Reading a particular column value from a file using Perl

Reading a particular column value from a file using Perl - linux

I am very new to Perl. I have a file called test.txt, the following code shows the data in the file
test count seed
in 5 100
checks 5 100
comb 5 100
reload 5 100
reset 5 100
There are 3 columns in the file those are test, count and seed. First, I want to read test name "in" and then I want to store into one variable after that 5 and 10 into the different variables like $test=in, $count=5 and $seed= 100, because I will call these variables into one more file, that's why I'm storing the variables and after that checks, comb, reload and reset likewise. Each and every time it should call test and it has to store into the the declared variables.
I'm able to read the file, but I'm not able to do this. Finally, I'll print those values to see whether the values a storing are not.
finally i got the answer with my brain, here is the code
while($line=<RD>)
{
if($line=~m/^\#\w+/)
{
}
elsif($line=~m/^\s*\w+/)
{
$line1=$&;
$line1=~s/\s+//g;
#print "$line1\n";
push(#arr,split(" ",$line));
$test_count=$arr[1];
for($i=1;$i<=$test_count;$i=$i+1)
{
print "make $arr[0] $arr[2] $arr[3] $arr[4]\n";
}
$#arr=-1;
}
}

You say:
I want to store into one variable after that 5 and 10 into the
different variables like $test=in, $count=5 and $seed= 100
I bet that's not what you want. I bet it will be far more useful for you to store the data in a complex data structure. Something like this:
my #data;
my #cols;
# Assuming you've opened the file and stored
# the filehandle in $fh
while (<$fh>) {
# Skip lines without any data
next unless /\S/;
if ($. == 1) {
#cols = split;
next;
}
my %row;
#row{#cols} = split;
push #data, \%row;
}
As you've made no effort to attempt the problem (or, at least, you've not shown any evidence of any effort) I'm not going to spend time explaining my solution to you. You'll get some useful information from reading perldoc perldsc.

Related

How to download the workspace into an Excel sheet where I can update a variable in Excel and they update in MATLAB as well?

Suppose I have a generic MATLAB script as follows:
A = [1 2 0; 2 5 -1; 4 10 -1];
b = [1;3;5];
x = A\b;
r = A*x - b;
These variables will be stored in the Workspace. How can I download the Workspace variables (A, b, x, r) in an Excel sheet, such that I can modify the variables in Excel and upload that Excel sheet unto the Current Folder in MATLAB and have the Workspace updated to the changes I did in Excel? For example, I download the workspace in Excel. I open the Excel sheet and change r=A*x-b to r='Hello World'. Then I upload that sheet onto MATLAB, and the new 'r' updates in the Workspace.

Please consider the following approach as a reference
First, your arrays and operations can be defined as strings, which are then evaluated. Please take this part of my proposal with a grain of salt and make sure that the instructions that you are evaluating are syntactically valid. Keep in mind that the eval function has its own risks
% Clean your workspace
clear
close
clc
% Create your two arrays and additional variables
A = [1 2 0; 2 5 -1; 4 10 -1];
b = [1;3;5];
% Define all the necessary operations as strings. Make sure that these
% operations are absolutely valid before proceeding. Here you can spend
% some time defining some error-checking logic.
x_oper = "A\b";
r_oper = "A*x - b";
% To be safe, we evaluate if the instructions are valid,
% otherwise we throw an error --> typos and other stuff can go wrong!
try
x = eval(x_oper); % be careful!
r = eval(r_oper); % be careful!
sprintf("Expressions successfully evaluated!")
catch err
sprintf("Error evaluating expression >> %s\n", err.message)
end
The values and instructions can be then formatted as individual tables to be saved as .csv files, which can be read using excel (or LibreOffice in my case).
Save your 'workspace' contents into two different files. For the sake of clarity, I am using one file for values and another one for operations
% Define to filenames
varsFile = "pseudo-workspace.csv"
operFile = "operations.csv"
% Convert variables and operations/instructions to tables
dataTable = table(A, b, x, r)
instrTable = table(x_oper, r_oper)
% Write the tables to their respective files
writetable(dataTable, varsFile)
writetable(instrTable, operFile)
Where the dataTable looks like this:
and the instrTable with the operations is:
After this point, your work is saved in two different files and are ready to be edited elsewhere. Perhaps you want to share the file with someone else or yourself in case you don't have access to Matlab on a different computer and you need to change the operations and/or values. Then, on a different .m file you read these files to your current workspace and assign them to the same variable tags:
% Now we read the values and operations from a previous session or
% externally edited in excel/text editor
rawValuesTable = readtable(varsFile)
clear A % I only clear my variables since I am working on the same m file
clear b
clear x
clear r
% Now we read the values and operations from a previous session or
% externally edited in excel/text editor
rawValuesTable = readtable(varsFile)
% Retrieve the values from A and b from the table that we just read
A = [rawValuesTable.A_1, rawValuesTable.A_2, rawValuesTable.A_3];
b = rawValuesTable.b;
rawOperations = readtable(operFile);
% The operations are read as cell arrays, therefore we need to
% evaluate them as strings only with the suffix {1}
try
x = eval(rawOperations.x_oper{1})
r = eval(rawOperations.r_oper{1})
sprintf("Expressions successfully evaluated!")
catch err
sprintf("Error evaluating expression >> %s\n", err.message)
end
Finally obtaining the same output, granted nothing was changed:
You could execute both procedures (write/read) using two different functions. Once again, this is my take on your particular case and you will surely come up with different ideas based on this

loop to read multiple files

I am using Obspy _read_segy function to read a segy file using following line of code:
line_1=_read_segy('st1.segy')
However I have a large number of files in a folder as follow:
st1.segy
st2.segy
st3.segy
.
.
st700.segy
I want to use a for loop to read the data but I am new so can any one help me in this regard.
Currently i am using repeated lines to read data as follow:
line_2=_read_segy('st1.segy')
line_2=_read_segy('st2.segy')
The next step is to display the segy data using matplotlib and again i am using following line of code on individual lines which makes it way to much repeated work. Can someone help me with creating a loop to display the data and save the figures .
data=np.stack(t.data for t in line_1.traces)
vm=np.percentile(data,99)
plt.figure(figsize=(60,30))
plt.imshow(data.T, cmap='seismic',vmin=-vm, vmax=vm, aspect='auto')
plt.title('Line_1')
plt.savefig('Line_1.png')
plt.show()
Your kind suggestions will help me a lot as I am a beginner in python programming.
Thank you

If you want to reduce code duplication, you use something called functions. And If you want to repeatedly do something, you can use loops. So you can call a function in a loop, if you want to do this for all files.
Now, for reading the files in folder, you can use glob package of python. Something like below:
import glob, os
def save_fig(in_file_name, out_file_name):
line_1 = _read_segy(in_file_name)
data = np.stack(t.data for t in line_1.traces)
vm = np.percentile(data, 99)
plt.figure(figsize=(60, 30))
plt.imshow(data.T, cmap='seismic', vmin=-vm, vmax=vm, aspect='auto')
plt.title(out_file_name)
plt.savefig(out_file_name)
segy_files = list(glob.glob(segy_files_path+"/*.segy"))
for index, file in enumerate(segy_files):
save_fig(file, "Line_{}.png".format(index + 1))
I have not added other imports here, which you know to add!. segy_files_path is the folder where your files reside.

You just need to dynamically open the files in a loop. Fortunately they all follow the same naming pattern.
N = 700
for n in range(N):
line_n =_read_segy(f"st{n}.segy") # Dynamic name.
data = np.stack(t.data for t in line_n.traces)
vm = np.percentile(data, 99)
plt.figure(figsize=(60, 30))
plt.imshow(data.T, cmap="seismic", vmin=-vm, vmax=vm, aspect="auto")
plt.title(f"Line_{n}")
plt.show()
plt.savefig(f"Line_{n}.png")
plt.close() # Needed if you don't want to keep 700 figures open.

I'll focus on addressing the file looping, as you said you're new and I'm assuming simple loops are something you'd like to learn about (the first example is sufficient for this).
If you'd like an answer to your second question, it might be worth providing some example data, the output result (graph) of your current attempt, and a description of your desired output. If you provide that reproducible example and clear description of the problem you're having it'd be easier to answer.
Create a list (or other iterable) to hold the file names to read, and another container (maybe a dict) to hold the result of your read_segy.
files = ['st1.segy', 'st2.segy']
lines = {} # creates an empty dictionary; dictionaries consist of key: value pairs
for f in files: # f will first be 'st1.segy', then 'st2.segy'
lines[f] = read_segy(f)
As stated in the comment by #Guimoute, if you want to dynamically generate the file names, you can create the files list by pasting integers to the base file name.
lines = {} # creates an empty dictionary; dictionaries have key: value pairs
missing_files = []
for i in range(1, 701):
f = f"st{str(i)}.segy" # would give "st1.segy" for i = 1
try: # in case one of the files is missing or can’t be read
lines[f] = read_segy(f)
except:
missing_files.append(f) # store names of missing or unreadable files

Comparing two jumbled / unaligned text files

I have two text files containing IL code for example, both text files include 100% the same code:
http://pastebin.com/iDvbu1tD
http://pastebin.com/u5fi9NMh
They are however unaligned / Jumbled so its not possible for me to find any differences, i am looking for a way to be able to take such files and filter out the code that is the same and only show blocks that are different.
Is there a way to do such a task ?

First the "TL;DR": the two pastes are in fact identical, as you say, except for a simple difference in the order of the methods, and a very minor formatting problem. Namely, the .method line of the middle method in each disassembly listing isn't indented by two spaces like the other two .method lines in the same file.
I copied the files to a Ubuntu Linux system, and took the liberty of fixing the minor .method formatting problem by hand in both of them.
Then I wrote some extraction code using the TXR language whose purpose it is to normalize the file: it extract the methods, identifying the name of each one and then prints a reformatted version of the disassembly. These reformatted versions can then be compared.
In the reformatting, the methods are sorted by name. Also, the IL_XXXX offsets on the instructions are deleted, so that minor differences, like the insertion of an instruction sequence, will not result in huge differences due to the subsequent offsets changing. However, this is not demonstrated with the given data, since the code is in fact identical.
Here is what the reformatting of one of the inputs looks like, abbreviated manually:
$ txr norm.txr disasm-a
.method public hidebysig newslot virtual final instance bool IsRoomConnected() cil managed
{
// Code size 21 (0x15)
.maxstack 8
ldarg.0
ldfld class [UnityEngine]UnityEngine.AndroidJavaObject GooglePlayGames.Android.AndroidRtmpClient::mRoom
brfalse IL_0013
ldarg.0
[ ... snip ... ]
ldc.i4.0
ret
} // end of method AndroidRtmpClient::IsRoomConnected
[ ... other methods ... ]
The .method material is folded into one line, IsRoomConnected is first because the other two functions start with P and O, respectively, and the offsets are gone from the instructions.
Comparing the reformatted versions can be done in the Linux environment in one step without temporary files thanks to Bash "process substitution" <(command ...) syntax. This syntax lets us use, in the place of a file name argument, a program which produces output. The program which is invoked thinks it is a file:
$ diff -u <(txr norm.txr disasm-a) <(txr norm.txr disasm-b)
# no output!
That is to say, when we diff the normalized versions of the two disassembly listings, there is no output: they are character-for-character identical!
A code listing of norm.txr follows:
(collect)
# (freeform)
.method #decl {
# (bind pieces #(tok-str decl #/\S+/))
# (bind name #(find-if (op search-str #1 "(") pieces))
# (collect)
# (cases)
IL_#offset: #line
# (or)
#line
# (end)
# (last)
} // #eom
# (end)
#(end)
#(set (name decl line eom) #(multi-sort (list name decl line eom)
[list less]))
#(output)
# (repeat)
.method #decl
{
# (repeat)
#line
# (end)
} // #eom
# (end)
#(end)

how can I "help along" node's garbage collection when dealing with large arrays?

I have about 5 very large csv files that I need to parse, munge and insert into a database. The code looks approximately like this:
i = 0
processFile = (linecount, file, onDone) ->
# process the csv as a stream
# NOTE: **this is where the large array gets declared**
# insert every relevant line into an array
# process the array and insert it into the db (about 5k records at a time)
# call onDone when db insert is done
getLinesAndProcess = (i, onDone) ->
inputFile = inputFiles[i]
if inputFile?
getFileSizeAndProcess = -> # this helps the GC
puts = (error, stdout, stderr) ->
totalLines = stdout.split(" ")[0]
processFile(totalLines, inputFile, ->
getLinesAndProcess(++i)
)
console.log "processing: #{inputFile}"
exec "wc -l '#{inputFile}'", puts
setTimeout(getFileSizeAndProcess, 5000)
getLinesAndProcess(i, ->
# close db connection, exit process and so on
)
The first million lines go fine, takes about 3 mins. Then it chunks along on the next record -- until node hits its memory limit (1.4GB) then it just crawls. The most likely thing is that v8's GC is not cleaning up the recordsToInsert array, even though it's in a closure that is done.
My solution in the short term is to just run one file at a time. That's fine, it works and so on, but I'm stuck with what to do to fix the multi-file problem. I've tried the -–max-old-space-size=8192 fix from caustik's blog, but it hasn't helped -- node is still getting stuck at 1.4GB. I added in the setTimeout based on a suggestion in another SO post. It doesn't appear to help.
In the end, I just had to set the array back to an empty array before calling the callback. That works fine but it feels like v8's GC is failing me here.
Can I do anything to get v8 to be smarter about GC when dealing with large arrays?

Closed over variables just seem like this because syntactically they look like local variables which live on the stack.
However they are very, very different.
If you had object like this:
function Closure() {
this.totalLines = [];
this.otherVariable = 3;
}
var object = new Closure();
Majority of people would understand that totalLines will never be garbage collected as long as object lives on. And if they were done with totalLines long before they were done with the object, they would assign it to null or at least understand that the garbage collector cannot collect it.
However, when it comes to actual javascript closures it works exactly the same yet people find it odd that they would have to explicitly set the closed over variable to null because the syntax is deceiving them.

how can i print the column in text file to row wise using unix command?

I have file like this.. for eg:
number,dac,amountdac,expdate,0
1111,1,0.000000,2010-07-21,0
1111,2,0.000000,2010-07-21,0
1111,3,0.000000,2010-07-21,0
1111,4,0.000000,2010-07-21,0
1111,5,0.000000,2010-07-21,0
1111,6,0.000000,2010-07-21,0
1111,7,0.000000,2010-07-21,0
1111,8,0.000000,2010-07-21,0
1111,9,0.000000,2010-07-21,0
1111,10,0.000000,2010-07-21,0
2222,1,50.000000,2010-07-21,0
2222,2,0.000000,2010-07-21,0
2222,3,0.000000,2010-07-21,0
2222,4,0.000000,2010-07-21,0
2222,5,0.000000,2010-07-21,0
2222,6,0.000000,2010-07-21,0
2222,7,0.000000,2010-07-21,0
2222,8,10.000000,2010-07-21,0
2222,9,0.000000,2010-07-21,0
2222,10,0.000000,2010-07-21,0
3333,1,0.000000,2010-07-21,0
3333,2,0.000000,2010-07-21,0
3333,3,0.000000,2010-07-21,0
3333,4,0.000000,2010-07-21,0
3333,5,0.000000,2010-07-21,0
3333,6,0.000000,2010-07-21,0
3333,7,0.000000,2010-07-21,0
3333,8,0.000000,2010-07-21,0
3333,9,200.000000,2010-07-21,0
3333,10,50.000000,2010-07-21,0
i want output like this, column 1 number is same for all dac1 to dac10. header i gave for your reference. in original file i don't have header.
number,dac1,dac2,dac3,dac4,dac5,dac6,dac7,dac8,dac9,dac10,amountdac1,amountdac2,amountdac3,,amountdac4,amountdac5,amountdac6,amountdac7,amountdac8,amountdac9,,amountdac10,expdate1,expdate2,expdate3,expdate4,expdate5,expdate6,expdate7,expdate8,expdate9,expdate10,0
1111,1,2,3,4,5,6,7,8,9,10,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,2010-07-21,2010-07-21,2010-07-21,2010-07-21,2010-07-21,2010-07-21,2010-07-21,2010-07-21,2010-07-21,2010-07-21,0
2222,1,2,3,4,5,6,7,8,9,10,50.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,10.000000,0.000000,0.000000,2010-07-21,2010-07-21,2010-07-21,2010-07-21,2010-07-21,2010-07-21,2010-07-21,2010-07-21,2010-07-21,2010-07-21,0
3333,1,2,3,4,5,6,7,8,9,10,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,200.000000,50.000000,2010-07-21,2010-07-21,2010-07-21,2010-07-21,2010-07-21,2010-07-21,2010-07-21,2010-07-21,2010-07-21,2010-07-21,0

awk -F"," '{
a[$1];
b[$1]=b[$1]","$2
c[$1]=c[$1]","$3
d[$1]=d[$1]","$4
e[$1]=e[$1]","$5 }
END{ for(i in a){ print i,b[i],c[i],d[i],e[i] } } ' file

You could write a python script to break that up:
numbers = []
dacs = []
amountdacs = []
expdates = []
for row in text:
number, dac, amountdac, expdate, zero = row.split(',')
numbers.append(number)
dacs.append(dac)
amountdacs.append(amountdac)
expdates.append(expdate)
# print things out however you want them
You could probably do something similar in perl, if you're more facile with it than I am.

Basically the idea is i suppose u need to transpose the data.
stackoverflow has a similar question with a very good solution
only task left is u need to use your scripting skills to
take the chunk of data i.e., 10 rows
at a time.
remove the first column in that 10
rows and transpose the data
add the first column value (here 1111
or 2222 or 3333)
all the above 3 steps should be done recursively for all the rows in the input file.
I guess half of the solution is provided and you can manage the remaining over here with simple scripting.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Reading a particular column value from a file using Perl - linux

Related

How to download the workspace into an Excel sheet where I can update a variable in Excel and they update in MATLAB as well?

loop to read multiple files

Comparing two jumbled / unaligned text files

how can I "help along" node's garbage collection when dealing with large arrays?

how can i print the column in text file to row wise using unix command?

Categories

Resources