I have a file as below:
number=49090005940;
NUMBER TRANSLATION DATA
NUMBER DATA
NUMBER TYPE SUBCOND
49090005940 IN
NUMPRE
1117230111
END
number=49090005942;
NUMBER TRANSLATION DATA
NUMBER DATA
NUMBER TYPE SUBCOND
49090005942 IN
NUMPRE
1117230111
END
I want to have an output with "NUMBER, TYPE, and NUMPRE as below:
NUMBER=49090005940; TYPE=IN; NUMPRE=1117230111;
NUMBER=49090005942; TYPE=IN; NUMPRE=1117230111;
This is a mouthful, but it works.
awk '/number=/{split($0, a, "[=;]"); nump=a[2]} nextrec==1 && /[^ ]/{num=$0; nextrec=0} /NUMPRE/{nextrec=1} $1==nump{ty=$2} /END/{print "NUMBER="num"; TYPE="ty"; NUMPRE="nump";"}' infile
Here awk:
If it finds a record matching "number=" (/number=/ then it splits the record by an equal sign or a semicolon and stores it in array a (split($0, a, "[=;]");). It then puts the second element of the array into variable nump (nump=a[2]).
It looks for a line containing the word NUMPRE (/NUMPRE/) if it finds it, it sets variable nextrec to 1 (nextrec=1).
If nextrec is set to 1 and the record contains no spaces (nextrec==1 && /[^ ]/) then set variable num to the line (num=$0).
If the line starts with what is stored in nump ($1==nump) then store the second field of that record in variable ty (ty=$2).
Finally if we hit a record containing END (/END/) then print the output desired(print "NUMBER="num"; TYPE="ty"; NUMPRE="nump";")
With GNU awk for multi-char RS:
$ awk -v RS='\n\n\n' -v OFS='; ' -v ORS=';\n' '{print $7"="$10, $8"="$11, $12"="$13 }' file
NUMBER=49090005940; TYPE=IN; NUMPRE=1117230111;
NUMBER=49090005942; TYPE=IN; NUMPRE=1117230111;
Related
What I am trying to do is to show 2 rows above and 2 rows below a line that meets a certain criteria without a pipe, using awk. For example, I am searching for the string 's62234' and when found, I want to print all the rows bounded in the blue rectangle as shown in the attached screenshot.
This is the file I am using (thefmifile.txt)
s62098:x:1271:504:Velizar Vrabchev,SI,3,1:/home/SI/s62098:/bin/bash
s62101:x:1272:504:Georgi Georgiev,SI,3,5:/home/SI/s62101:/bin/bash
s62108:x:1273:504:Sherif Kunch,SI,3,1:/home/SI/s62108:/bin/bash
s62111:x:1274:504:Yulian Bizeranov,SI,3,3:/home/SI/s62111:/bin/bash
s62121:x:1275:504:Daniel Dimitrov,SI,2,1:/home/SI/s62121:/bin/bash
s62133:x:1276:504:Ivaylo Ivanov,SI,2,2:/home/SI/s62133:/bin/bash
s62160:x:1277:504:Veniyana Tsolova,SI,2,3:/home/SI/s62160:/bin/bash
s62199:x:1278:504:Nikola Petrov,SI,2,5:/home/SI/s62199:/bin/bash
s62219:x:1279:504:Viliyan Ivanov,SI,2,6:/home/SI/s62219:/bin/bash
s62234:x:1280:504:Viktoriya Dobreva,SI,2,3:/home/SI/s62234:/bin/bash
s855264:x:1281:504:Toni Dupkarski,SI,4,2:/home/SI/s855264:/bin/bash
s81555:x:1282:503:Elena Georgieva,KN,2,0:/home/KN/s81555:/bin/bash
s81585:x:1283:503:Stela Marinova,KN,2,0:/home/KN/s81585:/bin/bash
s81441:x:1284:503:Vesela Plamenova Borislavova , KN, k2, g7:/home/KN/s81441:/bin/bash
s81644:x:1285:503:Viktor Rusev, KN, k2, g7:/home/KN/s81644:/bin/bash
s81628:x:1286:503:Iliyan Yordanov Yordanov, KN, k2, g6:/home/KN/s81628:/bin/bash
s81490:x:1287:503:Yana Spasova, KN, k2, g6:/home/KN/s81490:/bin/bash
What I have tried is using awk to find the row that meets the criteria and use NR to get the numbers of the other rows needed, but seems I am missing something.
Here is the command I used:
cat thefmifile.txt | awk -F ':' '$1==s62234 {for (x = NR -2; x <= NR + 2; x++){print}}'
Output is in the below screenshot.
And this is the desired output:
s62199:x:1278:504:Nikola Petrov,SI,2,5:/home/SI/s62199:/bin/bash
s62219:x:1279:504:Viliyan Ivanov,SI,2,6:/home/SI/s62219:/bin/bash
s62234:x:1280:504:Viktoriya Dobreva,SI,2,3:/home/SI/s62234:/bin/bash
s855264:x:1281:504:Toni Dupkarski,SI,4,2:/home/SI/s855264:/bin/bash
s81555:x:1282:503:Elena Georgieva,KN,2,0:/home/KN/s81555:/bin/bash
When it is {print x} it shows the numbers of the lines I need, but is there some way to access the lines of the file as elements in array and just to use this 'x' as an index (e.g. something like NR[x])?
Or Is there some other way to retrieve these rows?
Thank you!
$ awk -v n=2 -F':' '$1=="s62234"{for (i=0;i<n;i++) print buf[(NR+i)%n]; c=n+1} c&&c--; {buf[NR%n]=$0}' file
s62199:x:1278:504:Nikola Petrov,SI,2,5:/home/SI/s62199:/bin/bash
s62219:x:1279:504:Viliyan Ivanov,SI,2,6:/home/SI/s62219:/bin/bash
s62234:x:1280:504:Viktoriya Dobreva,SI,2,3:/home/SI/s62234:/bin/bash
s855264:x:1281:504:Toni Dupkarski,SI,4,2:/home/SI/s855264:/bin/bash
s81555:x:1282:503:Elena Georgieva,KN,2,0:/home/KN/s81555:/bin/bash
buf[] is just an array storing the n lines preceding the current line so those can be printed when your $1=="s62234" condition is met. c&&c--; represents a true condition which will cause awk to print (the default action) the current line plus n subsequent lines due to c=n+1 also being set when your condition is met - i.e. it'll print the current line and decrement c until c reaches zero.
Could you please try following, simple grep could handle this task.
grep -A2 -B2 '^s62234:' Input_file
Also more accurately you could try following to match exact string with grep:
grep -C2 '^s62234:' Input_file
That's easily doable with grep:
-B, --before-context=NUM print NUM lines of leading context
-A, --after-context=NUM print NUM lines of trailing context
-C, --context=NUM print NUM lines of output context
-NUM same as --context=NUM
With awk, you could do something like this:
awk -F ':' '$1==s62234{print l2;print l1;a=3}a&&a-->0{print}{l2=l1;l1=$0}' thefmifile.txt
You can handle the number of before-lines dynamically by storing them in an array and using a loop.
I have a file with patterns like below
12345343|559|-2,0,-200000,-20|20161108000000|FL|62,859,1439,1956|0,0,21300,0|S
7778880|123|500,100|20161108000000|AL|21,135|3|S
I'm looking for a way to separate into multiple records mapping 3rd and 6th set of values
Required output:
12345343|559|-2|20161108000000|FL|62|0,0,21300,0|S
12345343|559|0|20161108000000|FL|859|0,0,21300,0|S
12345343|559|-200000|20161108000000|FL|1439|0,0,21300,0|S
12345343|559|-20|20161108000000|FL|1956|0,0,21300,0|S
7778880|123|500|20161108000000|AL|21|3|S
7778880|123|100|20161108000000|AL|135|3|S
This might work for you (GNU sed):
sed -r 's/^(.*\|.*\|)([^,]*),([^|]*)(\|.*\|.*\|)([^,]*),([^|]*)(.*)/\1\2\4\5\7\n\1\3\4\6\7/;P;D' file
Iteratively split the current line into pieces, building two lines separated by a newline. The first line contains the head of the 3rd and 6th fields, the second line contain the tails of the 3rd and 6th lines. Print then delete the first of the lines and then repeat till the lists in the 3rd and 6th fields are consumed.
You can use this awk
awk -F'|' -vOFS='|' '{a="";b=split($3,c,",");split($6,d,",");for(e=1;e<=b;e++){if(a)a=a RS;$3=c[e];$6=d[e];a=a$0};print a}' infile
Explanation :
awk -F'|' -vOFS='|' ' # fields are separate by | for input and output
{
a=""
b=split($3,c,",") # split field 3 in array c
# b is the number of elements
split($6,d,",") # split field 6 in array d
for(e=1;e<=b;e++) # for each element of array c and d
{
if(a) # if a is defined, append RS (\n) at the end
a=a RS
$3=c[e]
$6=d[e] # substitute fields 3 and 6 with the value of array c and d
a=a$0 # append the complete line to a
}
print a # at the end of the loop print a
}
' infile
I am facing a problem to extract a specific value in a .txt file using grep and awk.
I show below an excerpt from the .txt file:
"-
bravais-lattice index = 2
lattice parameter (alat) = 10.0000 a.u.
unit-cell volume = 250.0000 (a.u.)^3
number of atoms/cell = 2
number of atomic types = 1
number of electrons = 28.00
number of Kohn-Sham states= 18
kinetic-energy cutoff = 60.0000 Ry
charge density cutoff = 300.0000 Ry
convergence threshold = 1.0E-09
mixing beta = 0.7000"
I also defined some variable: ELEMENT and lat.
I want to extract the "unit-cell volume" value which is equal to 250.00.
I tried the following to extract the value using grep and awk:
volume=`grep "unit-cell volume" ./latt.10/$ELEMENT.scf.latt_$lat.out | awk '{printf "%15.12f\n",$5}'`
However, when i run the bash file I always get 00.000000 as a result instead of the correct value of 250.00.
Can anyone help, please?
Thanks in advance.
awk '{printf "%15.12f\n",$5}'
You're asking awk to print out the fifth field of the line ($5).
unit-cell volume = 250.0000 (a.u.)^3
1 2 3 4 5
The fifth field is (a.u.)^3, which you are then asking awk to interpret as a number via the %f format code. It's not a number, though (or actually, doesn't start with a number), and when awk is asked to treat a non-numeric string as a number, it uses 0 instead. Thus it prints 0.
Solution: use $4 instead.
By the way, you can skip invoking grep by using awk itself to select the line, e.g.
awk /^ unit-cell/ {...}
The /^ unit-cell/ is a regular expression that matches "unit-cell" (with a leading space) at the beginning of the line. Adjust as necessary if you have other lines that start with unit-cell which you don't want to select.
You never need grep when you're using awk since awk can do anything useful that grep can do. It sounds like this is all you need:
$ awk -F'=' '/unit-cell volume/{printf "%.2f\n",$2}' file
250.00
The above works because when FS is = that means $2 is <spaces>250.000 (a.u.)^3 and when awk is asked to convert a string to a number it strips off leading spaces and anything after the numeric part so that leaves 250.000 to be converted to a number by %.2f.
In the script you posted $5 was failing because the 5th space-separated field in:
$1 $2 $3 $4 $5
<unit-cell> <volume> <=> <250.0000> <(a.u.)^3>
is (a.u.)^3 - you could have just added print $5 to see that.
Since you are processing key-value pairs where the key can have variable amount on space in it, you need to tune that field number ($4, $5 etc.) separately for each record you want to process unless you set the field separator (FS) appropriately to FS=" *= *". Then the key will always be in $1 and value in $2.
Then use split to split the value and unit parts from each other.
Also, you can loose that grep by defining in awk a pattern (or condition, /unit-cell volume/) for that printaction:
$ awk 'BEGIN{FS=" *= *"} /unit-cell volume/{split($2,a," +");print a[1]}' file
250.0000
Explained:
$ awk '
BEGIN { FS=" *= *" } # set appropriate field separator
/unit-cell volume/ { # pattern or condition
split($2,a," +") # split value part to value and possible unit parts
print a[1] # output value part
}' file
The title is probably not very well worded, but I currently need to script a search that finds a given string in a CSV, then parses the line that's found and do another grep with an element within that line.
Example:
KEY1,TRACKINGKEY1,TRACKINGNUMBER1-1,PACKAGENUM1-1
,TRACKINGKEY1,TRACKINGNUMBER1-2,PACKAGENUM1-2
,TRACKINGKEY1,TRACKINGNUMBER1-3,PACKAGENUM1-3
,TRACKINGKEY1,TRACKINGNUMBER1-4,PACKAGENUM1-4
,TRACKINGKEY1,TRACKINGNUMBER1-5,PACKAGENUM1-5
KEY2,TRACKINGKEY2,TRACKINGNUMBER2-1,PACKAGENUM2-1
KEY3,TRACKINGKEY3,TRACKINGNUMBER3-1,PACKAGENUM3-1
,TRACKINGKEY3,TRACKINGNUMBER3-2,PACKAGENUM3-2
What I need to do is grep the .csv file for a given key [key1 in this example] and then grab TRACKINGKEY1 so that I can grep the remaining lines. Our shipping software doesn't output the packingslip key on every line, which is why I have to first search by KEY and then by TRACKINGKEY in order to get all of the tracking numbers.
So using KEY1 initially I eventually want to output myself a nice little string like "TRACKINGNUMBER1-1;TRACKINGNUMBER1-2;TRACKINGNUMBER1-3;TRACKINGNUMBER1-4;TRACKINGNUMBER1-5"
$ awk -v key=KEY1 -F, '$1==key{f=1} ($1!~/^ *$/)&&($1!=key){f=0} f{print $3}' file
TRACKINGNUMBER1-1
TRACKINGNUMBER1-2
TRACKINGNUMBER1-3
TRACKINGNUMBER1-4
TRACKINGNUMBER1-5
glennjackman helpfully points out that by using a "smarter" value for FS the internal logic can be simpler.
awk -v key=KEY1 -F' *,' '$1==key{f=1} $1 && $1!=key{f=0} f{print $3}' file
-v key=KEY1 assign the value KEY1 to the awk variable key
-F' *,' assign the value *, (which is a regular expression) to the awk FS variable (controls field splitting)
$1==key{f=1} if the first key of the line is equal to the value of the key variable (KEY1) then assign the value 1 to the variable f (find our first desired key line)
$1 && $1!=key{f=0} if the first field has a truth-y value (in awk a non-zero, non-empty string) and the value of the first field is not equal to the value of the key variable assign the value 0 to the variable f (find the end of our keyless lines)
f{print $3} if the variable f has a truth-y value (remember non-zero, non-empty string) then print the third field of the line
awk '/KEY1/ {print $3}' FS=,
Result
TRACKINGNUMBER1-1
TRACKINGNUMBER1-2
TRACKINGNUMBER1-3
TRACKINGNUMBER1-4
TRACKINGNUMBER1-5
$ sed -nr '/^KEY1/, /^KEY/ { /^(KEY1| )/!d; s/.*(TRACKINGNUMBER[^,]+).*/\1/ p}' input
TRACKINGNUMBER1-1
TRACKINGNUMBER1-2
TRACKINGNUMBER1-3
TRACKINGNUMBER1-4
TRACKINGNUMBER1-5
One more awk
awk -F, '/KEY1/,/KEY/{print $3}' file
or given the sample data
awk 'match($0,/([^,]+NUMBER1[^,]+)/,a){print a[0]}'
or even
awk -F, '$3~/NUMBER1/&&$0=$3' file
I have a flat file separated by | that I want to update from information already inside of the flat file. I want to fill the third field using information from the first and second. From the first field I want to ignore the last two numbers when using that data to compare against the data missing the third field. When matching against the second field I want it to be exact. I do not want to create a new flat file. I want to update the existing file. I researched a way to pull out the first two fields from the file but I do not know if that will even be helpful for the goal I am trying to achieve. To sum all of that up, I want to compare the first and second fields to other fields in the file to pull the third field that may be missing on some of the lines on the flat file.
awk -F'|' -v OFS='|' '{sub(/[0-9 ]+$/,"",$1)}1 {print $1 "\t" $2}' tstfile
first field|second field|third field
Original intput:
t1ttt01|/a1
t1ttt01|/b1
t1ttt01|/c1
t1ttt03|/a1|1
t1ttt03|/b1|1
t1ttt03|/c1|1
l1ttt03|/a1|3
l1ttt03|/b1|3
l1ttt03|/c1|3
What it should do:
t1ttt03|/a1|1 = t1ttt01|/a1
when comparing t1ttt|/a1| = t1ttt|/a1
Therefore
t1ttt01|/a1 becomes t1ttt01|/a1|/1
What I want the Output to look like:
t1ttt01|/a1|1
t1ttt01|/b1|1
t1ttt01|/c1|1
t1ttt03|/a1|1
t1ttt03|/b1|1
t1ttt03|/c1|1
l1ttt03|/a1|3
l1ttt03|/b1|3
l1ttt03|/c1|3
One way with awk:
awk '
# set the input and output field separator to "|"
BEGIN{FS=OFS="|"}
# Do this action when number of fields on a line is 3 for first file only. The
# action is to strip the number portion from first field and store it as a key
# along with the second field. The value of this should be field 3
NR==FNR&&NF==3{sub(/[0-9]+$/,"",$1);a[$1$2]=$3;next}
# For the second file if number of fields is 2, store the line in a variable
# called line. Validate if field 1 (without numbers) and 2 is present in
# our array. If so, print the line followed by "|" followed by value from array.
NF==2{line=$0;sub(/[0-9]+$/,"",$1);if($1$2 in a){print line OFS a[$1$2]};next}1
' file file
Test:
$ cat file
t1ttt01|/a1
t1ttt01|/b1
t1ttt01|/c1
t1ttt03|/a1|1
t1ttt03|/b1|1
t1ttt03|/c1|1
l1ttt03|/a1|3
l1ttt03|/b1|3
l1ttt03|/c1|3
$ awk 'BEGIN{FS=OFS="|"}NR==FNR&&NF==3{sub(/[0-9]+$/,"",$1);a[$1$2]=$3;next}NF==2{line=$0;sub(/[0-9]+$/,"",$1);if($1$2 in a){print line OFS a[$1$2]};next}1' file file
t1ttt01|/a1|1
t1ttt01|/b1|1
t1ttt01|/c1|1
t1ttt03|/a1|1
t1ttt03|/b1|1
t1ttt03|/c1|1
l1ttt03|/a1|3
l1ttt03|/b1|3
l1ttt03|/c1|3