How to pass column name of a file as variable in awk - linux

I trying to printing the data in the particular columns by passing them int awk command
I have tried using "-v" to set is as variable but its considering "$" as string. And my delimiter is special character ^A (ctrl+v+a).
vi test_file.dat
a^Ab^Ac^Ad^Ae^Af^Ag^Ah^Ai^Aj^Ak^Al^Am^An^Ao^Ap
Working code
awk -F'^A' '{print $2,$5,$7}' test_file.dat
It's Printing
b e g
But if I try
export fields='$2,$5,$7'
export file='test_file.dat'
awk -v sample_file="$test_file.dat" -v columns="$fileds" -F'^A' '{print columns}' sample_file
It's printing
$2 $5 $7
I expect the output as
b e g
And I want to pass the delimiter, columns, file name as a parameter like
export fields='$2,$5,$7'
export file='test_file.dat'
export delimiter='^A'
awk -v sample_file="$test_file.dat" -v columns="$fields" -v file_delimiter="$delimiter" -F'file_delimiter' '{print columns}' sample_file

In awk, the $ symbol is effectively an operator which takes the field numbers as arguments. The field names are expressions, which is why $NF works for denoting the last field: NF is evaluated by the $ operator. So as you can see, we should not include the dollar sign in the field names.
If you're using the environment to pass material to Awk, the right thing to do is to have Awk pick it up from the environment.
The environment can be accessed using the ENVIRON associative array. If a variable called delimiter holds the field separator, we might do something like
BEGIN { FS = ENVIRON["delimiter"] }
in the Awk code. Then we aren't dealing with yet another round of shell parameter interpolation issues.
We can pick up the field numbers similarly. The split function can be used to get them into an array. Refer to this one-liner:
$ fields=1,3,9 awk 'BEGIN { split(ENVIRON["fields"], f, ",") ;
for (i in f)
printf("f[%d] = %d\n", i, f[i]) ; }'
f[1] = 1
f[2] = 3
f[3] = 9
GNU Awk, the expression length(f) gives the number of fields.

In order to get awk to see the special characters while reading the file you could use cat -v file (there might be a built-in method, although I'm not aware of it). Then the key to getting the character ^A (Control-A) delimiter to be recognized is to escape it with a \, otherwise the regex capability of awk,gawk, etc. is to treat ^ as start of line.
export fields='$2,$5,$7'
export test_file='test_file.dat'
export delimiter='\\^A'
awk -F $delimiter '{ print '$fields' }' < <(cat -v test_file)
There's also no need to set awk variables for bash variables that have already set — so you can eliminate all of them essentially.
One thing to note if you did want to set them in awk is that columns wouldn't work because usually setting an awk variable from a bash variable would be assigned individually. For example -v var1='$2' -v var2='$5' -v var3='$7', so you'd end up for { print var1, var2, var3 } in awk. It's doubtful a single string can translated it into three variables without additional steps.

Related

Use sed to find and replace a number following by its successor in bash

I have a string that contains multiple occurrences of number ranges, which are separated by a comma, e.g.,
2-12,59-89,90-102,103-492,593-3990,3991-4930
Now I would like to remove all directly neighbouring ranges and remove them from the string, i.e., remove anything that is of the form -(x),(x+1), to get something like this:
2-12,59-492,593-4930
Can anyone think of a method to accomplish this? I can honestly not post anything that I have tried, because all my tries were highly unsuccessful. To me it seems like it is not possible to actually find anything of the form -(x),(x+1) using sed, since that would require doing operations or comparisons of a found number by another number that has to be part of the command that is currently searching for numbers.
If everybody agrees that sed is NOT the correct tool for doing this, I will do it another way, but I am still interested if it's possible.
with awk
awk -F, -v RS="-" -v ORS="-" '$2!=$1+1' file
with appropriate separator setting, print the record when second field is not +1.
RS is the record separator and ORS is the outpout record separator.
test:
> awk -F, -v RS="-" -v ORS="-"
'$2!=$1+1' <<< "2-12,59-89,90-102,103-492,593-3990,3991-4930"
2-12,59-492,593-4930
awk solution:
awk -F'-' '{ r=$1;
for (i=2; i<=NF; i++) {
split($i, a, ",");
r=sprintf("%s%s", r, a[2]-a[1]==1? "" : FS $i)
}
print r
}' file
-F'-' - treat -(hyphen) as field separator
r - resulting string
split($i, a, ",") - split adjacent range boundaries into array a by separator ,
a[2]-a[1]==1 - crucial condition, reflects (x),(x+1)
The output:
2-12,59-492,593-4930
This might work for you (GNU sed):
sed -r ' s/^/\n/;:a;ta;s/\n([^-]*-)([0-9]*)(.*,)/\1\n\2\n\2\n\3/;Td;:b;s/(\n.*\n.*)9(_*\n)/\1_\2/;tb;s/(\n.*\n)(_*\n)/\10\2/;s/$/\n0123456789/;s/(\n.*\n[0-9]*)([0-8])(_*\n.*)\n.*\2(.).*/\1\4\3/;:z;tz;s/(\n.*\n[^_]*)_([^\n]*\n)/\10\2/;tz;:c;tc;s/([0-9]*-)\n(.*)\n(.*)\n,(\3)-/\n\1/;ta;s/\n(.*)\n.*\n,/\1,\n/;ta;:d;s/\n//g' file
This proof-of-concept sed solution, iteratively increments and compares the end of one range with the start of another. If the comparison is true it removes both and repeats, otherwise it moves on to the next range and repeats until all ranges have been compared.

How to extract specific value using grep and awk?

I am facing a problem to extract a specific value in a .txt file using grep and awk.
I show below an excerpt from the .txt file:
"-
bravais-lattice index = 2
lattice parameter (alat) = 10.0000 a.u.
unit-cell volume = 250.0000 (a.u.)^3
number of atoms/cell = 2
number of atomic types = 1
number of electrons = 28.00
number of Kohn-Sham states= 18
kinetic-energy cutoff = 60.0000 Ry
charge density cutoff = 300.0000 Ry
convergence threshold = 1.0E-09
mixing beta = 0.7000"
I also defined some variable: ELEMENT and lat.
I want to extract the "unit-cell volume" value which is equal to 250.00.
I tried the following to extract the value using grep and awk:
volume=`grep "unit-cell volume" ./latt.10/$ELEMENT.scf.latt_$lat.out | awk '{printf "%15.12f\n",$5}'`
However, when i run the bash file I always get 00.000000 as a result instead of the correct value of 250.00.
Can anyone help, please?
Thanks in advance.
awk '{printf "%15.12f\n",$5}'
You're asking awk to print out the fifth field of the line ($5).
unit-cell volume = 250.0000 (a.u.)^3
1 2 3 4 5
The fifth field is (a.u.)^3, which you are then asking awk to interpret as a number via the %f format code. It's not a number, though (or actually, doesn't start with a number), and when awk is asked to treat a non-numeric string as a number, it uses 0 instead. Thus it prints 0.
Solution: use $4 instead.
By the way, you can skip invoking grep by using awk itself to select the line, e.g.
awk /^ unit-cell/ {...}
The /^ unit-cell/ is a regular expression that matches "unit-cell" (with a leading space) at the beginning of the line. Adjust as necessary if you have other lines that start with unit-cell which you don't want to select.
You never need grep when you're using awk since awk can do anything useful that grep can do. It sounds like this is all you need:
$ awk -F'=' '/unit-cell volume/{printf "%.2f\n",$2}' file
250.00
The above works because when FS is = that means $2 is <spaces>250.000 (a.u.)^3 and when awk is asked to convert a string to a number it strips off leading spaces and anything after the numeric part so that leaves 250.000 to be converted to a number by %.2f.
In the script you posted $5 was failing because the 5th space-separated field in:
$1 $2 $3 $4 $5
<unit-cell> <volume> <=> <250.0000> <(a.u.)^3>
is (a.u.)^3 - you could have just added print $5 to see that.
Since you are processing key-value pairs where the key can have variable amount on space in it, you need to tune that field number ($4, $5 etc.) separately for each record you want to process unless you set the field separator (FS) appropriately to FS=" *= *". Then the key will always be in $1 and value in $2.
Then use split to split the value and unit parts from each other.
Also, you can loose that grep by defining in awk a pattern (or condition, /unit-cell volume/) for that printaction:
$ awk 'BEGIN{FS=" *= *"} /unit-cell volume/{split($2,a," +");print a[1]}' file
250.0000
Explained:
$ awk '
BEGIN { FS=" *= *" } # set appropriate field separator
/unit-cell volume/ { # pattern or condition
split($2,a," +") # split value part to value and possible unit parts
print a[1] # output value part
}' file

bash Changing every other comma to point

I am working with set of data which is written in Swedish format. comma is used instead of point for decimal numbers in Sweden.
My data set is like this:
1,188,1,250,0,757,0,946,8,960
1,257,1,300,0,802,1,002,9,485
1,328,1,350,0,846,1,058,10,021
1,381,1,400,0,880,1,100,10,418
Which I want to change every other comma to point and have output like this:
1.188,1.250,0.757,0.946,8.960
1.257,1.300,0.802,1.002,9.485
1.328,1.350,0.846,1.058,10.021
1.381,1.400,0.880,1.100,10.418
Any idea of how to do that with simple shell scripting. It is fine If I do it in multiple steps. I mean if I change first the first instance of comma and then the third instance and ...
Thank you very much for your help.
Using sed
sed 's/,\([^,]*\(,\|$\)\)/.\1/g' file
1.188,1.250,0.757,0.946,8.960
1.257,1.300,0.802,1.002,9.485
1.328,1.350,0.846,1.058,10.021
1.381,1.400,0.880,1.100,10.418
For reference, here is a possible way to achieve the conversion using awk:
awk -F, '{for(i=1;i<=NF;i=i+2) {printf $i "." $(i+1); if(i<NF-2) printf FS }; printf "\n" }' file
The for loop iterates every 2 fields separated by a comma (set by the option -F,) and prints the current element and the next one separated by a dot.
The comma separator represented by FS is printed except at the end of line.
As a Perl one-liner, using split and array manipulation:
perl -F, -e '#a = #b = (); while (#b = splice #F, 0, 2) {
push #a, join ".", #b} print join ",", #a' file
Output:
1.188,1.250,0.757,0.946,8.960
1.257,1.300,0.802,1.002,9.485
1.328,1.350,0.846,1.058,10.021
1.381,1.400,0.880,1.100,10.418
Many sed dialects allow you to specify which instance of a pattern to replace by specifying a numeric option to s///.
sed -e 's/,/./9' -e 's/,/./7' -e 's/,/./5' -e 's/,/./3' -e 's/,/./'
ISTR some sed dialects would allow you to simplify this to
sed 's/,/./1,2'
but this is not supported on my Debian.
Demo: http://ideone.com/6s2lAl

Grep / search for a string in a csv, then parse that line and do another grep

The title is probably not very well worded, but I currently need to script a search that finds a given string in a CSV, then parses the line that's found and do another grep with an element within that line.
Example:
KEY1,TRACKINGKEY1,TRACKINGNUMBER1-1,PACKAGENUM1-1
,TRACKINGKEY1,TRACKINGNUMBER1-2,PACKAGENUM1-2
,TRACKINGKEY1,TRACKINGNUMBER1-3,PACKAGENUM1-3
,TRACKINGKEY1,TRACKINGNUMBER1-4,PACKAGENUM1-4
,TRACKINGKEY1,TRACKINGNUMBER1-5,PACKAGENUM1-5
KEY2,TRACKINGKEY2,TRACKINGNUMBER2-1,PACKAGENUM2-1
KEY3,TRACKINGKEY3,TRACKINGNUMBER3-1,PACKAGENUM3-1
,TRACKINGKEY3,TRACKINGNUMBER3-2,PACKAGENUM3-2
What I need to do is grep the .csv file for a given key [key1 in this example] and then grab TRACKINGKEY1 so that I can grep the remaining lines. Our shipping software doesn't output the packingslip key on every line, which is why I have to first search by KEY and then by TRACKINGKEY in order to get all of the tracking numbers.
So using KEY1 initially I eventually want to output myself a nice little string like "TRACKINGNUMBER1-1;TRACKINGNUMBER1-2;TRACKINGNUMBER1-3;TRACKINGNUMBER1-4;TRACKINGNUMBER1-5"
$ awk -v key=KEY1 -F, '$1==key{f=1} ($1!~/^ *$/)&&($1!=key){f=0} f{print $3}' file
TRACKINGNUMBER1-1
TRACKINGNUMBER1-2
TRACKINGNUMBER1-3
TRACKINGNUMBER1-4
TRACKINGNUMBER1-5
glennjackman helpfully points out that by using a "smarter" value for FS the internal logic can be simpler.
awk -v key=KEY1 -F' *,' '$1==key{f=1} $1 && $1!=key{f=0} f{print $3}' file
-v key=KEY1 assign the value KEY1 to the awk variable key
-F' *,' assign the value *, (which is a regular expression) to the awk FS variable (controls field splitting)
$1==key{f=1} if the first key of the line is equal to the value of the key variable (KEY1) then assign the value 1 to the variable f (find our first desired key line)
$1 && $1!=key{f=0} if the first field has a truth-y value (in awk a non-zero, non-empty string) and the value of the first field is not equal to the value of the key variable assign the value 0 to the variable f (find the end of our keyless lines)
f{print $3} if the variable f has a truth-y value (remember non-zero, non-empty string) then print the third field of the line
awk '/KEY1/ {print $3}' FS=,
Result
TRACKINGNUMBER1-1
TRACKINGNUMBER1-2
TRACKINGNUMBER1-3
TRACKINGNUMBER1-4
TRACKINGNUMBER1-5
$ sed -nr '/^KEY1/, /^KEY/ { /^(KEY1| )/!d; s/.*(TRACKINGNUMBER[^,]+).*/\1/ p}' input
TRACKINGNUMBER1-1
TRACKINGNUMBER1-2
TRACKINGNUMBER1-3
TRACKINGNUMBER1-4
TRACKINGNUMBER1-5
One more awk
awk -F, '/KEY1/,/KEY/{print $3}' file
or given the sample data
awk 'match($0,/([^,]+NUMBER1[^,]+)/,a){print a[0]}'
or even
awk -F, '$3~/NUMBER1/&&$0=$3' file

Awk using index with Substring

I have one command to cut string.
I wonder detail of control index of command in Linux "awk"
I have two different case.
I want to get word "Test" in below example string.
1. "Test-01-02-03"
2. "01-02-03-Test-Ref1-Ref2
First one I can get like
substr('Test-01-02-03',0,index('Test-01-02-03',"-"))
-> Then it will bring result only "test"
How about Second case I am not sure how can I get Test in that case using index function.
Do you have any idea about this using awk?
Thanks!
This is how to use index() to find/print a substring:
$ cat file
Test-01-02-03
01-02-03-Test-Ref1-Ref2
$ awk -v tgt="Test" 's=index($0,tgt){print substr($0,s,length(tgt))}' file
Test
Test
but that may not be the best solution for whatever your actual problem is.
For comparison here's how to do the equivalent with match() for an RE:
$ awk -v tgt="Test" 'match($0,tgt){print substr($0,RSTART,RLENGTH)}' file
Test
Test
and if you like the match() synopsis, here's how to write your own function to do it for strings:
awk -v tgt="Test" '
function strmatch(source,target) {
SSTART = index(source,target)
SLENGTH = length(target)
return SSTART
}
strmatch($0,tgt){print substr($0,SSTART,SLENGTH)}
' file
If these lines are the direct input to awk then the following work:
echo 'Test-01-02-03' | awk -F- '{print $1}' # First field
echo '01-02-03-Test-Ref1-Ref2' | awk -F- '{print $NF-2}' # Third field from the end.
If these lines are pulled out of a larger line in an awk script and need to be split again then the following snippets will do that:
str="Test-01-02-03"; split(str, a, /-/); print a[1]
str="01-02-03-Test-Ref1-Ref2"; numfields=split(str, a, /-/); print a[numfields-2]

Resources