How to use a bash command in awk - linux

Here is my problem
I have a File 1 where I have some data
Var1.1 Var1.2 Var1.3
Var2.1 Var2.2 Var2.3
Var3.1 Var3.2 Var3.3
And I have a File 2 that I would like edit thanks to the above data
File2 (1)
***pattern with Var2.1***
some text...
File2(2)
***pattern with Var2.1***
Here I want to add Var2.2 and Var2.3
some text
My first solution is to use AWK, but I don't know to include a bash command in. The AWK should make something like:
Search the pattern in the File2
When awk get it, awk calls a script which returns the wanted values from the File1.
Then awk can edit the File2
don't hesitate to explain me other possibilities if there are which are more simple !
Thank you !

This is how I run an external command from within awk to base64-decode a string:
cmd = "/usr/bin/base64 -i -d <<< " $2 " 2>/dev/null"
while ( ( cmd | getline result ) > 0 ) { }
close(cmd)
split(result, a, "[:=,]")
name=a[2]
Perhaps you can get some inspiration from it...

There's no need to run an external script to accomplish what you want. It can be done completely within a short AWK script.
awk 'FNR == NR {arr[$1] = $2 " " $3; next} {print; for (lookup in arr) {if ($0 ~ lookup) {split(arr[lookup], a); print "Here I want to add " a[1] " and " a[2]}}}' File1 File2
Explanation:
FNR == NR {arr[$1] = $2 " " $3; next} - Loop through the first file and save all the values in an array indexed by the first column. The record number equals the file record number for the first file.
print - Print every input line.
for (lookup in arr) {if ($0 ~ lookup) { - Loop through each of the array indices and see if the input line matches.
split(arr[lookup], a) - Split the value stored at the matched index into a temporary array.
print "Here I want to add " a[1] " and " a[2] - Print some text using the two values resulting from the split.

Related

changing two lines of a text file

I have a bash script which gets a text file as input and takes two parameters (Line N° one and line N° two), then changes both lines with each other in the text. Here is the code:
#!/bin/bash
awk -v var="$1" -v var1="$2" 'NR==var {
s=$0
for(i=var+1; i < var1 ; i++) {
getline; s1=s1?s1 "\n" $0:$0
}
getline; print; print s1 s
next
}1' Ham > newHam_changed.txt
It works fine for every two lines which are not consecutive. but for lines which follows after each other (for ex line 5 , 6) it works but creates a blank line between them. How can I fix that?
I think your actual script is not what you posted in the question. I think the line with all the prints contains:
print s1 "\n" s
The problem is that when the lines are consecutive, s1 will be empty (the for loop is skipped), but it will still print a newline before s, producing a blank line.
So you need to make that newline conditional.
awk -v var="4" -v var1="6" 'NR==var {
s=$0
for(i=var+1; i < var1 ; i++) {
getline; s1=s1?s1 "\n" $0:$0
}
getline; print; print (s1 ? s1 "\n" : "") s
next
}1' Ham > newHam_changed.txt
Using getline makes awk scripts always a bit complicated. It is better to prevent the use of getline and just make use of the awk pattern { action } syntax. This will make perfectly readable scripts. In any other language you would just do a loop and get the next line, but in awk I think it is best to make good use of this feature.
awk -v var="$1" -v var1="$2" '
NR==var {s=$0; collect=1; next;}
NR==var1 {collect=0; print; printf inbetween; print s}
collect {inbetween=inbetween""$0"\n"; next;}
1' Ham
Here I capture the first line in s when I found it and set the collect flag. This will trigger the collect block on the next iteration which collects all lines in between. Whenever the second line is found it sets the collect back to zero and prints first the current line, than the inbetween lines and then s. If the lines are consecutive inbetween is empty and printf will than do nothing.
Too complex for my taste, here is something quite simple that achieves the same task:
#!/bin/bash
ORIGFILE='original.txt' # original text file
PROCFILE='processed.txt' # copy of the original file to be proccesed
CHGL1=`sed "$1q;d" $ORIGFILE` # get original $1 line
CHGL2=`sed "$2q;d" $ORIGFILE` # get original $2 line
`cat $ORIGFILE > $PROCFILE`
sed -i "$2s/^.*/$CHGL1/" $PROCFILE # replace
sed -i "$1s/^.*/$CHGL2/" $PROCFILE # replace
More code doesn't mean more useful, keep it simple. This code do not use for and instead goes directly to the specific lines.
EDIT:
A simple way on one line to do this task:
printf '%s\n' 14m26 26-m14- w q | ed -s file
Found in this answer.

linux grep pattern in an unknown number of column

I have a text file with many rows and columns and I want to grep a column by the 'column name'.
M121 M125 M123 M124 M131 M126 M211 N
0.41463252 1.00296561 -0.1713496 0.15923644 -1.49682602 -1.9478695 1.45223392 …
-0.46775802 0.14591103 1.122446 0.83648981 -0.3038532 -1.1841548 2.18074729 …
0.67736835 2.12969375 -0.8187298 0.13582824 -1.49290987 -0.6798428 1.04353114 …
0.08673344 -0.40437672 1.8441559 -0.63679375 0.47998832 0.1702844 0.54029264 …
-0.32606297 -0.95551833 0.6157599 0.02819133 1.44818627 -0.9528659 0.09207864 …
-0.51781121 0.88806507 -0.2913757 -0.00463802 0.05037374 0.953773 0.01244763 …
-0.25724472 0.05119051 0.2109025 -0.26083822 -0.52094072 -0.938595 -0.01275275 …
1.94348766 -1.83607523 1.2010512 -0.54109756 -0.88323831 -0.6263788 -0.96973544 …
0.1900408 -0.61025656 0.4586306 -0.69181051 -0.90713834 0.3589271 0.6870383 …
0.54866057 -0.03861159 -1.505861 0.54871682 -0.24602601 -0.3941754 0.85673905 …
for example, I want to grep M211 column but I don't know the number of column. I tried:
awk '$i == "M211"' filename or awk '$0 == "M211"' filename
awk: illegal field $(), name "i"
input record number 1, filename
source line number 1
Is there any solution ? Thank you.
awk solution - iterates over column names for first line of input file and saves column number if it matches desired pattern. Then print that column. No output if match is not found
$ awk 'NR==1{ for(i=1;i<=NF;i++){if($i=="M125")c=i;} if(c==0)exit; }
{print $c}' ip.txt
M125
1.00296561
0.14591103
2.12969375
-0.40437672
-0.95551833
0.88806507
0.05119051
-1.83607523
-0.61025656
-0.03861159
Similar solution with perl
$ perl -lane '#i = grep {$F[$_] eq "M123"} 0..$#F if $.==1; exit if !#i;
print #F[#i]' ip.txt
M123
-0.1713496
1.122446
-0.8187298
1.8441559
0.6157599
-0.2913757
0.2109025
1.2010512
0.4586306
-1.505861
#i = grep {$F[$_] eq "M123"} 0..$#F if $.==1 for the header line, get index for which column value matches the string M123
exit if !#i exit if no match found
print #F[#i] print the matched column
assumes there'll be only one column match
for multiple matches, use
perl -lane '#i = grep {$F[$_] =~ /^(M121|M126)$/} 0..$#F if $.==1; exit if !#i;
print join " ", #F[#i]' ip.txt
Another in awk:
$ awk 'NR==1 {for(i=NF;i>0;i--) if($i=="M125") break; if(!i) exit} {print $i}' file
M125
1.00296561
0.14591103
2.12969375
-0.40437672
-0.95551833
0.88806507
0.05119051
-1.83607523
-0.61025656
-0.03861159
Explained:
NR==1 { # for the first record
for(i=NF;i>0;i--) # iterate fields backwards for change
if($i=="M125") break # until desired column, remember i
if (!i) exit # if column not found, exit
}
{print $i} # print value from ith field
If you are more familiar with Python:
import csv
column_name = "M125"
with open("file", "rb") as f:
data_dict = csv.DictReader(f, delimiter=" ")
print column_name
for item in data_dict:
print item[column_name]
To do anything with columns ("fields" in awk) by name rather than number you should first create an array that maps the field name to number and from then on just access the fields using that array indexed by the field name(s) rather than accessing them directly by field number(s):
$ awk 'NR==1{for (i=1;i<=NF;i++) f[$i]=i} {print $(f["M124"])}' file
M124
0.15923644
0.83648981
0.13582824
-0.63679375
0.02819133
-0.00463802
-0.26083822
-0.54109756
-0.69181051
0.54871682
or if you don't want to hard-code the column name:
$ awk -v c=M124 'NR==1{for (i=1;i<=NF;i++) f[$i]=i} {print $(f[c])}' file
M124
0.15923644
0.83648981
0.13582824
-0.63679375
0.02819133
-0.00463802
-0.26083822
-0.54109756
-0.69181051
0.54871682
and to print any number of columns in the order you choose:
$ awk -v cols='M129 M124' 'NR==1{for (i=1;i<=NF;i++) f[$i]=i; n=split(cols,c)} {for (i=1;i<=n;i++) printf "%s%s", $(f[c[i]]), (i<n ? OFS : ORS)}' file
M129 M124
1.45223392 0.15923644
2.18074729 0.83648981
1.04353114 0.13582824
0.54029264 -0.63679375
0.09207864 0.02819133
0.01244763 -0.00463802
-0.01275275 -0.26083822
-0.96973544 -0.54109756
0.6870383 -0.69181051
0.85673905 0.54871682

How to Compare CSV Column using awk?

I receive and CSV like this:
column$1,column$2,column$
john,P,10
john,P,10
john,A,20
john,T,30
john,T,10
marc,P,10
marc,C,10
marc,C,20
marc,T,30
marc,A,10
I need so sum the values and display the name and results but column$2 needs to show the sum of values T separated from values P,A,C.
Output should be this:
column$1,column$2,column$3,column$4
john,PCA,40
john,T,40,CORRECT
marc,PCA,50
marc,T,30,INCORRECT
All i could do was extract the columns i need from the original csv:
awk -F "|" '{print $8 "|" $9 "|" $4}' input.csv >> output.csv
Also sort by the correct column:
sort -t "|" -k1 input.csv >> output.csv
And add a new column to the end of the csv:
awk -F, '{NF=2}1' OFS="|" input.csv >> output.csv
I managed to sum and display the sum by column$1 and $2, but i don't how to group different values from column$2:
awk -F "," '{col[$1,$2]++} END {for(i in col) print i, col[i]}' file > output
Awk is stream oriented. It processes input and outputs what you change. It does not do in file changes.
You just need to add a corresponding print
awk '{if($2 == "T") {print "MATCHED"}}'
If you want to output more than the "matched" you need to add it to the print
e.g. '{print $1 "|" $2 "|" $3 "|" " MATCHED"}'
or use print $0 as comment mentions above.
Assuming that "CORRECT" and "INCORRECT" are determined by comparing the "PCA" value to the "T" value, the following awk script should do the trick:
awk -F, -vOFS=, '$2=="T"{t[$1]+=$3;n[$1]} $2!="T"{s[$1]+=$3;n[$1]} END{ for(i in n){print i,"PCA",s[i]; print i,"T",t[i],(t[i]==s[i] ? "CORRECT" : "INCORRECT")} }' inputfile
Broken out for easier reading, here's what this looks like:
awk -F, -vOFS=, '
$2=="T" { # match all records that are "T"
t[$1]+=$3 # add the value for this record to an array of totals
n[$1] # record this name in our authoritative name list
}
$2!="T" { # match all records that are NOT "T"
s[$1]+=$3 # add the value for this record to an array of sums
n[$1] # record this name too
}
END { # Now that we've collected data, analyse the results
for (i in n) { # step through our authoritative list of names
print i,"PCA",s[i]
print i,"T",t[i],(t[i]==s[i] ? "CORRECT" : "INCORRECT")
}
}
' inputfile
Note that array order is not guaranteed in awk, so your output may not come out in the same order as your input.
If you want your output to be delimited using vertical bars, change the -vOFS=, to -vOFS='|'.
Then you can sort using:
awk ... | sort
which defaults to -k1.

Exchange columns in bash

I have this file:
$ cat file
1515523 A45678BF141 A11269151
2234545 A45678BE145 A87979746
5432568 A45678B2123 A40629187
7234573 A45678B4154 A98879129
8889568 A45678B5123 A13409137
9234511 A45678B9176 A23589941
3904568 A45678B7123 A52329165
3234555 A45678B1169 A23589497
9643568 A45678B6123 A39969112
1234547 A45678B2132 A40579243
and this script:
cat file | awk '{FS = " "} {print $1" "$3" "$5}'| awk '{
n = split($3, a, "");
s = "";
for (i = 1; i <= n; i += 2) s = s a[i+1] a[i];
print $1, substr($2, length($2)-3, 4), s
}'| cut -d" " -f3,1 > output
And when I open the output with vi, I have:
1515523 F141 11621915^M
2234545 E145 78797964^M
5432568 2123 04261978^M
7234573 4154 89781992^M
8889568 5123 31041973^M
9234511 9176 32859914^M
3904568 7123 25231956^M
3234555 1169 32854979^M
9643568 6123 93691921^M
1234547 2132 04752934^M
I don't know why I obtain ^M, because when I intend to run the awk snippet:
cat imei | awk '{FS=" "} {print $2","$1}'
the output is mistaken, i.e., it does not exchange the columns, as it does not print the second column. Any ideas on what may be happening?
There are carriage returns (^M or Control-M) in the data file. It probably came from a Windows machine at some point.
When you print $2","$1 (which concatenates $2 with a string containing a comma and then $1 — it took me a couple of looks to see what it was really doing), the carriage return makes the second column overwrite the first.
Look at the data file with od -c or similar tools to see the carriage returns in it.
You can use dos2unix or tr or various other techniques to convert the file from DOS/Windows format to Unix format.
Also, given the data format shown, I'd expect not to use -F " " (or the FS = " ", which is equivalent), so that you have columns $1, $2, and $3, which is more obvious than working with columns 1, 3, 5 as shown. You could set OFS to double-blank if you wanted the output with two blanks between columns.
$ dos2unix file
$ awk '{split($3,a,""); print $1, substr($2,8), a[3]a[2]a[5]a[4]a[7]a[6]a[9]a[8]}' file
1515523 F141 11621915
2234545 E145 78797964
5432568 2123 04261978
7234573 4154 89781992
8889568 5123 31041973
9234511 9176 32859914
3904568 7123 25231956
3234555 1169 32854979
9643568 6123 93691921
1234547 2132 04752934
Since you are using awk you do not need a dos2unix.
simply insert
gsub(/\r/,"");
as a first statement in your awk Script
It cleans up each line read in. Subsequent matching or processing does not get any 'carriage return' characters.
How about a perl 'one liner' (with a continuation line)
$ dos2unix file
$ perl -lane \
'$xxxx = substr($F[1],-4);
#c = split(//,$F[2]);
print "$F[0] $xxxx $c[2]$c[1]$c[4]$c[3]$c[6]$c[5]$c[8]$c[7]"' file

find a string in a string using awk

here is column 6 in a file:
ttttttttttt
tttttttttt
ttttttttt
tttttttattt
tttttttttt
ttttttttttt
how can I use awk to print out lines that include "a"
If you only want to search the sixth column, use:
awk '$6 ~ /a/' file
If you want the whole line, any of these should work:
awk /a/ file
grep a file
sed '/^[^a]*$/d' file
If you wish to print only those lines in which 6th column contains a then this would work -
awk '$6~/a/' file
if it is an exact match (which yours is not) you're looking for:
$6 == "a"
http://www.pement.org/awk/awk1line.txt
is an excellent resource
awk can also tell you where the pattern is in the column:
awk '{++line_num}{ if ( match($6,"a")) { print "found a at position",RSTART, " line " ,line_num} }' file
though this example will only show the first "a" in column 6; a for loop would be needed to show all instances (I think)
You could try
gawk '{ if ( $1 ~ /a/ ) { print $1 } }' filename

Resources