How to use AWK to print line with highest number? - linux

I have a question. Assuming I dump a file and do a grep for foo and comes out the result like this:
Foo-bar-120:'foo name 1'
Foo-bar-130:'foo name 2'
Foo-bar-1222:'foo name 3'
Etc.
All I want is trying to extract the foo name with largest number. For instance in this case, largest number is 1222 and the result I expect is foo name 3
Is there a easy way using awk and sed to achieve this? Rather than pull the number out line by line and loop through to find the largest number?

Code for awk:
awk -F[-:] '$3>a {a=$3; b=$4} END {print b}' file
$ cat file
Foo-bar-120:'foo name 1'
Foo-bar-130:'foo name 2'
Foo-bar-1222:'foo name 3'
$ awk -F[-:] '$3>a {a=$3; b=$4} END {print b}' file
'foo name 3'

Here's how I would do it. I just tested this in Cygwin. Hopefully it works under Linux as well. Put this into a file, such as mycommand:
#!/usr/bin/awk -f
BEGIN {
FS="-";
max = 0;
maxString = "";
}
{
num = $3 + 0; # convert string to int
if (num > max) {
max = num;
split($3, arr, "'");
maxString = arr[2];
}
}
END {
print maxString;
}
Then make the file executable (chmod 755 mycommand). Now you can pipe whatever you want through it by typing, for example, cat somefile | ./mycommand.

Assuming the line format is as shown with 2 hyphens before "the number":
cut -d- -f3- | sort -rn | sed '1{s/^[0-9]\+://; q}'

is this ok for you?
awk -F'[:-]' '{n=$(NF-1);if(n>m){v=$NF;m=n}}END{print v}'
with your data:
kent$ echo "Foo-bar-120:’foo name 1’
Foo-bar-130:’foo name 2’
Foo-bar-1222:’foo name 3’"|awk -F'[:-]' '{n=$(NF-1);if(n>m){v=$NF;m=n}}END{print v}'
’foo name 3’
P.S. I like the Field separator [:-]

$ awk '{gsub(/.*:.|.$/,"")} (NR==1)||($NF>max){max=$NF; val=$0} END{print val}' file
foo name 3

You don't need to use grep. you can use awk directly on your file as:
awk -F"[-:]" '/Foo/ && $3>prev{val=$NF;prev=$3}END{print val}' file

Related

Linux cut, paste

I have to write a script file to cut the following column and paste it the end of the same row in a new .arff file. I guess the file type doesn't matter.
Current file:
63,male,typ_angina,145,233,t,left_vent_hyper,150,no,2.3,down,0,fixed_defect,'<50'
67,male,asympt,160,286,f,left_vent_hyper,108,yes,1.5,flat,3,normal,'>50_1'
The output should be:
male,typ_angina,145,233,t,left_vent_hyper,150,no,2.3,down,0,fixed_defect,'<50',63
male,asympt,160,286,f,left_vent_hyper,108,yes,1.5,flat,3,normal,'>50_1',67
how can I do this? using a Linux script file?
sed -r 's/^([^,]*),(.*)$/\2,\1/' Input_file
Brief explanation,
^([^,]*) would match the first field which separated by commas, and \1 behind refer to the match
(.*)$ would be the remainding part except the first comma, and \2 would refer to the match
Shorter awk solution:
$ awk -F, '{$(NF+1)=$1;sub($1",","")}1' OFS=, input.txt
gives:
male,typ_angina,145,233,t,left_vent_hyper,150,no,2.3,down,0,fixed_defect,'<50',63
male,asympt,160,286,f,left_vent_hyper,108,yes,1.5,flat,3,normal,'>50_1',67
Explanation:
{$(NF+1)=$1 # add extra field with value of field $1
sub($1",","") # search for string "$1," in $0, replace it with ""
}1 # print $0
EDIT: Reading your comments following your question, looks like your swapping more columns than just the first to the end of the line. You might consider using a swap function that you call multiple times:
func swap(i,j){s=$i; $i=$j; $j=s}
However, this won't work whenever you want to move a column to the end of the line. So let's change that function:
func swap(i,j){
s=$i
if (j>NF){
for (k=i;k<NF;k++) $k=$(k+1)
$NF=s
} else {
$i=$j
$j=s
}
}
So now you can do this:
$ cat tst.awk
BEGIN{FS=OFS=","}
{swap(1,NF+1); swap(2,5)}1
func swap(i,j){
s=$i
if (j>NF){
for (k=i;k<NF;k++) $k=$(k+1)
$NF=s
} else {
$i=$j
$j=s
}
}
and:
$ awk -f tst.awk input.txt
male,t,145,233,typ_angina,left_vent_hyper,150,no,2.3,down,0,fixed_defect,'<50',63
male,f,160,286,asympt,left_vent_hyper,108,yes,1.5,flat,3,normal,'>50_1',67
Why using sed or awk, the shell can handle this easily
while read l;do echo ${l#*,},${l%%,*};done <infile
If it's a win file with \r
while read l;do f=${l%[[:cntrl:]]};echo ${f#*,},${l%%,*};done <infile
If you want to keep the file in place.
printf "%s" "$(while read l;do f=${l%[[:cntrl:]]};printf "%s\n" "${f#*,},${l%%,*}";done <infile)">infile

linux grep pattern in an unknown number of column

I have a text file with many rows and columns and I want to grep a column by the 'column name'.
M121 M125 M123 M124 M131 M126 M211 N
0.41463252 1.00296561 -0.1713496 0.15923644 -1.49682602 -1.9478695 1.45223392 …
-0.46775802 0.14591103 1.122446 0.83648981 -0.3038532 -1.1841548 2.18074729 …
0.67736835 2.12969375 -0.8187298 0.13582824 -1.49290987 -0.6798428 1.04353114 …
0.08673344 -0.40437672 1.8441559 -0.63679375 0.47998832 0.1702844 0.54029264 …
-0.32606297 -0.95551833 0.6157599 0.02819133 1.44818627 -0.9528659 0.09207864 …
-0.51781121 0.88806507 -0.2913757 -0.00463802 0.05037374 0.953773 0.01244763 …
-0.25724472 0.05119051 0.2109025 -0.26083822 -0.52094072 -0.938595 -0.01275275 …
1.94348766 -1.83607523 1.2010512 -0.54109756 -0.88323831 -0.6263788 -0.96973544 …
0.1900408 -0.61025656 0.4586306 -0.69181051 -0.90713834 0.3589271 0.6870383 …
0.54866057 -0.03861159 -1.505861 0.54871682 -0.24602601 -0.3941754 0.85673905 …
for example, I want to grep M211 column but I don't know the number of column. I tried:
awk '$i == "M211"' filename or awk '$0 == "M211"' filename
awk: illegal field $(), name "i"
input record number 1, filename
source line number 1
Is there any solution ? Thank you.
awk solution - iterates over column names for first line of input file and saves column number if it matches desired pattern. Then print that column. No output if match is not found
$ awk 'NR==1{ for(i=1;i<=NF;i++){if($i=="M125")c=i;} if(c==0)exit; }
{print $c}' ip.txt
M125
1.00296561
0.14591103
2.12969375
-0.40437672
-0.95551833
0.88806507
0.05119051
-1.83607523
-0.61025656
-0.03861159
Similar solution with perl
$ perl -lane '#i = grep {$F[$_] eq "M123"} 0..$#F if $.==1; exit if !#i;
print #F[#i]' ip.txt
M123
-0.1713496
1.122446
-0.8187298
1.8441559
0.6157599
-0.2913757
0.2109025
1.2010512
0.4586306
-1.505861
#i = grep {$F[$_] eq "M123"} 0..$#F if $.==1 for the header line, get index for which column value matches the string M123
exit if !#i exit if no match found
print #F[#i] print the matched column
assumes there'll be only one column match
for multiple matches, use
perl -lane '#i = grep {$F[$_] =~ /^(M121|M126)$/} 0..$#F if $.==1; exit if !#i;
print join " ", #F[#i]' ip.txt
Another in awk:
$ awk 'NR==1 {for(i=NF;i>0;i--) if($i=="M125") break; if(!i) exit} {print $i}' file
M125
1.00296561
0.14591103
2.12969375
-0.40437672
-0.95551833
0.88806507
0.05119051
-1.83607523
-0.61025656
-0.03861159
Explained:
NR==1 { # for the first record
for(i=NF;i>0;i--) # iterate fields backwards for change
if($i=="M125") break # until desired column, remember i
if (!i) exit # if column not found, exit
}
{print $i} # print value from ith field
If you are more familiar with Python:
import csv
column_name = "M125"
with open("file", "rb") as f:
data_dict = csv.DictReader(f, delimiter=" ")
print column_name
for item in data_dict:
print item[column_name]
To do anything with columns ("fields" in awk) by name rather than number you should first create an array that maps the field name to number and from then on just access the fields using that array indexed by the field name(s) rather than accessing them directly by field number(s):
$ awk 'NR==1{for (i=1;i<=NF;i++) f[$i]=i} {print $(f["M124"])}' file
M124
0.15923644
0.83648981
0.13582824
-0.63679375
0.02819133
-0.00463802
-0.26083822
-0.54109756
-0.69181051
0.54871682
or if you don't want to hard-code the column name:
$ awk -v c=M124 'NR==1{for (i=1;i<=NF;i++) f[$i]=i} {print $(f[c])}' file
M124
0.15923644
0.83648981
0.13582824
-0.63679375
0.02819133
-0.00463802
-0.26083822
-0.54109756
-0.69181051
0.54871682
and to print any number of columns in the order you choose:
$ awk -v cols='M129 M124' 'NR==1{for (i=1;i<=NF;i++) f[$i]=i; n=split(cols,c)} {for (i=1;i<=n;i++) printf "%s%s", $(f[c[i]]), (i<n ? OFS : ORS)}' file
M129 M124
1.45223392 0.15923644
2.18074729 0.83648981
1.04353114 0.13582824
0.54029264 -0.63679375
0.09207864 0.02819133
0.01244763 -0.00463802
-0.01275275 -0.26083822
-0.96973544 -0.54109756
0.6870383 -0.69181051
0.85673905 0.54871682

Unix command to create new output file by combining 2 files based on condition

I have 2 files. Basically i want to match the column names from File 1 with the column name listed in the File 2. The resulting output File should have data for the column that matches with File 2 and Null value for the remaining column name in File 2.
Example:
file1
Name|Phone_Number|Location|Email
Jim|032131|xyz|xyz#qqq.com
Tim|037903|zzz|zzz#qqq.com
Pim|039141|xxz|xxz#qqq.com
File2
Location
Name
Age
Based on these 2 files, I want to create new file which has data in the below format:
Output:
Location|Name|Age
xyz|Jim|Null
zzz|Tim|Null
xxz|Pim|Null
Is there a way to get this result using join, awk or sed. I tried with join but couldnt get it working.
$ cat tst.awk
BEGIN { FS=OFS="|" }
NR==FNR { names[++numNames] = $0; next }
FNR==1 {
for (nameNr=1;nameNr<=numNames;nameNr++) {
name = names[nameNr]
printf "%s%s", name, (nameNr<numNames?OFS:ORS)
}
for (i=1;i<=NF;i++) {
name2fldNr[$i] = i
}
next
}
{
for (nameNr=1;nameNr<=numNames;nameNr++) {
name = names[nameNr]
fldNr = name2fldNr[name]
printf "%s%s", (fldNr?$fldNr:"Null"), (nameNr<numNames?OFS:ORS)
}
}
$ awk -f tst.awk file2 file1
Location|Name|Age
xyz|Jim|Null
zzz|Tim|Null
xxz|Pim|Null
Get the book Effective Awk Programming, 4th Edition, by Arnold Robbins.
I'd suggest using csvcut, which is part of CSVKit (https://csvkit.readthedocs.org), along the lines of the following:
#!/bin/bash
HEADERS=File2
PSV=File1
headers=$(tr '\n' , < "$HEADERS" | sed 's/,$//' )
awk '-F|' '
BEGIN {OFS=FS}
NR==1 {print $0,"Age"; next}
{print $0, "Null"}' "$PSV" ) |\
csvcut "-d|" -c "$headers"
I realize this may not be entirely satisfactory, but csvcut doesn't currently have options to handle missing columns or translate missing data to a specified value.

Replace a line in a file with a string

I have a file file1 with the following content
{"name":"clio5", "value":"13"}
{"name":"citroen_c4", "value":"23"}
{"name":"citroen_c3", "value":"12"}
{"name":"golf4", "value":"16"}
{"name":"golf3", "value":"8"}
I want to look for the line which contains the word clio5 and then replace the found line by the following string
string='{"name":"clio5", "value":"1568688554"}'
$ string='{"name":"clio5", "value":"1568688554"}'
$ awk -F'"(:|, *)"' -v string="$string" 'BEGIN{split(string,s)} {print ($2==s[2]?string:$0)}' file
{"name":"clio5", "value":"1568688554"}
{"name":"citroen_c4", "value":"23"}
{"name":"citroen_c3", "value":"12"}
{"name":"golf4", "value":"16"}
{"name":"golf3", "value":"8"}
$ string='{"name":"citroen_c3", "value":"1568688554"}'
$ awk -F'"(:|, *)"' -v string="$string" 'BEGIN{split(string,s)} {print ($2==s[2]?string:$0)}' file
{"name":"clio5", "value":"13"}
{"name":"citroen_c4", "value":"23"}
{"name":"citroen_c3", "value":"1568688554"}
{"name":"golf4", "value":"16"}
{"name":"golf3", "value":"8"}
Updated the above based on #dogbane's comment so it will work even if the text contains "s. It will still fail if the text can contain ":" (with appropriate escapes) but that seems highly unlikely and the OP can tell us if it's a valid concern.
First you extract the name part from your $string as
NAME=`echo $string | sed 's/[^:]*:"\([^"]*\).*/\1/'`
Then, use the $NAME to replace the string as
sed -i "/\<$NAME\>/s/.*/$string/" file1
Use awk like this:
awk -v str="$string" -F '[,{}:]+' '{
split(str, a);
if (a[3] ~ $3)
print str;
else print
}' file.json

find a string in a string using awk

here is column 6 in a file:
ttttttttttt
tttttttttt
ttttttttt
tttttttattt
tttttttttt
ttttttttttt
how can I use awk to print out lines that include "a"
If you only want to search the sixth column, use:
awk '$6 ~ /a/' file
If you want the whole line, any of these should work:
awk /a/ file
grep a file
sed '/^[^a]*$/d' file
If you wish to print only those lines in which 6th column contains a then this would work -
awk '$6~/a/' file
if it is an exact match (which yours is not) you're looking for:
$6 == "a"
http://www.pement.org/awk/awk1line.txt
is an excellent resource
awk can also tell you where the pattern is in the column:
awk '{++line_num}{ if ( match($6,"a")) { print "found a at position",RSTART, " line " ,line_num} }' file
though this example will only show the first "a" in column 6; a for loop would be needed to show all instances (I think)
You could try
gawk '{ if ( $1 ~ /a/ ) { print $1 } }' filename

Resources