find a string in a string using awk - string

here is column 6 in a file:
ttttttttttt
tttttttttt
ttttttttt
tttttttattt
tttttttttt
ttttttttttt
how can I use awk to print out lines that include "a"

If you only want to search the sixth column, use:
awk '$6 ~ /a/' file
If you want the whole line, any of these should work:
awk /a/ file
grep a file
sed '/^[^a]*$/d' file

If you wish to print only those lines in which 6th column contains a then this would work -
awk '$6~/a/' file

if it is an exact match (which yours is not) you're looking for:
$6 == "a"
http://www.pement.org/awk/awk1line.txt
is an excellent resource
awk can also tell you where the pattern is in the column:
awk '{++line_num}{ if ( match($6,"a")) { print "found a at position",RSTART, " line " ,line_num} }' file
though this example will only show the first "a" in column 6; a for loop would be needed to show all instances (I think)

You could try
gawk '{ if ( $1 ~ /a/ ) { print $1 } }' filename

Related

How to remove double quotes in a specific column by using sub() in AWK

My sample data is
cat > myfile
"a12","b112122","c12,d12"
a13,887988,c13,d13
a14,b14121,c79,d13
when I try to remove " from colum 2 by
awk -F, 'BEGIN { OFS = FS } $2 ~ /"/ { sub(/"/, "", $2) }1' myfile
"a12",b112122","c12,d12"
a13,887988,c13,d13
a14,b14121,c79,d13
It only remove only 1 comma, instead of b112122 i am getting b112122"
how to remove all " in 2nd column
From the documentation:
Search target, which is treated as a string, for the leftmost, longest substring matched by the regular expression regexp.[...] Return the number of substitutions made (zero or one).
It is quite clear that the function sub is doing at most one single replacement and does not replace all occurences.
Instead, use gsub:
Search target for all of the longest, leftmost, nonoverlapping matching substrings it can find and replace them with replacement. The ‘g’ in gsub() stands for “global,” which means replace everywhere.
So you can add a 'g' to your line and it works fine:
awk -F, 'BEGIN { OFS = FS } $2 ~ /"/ { gsub(/"/, "", $2) }1' myfile
When you dealing with CSV file, not using FPAT, it will break sooner or later.
Here is a gnu awk that does the jib.
awk -v OFS="," -v FPAT="([^,]+)|(\"[^\"]+\")" '{gsub(/"/,"",$2)}1' file
"a12",b112122,"c12,d12"
a13,887988,c13,d13
a14,b14121,c79,d13
It will work fine on any column, number 3 as well.
Example on remove " on column 3 at the same time change separator to |
awk -v OFS="|" -v FPAT="([^,]+)|(\"[^\"]+\")" '{gsub(/"/,"",$3);$1=$1}1' file
"a12"|"b112122"|c12,d12
a13|887988|c13|d13
a14|b14121|c79|d13

Linux cut, paste

I have to write a script file to cut the following column and paste it the end of the same row in a new .arff file. I guess the file type doesn't matter.
Current file:
63,male,typ_angina,145,233,t,left_vent_hyper,150,no,2.3,down,0,fixed_defect,'<50'
67,male,asympt,160,286,f,left_vent_hyper,108,yes,1.5,flat,3,normal,'>50_1'
The output should be:
male,typ_angina,145,233,t,left_vent_hyper,150,no,2.3,down,0,fixed_defect,'<50',63
male,asympt,160,286,f,left_vent_hyper,108,yes,1.5,flat,3,normal,'>50_1',67
how can I do this? using a Linux script file?
sed -r 's/^([^,]*),(.*)$/\2,\1/' Input_file
Brief explanation,
^([^,]*) would match the first field which separated by commas, and \1 behind refer to the match
(.*)$ would be the remainding part except the first comma, and \2 would refer to the match
Shorter awk solution:
$ awk -F, '{$(NF+1)=$1;sub($1",","")}1' OFS=, input.txt
gives:
male,typ_angina,145,233,t,left_vent_hyper,150,no,2.3,down,0,fixed_defect,'<50',63
male,asympt,160,286,f,left_vent_hyper,108,yes,1.5,flat,3,normal,'>50_1',67
Explanation:
{$(NF+1)=$1 # add extra field with value of field $1
sub($1",","") # search for string "$1," in $0, replace it with ""
}1 # print $0
EDIT: Reading your comments following your question, looks like your swapping more columns than just the first to the end of the line. You might consider using a swap function that you call multiple times:
func swap(i,j){s=$i; $i=$j; $j=s}
However, this won't work whenever you want to move a column to the end of the line. So let's change that function:
func swap(i,j){
s=$i
if (j>NF){
for (k=i;k<NF;k++) $k=$(k+1)
$NF=s
} else {
$i=$j
$j=s
}
}
So now you can do this:
$ cat tst.awk
BEGIN{FS=OFS=","}
{swap(1,NF+1); swap(2,5)}1
func swap(i,j){
s=$i
if (j>NF){
for (k=i;k<NF;k++) $k=$(k+1)
$NF=s
} else {
$i=$j
$j=s
}
}
and:
$ awk -f tst.awk input.txt
male,t,145,233,typ_angina,left_vent_hyper,150,no,2.3,down,0,fixed_defect,'<50',63
male,f,160,286,asympt,left_vent_hyper,108,yes,1.5,flat,3,normal,'>50_1',67
Why using sed or awk, the shell can handle this easily
while read l;do echo ${l#*,},${l%%,*};done <infile
If it's a win file with \r
while read l;do f=${l%[[:cntrl:]]};echo ${f#*,},${l%%,*};done <infile
If you want to keep the file in place.
printf "%s" "$(while read l;do f=${l%[[:cntrl:]]};printf "%s\n" "${f#*,},${l%%,*}";done <infile)">infile

linux grep pattern in an unknown number of column

I have a text file with many rows and columns and I want to grep a column by the 'column name'.
M121 M125 M123 M124 M131 M126 M211 N
0.41463252 1.00296561 -0.1713496 0.15923644 -1.49682602 -1.9478695 1.45223392 …
-0.46775802 0.14591103 1.122446 0.83648981 -0.3038532 -1.1841548 2.18074729 …
0.67736835 2.12969375 -0.8187298 0.13582824 -1.49290987 -0.6798428 1.04353114 …
0.08673344 -0.40437672 1.8441559 -0.63679375 0.47998832 0.1702844 0.54029264 …
-0.32606297 -0.95551833 0.6157599 0.02819133 1.44818627 -0.9528659 0.09207864 …
-0.51781121 0.88806507 -0.2913757 -0.00463802 0.05037374 0.953773 0.01244763 …
-0.25724472 0.05119051 0.2109025 -0.26083822 -0.52094072 -0.938595 -0.01275275 …
1.94348766 -1.83607523 1.2010512 -0.54109756 -0.88323831 -0.6263788 -0.96973544 …
0.1900408 -0.61025656 0.4586306 -0.69181051 -0.90713834 0.3589271 0.6870383 …
0.54866057 -0.03861159 -1.505861 0.54871682 -0.24602601 -0.3941754 0.85673905 …
for example, I want to grep M211 column but I don't know the number of column. I tried:
awk '$i == "M211"' filename or awk '$0 == "M211"' filename
awk: illegal field $(), name "i"
input record number 1, filename
source line number 1
Is there any solution ? Thank you.
awk solution - iterates over column names for first line of input file and saves column number if it matches desired pattern. Then print that column. No output if match is not found
$ awk 'NR==1{ for(i=1;i<=NF;i++){if($i=="M125")c=i;} if(c==0)exit; }
{print $c}' ip.txt
M125
1.00296561
0.14591103
2.12969375
-0.40437672
-0.95551833
0.88806507
0.05119051
-1.83607523
-0.61025656
-0.03861159
Similar solution with perl
$ perl -lane '#i = grep {$F[$_] eq "M123"} 0..$#F if $.==1; exit if !#i;
print #F[#i]' ip.txt
M123
-0.1713496
1.122446
-0.8187298
1.8441559
0.6157599
-0.2913757
0.2109025
1.2010512
0.4586306
-1.505861
#i = grep {$F[$_] eq "M123"} 0..$#F if $.==1 for the header line, get index for which column value matches the string M123
exit if !#i exit if no match found
print #F[#i] print the matched column
assumes there'll be only one column match
for multiple matches, use
perl -lane '#i = grep {$F[$_] =~ /^(M121|M126)$/} 0..$#F if $.==1; exit if !#i;
print join " ", #F[#i]' ip.txt
Another in awk:
$ awk 'NR==1 {for(i=NF;i>0;i--) if($i=="M125") break; if(!i) exit} {print $i}' file
M125
1.00296561
0.14591103
2.12969375
-0.40437672
-0.95551833
0.88806507
0.05119051
-1.83607523
-0.61025656
-0.03861159
Explained:
NR==1 { # for the first record
for(i=NF;i>0;i--) # iterate fields backwards for change
if($i=="M125") break # until desired column, remember i
if (!i) exit # if column not found, exit
}
{print $i} # print value from ith field
If you are more familiar with Python:
import csv
column_name = "M125"
with open("file", "rb") as f:
data_dict = csv.DictReader(f, delimiter=" ")
print column_name
for item in data_dict:
print item[column_name]
To do anything with columns ("fields" in awk) by name rather than number you should first create an array that maps the field name to number and from then on just access the fields using that array indexed by the field name(s) rather than accessing them directly by field number(s):
$ awk 'NR==1{for (i=1;i<=NF;i++) f[$i]=i} {print $(f["M124"])}' file
M124
0.15923644
0.83648981
0.13582824
-0.63679375
0.02819133
-0.00463802
-0.26083822
-0.54109756
-0.69181051
0.54871682
or if you don't want to hard-code the column name:
$ awk -v c=M124 'NR==1{for (i=1;i<=NF;i++) f[$i]=i} {print $(f[c])}' file
M124
0.15923644
0.83648981
0.13582824
-0.63679375
0.02819133
-0.00463802
-0.26083822
-0.54109756
-0.69181051
0.54871682
and to print any number of columns in the order you choose:
$ awk -v cols='M129 M124' 'NR==1{for (i=1;i<=NF;i++) f[$i]=i; n=split(cols,c)} {for (i=1;i<=n;i++) printf "%s%s", $(f[c[i]]), (i<n ? OFS : ORS)}' file
M129 M124
1.45223392 0.15923644
2.18074729 0.83648981
1.04353114 0.13582824
0.54029264 -0.63679375
0.09207864 0.02819133
0.01244763 -0.00463802
-0.01275275 -0.26083822
-0.96973544 -0.54109756
0.6870383 -0.69181051
0.85673905 0.54871682

How to use AWK to print line with highest number?

I have a question. Assuming I dump a file and do a grep for foo and comes out the result like this:
Foo-bar-120:'foo name 1'
Foo-bar-130:'foo name 2'
Foo-bar-1222:'foo name 3'
Etc.
All I want is trying to extract the foo name with largest number. For instance in this case, largest number is 1222 and the result I expect is foo name 3
Is there a easy way using awk and sed to achieve this? Rather than pull the number out line by line and loop through to find the largest number?
Code for awk:
awk -F[-:] '$3>a {a=$3; b=$4} END {print b}' file
$ cat file
Foo-bar-120:'foo name 1'
Foo-bar-130:'foo name 2'
Foo-bar-1222:'foo name 3'
$ awk -F[-:] '$3>a {a=$3; b=$4} END {print b}' file
'foo name 3'
Here's how I would do it. I just tested this in Cygwin. Hopefully it works under Linux as well. Put this into a file, such as mycommand:
#!/usr/bin/awk -f
BEGIN {
FS="-";
max = 0;
maxString = "";
}
{
num = $3 + 0; # convert string to int
if (num > max) {
max = num;
split($3, arr, "'");
maxString = arr[2];
}
}
END {
print maxString;
}
Then make the file executable (chmod 755 mycommand). Now you can pipe whatever you want through it by typing, for example, cat somefile | ./mycommand.
Assuming the line format is as shown with 2 hyphens before "the number":
cut -d- -f3- | sort -rn | sed '1{s/^[0-9]\+://; q}'
is this ok for you?
awk -F'[:-]' '{n=$(NF-1);if(n>m){v=$NF;m=n}}END{print v}'
with your data:
kent$ echo "Foo-bar-120:’foo name 1’
Foo-bar-130:’foo name 2’
Foo-bar-1222:’foo name 3’"|awk -F'[:-]' '{n=$(NF-1);if(n>m){v=$NF;m=n}}END{print v}'
’foo name 3’
P.S. I like the Field separator [:-]
$ awk '{gsub(/.*:.|.$/,"")} (NR==1)||($NF>max){max=$NF; val=$0} END{print val}' file
foo name 3
You don't need to use grep. you can use awk directly on your file as:
awk -F"[-:]" '/Foo/ && $3>prev{val=$NF;prev=$3}END{print val}' file

How to use a bash command in awk

Here is my problem
I have a File 1 where I have some data
Var1.1 Var1.2 Var1.3
Var2.1 Var2.2 Var2.3
Var3.1 Var3.2 Var3.3
And I have a File 2 that I would like edit thanks to the above data
File2 (1)
***pattern with Var2.1***
some text...
File2(2)
***pattern with Var2.1***
Here I want to add Var2.2 and Var2.3
some text
My first solution is to use AWK, but I don't know to include a bash command in. The AWK should make something like:
Search the pattern in the File2
When awk get it, awk calls a script which returns the wanted values from the File1.
Then awk can edit the File2
don't hesitate to explain me other possibilities if there are which are more simple !
Thank you !
This is how I run an external command from within awk to base64-decode a string:
cmd = "/usr/bin/base64 -i -d <<< " $2 " 2>/dev/null"
while ( ( cmd | getline result ) > 0 ) { }
close(cmd)
split(result, a, "[:=,]")
name=a[2]
Perhaps you can get some inspiration from it...
There's no need to run an external script to accomplish what you want. It can be done completely within a short AWK script.
awk 'FNR == NR {arr[$1] = $2 " " $3; next} {print; for (lookup in arr) {if ($0 ~ lookup) {split(arr[lookup], a); print "Here I want to add " a[1] " and " a[2]}}}' File1 File2
Explanation:
FNR == NR {arr[$1] = $2 " " $3; next} - Loop through the first file and save all the values in an array indexed by the first column. The record number equals the file record number for the first file.
print - Print every input line.
for (lookup in arr) {if ($0 ~ lookup) { - Loop through each of the array indices and see if the input line matches.
split(arr[lookup], a) - Split the value stored at the matched index into a temporary array.
print "Here I want to add " a[1] " and " a[2] - Print some text using the two values resulting from the split.

Resources