Using an if/else statement in the middle of AWK - linux

I have a 5-column file:
PS 6 15 0 1
PS 1 17 0 1
PS 4 18 0 1
that I would like to get it in this 7-column format:
PS.15 PS 6 N 1 0 1
PS.17 PS 1 P 1 0 1
PS.18 PS 4 N 1 0 1
To create 6 of the 7 columns requires just grabbing directly (and sometimes applying small arithmetic) from columns in the original file. However, to create one column (column 4) requires an if-else statement.
Specifically, to create new columns 1, 2, 3, I use:
cat File | awk '{print $1"."$3"\t"$1"\t"$2}'
and to create new columns 5, 6,7, I use:
cat testFileB | awk '{print $4+$5"\t"$4/($4+$5)"\t"$5/($4+$5)}'
and to create new column 4, I use:
cat testFileB | awk '{if ($2 == 1 || $2 == 2 || $2 == 3) print "P"; else print "N";}'
These three statements work fine independently and get me what I want (the correct values for the columns that are all separated by tabs). However, when I try to apply them simultaneously (create all 7 columns at once), I can only do so with unwanted new lines (instead of tabs) before and after column 4 (the if/else statement column):
For instance, my attempt to simultaneously create columns 1, 2, 3, 4:
cat File | awk '{print $1"."$3"\t"$1"\t"$2; if ($2 == 1 || $2 == 2 || $2 == 3) print "P"; else print "N";}'
results in unwanted new lines before column 4:
PS.15 PS 6
N
PS.17 PS 1
P
PS.18 PS 4
Similarly, my attempt to simultaneously create columns 4, 5, 6, 7:
cat File | awk '{if ($2 == 1 || $2 == 2 || $2 == 3) print "P"; else print "N"; print $4+$5"\t"$4/($4+$5)"\t"$5/($4+$5)}'
results in unwanted new lines after column 4:
N
1 0 1
P
1 0 1
N
1 0 1
Is there a solution so that I can create all 7 columns at once, and there are only tabs between them (no new lines)?

If you don't want automatic line feeds, you can just use printf instead of print. I'm not quite sure if you want a tab separating the N1 or not, but that's easy enough to adjust;
cat testfile | awk '{printf "%s.%s\t%s\t%s\t",$1,$3,$1,$2; if ($2 == 1 || $2 == 2 || $2 == 3) printf "P"; else printf "N"; print $4+$5"\t"$4/($4+$5)"\t"$5/($4+$5)}'
PS.15 PS 6 N1 0 1
PS.17 PS 1 P1 0 1
PS.18 PS 4 N1 0 1

Simply set your OFS (instead of repeating a \t all across the line), and use the ternary operator to print P or N:
$ awk -v OFS='\t' '{s=$4+$5;print $1"."$3,$1,$2,($2~/^[123]$/?"P":"N"),s,$4/s,$5/s}' file
PS.15 PS 6 N 1 0 1
PS.17 PS 1 P 1 0 1
PS.18 PS 4 N 1 0 1

Related

Insert a row and a column in a matrix using awk

I have a gridded dataset with 250 rows x 300 columns in matrix form:
ifile.txt
2 3 4 1 2 3
3 4 5 2 4 6
2 4 0 5 0 7
0 0 5 6 3 8
I would like to insert the latitude values at the first column and longitude values at the top. Which looks like:
ofile.txt
20.00 20.33 20.66 20.99 21.32 21.65
100.00 2 3 4 1 2 3
100.33 3 4 5 2 4 6
100.66 2 4 0 5 0 7
100.99 0 0 5 6 3 8
The increment is 0.33
I can do it for a small size matrix in manually, but I can't able to get any idea how to get my output in my desired format. I was writing a script in the following way, but completely useless.
echo 20 > latitude.txt
for i in `seq 1 250`;do
i1=$(( i + 0.33 )) #bash can't recognize fractions
echo $i1 >> latitude.txt
done
echo 100 > longitude.txt
for j in `seq 1 300`;do
j1=$(( j + 0.33 ))
echo $j1 >> longitude.txt
done
paste longitude.txt ifile.txt > dummy_file.txt
cat latitude.txt dummy_file.txt > ofile.txt
$ cat tst.awk
BEGIN {
lat = 100
lon = 20
latWid = lonWid = 6
latDel = lonDel = 0.33
latFmt = lonFmt = "%*.2f"
}
NR==1 {
printf "%*s", latWid, ""
for (i=1; i<=NF; i++) {
printf lonFmt, lonWid, lon
lon += lonDel
}
print ""
}
{
printf latFmt, latWid, lat
lat += latDel
for (i=1; i<=NF; i++) {
printf "%*s", lonWid, $i
}
print ""
}
$ awk -f tst.awk file
20.00 20.33 20.66 20.99 21.32 21.65
100.00 2 3 4 1 2 3
100.33 3 4 5 2 4 6
100.66 2 4 0 5 0 7
100.99 0 0 5 6 3 8
Following awk may also help you on same.
awk -v col=100 -v row=20 'FNR==1{printf OFS;for(i=1;i<=NF;i++){printf row OFS;row=row+.33;};print ""} {col+=.33;$1=$1;print col OFS $0}' OFS="\t" Input_file
Adding non one liner form of above solution too now:
awk -v col=100 -v row=20 '
FNR==1{
printf OFS;
for(i=1;i<=NF;i++){
printf row OFS;
row=row+.33;
};
print ""
}
{
col+=.33;
$1=$1;
print col OFS $0
}
' OFS="\t" Input_file
Awk solution:
awk 'NR == 1{
long = 20.00; lat = 100.00; printf "%12s%.2f", "", long;
for (i=1; i<NF; i++) { long += 0.33; printf "\t%.2f", long } print "" }
NR > 1{ lat += 0.33 }
{
printf "%.2f%6s", lat, "";
for (i=1; i<=NF; i++) printf "\t%d", $i; print ""
}' file
With perl
$ perl -lane 'print join "\t", "", map {20.00+$_*0.33} 0..$#F if $.==1;
print join "\t", 100+(0.33*$i++), #F' ip.txt
20 20.33 20.66 20.99 21.32 21.65
100 2 3 4 1 2 3
100.33 3 4 5 2 4 6
100.66 2 4 0 5 0 7
100.99 0 0 5 6 3 8
-a to auto-split input on whitespaces, result saved in #F array
See https://perldoc.perl.org/perlrun.html#Command-Switches for details on command line options
if $.==1 for the first line of input
map {20.00+$_*0.33} 0..$#F iterate based on size of #F array, and for each iteration, we get a value based on equation inside {} where $_ will be 0, 1, etc upto last index of #F array
print join "\t", "", map... use tab separator to print empty element and results of map
For all the lines, print contents of #F array pre-fixed with results of 100+(0.33*$i++) where $i will be initially 0 in numeric context. Again, tab is used as separator while joining these values
Use sprintf if needed for formatting, also $, can be initialized instead of using join
perl -lane 'BEGIN{$,="\t"; $st=0.33}
print "", map { sprintf "%.2f", 20+$_*$st} 0..$#F if $.==1;
print sprintf("%.2f", 100+($st*$i++)), #F' ip.txt

How to get the rows corespondent to a specific column values using linux command.

I have an o/p like below.I want the values of first column correspondent to a input value for second column.
Ex: in column 1, 0 and 1 belongs to 0 value of column 2.
So I need a command in which if I pass 0(second column values) I must get 0,1
dmpgdo dbsconfig 0 | grep AMP | grep Online | awk -F' ' '{print $1,$4}'
0 0
1 0
2 1
3 1
4 2
5 2
6 3
7 3
Will this do?
printf "0 0\n1 0\n2 1\n3 1\n4 2\n5 2\n6 3\n7 3" | awk '{if ($2 == 0) print $1}'
0
1

Print a selective column after subtraction from another file

I have two files with equal number of rows and columns. I would like to subtract the 2nd column in one file from the 2nd column in another file without considering the missing values. e.g.
ifile1.txt
3 5 2 2
1 ? 2 1
4 6 5 2
5 5 7 1
ifile2.txt
1 2 1 3
1 3 0 2
2 ? 5 1
0 0 1 1
Here "?" is the missing value and should not be considered in computation.
ofile.txt i.e. [$2(ifile1.txt) - $2(ifile2.txt)]
3
?
?
5
I could able to do it without any missing values in following way. But can't able to succeed with a missing value like here "?".
paste ifile1.txt ifile2.txt > ifile3.txt
awk '{n=NF/2; for (i=1;i<=n;i++) printf "%5.2f ", $i-$(i+n); print ""}' ifile3.txt > ifile4.txt
awk '{printf ("%.2f\n",$2)}' ifile4.txt > ofile.txt
$ awk 'NR==FNR{a[NR]=$2;next} {print ((a[FNR]$2)~/?/ ? "?" : a[FNR]-$2)}' file1 file2
3
?
?
5
POSIX shell script, and paste.
paste ifile[12].txt | \
while read a b c d e f g ; do \
[ "$b$f" -eq "$b$f" ] 2> /dev/null \
&& echo $(( b - f )) \
|| echo '?' ; \
done
Output:
3
?
?
5

Find rows common in more than two files using awk [duplicate]

This question already has answers here:
How to find common rows in multiple files using awk
(2 answers)
Closed 7 years ago.
I have tab delimited text files in which common rows between them are to be found based on columns 1 and 2 as key columns.
Sample files:
file1.txt
aba 0 0
abc 0 1
abd 1 1
xxx 0 0
file2.txt
xyz 0 0
aba 0 0 0 0
xxx 0 0
abc 1 1
file3.txt
xyx 0 0
aba 0 0
aba 0 1 0
xxx 0 0 0 1
abc 1 1
I would like to get rows common in 2 files or 3 files using columns 1 and 2 as key to search. For the common rows based on column 1 and 2 reporting the first occurrence in any file would do the job.
Sample Output for rows common in 2 files:
abc 1 1
Sample output for rows common in 3 files:
aba 0 0
xxx 0 0
In real scenario I do have to specify different values for number of files. Can anybody suggest a generalized solution to pass the value for number of files in which it has to be common.
I have this piece of code which looks for rows common in all files.
awk '
FNR == NR {
arr[$1,$2] = 1
line[$1,$2] = line[$1,$2] ( line[$1,$2] ? SUBSEP : "" ) $0
next
}
FNR == 1 { delete found }
{ if ( arr[$1,$2] && ! found[$1,$2] ) { arr[$1,$2]++; found[$1,$2] = 1 } }
END {
num_files = ARGC -1
for ( key in arr ) {
if ( arr[key] < num_files ) { continue }
split( line[ key ], line_arr, SUBSEP )
for ( i = 1; i <= length( line_arr ); i++ ) {
printf "%s\n", line_arr[ i ]
}
}
}
' *.txt > commoninall.txt
This should work:
cat file[123].txt | sort | awk 'BEGIN{FS="\t"; V1=""; V2=""}
{if (V1==$1 && V2==$2) { b=b+1 } else
{ print b":"$0; b=1; V1=$1; V2=$2} }' |grep "2:"|awk '
BEGIN{FS=":"} {print $2}'
I cat all file in one stream, sort the lines, check if the first two tab seperated colums are equal (if they are then print the line) and then filter out all duplicated lines.
BTW: I took this nice file[123].txt globbing idea from the comment of William Pursell.
This should work too
I put all the lines in an array (b) with two first values and accumulate in a the number of repetitions. If number > 1 it will be printed from b which has the last line saved for this pair combination column1/column2
cat *.txt | awk -F" " '{a[$1$2]=a[$1$2]+1; b[$1$2]=$0} END{ for (i in a){if(a[i]>1){print b[i]}}}'
Is it ok too?
EDIT
To show all lines in all files, you need just a little more:
cat *.txt | awk -F" " '{a[$1$2]=a[$1$2]+1; c=a[$1$2]; b[$1$2c]=$0} END{ for (i in a){if(a[i]>1){for(c=1; c<=a[i];++c){print b[i c]}}}}'
Very thanks to #PeterPaulKiefer for the cat *txt idea

How to select rows in which column two and three are not equal to each other and to 0 or 1?(with awk)

I have a file like this:
AX-75448119 0 1
AX-75448118 0.45 0.487179
AX-75474642 0 0
AX-75474643 0.25 0.820513
AX-75448113 1 0
AX-75474641 1 1
and I want to select the rows that column 2 and 3 are not equal each other and 0 or 1 (both of them)! (i.e if column 2 and 3 are similar but equal to 0.5 (or any other number except 0 and 1) I would like to have that row)
so the output would be:
AX-75448119 0 1
AX-75448118 0.45 0.487179
AX-75474643 0.25 0.820513
AX-75448113 1 0
I know how to write the command to select the rows that column 2 and 3 are equal to each other and are equal to 0 or 1 which is this:
awk '$2=$3==1 || $2=$3==0' test.txt | wc -l
but I want exactly the opposite, to select every rows that are not the output of the above command!
Thanks, I hope I was able to explain what I want
It might work for you if I get your requirements correctly.
awk ' $2 != $3 { print; next } $2 == $3 && $2 != 0 && $2 != 1 { print }' INPUTFILE
See it in action at Ideone.com
This might work for you:(?)
awk '($2==0 || $2==1) && ($3==0 || $3==1) && $2==$3{next}1' file

Resources