How to select rows in which column two and three are not equal to each other and to 0 or 1?(with awk) - linux

I have a file like this:
AX-75448119 0 1
AX-75448118 0.45 0.487179
AX-75474642 0 0
AX-75474643 0.25 0.820513
AX-75448113 1 0
AX-75474641 1 1
and I want to select the rows that column 2 and 3 are not equal each other and 0 or 1 (both of them)! (i.e if column 2 and 3 are similar but equal to 0.5 (or any other number except 0 and 1) I would like to have that row)
so the output would be:
AX-75448119 0 1
AX-75448118 0.45 0.487179
AX-75474643 0.25 0.820513
AX-75448113 1 0
I know how to write the command to select the rows that column 2 and 3 are equal to each other and are equal to 0 or 1 which is this:
awk '$2=$3==1 || $2=$3==0' test.txt | wc -l
but I want exactly the opposite, to select every rows that are not the output of the above command!
Thanks, I hope I was able to explain what I want

It might work for you if I get your requirements correctly.
awk ' $2 != $3 { print; next } $2 == $3 && $2 != 0 && $2 != 1 { print }' INPUTFILE
See it in action at Ideone.com

This might work for you:(?)
awk '($2==0 || $2==1) && ($3==0 || $3==1) && $2==$3{next}1' file

Related

How to get the rows corespondent to a specific column values using linux command.

I have an o/p like below.I want the values of first column correspondent to a input value for second column.
Ex: in column 1, 0 and 1 belongs to 0 value of column 2.
So I need a command in which if I pass 0(second column values) I must get 0,1
dmpgdo dbsconfig 0 | grep AMP | grep Online | awk -F' ' '{print $1,$4}'
0 0
1 0
2 1
3 1
4 2
5 2
6 3
7 3
Will this do?
printf "0 0\n1 0\n2 1\n3 1\n4 2\n5 2\n6 3\n7 3" | awk '{if ($2 == 0) print $1}'
0
1

Print a selective column after subtraction from another file

I have two files with equal number of rows and columns. I would like to subtract the 2nd column in one file from the 2nd column in another file without considering the missing values. e.g.
ifile1.txt
3 5 2 2
1 ? 2 1
4 6 5 2
5 5 7 1
ifile2.txt
1 2 1 3
1 3 0 2
2 ? 5 1
0 0 1 1
Here "?" is the missing value and should not be considered in computation.
ofile.txt i.e. [$2(ifile1.txt) - $2(ifile2.txt)]
3
?
?
5
I could able to do it without any missing values in following way. But can't able to succeed with a missing value like here "?".
paste ifile1.txt ifile2.txt > ifile3.txt
awk '{n=NF/2; for (i=1;i<=n;i++) printf "%5.2f ", $i-$(i+n); print ""}' ifile3.txt > ifile4.txt
awk '{printf ("%.2f\n",$2)}' ifile4.txt > ofile.txt
$ awk 'NR==FNR{a[NR]=$2;next} {print ((a[FNR]$2)~/?/ ? "?" : a[FNR]-$2)}' file1 file2
3
?
?
5
POSIX shell script, and paste.
paste ifile[12].txt | \
while read a b c d e f g ; do \
[ "$b$f" -eq "$b$f" ] 2> /dev/null \
&& echo $(( b - f )) \
|| echo '?' ; \
done
Output:
3
?
?
5

Counting number of rows depending on more than 1 column condition

I have a data file like this
H1 H2 H3 E1 E2 E3 C1 C2 C3
0 0 0 0 0 0 0 0 1
1 0 0 0 1 0 0 0 1
0 1 0 0 1 0 1 0 1
now i want to count the rows where H1,H2,H3 has the same pattern as E1,E2 and E3. for example, i want to count the number of time H1,H2,H3 and E1,E2,E3 both are 010 or 000.
I tried to use this code but it doesnt really work
awk -F "" '!($1==0 && $2==1 && $3==0 && $4==0 && $5==1 && $6==0)' file | wc -l
Something like
>>> awk '$1$2$3 == $4$5$6' input | wc -l
2
What it does?
$1$2$3 == $4$5$6 Checks if the string formed by columns 1 2 and 3 is equal to the columns formed by 4 5 and 6. When it is true, awk takes the default action of printing the entire line and the wc takes care of counting those lines.
Or, if you want complete awk solution, you can write
>>> awk '$1$2$3 == $4$5$6{count++} END{print count}' input
2

Find rows common in more than two files using awk [duplicate]

This question already has answers here:
How to find common rows in multiple files using awk
(2 answers)
Closed 7 years ago.
I have tab delimited text files in which common rows between them are to be found based on columns 1 and 2 as key columns.
Sample files:
file1.txt
aba 0 0
abc 0 1
abd 1 1
xxx 0 0
file2.txt
xyz 0 0
aba 0 0 0 0
xxx 0 0
abc 1 1
file3.txt
xyx 0 0
aba 0 0
aba 0 1 0
xxx 0 0 0 1
abc 1 1
I would like to get rows common in 2 files or 3 files using columns 1 and 2 as key to search. For the common rows based on column 1 and 2 reporting the first occurrence in any file would do the job.
Sample Output for rows common in 2 files:
abc 1 1
Sample output for rows common in 3 files:
aba 0 0
xxx 0 0
In real scenario I do have to specify different values for number of files. Can anybody suggest a generalized solution to pass the value for number of files in which it has to be common.
I have this piece of code which looks for rows common in all files.
awk '
FNR == NR {
arr[$1,$2] = 1
line[$1,$2] = line[$1,$2] ( line[$1,$2] ? SUBSEP : "" ) $0
next
}
FNR == 1 { delete found }
{ if ( arr[$1,$2] && ! found[$1,$2] ) { arr[$1,$2]++; found[$1,$2] = 1 } }
END {
num_files = ARGC -1
for ( key in arr ) {
if ( arr[key] < num_files ) { continue }
split( line[ key ], line_arr, SUBSEP )
for ( i = 1; i <= length( line_arr ); i++ ) {
printf "%s\n", line_arr[ i ]
}
}
}
' *.txt > commoninall.txt
This should work:
cat file[123].txt | sort | awk 'BEGIN{FS="\t"; V1=""; V2=""}
{if (V1==$1 && V2==$2) { b=b+1 } else
{ print b":"$0; b=1; V1=$1; V2=$2} }' |grep "2:"|awk '
BEGIN{FS=":"} {print $2}'
I cat all file in one stream, sort the lines, check if the first two tab seperated colums are equal (if they are then print the line) and then filter out all duplicated lines.
BTW: I took this nice file[123].txt globbing idea from the comment of William Pursell.
This should work too
I put all the lines in an array (b) with two first values and accumulate in a the number of repetitions. If number > 1 it will be printed from b which has the last line saved for this pair combination column1/column2
cat *.txt | awk -F" " '{a[$1$2]=a[$1$2]+1; b[$1$2]=$0} END{ for (i in a){if(a[i]>1){print b[i]}}}'
Is it ok too?
EDIT
To show all lines in all files, you need just a little more:
cat *.txt | awk -F" " '{a[$1$2]=a[$1$2]+1; c=a[$1$2]; b[$1$2c]=$0} END{ for (i in a){if(a[i]>1){for(c=1; c<=a[i];++c){print b[i c]}}}}'
Very thanks to #PeterPaulKiefer for the cat *txt idea

Using an if/else statement in the middle of AWK

I have a 5-column file:
PS 6 15 0 1
PS 1 17 0 1
PS 4 18 0 1
that I would like to get it in this 7-column format:
PS.15 PS 6 N 1 0 1
PS.17 PS 1 P 1 0 1
PS.18 PS 4 N 1 0 1
To create 6 of the 7 columns requires just grabbing directly (and sometimes applying small arithmetic) from columns in the original file. However, to create one column (column 4) requires an if-else statement.
Specifically, to create new columns 1, 2, 3, I use:
cat File | awk '{print $1"."$3"\t"$1"\t"$2}'
and to create new columns 5, 6,7, I use:
cat testFileB | awk '{print $4+$5"\t"$4/($4+$5)"\t"$5/($4+$5)}'
and to create new column 4, I use:
cat testFileB | awk '{if ($2 == 1 || $2 == 2 || $2 == 3) print "P"; else print "N";}'
These three statements work fine independently and get me what I want (the correct values for the columns that are all separated by tabs). However, when I try to apply them simultaneously (create all 7 columns at once), I can only do so with unwanted new lines (instead of tabs) before and after column 4 (the if/else statement column):
For instance, my attempt to simultaneously create columns 1, 2, 3, 4:
cat File | awk '{print $1"."$3"\t"$1"\t"$2; if ($2 == 1 || $2 == 2 || $2 == 3) print "P"; else print "N";}'
results in unwanted new lines before column 4:
PS.15 PS 6
N
PS.17 PS 1
P
PS.18 PS 4
Similarly, my attempt to simultaneously create columns 4, 5, 6, 7:
cat File | awk '{if ($2 == 1 || $2 == 2 || $2 == 3) print "P"; else print "N"; print $4+$5"\t"$4/($4+$5)"\t"$5/($4+$5)}'
results in unwanted new lines after column 4:
N
1 0 1
P
1 0 1
N
1 0 1
Is there a solution so that I can create all 7 columns at once, and there are only tabs between them (no new lines)?
If you don't want automatic line feeds, you can just use printf instead of print. I'm not quite sure if you want a tab separating the N1 or not, but that's easy enough to adjust;
cat testfile | awk '{printf "%s.%s\t%s\t%s\t",$1,$3,$1,$2; if ($2 == 1 || $2 == 2 || $2 == 3) printf "P"; else printf "N"; print $4+$5"\t"$4/($4+$5)"\t"$5/($4+$5)}'
PS.15 PS 6 N1 0 1
PS.17 PS 1 P1 0 1
PS.18 PS 4 N1 0 1
Simply set your OFS (instead of repeating a \t all across the line), and use the ternary operator to print P or N:
$ awk -v OFS='\t' '{s=$4+$5;print $1"."$3,$1,$2,($2~/^[123]$/?"P":"N"),s,$4/s,$5/s}' file
PS.15 PS 6 N 1 0 1
PS.17 PS 1 P 1 0 1
PS.18 PS 4 N 1 0 1

Resources