awk for comparing, selecting and processing columns - linux

I have a list list.txt
1 10691 0.12 54 + 1 10692 0.13 55 -
2 10720 0.23 -1 + 2 10721 0.13 43 -
3 10832 0.43 123 + 3 10833 0.13 88 -
4 11032 0.22 -1 + 4 11033 0.13 -1 -
5 11248 0.12 45 + 5 11249 0.13 -1 -
6 15214 0.88 33 + 6 15215 0.13 45 -
I wish to extract data from columns 3 ($3) and 8 ($8) using a few rules:
Compare columns 4 ($4) and 9 ($9)
i) If both are negative, output "-1".
ii) If $4 < 0 and $9 > 0, output $3; If $4 > 0 and $9 < 0, output $8.
iii) If both $4 and $ 9 >0, output $3+$8
So I tried something like this:
awk '{a[$4]; b[$9]}
END{
for (x in a) {
for (y in b) {
if (x >0 && y >0) {
print $3+$8
}
else if (x >0 && y <=0) {
print $3;
}
else if (x <= 0 && y >0) {
print $8;
}
else if (x <=0 && y <=0) {
print "-1";
}
}
}
}' list.txt
Somehow this script doesn't give the correct number of lines (should be equal to list.txt) or the right data :(
Using list.txt one should get
0.25
0.13
0.56
-1
0.12
1.01

By using the nested for loops, you are comparing all the values of column 4 with all the values of column 9 instead of comparing the values on corresponding rows.
Working with each line as it is read is probably more what you want:
awk '{
x=$4; y=$9;
if (x >0 && y >0) {
print $3+$8
}
else if (x >0 && y <=0) {
print $3;
}
else if (x <= 0 && y >0) {
print $8;
}
else if (x <=0 && y <=0) {
print "-1";
}
}'

although there is an accepted answer I don't think it's idiomatic awk. You definitely can get rid of if/else blocks and should.
$ awk '{x=$4>0;y=$9>0} x&&y{w=$3+$8} x&&!y{w=$3} !x&&y{w=$8} !x&&!y{w=-1} {print w}' xy
0.25
0.13
0.56
-1
0.12
1.01
better yet
$ awk '{x=$4>0;y=$9>0} x{w=$3} y{w+=$8} !x&&!y{w=-1} {print w}' xy

Related

Insert a row and a column in a matrix using awk

I have a gridded dataset with 250 rows x 300 columns in matrix form:
ifile.txt
2 3 4 1 2 3
3 4 5 2 4 6
2 4 0 5 0 7
0 0 5 6 3 8
I would like to insert the latitude values at the first column and longitude values at the top. Which looks like:
ofile.txt
20.00 20.33 20.66 20.99 21.32 21.65
100.00 2 3 4 1 2 3
100.33 3 4 5 2 4 6
100.66 2 4 0 5 0 7
100.99 0 0 5 6 3 8
The increment is 0.33
I can do it for a small size matrix in manually, but I can't able to get any idea how to get my output in my desired format. I was writing a script in the following way, but completely useless.
echo 20 > latitude.txt
for i in `seq 1 250`;do
i1=$(( i + 0.33 )) #bash can't recognize fractions
echo $i1 >> latitude.txt
done
echo 100 > longitude.txt
for j in `seq 1 300`;do
j1=$(( j + 0.33 ))
echo $j1 >> longitude.txt
done
paste longitude.txt ifile.txt > dummy_file.txt
cat latitude.txt dummy_file.txt > ofile.txt
$ cat tst.awk
BEGIN {
lat = 100
lon = 20
latWid = lonWid = 6
latDel = lonDel = 0.33
latFmt = lonFmt = "%*.2f"
}
NR==1 {
printf "%*s", latWid, ""
for (i=1; i<=NF; i++) {
printf lonFmt, lonWid, lon
lon += lonDel
}
print ""
}
{
printf latFmt, latWid, lat
lat += latDel
for (i=1; i<=NF; i++) {
printf "%*s", lonWid, $i
}
print ""
}
$ awk -f tst.awk file
20.00 20.33 20.66 20.99 21.32 21.65
100.00 2 3 4 1 2 3
100.33 3 4 5 2 4 6
100.66 2 4 0 5 0 7
100.99 0 0 5 6 3 8
Following awk may also help you on same.
awk -v col=100 -v row=20 'FNR==1{printf OFS;for(i=1;i<=NF;i++){printf row OFS;row=row+.33;};print ""} {col+=.33;$1=$1;print col OFS $0}' OFS="\t" Input_file
Adding non one liner form of above solution too now:
awk -v col=100 -v row=20 '
FNR==1{
printf OFS;
for(i=1;i<=NF;i++){
printf row OFS;
row=row+.33;
};
print ""
}
{
col+=.33;
$1=$1;
print col OFS $0
}
' OFS="\t" Input_file
Awk solution:
awk 'NR == 1{
long = 20.00; lat = 100.00; printf "%12s%.2f", "", long;
for (i=1; i<NF; i++) { long += 0.33; printf "\t%.2f", long } print "" }
NR > 1{ lat += 0.33 }
{
printf "%.2f%6s", lat, "";
for (i=1; i<=NF; i++) printf "\t%d", $i; print ""
}' file
With perl
$ perl -lane 'print join "\t", "", map {20.00+$_*0.33} 0..$#F if $.==1;
print join "\t", 100+(0.33*$i++), #F' ip.txt
20 20.33 20.66 20.99 21.32 21.65
100 2 3 4 1 2 3
100.33 3 4 5 2 4 6
100.66 2 4 0 5 0 7
100.99 0 0 5 6 3 8
-a to auto-split input on whitespaces, result saved in #F array
See https://perldoc.perl.org/perlrun.html#Command-Switches for details on command line options
if $.==1 for the first line of input
map {20.00+$_*0.33} 0..$#F iterate based on size of #F array, and for each iteration, we get a value based on equation inside {} where $_ will be 0, 1, etc upto last index of #F array
print join "\t", "", map... use tab separator to print empty element and results of map
For all the lines, print contents of #F array pre-fixed with results of 100+(0.33*$i++) where $i will be initially 0 in numeric context. Again, tab is used as separator while joining these values
Use sprintf if needed for formatting, also $, can be initialized instead of using join
perl -lane 'BEGIN{$,="\t"; $st=0.33}
print "", map { sprintf "%.2f", 20+$_*$st} 0..$#F if $.==1;
print sprintf("%.2f", 100+($st*$i++)), #F' ip.txt

Convert column to matrix format using awk

I have a gridded data file in column format as:
ifile.txt
x y value
20.5 20.5 -4.1
21.5 20.5 -6.2
22.5 20.5 0.0
20.5 21.5 1.2
21.5 21.5 4.3
22.5 21.5 6.0
20.5 22.5 7.0
21.5 22.5 10.4
22.5 22.5 16.7
I would like to convert it to matrix format as:
ofile.txt
20.5 21.5 22.5
20.5 -4.1 1.2 7.0
21.5 -6.2 4.3 10.4
22.5 0.0 6.0 16.7
Where top 20.5 21.5 22.5 indicate y and side values indicate x and the inside values indicate the corresponding grid values.
I found a similar question here Convert a 3 column file to matrix format but the script is not working in my case.
The script is
awk '{ h[$1,$2] = h[$2,$1] = $3 }
END {
for(i=1; i<=$1; i++) {
for(j=1; j<=$2; j++)
printf h[i,j] OFS
printf "\n"
}
}' ifile
The following awk script handles :
any size of matrix
no relation between row and column indices so it keeps track of them separately.
If a certain row column index does not appear, the value will default to zero.
This is done in this way:
awk '
BEGIN{PROCINFO["sorted_in"] = "#ind_num_asc"}
(NR==1){next}
{row[$1]=1;col[$2]=1;val[$1" "$2]=$3}
END { printf "%8s",""; for (j in col) { printf "%8.3f",j }; printf "\n"
for (i in row) {
printf "%8.3f",i; for (j in col) { printf "%8.3f",val[i" "j] }; printf "\n"
}
}' <file>
How does it work:
PROCINFO["sorted_in"] = "#ind_num_asc", states that all arrays are sorted numerically by index.
(NR==1){next} : skip the first line
{row[$1]=1;col[$2]=1;val[$1" "$2]=$3}, process the line by storing the row and column index and accompanying value.
The end statement does all the printing.
This outputs:
20.500 21.500 22.500
20.500 -4.100 1.200 7.000
21.500 -6.200 4.300 10.400
22.500 0.000 6.000 16.700
note: the usage of PROCINFO is a gawk feature.
However, if you make a couple of assumptions, you can do it much shorter:
the file contains all possible entries, no missing values
you do not want the indices of the rows and columns printed out:
the indices are sorted in column-major-order
The you can use the following short versions:
sort -g <file> | awk '($1+0!=$1){next}
($1!=o)&&(NR!=1){printf "\n"}
{printf "%8.3f",$3; o=$1 }'
which outputs
-4.100 1.200 7.000
-6.200 4.300 10.400
0.000 6.000 16.700
or for the transposed:
awk '(NR==1){next}
($2!=o)&&(NR!=2){printf "\n"}
{printf "%8.3f",$3; o=$2 }' <file>
This outputs
-4.100 -6.200 0.000
1.200 4.300 6.000
7.000 10.400 16.700
Adjusted my old GNU awk solution for your current input data:
matrixize.awk script:
#!/bin/awk -f
BEGIN { PROCINFO["sorted_in"]="#ind_num_asc"; OFS="\t" }
NR==1{ next }
{
b[$1]; # accumulating unique indices
($1 != $2)? a[$1][$2] = $3 : a[$2][$1] = $3; # set `diagonal` relation between different indices
}
END {
h = "";
for (i in b) {
h = h OFS i # form header columns
}
print h; # print header column values
for (i in b) {
row = i; # index column
# iterating through the row values (for each intersection point)
for (j in a[i]) {
row = row OFS a[i][j]
}
print row
}
}
Usage:
awk -f matrixize.awk yourfile
The output:
20.5 21.5 22.5
20.5 -4.1 1.2 7.0
21.5 -6.2 4.3 10.4
22.5 0.0 6.0 16.7
Perl solution:
#!/usr/bin/perl -an
$h{ $F[0] }{ $F[1] } = $F[2] unless 1 == $.;
END {
#s = sort { $a <=> $b } keys %h;
print ' ' x 5;
printf '%5.1f' x #s, #s;
print "\n";
for my $u (#s) {
print "$u ";
printf '%5.1f', $h{$u}{$_} for #s;
print "\n";
}
}
-n reads the input line by line
-a splits each line on whitespace into the #F array
See sort, print, printf, and keys.
awk solution:
sort -n ifile.txt | awk 'BEGIN{header="\t"}NR>1{if((NR-1)%3==1){header=header sprintf("%4.1f\t",$1); matrix=matrix sprintf("%4.1f\t",$1)}matrix= matrix sprintf("%4.1f\t",$3); if((NR-1)%3==0 && NR!=10)matrix=matrix "\n"}END{print header; print matrix}';
20.5 21.5 22.5
20.5 -4.1 1.2 7.0
21.5 -6.2 4.3 10.4
22.5 0.0 6.0 16.7
Explanations:
sort -n ifile.txt sort the file numerically
header variable will store all the data necessary to create the header line it is initiated to header="\t" and will be appended with the necessary information thanks to header=header sprintf("%4.1f\t",$1) for lines respecting (NR-1)%3==1)
in the same way you construct the matrix using matrix variable: matrix=matrix sprintf("%4.1f\t",$1) will create the first column and
matrix= matrix sprintf("%4.1f\t",$3) will populate the matrix with the content then if((NR-1)%3==0 &&
NR!=10)matrix=matrix "\n" will add the adequate EOL

Biggest and smallest of all lines

I have a output like this
3.69
0.25
0.80
1.78
3.04
1.99
0.71
0.50
0.94
I want to find the biggest number and the smallest number in the above output
I need output like
smallest is 0.25 and biggest as 3.69
Just sort your input first and print the first and last value. One method:
$ sort file | awk 'NR==1{min=$1}END{print "Smallest",min,"Biggest",$0}'
Smallest 0.25 Biggest 3.69
Hope this help.
OUTPUT="3.69 0.25 0.80 1.78 3.04 1.99 0.71 0.50 0.94"
SORTED=`echo $OUTPUT | tr ' ' '\n' | sort -n`
SMALLEST=`echo "$SORTED" | head -n 1`
BIGGEST=`echo "$SORTED" | tail -n 1`
echo "Smallest is $SMALLEST"
echo "Biggest is $BIGGEST"
Added op's awk oneliner request.
I'm not good at awk, but this works anyway. :)
echo "3.69 0.25 0.80 1.78 3.04 1.99 0.71 0.50 0.94" | awk '{
for (i=1; i<=NF; i++) {
if (length(s) == 0) s = $i;
if (length(b) == 0) b = $i;
if ($i < s) s = $i;
if (b < $i) b = $i;
}
print "Smallest is", s;
print "Biggest is", b;
}'
You want an awk solution?
echo "3.69 0.25 0.80 1.78 3.04 1.99 0.71 0.50 0.94" | \
awk -v RS=' ' '/.+/ { biggest = ((biggest == "") || ($1 > biggest)) ? $1 : biggest;
smallest = ((smallest == "") || ($1 < smallest)) ? $1:smallest}
END { print biggest, smallest}'
Produce the following output:
3.69 0.25
You can use this method also
sort file | echo -e `sed -nr '1{s/(.*)/smallest is :\1/gp};${s/(.*)/biggest no is :\1/gp'}`
TXR solution:
$ txr -e '(let ((nums [mapcar tofloat (gun (get-line))]))
(if nums
(pprinl `smallest is #(find-min nums) and biggest is #(find-max nums)`)
(pprinl "empty input")))'
0.1
-1.0
3.5
2.4
smallest is -1.0 and biggest is 3.5

How do I check if a field is empty or null in a text file using awk and bash

I have two text files and I want to compare their correspondent values according to their rows and columns. Each value (field) in the text file is separated by tabs.
Here are the files:
file1.txt
Name Col1 Col2 Col3
-----------------------
row1 1 4 7
row2 2 5 8
row3 3 6 9
file2.txt
Name Col1 Col2 Col3
-----------------------
row2 1 4 11
row1 2 5 12
row3 3 9
Here is the code I have so far:
awk '
FNR < 2 {next}
FNR == NR {
for (i = 2; i <= NF; i++) {
a[i,$1] = $i;
}
next;
}
# only compare if a row in file2 exists in file1
($1 in b) {
for (i = 2; i <= NF; i++)
{
if (a[i,$1] == $i)
{
print "EQUAL"
}
else if ( //condition that checks if value is null// )
{
print "NULL"
}
else
{
print "NOT EQUAL"
}
}
}' file1.txt file2.txt
I am having difficulties with checking if there is a null value (row3 and col2 in file2.txt) in file2.txt. I don't even get an output for that null value. So far I tried if ($i == "") and it is still not giving me any output. Any suggestions? Thanks. (I'm using gnu awk in a bash script)
Let me know if further explanation is required.
Just set the FS to tab:
awk -F'\t' '....'

How to select rows in which column two and three are not equal to each other and to 0 or 1?(with awk)

I have a file like this:
AX-75448119 0 1
AX-75448118 0.45 0.487179
AX-75474642 0 0
AX-75474643 0.25 0.820513
AX-75448113 1 0
AX-75474641 1 1
and I want to select the rows that column 2 and 3 are not equal each other and 0 or 1 (both of them)! (i.e if column 2 and 3 are similar but equal to 0.5 (or any other number except 0 and 1) I would like to have that row)
so the output would be:
AX-75448119 0 1
AX-75448118 0.45 0.487179
AX-75474643 0.25 0.820513
AX-75448113 1 0
I know how to write the command to select the rows that column 2 and 3 are equal to each other and are equal to 0 or 1 which is this:
awk '$2=$3==1 || $2=$3==0' test.txt | wc -l
but I want exactly the opposite, to select every rows that are not the output of the above command!
Thanks, I hope I was able to explain what I want
It might work for you if I get your requirements correctly.
awk ' $2 != $3 { print; next } $2 == $3 && $2 != 0 && $2 != 1 { print }' INPUTFILE
See it in action at Ideone.com
This might work for you:(?)
awk '($2==0 || $2==1) && ($3==0 || $3==1) && $2==$3{next}1' file

Resources