How to test two entries per line to given intervals? - linux

I have a reference file ref with certain values (v1 and v2), and for every value there is an interval with upper (ub) and lower (lb bonds) and a group number (gn)defined:
v1 v2 ub1 lb1 ub2 lb2 gn
50 25 51 49 26 24 1
86 13 86.5 85.5 14 12 2
...
Now I have a file test with many lines and two of the entries of every line have values that lie within the intervals defined in ref. The goal is to assign every line the group number which corresponds to the entries in the reference file.
Input file:
50.2 24.6
85.7 13.9
86.3 12.6
Desired output:
50.2 24.6 1
85.7 13.9 2
86.3 12.6 2
My approach so far is this code with bash and awk:
while read line
do
lin=( ${line} )
rot=${lin[0]}
tilt=${lin[1]}
awk -v line="${line}" -v rot="$rot" -v tilt="$tilt" ' {if ((rot>$4) && (rot<$3) && (tilt>$6) && (tilt<$5)) {print line,$7} } ' reference >> output
done < test
But it won't work, the test file has 130000 lines, but the output file has only 11000. So obviously I am doing something wrong. I'm grateful for any suggestions.

with the . used as decimal separator
$ awk 'NR==FNR && NR>1{ub1[$NF]=$3;lb1[$NF]=$4;ub2[$NF]=$5;lb2[$NF]=$6; next}
{for(k in lb1)
if(lb1[k]<$1 && $1<ub1[k] &&
lb2[k]<$2 && $2<ub2[k]) print $0, k}' file input
50.2 24.6 1
85.7 13.9 2
86.3 12.6 2
you may need to change the locale settings to use , as the decimal separator. Also the code assumes there is one pair of ranges per group number (so indexed ranges with group number), if not you need to index by row number and keep a mapping to row number to group number as well.

Related

How to find pattern and make operation in another field in awk?

I have a file with 4 columns separated by space like this bellow:
1_86500000 50 1_87500000 19
1_87500000 13 1_89500000 42
1_89500000 25 1_90500000 10
1_90500000 3 1_91500000 11
1_91500000 23 1_92500000 29
1_92500000 34 1_93500000 4
1_93500000 39 1_94500000 49
1_94500000 35 1_95500000 26
2_35500000 1 2_31500000 81
2_31500000 12 2_4150000 50
The First and Third columns are not in phase so I can not divide the value of one by another.
As there are only two or one possible columns $1 or $3, a solution would be look for the pattern and divide its value in the another column or set it to 0 if there is none like this expected result shows:
P.S. the second field in this expected result is just illustrative to shown the division.
1_86500000 0/50 0
1_87500000 19/13 1.46154
1_89500000 42/25 1.68
1_90500000 10/3 3.333
1_91500000 11/23 0.47826
1_92500000 29/34 0.85294
1_93500000 4/39 0.10256
1_94500000 49/35 1.4
2_35500000 0/1 0
2_31500000 81/12 6.75
2_4150000 50/0 50
I do not archived anything by myself other than this. So I do not have any starting point by now.
I tried separate the fields merged with _ to see if I could match by subtracting the coordinates. If I got 0 would mean that the columns was in phase and correct. But I could not go further.
awk '{if( ($5-$2)==0) print $1,$2,$3,$4,$5,$6}' file
I tried to match both columns but I only got phased results:
awk '{if(($1==$3)) print $1,$4/$2}' file
Can you help me?
awk to the rescue!
$ awk '{d[$1]=$2; n[$3]=$4}
END {for(k in n)
if(k in d) {print k,n[k]"/"d[k],n[k]/d[k]; delete d[k]}
else print k,n[k]"/0",n[k];
for(k in d) print k,"0/"d[k],0}' file | sort
1_86500000 0/50 0
1_87500000 19/13 1.46154
1_89500000 42/25 1.68
1_90500000 10/3 3.33333
1_91500000 11/23 0.478261
1_92500000 29/34 0.852941
1_93500000 4/39 0.102564
1_94500000 49/35 1.4
1_95500000 26/0 26
2_31500000 81/12 6.75
2_35500000 0/1 0
2_4150000 50/0 50
your division by zero result is little strange though!
Explanation keep two arrays for numerator and denominator. Once scanned the file, go over numerator array and find the corresponding denominator and make the division. For the denominators not used apply the convention given.

Use part of a column in one file as search term in other file

I have two files. The output file I am searching has earthquake locations and has the following format:
19090212 1323 30.12 36 19.41 103 28.24 7.29 0.00 4 149 25.8 0.02 5.7 9.8 D - 0
19090216 1828 49.61 36 13.27 101 35.38 10.94 0.00 13 54 38.5 0.07 0.3 0.7 B 0
19090711 2114 54.11 35 1.07 99 56.42 7.00 0.00 7 177 18.7 4.00 63.3 53.2 D # 0
I want to use the last 6 digits of the first column (i.e. '090418' out of '19090418') with the first 3 digits of the second column (i.e. '072' out of '0728') as my search term. The file I am searching has the following format:
SC17 P 090212132329.89
X25A P 090212132330.50
AMTX P 090216182814.12
X29A P 090216182813.70
Y28A P 090216182822.36
MSTX P 090216182826.80
Y27A P 090216182831.43
After I search the second file for the term, I need to figure out how many lines are in that section. So for this example, if I were searching the terms shown for the second file above, I want to know there are 2 lines for 090212132 and 5 lines for 090216182.
This is my first post, so please let me know how I can improve clarity or conciseness in my posts. Thanks for the help!
awk to the rescue!
$ awk 'NR==FNR{a[substr($1,3) substr($2,1,3)]; next}
{k=substr($3,1,9)}
k in a{a[k]++}
END{for(k in a) if(a[k]>0) print k,a[k]}' file1 file2
with your input files, there is no output as expected.
The answer karakfa suggested worked! My output looks like this:
100224194 7
100117172 18
091004005 11
090520220 10
090526143 21
090122033 20
Thanks for the help!
Karafka answer with explanation
awk 'NR==FNR { # For first file
$1 = substr($1, 3); # Get last 6 characters from first col
$2 = substr($2, 1, 3); # Get first 3 characters from second col
a[$1 $2]; # Add to an array
next } # Move to next record in first file
# Start processing second file
{k = substr($3, 1, 9)} # Get first 9 character for third col
k in a {a[k]++} # If key in a, then increment the key
END {
for (k in a) # Iterate array
if (a[k] > 0) # If pattern was matched
print k, a[k] # print the pattern and num occurrence
}'

Select rows in one file based on specific values in the second file (Linux)

I have two files:
One is "total.txt". It has two columns: the first column is natural numbers (indicator) ranging from 1 to 20, the second column contains random numbers.
1 321
1 423
1 2342
1 7542
2 789
2 809
2 5332
2 6762
2 8976
3 42
3 545
... ...
20 432
20 758
The other one is "index.txt". It has three columns:(1.indicator, 2:low value, 3: high value)
1 400 5000
2 600 800
11 300 4000
I want to output the rows of "total.txt" file with first column matches with the first column of "index.txt" file. And at the same time, the second column of output results must be larger than (>) the second column of the "index.txt" and smaller than (<) the third column of the "index.txt".
The expected result is as follows:
1 423
1 2342
2 809
2 5332
2 6762
11 ...
11 ...
I have tried this:
awk '$1==(awk 'print($1)' index.txt) && $2 > (awk 'print($2)' index.txt) && $1 < (awk 'print($2)' index.txt)' total.txt > result.txt
But it failed!
Can you help me with this? Thank you!
You need to read both files in the same awk script. When you read index.txt, store the other columns in an array.
awk 'FNR == NR { low[$1] = $2; high[$1] = $3; next }
$2 > low[$1] && $2 < high[$1] { print }' index.txt total.txt
FNR == NR is the common awk idiom to detect when you're processing the first file.
Use join like Barmar said:
# To join on the first columns
join -11 -21 total.txt index.txt
And if the files aren't sorted in lexical order by the first column then:
join -11 -21 <(sort -k1,1 total.txt) <(sort -k1,1 index.txt)

Rearrange column with empty values using awk or sed

i want to rearrange the columns of a txt file, but there are empty values, which cause a problem. For example:
testfile:
Name ID Count Date Other
A 1 10 513 x
6 15 312 x
3 18 314 x
B 19 31 942 x
8 29 722 x
when i tried $ more testfile |awk '{print $2"\t"$1"\t"$3"\t"$4"\t"$5}'
it becomes:
ID Name Count Date Other
1 A 10 513 x
15 6 312 x
18 3 314 x
19 B 31 942 x
29 8 722 x
which is not i want, please help,i want it to be
ID Name Count Date Other
1 A 10 513 x
15 6 312 x
18 3 314 x
19 B 31 942 x
29 8 722 x
moreover i am not sure which columns might contain empty values, and the column length is not fixed, thank you
Assuming your input file is not tab-separated and you have (or can get) GNU awk then I recommend:
$ awk -v FIELDWIDTHS="8 8 8 8 8" -v OFS='\t' '{
for (i=1;i<=NF;i++) {
gsub(/^\s+|\s+$/,"",$i)
}
t=$1; $1=$2; $2=t'
}1' file
ID Name Count Date Other
1 A 10 513 x
6 15 312 x
3 18 314 x
19 B 31 942 x
8 29 722 x
If your file is tab-separated then all you need is:
awk 'BEGIN{FS=OFS="\t"} {t=$1; $1=$2; $2=t}1' file
Another awk alternative is using the number of fields. If you know your data and it's only deficit in the first column you can try this.
awk -v OFS="\t" 'NF==4{$5=$4;$4=$3;$3=$2;$2=$1;$1=""} {print $2,$1,$3,$4,$5}'
However, the output will be tab separated instead of fixed length format. You can achieve the same using printf and changing OFS, but perhaps tab separated is what you really need for tabular representation.
The most natural model for awk to use is columns as defined by the transitions from white-space to non-white-space and back. Since you have columns that may themselves be white-space, the natural model won't work.
However, you can revert to using a model based on column positions rather than transitions, meaning that a file containing only spaces (the presence of tabs will complicate things):
Name ID Count Date Other
A 1 10 513 x
6 15 312 x
3 18 314 x
B 19 31 942 x
8 29 722 x
can still be rearranged, though not as succinctly as transition-based columns.
The following awk script will do the trick, swapping name and id:
{
name = substr($0, 1,7);
id = substr($0, 9,7);
count = substr($0,17,7);
date = substr($0,25,7);
other = substr($0,33 );
print id" "name" "count" "date" "other;
}
If the original file is called pax.in and the awk script is stored in pax.awk, the command awk -f pax.awk pax.in will give you, as desired:
ID Name Count Date Other
1 A 10 513 x
6 15 312 x
3 18 314 x
19 B 31 942 x
8 29 722 x
Keep in mind I've written that script to be relatively flexible, allowing you to change the order of the columns quite easily. If all you want is to swap the first two columns, you could use:
awk '{print substr($0,9,8)substr($0,1,8)substr($0,17)}' qq.in
or the slightly shorter (if you're allowed to use other tools):
sed -E 's/^(.{8})(.{8})/\2\1/' qq.in

How to use awk with a for loop?

I have a 5 column file
2 649 2 82 1
3 651 1 83 1
2 652 3 84 2
... ... ... ... ...
The first column is n number of points in segment, the second is the x coordinate, the third is delta x, the delta between the current x coordinate and the next one, similarly the fourth column is the y coordinate and fifth is delta y. I need to generate all points in the segments so the output should be, from the data in the first line
649 82
650 82.5
From the data in the second line
651 83
651.33 83.33
651.67 83.67
From the data in the third line
652 84
653.5 85
Any idea How to do it?
This will do:
awk '{n=$1; x=$2; dx=$3; y=$4; dy=$5; \
for(i=0;i<n;i++) printf "%.2f %.2f\n", x+i*dx/n, y+i*dy/n; }' file
You can adjust %.2f as you desired. For e.g. %.4f to print 4 digits of fraction.
I only used variables for clarity. Otherwise, you could simply do:
awk '{for(i=0;i<$1;i++) printf "%.2f %.2f\n", $2+i*$3/$1, $4+i*$5/$1; }' file

Resources