Awk script to concatenate two column and look for concatenated values in another file - linux

Need your help in solving this puzzle. Any kind of help will be appreciated and link for any documents to read and learn and to deal with such scenarios would be helpful
Concatenate column1 and column2 of file 1. Then check for the concatenated value in Column1 of File2. If found extract the corresponding value of column2 and column3 of File2, Again concatenate column1 and column2 of File2. Now look for this concatenated value in File1 and if found
For example - concatenate column1(262881626) and column2(10) of File1. Then look for this concatenated(26288162610) value in column1 of File2 and extract corresponding column2 and column3 value of File2.
Now again concatenate column1 and column2 of File2 and look for this concatenated(2628816261050) value in File1 and multiply exchange rate(2) fetched by concatenated value(26288162610) with taxable value(65) which corresponding to 2628816261050 of File1. Store the result of multiplcation value in column4(AD) of File1 only
File1
Bill Doc LineNo Taxablevalue AD
262881626 10 245
262881627 10 32
262881628 20 456
262881629 30 0
262881630 40 45
2628816261050 11 65
2628816271060 12 34
2628816282070 13 45
2628816293080 14 0
2628816304090 15
File2
Bill.Doc Item Exch.Rate
26288162610 50 2
26288162710 60 1
26288162820 70 45
26288162930 80 1
26288163040 90 5
Output File
Bill Doc LineNo Taxablevalue AD
262881626 10 245
262881627 10 32
262881628 20 456
262881629 30 0
262881630 40
2628816261050 11 65 130
2628816271060 12 34 34
2628816282070 13 45 180
2628816293080 14 0 0
2628816304090 15

Though your output is not clear, could you please try following and let me know of this helps you.
awk -F"|" 'FNR==NR{a[$1$2]=$NF;next} {print $0,$1 in a?"|" a[$1]*$NF:""}' OFS="" File2 File1
Explanation:
awk -F"|" ' ##Setting field separator as |(pipe) here.
FNR==NR{ ##Checking condition here FNR==NR which will be TRUE when first file named File2 is being read.
a[$1$2]=$NF; ##Creating an array named a whose index is $1$2(first and second field of current line) and value if last field.
next} ##next will skip all further statements from here.
{ ##Statements from here will be executed when only 2nd Input_file named File1 is being read.
print $0,$1 in a?"|" a[$1]*$NF:"" ##Printing $0(current line) and then checking if $1 of current line is present in array a is yes then print a value * $NF else print NULL.
}
' OFS="" File2 File1 ##Setting OFS to NULL here and mentioning both the Input_file(s) name here.

Related

How to use AWK to continuously output lines from a file

I have a file with multiple lines, and I want to continuously output some lines of the file, such as first time, print from line 1 to line 5, next time, print line 2 to line 6, and so on.
I find AWK as a very useful function and I tried to write a code on my own, but it just outputs nothing.
Following is my code
#!/bin/bash
for n in `seq 1 3`
do
N1=$n
N2=$((n+4))
awk -v n1="$N1" -v n2="$N2" 'NR == n1, NR == n2 {print $0}' my_file >> new_file
done
For example, I have an input file called my_file
1 99 tut
2 24 bcc
3 32 los
4 33 rts
5 642 pac
6 23 caas
7 231 cdos
8 1 caee
9 78 cdsa
Then I expect an output file as
1 99 tut
2 24 bcc
3 32 los
4 33 rts
5 642 pac
2 24 bcc
3 32 los
4 33 rts
5 642 pac
6 23 caas
3 32 los
4 33 rts
5 642 pac
6 23 caas
7 231 cdos
Could you please try following, written and tested with shown samples in GNU awk. Where one needs to mention all lines which needs to be printed in lines_from variable, then there is a variable named till_lines which tells us how many lines we need to print from a specific line(eg--> from 1st line print next 4 lines too). On another note, I have tested OP's code and it worked fine for me, its generating the output file with new_file since calling awk in bash loop is NOT good practice hence adding this as an improvement too here.
awk -v lines_from="1,2,3" -v till_lines="4" '
BEGIN{
num=split(lines_from,arr,",")
for(i=1;i<=num;i++){ line[arr[i]] }
}
FNR==NR{
value[FNR]=$0
next
}
(FNR in line){
print value[FNR] > "output_file"
j=""
while(++j<=till_lines){ print value[FNR+j] > "output_file" }
}
' Input_file Input_file
When I see contents of output_file I could see following:
cat output_file
1 99 tut
2 24 bcc
3 32 los
4 33 rts
5 642 pac
2 24 bcc
3 32 los
4 33 rts
5 642 pac
6 23 caas
3 32 los
4 33 rts
5 642 pac
6 23 caas
7 231 cdos
Explanation: Adding detailed explanation for above.
awk -v lines_from="1,2,3" -v till_lines="4" ' ##Starting awk program from here and creating 2 variables lines_from and till_lines here, where lines_from will have all line numbers which one wants to print from. till_lines is the value till lines one has to print.
BEGIN{ ##Starting BEGIN section of this program from here.
num=split(lines_from,arr,",") ##Splitting lines_from into arr with delimiter of , here.
for(i=1;i<=num;i++){ ##Running a for loop from i=1 to till value of num here.
line[arr[i]] ##Creating array line with index of value of array arr with index of i here.
}
}
FNR==NR{ ##Checking condition FNR==NR which will be TRUE when 1st time Input_file is being read.
value[FNR]=$0 ##Creating value with index as FNR and its value is current line.
next ##next will skip all further statements from here.
}
(FNR in line){ ##Checking condition if current line number is coming in array then do following.
print value[FNR] > "output_file" ##Printing value with index of FNR into output_file
j="" ##Nullifying value of j here.
while(++j<=till_lines){ ##Running while loop from j=1 to till value of till_lines here.
print value[FNR+j] > "output_file" ##Printing value of array value with index of FNR+j and print output into output_file
}
}
' Input_file Input_file ##Mentioning Input_file names here.
Another awk variant
awk '
BEGIN {N1=1; N2=5}
arr[NR]=$0 {}
END {
while (arr[N2]) {
for (i=N1; i<=N2; i++)
print arr[i]
N1++
N2++
}
}
' file

How to do divide a column based on the corresponding value in another file?

I have multiple files (66) and want to divid column 3 of each file to its corresponding value in the info.file and insert the new value in column 4 of each file.
My manual code is:
awk '{print $4=$3/NUmber from info.file}1' file
But this takes me hours to do for each individual file. So I want to automate it for all files. Thanks
file1:
chrm name value
4 a 8
3 b 4
file2:
chrm name value
3 g 6
5 s 12
info.file:
file_name average
file1 8
file2 6
file3 10
output:
file1:
chrm name value new_value
4 a 8 1
3 b 4 0.5
file2:
chrm name value new_value
3 g 6 1
5 s 12 2
without error handling
$ awk 'NR==FNR {a[$1]=$2; next}
FNR==1 {out=FILENAME".new"; print $0, "new_value" > out; next}
{v=$NF/a[FILENAME]; $++NF=v; print > out}' info file1 file2
will generate updated files
$ head file{1,2}.new | column -t
==> file1.new <==
chrm name value new_value
4 a 8 1
3 b 4 0.5
==> file2.new <==
chrm name value new_value
3 g 6 1
5 s 12 2
Explanation
NR==FNR {a[$1]=$2; next} scan the first file and save the file/value pairs in the associative array
FNR==1 in the header line of each data file
out=FILENAME".new" set a output filename
print $0, "new_value" > out print existing header appended with the new column name
v=$NF/a[FILENAME] for every data line, scale the last field and assign to v
$++NF=v increment number of fields and assign the new computed value to the last field
print > out print the new line to the same file set before
info file1 file2 the list of files should be preceded by the info file
I have prepared the following double nested awk command for you:
awk 'NR>1{system("awk -v div="$2" -f div_column3.awk "$1" | column -t > new_"$1);}' info.file
with div_column3.awk being a awk commands script file with the content:
$ cat div_column3.awk
NR==1{print $0" new_value"}NR>1{print $0" "$3/div}

how can I make awk match up lines in file 1 with the lines in file 2 based on some number ranges in file 2

I have the following two files:
file 1:
22
2
42
32
file 2:
1 10 valuea
11 20 valueb
21 30 valuec
31 40 valued
41 50 valuee
51 60 valuef
How can I make awk grab each value from file 1, match it up with file 2 based on whether it falls between the number range in columns 1 and 2 of file 2, and then print out column 3 from the matched column in file 2? The output would resemble the following:
valuec
valuea
valuee
valued
I tried using the following AWK command (based on what I found in this post: How to check value of a column lies between values of two columns in other file and print corresponding value from column in Unix?), but it does not seem to be working correctly.
#!/bin/bash
awk 'FNR == NR { val[$1] = $1 }
FNR != NR { if (val[$1] >= $1 && val[$1] <= $2)
print $3
}' file1 file2
Also I did not include it in here for obvious reasons, but for the actual application of this script, file 1 would include around 7,000 entries while file 2 would include 68,000 entries
alternative awk script
$ awk 'FNR == NR {a[$1]=$2; v[$1]=$3; next}
{for(k in a)
if(k+0<=$1 && $1+0<=a[k]) print v[k]}' file2 file1
valuec
valuea
valuee
valued
note that file2 is the first file. This will cover multiple range matches as well. +0 is to force for numerical comparison.

Select rows in one file based on specific values in the second file (Linux)

I have two files:
One is "total.txt". It has two columns: the first column is natural numbers (indicator) ranging from 1 to 20, the second column contains random numbers.
1 321
1 423
1 2342
1 7542
2 789
2 809
2 5332
2 6762
2 8976
3 42
3 545
... ...
20 432
20 758
The other one is "index.txt". It has three columns:(1.indicator, 2:low value, 3: high value)
1 400 5000
2 600 800
11 300 4000
I want to output the rows of "total.txt" file with first column matches with the first column of "index.txt" file. And at the same time, the second column of output results must be larger than (>) the second column of the "index.txt" and smaller than (<) the third column of the "index.txt".
The expected result is as follows:
1 423
1 2342
2 809
2 5332
2 6762
11 ...
11 ...
I have tried this:
awk '$1==(awk 'print($1)' index.txt) && $2 > (awk 'print($2)' index.txt) && $1 < (awk 'print($2)' index.txt)' total.txt > result.txt
But it failed!
Can you help me with this? Thank you!
You need to read both files in the same awk script. When you read index.txt, store the other columns in an array.
awk 'FNR == NR { low[$1] = $2; high[$1] = $3; next }
$2 > low[$1] && $2 < high[$1] { print }' index.txt total.txt
FNR == NR is the common awk idiom to detect when you're processing the first file.
Use join like Barmar said:
# To join on the first columns
join -11 -21 total.txt index.txt
And if the files aren't sorted in lexical order by the first column then:
join -11 -21 <(sort -k1,1 total.txt) <(sort -k1,1 index.txt)

How to append a column for the result set in shell script

I need a script for the below scenario. I am very new to shell script.
wc file1 file2
the above query results with following result
40 149 947 file1
2294 16638 97724 file2
Now I need to get result as follows: 1st column, 3rd column ,4th column of above result set and new column with default values
40 947 file1 DF.tx1
2294 97724 file2 DF.rb2
Here the last column values is always known values i.e for file1 DF.tx1 and file2 DF.rb2.
If the give filenames in any order the default values should not change.
Please help me to write this script. Thanks in advance!!
You can use awk:
wc file1 file2 |
awk '$4 != "total"{if ($4 ~ /file1/) f="DF.tx1"; else if ($4 ~ /file2/) f="DF.rb2";
else if ($4 ~ /file3/) f="foo.bar"; print $1, $3, $4, f}'
1 12 file1 DF.tx1
9 105 file2 DF.rb2
5 15 file3 foo.bar

Resources