Appending the line even though there is no match with awk - linux

I am trying to compare two files and append another column if there is certain condition satisfied.
file1.txt
1 101 111 . BCX 123
1 298 306 . CCC 234
1 299 305 . DDD 345
file2.txt
1 101 111 BCX P1#QQQ
1 299 305 DDD P2#WWW
The output should be:
1 101 111 . BCX 123;P1#QQQ
1 298 306 . CCC 234
1 299 305 . DDD 345;P2#WWW
What I can do is, to only do this for the lines having a match:
awk 'NR==FNR{ a[$1,$2,$3,$4]=$5; next }{ s=SUBSEP; k=$1 s $2 s $3 s $5 }k in a{ print $0,a[k] }' file2.txt file1.txt
1 101 111 . BCX 123 P1#QQQ
1 299 305 . DDD 345 P2#WWW
But then, I am missing the second line in file1.
How can I still keep it even though there is no match with file2 regions?

If you want to print every line, you need your print command not to be limited by your condition.
awk '
NR==FNR {
a[$1,$2,$3,$4]=$5; next
}
{
s=SUBSEP; k=$1 s $2 s $3 s $5
}
k in a {
$6=$6 ";" a[k]
}
1' file2.txt file1.txt
The 1 is shorthand that says "print every line". It's a condition (without command statements) that always evaluates "true".
The k in a condition simply replaces your existing 6th field with the concatenated one. If the condition is not met, the replacement doesn't happen, but we still print because of the 1.

Following awk may help you in same.
awk 'FNR==NR{a[$1,$2,$3,$4]=$NF;next} (($1,$2,$3,$5) in a){print $0";"a[$1,$2,$3,$5];next} 1' file2.txt file1.txt
Output will be as follows.
1 101 111 . BCX 123;P1#QQQ
1 298 306 . CCC 234
1 299 305 . DDD 345;P2#WWW

another awk
$ awk ' {t=5-(NR==FNR); k=$1 FS $2 FS $3 FS $t}
NR==FNR {a[k]=$NF; next}
k in a {$0=$0 ";" a[k]}1' file2 file1
1 101 111 . BCX 123;P1#QQQ
1 298 306 . CCC 234
1 299 305 . DDD 345;P2#WWW
last component of the key is either 4th or 5th field based on first or second file input; set it accordingly and use a single k variable in the script. Note that
t=5-(NR==FNR)
can be written as conventionally,
t=NR==FNR?4:5

Related

how to write awk code with specific condition

I want to create a code that operates on a certain number of a row of data, for which I just want to count negative numbers to make them positive by multiplying by the number itself negative
example
data
10
11
-12
-13
-14
expected output
10
11
144
169
196
this is what I've been try
awk 'int($0)<0 {$4 = int($0) + 360}
END {print $4}' data.txt
but I don't even get the output, anyone can help me?
awk '$0 < 0 { $0 = $0 * $0 } 1' data.txt
The first condition multiplies the value by itself when it's negative. The condition 1 is always true, so the line is printed unconditionally.
Also:
awk '{print($0<0)?$0*$0:$0}' input
$ awk '{print $0 ^ (/-/ ? 2 : 1)}' file
10
11
144
169
196
You could also match only digits that start with - and in that case multiply them by themselves
awk '{print (/^-[0-9]+$/ ? $0 * $0 : $0)}' data.txt
Output
10
11
144
169
196

How to use awk '{print $1*Number}' from the second line or telling him to ignore NaN values?

I have a file called 'waterproofposters.jsonl' with this type of output:
Regular price
100
200
300
400
500
And I need to take out 2% of each value. I have used the following code:
awk '{print $1*0.98}' waterproofposters.jsonl
And then I have the following output:
0
98
196
294
392
490
And then I'm stuck because I need to have 'Regular price' in the first line instead '0'
I thought to replace '0' with 'Regular price using
find . -name "waterproof.jsonl" | xargs sed -i -e 's/0/Regular price/g'
But it will replace all the '0' by 'Regular price'
To print the first line as-is:
awk '{print (NR>1 ? $0*0.98 : $0)}'
To print lines that are not a number as-is:
awk '{print ($0+0 == $0 ? $0*0.98 : $0)}'
I'm using $0 instead of $1 in the multiplication because:
They're the same thing in your numerical input, and
I aesthetically prefer using the same value across the whole script rather than different values for the numeric vs non-numeric lines, and
When you use a specific field it causes awk to do field-splitting so it's a bit more efficient to not reference a field when the whole record will do.
Here's both of the above working with the posted sample input:
$ awk '{print (NR>1 ? $0*0.98 : $0)}' file
Regular price
98
196
294
392
490
$ awk '{print ($0+0 == $0 ? $0*0.98 : $0)}' file
Regular price
98
196
294
392
490
and here's the difference between the two given input that has a non-numeric value mid input file:
$ cat file
Regular price
100
200
foobar
400
500
$ awk '{print (NR>1 ? $0*0.98 : $0)}' file
Regular price
98
196
0
392
490
$ awk '{print ($0+0 == $0 ? $0*0.98 : $0)}' file
Regular price
98
196
foobar
392
490
You can certainly achieve what you need with a single awk call, but an answer to why your sed -i -e 's/0/Regular price/g' command did not work as expected is that you used 0 as the regex pattern. 0 matches any zero char inside the string.
You want to replace 0s that are the only char on a line.
Hence, you need to use ^ and $ anchors to match the start and end of the line respectively:
sed -i 's/^0$/Regular price/'
If you need to replace on the first line only add the 1 address before the substitution command:
sed -i '1 s/^0$/Regular price/'
Note you do not need g, since you only expect one replacement per line and g is only needed when performing multiple replacements on a line. By default, all lines will get processed.
How to use awk '{print $1Number}' from the second line or telling him to ignore NaN values?*
I would do it following way using GNU AWK, let file.txt content be
Regular price
100
200
300
400
500
then
awk 'NR==1{print}NR>=2{print $1*0.98}' file.txt
output
Regular price
98
196
294
392
490
Explanation: if it 1st line just print it, if it 2nd or later line print 0.98 of 1st column value
(tested in GNU Awk 5.0.1)

Select rows in one file based on specific values in the second file (Linux)

I have two files:
One is "total.txt". It has two columns: the first column is natural numbers (indicator) ranging from 1 to 20, the second column contains random numbers.
1 321
1 423
1 2342
1 7542
2 789
2 809
2 5332
2 6762
2 8976
3 42
3 545
... ...
20 432
20 758
The other one is "index.txt". It has three columns:(1.indicator, 2:low value, 3: high value)
1 400 5000
2 600 800
11 300 4000
I want to output the rows of "total.txt" file with first column matches with the first column of "index.txt" file. And at the same time, the second column of output results must be larger than (>) the second column of the "index.txt" and smaller than (<) the third column of the "index.txt".
The expected result is as follows:
1 423
1 2342
2 809
2 5332
2 6762
11 ...
11 ...
I have tried this:
awk '$1==(awk 'print($1)' index.txt) && $2 > (awk 'print($2)' index.txt) && $1 < (awk 'print($2)' index.txt)' total.txt > result.txt
But it failed!
Can you help me with this? Thank you!
You need to read both files in the same awk script. When you read index.txt, store the other columns in an array.
awk 'FNR == NR { low[$1] = $2; high[$1] = $3; next }
$2 > low[$1] && $2 < high[$1] { print }' index.txt total.txt
FNR == NR is the common awk idiom to detect when you're processing the first file.
Use join like Barmar said:
# To join on the first columns
join -11 -21 total.txt index.txt
And if the files aren't sorted in lexical order by the first column then:
join -11 -21 <(sort -k1,1 total.txt) <(sort -k1,1 index.txt)

Process multiple files and append them in linux/unix

I have over 100 files with at least 5-8 columns (tab-separated) in each file. I need to extract first three columns from each file and add fourth column with some predefined text and append them.
Let's say I have 3 files: file001.txt, file002.txt, file003.txt.
file001.txt:
chr1 1 2 15
chr2 3 4 17
file002.txt:
chr1 1 2 15
chr2 3 4 17
file003.txt:
chr1 1 2 15
chr2 3 4 17
combined_file.txt:
chr1 1 2 f1
chr2 3 4 f1
chr1 1 2 f2
chr2 3 4 f2
chr1 1 2 f3
chr2 3 4 f3
For simplicity I kept file contents same.
My script is as follows:
#!/bin/bash
for i in {1..3}; do
j=$(printf '%03d' $i)
awk 'BEGIN { OFS="\t"}; {print $1,$2,$3}' file${j}.txt | awk -v k="$j" 'BEGIN {print $0"\t$k”}' | cat >> combined_file.txt
done
But the script is giving the following errors:
awk: non-terminated string $k”}... at source line 1
context is
<<<
awk: giving up
source line number 2
awk: non-terminated string $k”}... at source line 1
context is
<<<
awk: giving up
source line number 2
Can some one help me to figure it out?
You don't need two different awk scripts. And you don't use $ to refer to variables in awk, that's used to refer to input fields (i.e. $k means access the field whose number is in the variable k).
for i in {1..3}; do
j=$(printf '%03d' $i)
awk -v k="$j" -v OFS='\t' '{print $1, $2, $3, k}' file$j.txt
done > combined_file.txt
As pointed out in the comments your problem is youre trying to use odd characters as if they were double quotes. Once you fix that though, you don't need a loop or any of that other complexity all you need is:
$ awk 'BEGIN{FS=OFS="\t"} {$NF="f"ARGIND} 1' file*
chr1 1 2 f1
chr2 3 4 f1
chr1 1 2 f2
chr2 3 4 f2
chr1 1 2 f3
chr2 3 4 f3
The above used GNU awk for ARGIND.

Linux shell command to copy text data from a file to another

file_1 contents:
aaa 111 222 333
bbb 444 555 666
ccc 777 888 999
file_2 contents:
ddd
eee
fff
how do i copy only part of the text from file_1 to file_2
so that file_2 would become:
ddd 111 222 333
eee 444 555 666
fff 777 888 999
Try with awk:
awk 'NR==FNR{a[FNR]=$2FS$3FS$4;next} {print $0, a[FNR]}' file_1 file_2
Explanation:
NR is the current input line, FNR is the number of input line in current file, you can see that by
$ awk '{print NR,FNR}' file_1 file_2
1 1
2 2
3 3
4 1
5 2
6 3
So, the condition NR==FNR is only true when reading the first file, and that's when the columns $2, $3, and $4 get saved in a[FNR]. After reading file_1, the condition NR==FNR becomes false and the block {print $0, a[FNR]} is executed, where $0 is the whole line in file_2.

Resources