How to compare two columns in multiple files in linux with awk

How to compare two columns in multiple files in linux with awk - linux

I have this code
[motaro#Cyrax ]$ awk '{print $1}' awk1.txt awk2.txt
line1a
line2a
file1a
file2a
It shows the ccolumns from the both files
How can i find $1(of file 1) and $1(of file2) , separately

As per the comments above, for three or more files, set the conditionals like:
FILENAME == ARGV[1]
For example:
awk 'FILENAME == ARGV[1] { print $1 } FILENAME == ARGV[2] { print $1 } FILENAME == ARGV[3] { print $1 }' file1.txt file2.txt file3.txt
Alternatively, if you have a glob of files:
Change the conditionals to:
FILENAME == "file1.txt"
For example:
awk 'FILENAME == "file1.txt" { print $1 } FILENAME == "file2.txt" { print $1 } FILENAME == "file3.txt" { print $1 }' *.txt
You may also want to read more about the variables ARGC and ARGV. Please let me know if anything requires more explanation. Cheers.

I am not sure exactly what you need.
Probably you need predefined variable :FILENAME
awk '{print $1,FILENAME}' awk1.txt awk2.txt
This above command will output:
line1a awk1.txt
line2a awk1.txt
file1a awk2.txt
file2a awk2.txt

awk 'NR==FNR{a[FNR]=$0;next} {print a[FNR],$0}' file_1 file_2
found here

Related

Adding double quotes around non-numeric columns by awk

I have a file like this;
2018-01-02;1.5;abcd;111
2018-01-04;2.75;efgh;222
2018-01-07;5.25;lmno;333
2018-01-09;1.25;prs;444
I'd like to add double ticks to non-numeric columns, so the new file should look like;
"2018-01-02";1.5;"abcd";111
"2018-01-04";2.75;"efgh";222
"2018-01-07";5.25;"lmno";333
"2018-01-09";1.25;"prs";444
I tried this so far, know that this is not the correct way
head myfile.csv -n 4 | awk 'BEGIN{FS=OFS=";"} {gsub($1,echo $1 ,$1)} 1' | awk 'BEGIN{FS=OFS=";"} {gsub($3,echo "\"" $3 "\"",$3)} 1'
Thanks in advance.

You may use this awk that sets ; as input/output delimiter and then wraps each field with "s if that field is non-numeric:
awk '
BEGIN {
FS = OFS = ";"
}
{
for (i=1; i<=NF; ++i)
$i = ($i+0 == $i ? $i : "\"" $i "\"")
} 1' file
"2018-01-02";1.5;"abcd";111
"2018-01-04";2.75;"efgh";222
"2018-01-07";5.25;"lmno";333
"2018-01-09";1.25;"prs";444
Alternative gnu-awk solution:
awk -v RS='[;\n]' '$0+0 != $0 {$0 = "\"" $0 "\""} {ORS=RT} 1' file

Using GNU awk and typeof(): Fields - - that are numeric strings have the strnum attribute. Otherwise, they have the string attribute.1
$ gawk 'BEGIN {
FS=OFS=";"
}
{
for(i=1;i<=NF;i++)
if(typeof($i)=="string")
$i=sprintf("\"%s\"",$i)
}1' file
Some output:
"2018-01-02";1.5;"abcd";111
- -
Edit:
If some the fields are already quoted:
$ gawk 'BEGIN {
FS=OFS=";"
}
{
for(i=1;i<=NF;i++)
if(typeof($i)=="string")
gsub(/^"?|"?$/,"\"",$i)
}1' <<< string,123,"quoted string"
Output:
"string",123,"quoted string"

Further enhancing upon anubhava's solution (including handling fields already double-quoted :
gawk -e 'sub(".+",$-_==+$-_?"&":(_)"&"_\
)^gsub((_)_, _)^(ORS = RT)' RS='[;\n]' \_='\42'
"2018-01-02";1.5;"abcd";111
"2018-01-04";2.75;"efgh";222
"2018-01-07";5.25;"lmno";333
"2018-01-09";1.25;"prs";444
"2018-01-09";1.25;"prs";111111111111111111112222222222
222222223333333333333333333333
333344444444444444444499999999
999991111111111111111111122222
222222222222233333333333333333
333333333444444444444444444999
999999999991111111111111111111
122222222222222222233333333333
333333333333333444444444444444
444999999999999991111111111111
111111122222222222222222233333
333333333333333333333444444444
444444444999999999999991111111
111111111111122222222222222222
233333333333333333333333333444
444444444444444999999999999991
111111111111111111122222222222
222222233333333333333333333333
333444444444444444444999999999
999991111111111111111111122222
222222222222233333333333333333
333333333444444444444444444999
999999999999

unix concatenate list of files into on line

In a directory, there is several files such as:
file1
file2
file3
Is there a simple way to concatenate those files to get one line (connected by "OR") in bash as follows:
file1 OR file2 OR file3
Or do I need to write a script for it?

You can use this function to print all filenames (including ones with space, newline or special characters) with " OR " as separator (assuming your filename doesn't contain ASCII code 4):
orfiles() {
local IFS=$'\4'
local out="$*"
echo "${out//$'\4'/ OR }"
}
Then call it as:
orfiles *
How it works:
We set IFS (Internal Field Separator) to ASCII 4 locally inside the function
We store output of "$*" in local variable out. This will place \4 after each filename in variable $out.
Finally using BASH string substitution we globally replace \4 by " OR " while printing the output from $out.
In Unix systems IFS is only a single character delimiter therefore it cannot store multi character string " OR " and we have to do this in 2 steps as shown above.

You can simply do that with
printf '%s OR ' $(ls -1 *) | sed 's/OR $/''/'; echo -e '\n'
Where ls -1 * is the directory.

The moment that should be considered is that a filename could contain whitespace(s).
Use the following ls + awk solution:
ls -1 * | awk '{ r=(r)? r" OR "$0 : $0 }END{ print r }'
Workaround for filenames with newline(s):
echo -e $(ls -1b hello* | awk -v RS= '{gsub(/\n/," OR ",$0); gsub(/\\ /," ",$0); print $0}')
-b - ls option to print C-style escapes for nongraphic characters

ls -1|awk -v q='"' '{printf "%s%s", NR==1?"":" OR ", q $0 q}END{print ""}'
the ls & awk way to do it, with example that the filename containing spaces:
kent$ ls -1
file1
file2
'file with OR and space'
kent$ ls -1|awk -v q='"' '{printf "%s%s", NR==1?"":" OR ", q $0 q}END{print ""}'
"file1" OR "file2" OR "file with OR and space"

$ for f in *; do printf '%s%s' "$s" "$f"; s=" OR "; done; printf '\n'
file1 OR file2 OR file3

Why AWK uses my arguments as input file

I'm writing an awk script to let it parse something for me. For the purpose of convenience, I want the awk script to be executable in linux. Here are my codes:
#!/usr/bin/awk -f
BEGIN {
FILENAME=ARGV[1]
sub_name=ARGV[2]
run=ARGV[3]
count=0
}
{
if ($4 == "ARGV[2]" && $8 == ARGV[3])
{
print $15
count=count+1
}
}
END {
print count
}
When I issue my awk script in linux such as:
./my_script 001.log type1 2
awk will say:
awk: ./awk_script:23: fatal: cannot open file `type1' for reading (No such
file or directory)
I just want to let argument "type1" as a variable in my script, not a input file for parsing. How can I don't let awk treat it as an imput file?
Thank you,

Don't use a shebang to execute the awk script as it just complicates things:
/usr/bin/awk -v sub_name="$2" -v run="$3" '
{
if ($4 == sub_name && $8 == run)
{
print $15
count=count+1
}
}
END {
print count
}
' "$1"
Note that your script could be cleaned up a bit:
/usr/bin/awk -v sub_name="$2" -v run="$3" '
($4 == sub_name) && ($8 == run) {
print $15
count++
}
END { print count+0 }
' "$1"

Delete the non-file options from ARGV:
delete ARGV[2]
delete ARGV[3]

if you want to use them as variables then you have to use the -v argument. The way you are trying to do it , suggests that the second argument is an input/output file

Multi-input files for awk

I have two CSV files, the first one looks like below:
File1:
3124,3124,0,2,,1,0,1,1,0,0,0,0,0,0,0,0,1106,11
6118,6118,0,0,,0,0,1,0,0,0,0,1,1,1,1,1,5156,51
6679,6679,0,0,,1,0,1,0,0,0,0,0,1,0,1,0,1106,11
5249,5249,0,0,,0,0,1,1,0,0,0,0,0,0,0,0,1106,13
2658,2658,0,0,,1,0,1,1,0,0,0,0,0,0,0,0,1197,11
4322,4322,0,0,,1,0,1,1,0,0,0,0,0,0,0,0,1307,13
File2:
7792,1307,2012-06-07,,,,
5249,4001,2016-07-02,,,,
6001,1334,2017-01-23,,,,
2658,4001,2009-02-09,,,,
9279,1326,2014-12-20,,,,
what I need:
if the $2 in file2 = 4001, then has to match $1 of file2 with file1, if $18 in file1 = 1106 for the matched $1 then print that line.
the expected output:
5249,5249,0,0,,0,0,1,1,0,0,0,0,0,0,0,0,1106,13
I have tried something as the following, but with no success.
awk 'NR=FNR {A[$1]=$1;next} {print $1}'
P.S: The files are compressed, so I have to use the zcat command

I would try something like:
$ cat t.awk
BEGIN { FS = "," }
# Processing first file
NR == FNR && $18 == 1106 { a[$1] = $0; next }
# Processing second file
$2 == 4001 && $1 in a { print a[$1] }
$ awk -f t.awk file1.txt file2.txt
5249,5249,0,0,,0,0,1,1,0,0,0,0,0,0,0,0,1106,13

Bash Script Awk Condition

i have a problem with this code.. i can't figure out what i have to write as condition to cut my file with awk.
i=0
while [ $i -lt 10 ]; #da 1 a 9, Ap1..Ap9
do
case $i in
1) RX="54:75:D0:3F:1E:F0";;
2) RX="54:75:D0:3F:4D:00";;
3) RX="54:75:D0:3F:51:50";;
4) RX="54:75:D0:3F:53:60";;
5) RX="54:75:D0:3F:56:10";;
6) RX="54:75:D0:3F:56:E0";;
7) RX="54:75:D0:3F:5A:B0";;
8) RX="54:75:D0:3F:5F:90";;
9) RX="D0:D0:FD:68:BC:70";;
*) echo "Numero invalido!";;
esac
echo "RX = $RX" #check
awk -F, '$2 =="$RX" { print $0 }' File1 > File2[$i] #this is the line!
i=$(( $i + 1 ))
done
the command echo prints correctly but when i use the same "$RX" as condition in AWK it doesn't work (it prints a blank page).
my File1 :
1417164082794,54:75:D0:3F:53:60,54:75:D0:3F:1E:F0,-75,2400,6
1417164082794,54:75:D0:3F:56:10,54:75:D0:3F:1E:F0,-93,2400,4
1417164082794,54:75:D0:3F:56:E0,54:75:D0:3F:1E:F0,-89,2400,4
1417164082794,54:75:D0:3F:5A:B0,54:75:D0:3F:1E:F0,-80,2400,4
1417164082794,54:75:D0:3F:53:60,54:75:D0:3F:1E:F0,-89,5000,2
could you tell me the right expression "awk -F ..."
thank you very much!

To pass variables from shell to awk use -v:
awk -F, -v R="$RX" '$2 ==R { print $0 }' File1 > File2[$i]

#Ricky - any time you write a loop in shell just to manipulate text you have the wrong approach. It's just not what the shell was created to do - it's what awk was created to do and the shell was created to invoke commands like awk.
Just use a single awk command and instead of reading File 10 times and switching on variables for every line of the file, just do it all once, something like this:
BEGIN {
split(file2s,f2s)
split("54:75:D0:3F:1E:F0\
54:75:D0:3F:4D:00\
54:75:D0:3F:51:50\
54:75:D0:3F:53:60\
54:75:D0:3F:56:10\
54:75:D0:3F:56:E0\
54:75:D0:3F:5A:B0\
54:75:D0:3F:5F:90\
D0:D0:FD:68:BC:70", rxs)
for (i in rxs) {
rx2file2s[rxs[i]] = f2s[i]
}
}
{
if ($2 in rx2file2s) {
print > rx2file2s[$2]
}
else {
print NR, $2, "Numero invalido!" | "cat>&2"
}
}
which you'd then invoke as awk -v file2s="${File2[#]}" -f script.awk File1
I say "something like" because you didn't provide any sample input (File1 contents) or expected output (File2* values and contents) so I couldn't test it but it will be very close to what you need if not exactly right.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to compare two columns in multiple files in linux with awk - linux

I have this code [motaro#Cyrax ]$ awk '{print $1}' awk1.txt awk2.txt line1a line2a file1a file2a It shows the ccolumns from the both files How can i find $1(of file 1) and $1(of file2) , separately

I am not sure exactly what you need. Probably you need predefined variable :FILENAME awk '{print $1,FILENAME}' awk1.txt awk2.txt This above command will output: line1a awk1.txt line2a awk1.txt file1a awk2.txt file2a awk2.txt

awk 'NR==FNR{a[FNR]=$0;next} {print a[FNR],$0}' file_1 file_2 found here

Related

Adding double quotes around non-numeric columns by awk

unix concatenate list of files into on line

Why AWK uses my arguments as input file

Multi-input files for awk

Bash Script Awk Condition

Categories

Resources