Multi-input files for awk - linux
I have two CSV files, the first one looks like below:
File1:
3124,3124,0,2,,1,0,1,1,0,0,0,0,0,0,0,0,1106,11
6118,6118,0,0,,0,0,1,0,0,0,0,1,1,1,1,1,5156,51
6679,6679,0,0,,1,0,1,0,0,0,0,0,1,0,1,0,1106,11
5249,5249,0,0,,0,0,1,1,0,0,0,0,0,0,0,0,1106,13
2658,2658,0,0,,1,0,1,1,0,0,0,0,0,0,0,0,1197,11
4322,4322,0,0,,1,0,1,1,0,0,0,0,0,0,0,0,1307,13
File2:
7792,1307,2012-06-07,,,,
5249,4001,2016-07-02,,,,
6001,1334,2017-01-23,,,,
2658,4001,2009-02-09,,,,
9279,1326,2014-12-20,,,,
what I need:
if the $2 in file2 = 4001, then has to match $1 of file2 with file1, if $18 in file1 = 1106 for the matched $1 then print that line.
the expected output:
5249,5249,0,0,,0,0,1,1,0,0,0,0,0,0,0,0,1106,13
I have tried something as the following, but with no success.
awk 'NR=FNR {A[$1]=$1;next} {print $1}'
P.S: The files are compressed, so I have to use the zcat command
I would try something like:
$ cat t.awk
BEGIN { FS = "," }
# Processing first file
NR == FNR && $18 == 1106 { a[$1] = $0; next }
# Processing second file
$2 == 4001 && $1 in a { print a[$1] }
$ awk -f t.awk file1.txt file2.txt
5249,5249,0,0,,0,0,1,1,0,0,0,0,0,0,0,0,1106,13
Related
comparing files Unix
I have 2 scripts file.txt and file2.txt file1.txt name|mandatory| age|mandatory| address|mandatory| email|mandatory| country|not-mandatory| file2.txt gabrielle||nashville|gabrielle#outlook.com|| These are my exact data files, In file1 column1 is the field name and column2 is to note whether the field should not be null in file2. In file2 data is in single row separated by |. The age mentioned as mandatory in file1 is not present in file2[which is a single row] and that is what my needed output too. Expected output: age mandatory I got with code that file2 is in same format as file1 where mandatory is replaced with field2 data. awk -F '|' ' NR==FNR && $3=="mandatory" {m[$2]++} NR>FNR && $3=="" && m[$2] {printf "%s mandatory\n", $2} ' file1.txt file2.txt
You have to iterate over fields for(... i <= NR ...). awk -F '|' ' NR==FNR { name[NR]=$1; man[NR]=$2 } NR!=FNR { for (i = 1; i <= NR; ++i) { if ($i == "" && man[i] == "mandatory") { printf("Field %s is mandatory!\n", name[i]); } } } ' file1.txt file2.txt
Merge two files using awk in linux
I have a 1.txt file: betomak#msn.com||o||0174686211||o||7880291304ca0404f4dac3dc205f1adf||o||Mario||o||Mario||o||Kawati zizipi#libero.it||o||174732943.0174732943||o||e10adc3949ba59abbe56e057f20f883e||o||Tiziano||o||Tiziano||o||D'Intino frankmel#hotmail.de||o||0174844404||o||8d496ce08a7ecef4721973cb9f777307||o||Melanie||o||Melanie||o||Kiesel apoka-paris#hotmail.fr||o||0174847613||o||536c1287d2dc086030497d1b8ea7a175||o||Sihem||o||Sihem||o||Sousou sofianomovic#msn.fr||o||174902297.0174902297||o||9893ac33a018e8d37e68c66cae23040e||o||Nabile||o||Nabile||o||Nassime donaldduck#yahoo.com||o||174912161.0174912161||o||0c770713436695c18a7939ad82bc8351||o||Donald||o||Donald||o||Duck cernakova#centrum.cz||o||0174991962||o||d161dc716be5daf1649472ddf9e343e6||o||Dagmar||o||Dagmar||o||Cernakova trgsrl#tiscali.it||o||0175099675||o||d26005df3e5b416d6a39cc5bcfdef42b||o||Esmeralda||o||Esmeralda||o||Trogu catherinesou#yahoo.fr||o||0175128896||o||2e9ce84389c3e2c003fd42bae3c49d12||o||Cat||o||Cat||o||Sou ermimurati24#hotmail.com||o||0175228687||o||a7766a502e4f598c9ddb3a821bc02159||o||Anna||o||Anna||o||Beratsja cece_89#live.fr||o||0175306898||o||297642a68e4e0b79fca312ac072a9d41||o||Celine||o||Celine||o||Jacinto kendinegel39#hotmail.com||o||0175410459||o||a6565ca2bc8887cde5e0a9819d9a8ee9||o||Adem||o||Adem||o||Bulut A 2.txt file: 9893ac33a018e8d37e68c66cae23040e:134:#a1 536c1287d2dc086030497d1b8ea7a175:~~#!:/92\ 8d496ce08a7ecef4721973cb9f777307:demodemo FS for 1.txt is "||o||" and for 2.txt is ":" I want to merge two files in a single file result.txt based on the condition that the 3rd column of 1.txt must match with 1st column of 2.txt file and should be replaced by the 2nd column of 2.txt file. The expected output will contain all the matching lines: I am showing you one of them: sofianomovic#msn.fr||o||174902297.0174902297||o||134:#a1||o||Nabile||o||Nabile||o||Nassime I tried the script: awk -F"||o||" 'NR==FNR{s=$0; sub(/:[^:]*$/, "", s); a[s]=$NF;next} {s = $5; for (i=6; i<=NF; ++i) s = s "," $i; if (s in a) { NF = 5; $5=a[s]; print } }' FS=: <(tr -d '\r' < 2.txt) FS="||o||" OFS="||o||" <(tr -d '\r' < 1.txt) > result.txt But getting an empty file as the result. Any help would be highly appreciated.
If your actual Input_file(s) are same as shown sample then following awk may help you in same. awk -v s1="||o||" ' FNR==NR{ a[$9]=$1 s1 $5; b[$9]=$13 s1 $17 s1 $21; next } ($1 in a){ print a[$1] s1 $2 FS $3 s1 b[$1] } ' FS="|" 1.txt FS=":" 2.txt EDIT: Since OP has changed requirement a bit so providing code as per new ask where it will create 2 files too 1 file which will have ids present in 1.txt and NOT in 2.txt and other will be vice versa of it. awk -v s1="||o||" ' FNR==NR{ a[$9]=$1 s1 $5; b[$9]=$13 s1 $17 s1 $21; c[$9]=$0; next } ($1 in a){ val=$1; $1=""; sub(/:/,""); print a[val] s1 $0 s1 b[val]; d[val]=$0; next } { print > "NOT_present_in_2.txt" } END{ for(i in d){ delete c[i] }; for(j in c){ print j,c[j] > "NOT_present_in_1.txt" }} ' FS="|" 1.txt FS=":" OFS=":" 2.txt
You can use this awk to get your output: awk -F ':' 'NR==FNR{a[$1]=$2 FS $3; next} FNR==1{FS=OFS="||o||"; gsub(/[|]/, "\\\\&", FS)} $3 in a{$3=a[$3]; print}' file2 file1 > result.txt cat result.txt frankmel#hotmail.de||o||0174844404||o||demodemo:||o||Melanie||o||Melanie||o||Kiesel apoka-paris#hotmail.fr||o||0174847613||o||~~#!:/92\||o||Sihem||o||Sihem||o||Sousou sofianomovic#msn.fr||o||174902297.0174902297||o||134:#a1||o||Nabile||o||Nabile||o||Nassime
Run query in Linux for selecting CSV'S
In the Linux: there are many .csvs' in the folder, I have to select those csv's file having column name {'PREDICT' = 646}. check this link: https://prnt.sc/gone85 what kind of query works?
Providing test data which was unprovided ): $ cat > file1 ACTUAL PREDICT 1 2 3 646 $ cat > file2 ACTUAL PREDICT 1 2 3 666 Then some GNU awk (nextfile) to select those csv's file having column name {'PREDICT' = 646} or where there is column PREDICT with a value 646: $ awk 'FNR==1{for(i=1;i<=NF;i++)if($i=="PREDICT")p=i}$p==646{print FILENAME;nextfile}' file1 file2 file1 Explained: awk ' FNR==1 { # get the column number of PREDICT column for each file for(i=1;i<=NF;i++) if($i=="PREDICT") p=i # set it to p } $p==646 { # if p==646, we have a match print FILENAME # print the filename nextfile # and move on to the next file }' file1 file2 # all the candicate files
gnu awk solution without loop: $ cat tst.awk BEGIN{FS=","} FNR==1 && s=substr($0,1,index($0,"PREDICT")) { # look for index of PREDICT i=sub(/,/, "", s) + 1 # and count nr of times you # can replace "," in preceding # substring } s && $i==646 { print FILENAME; nextfile } some input: $ cat file1.csv ACTUAL,PREDICT,COUNTRY,REGION,DIVISION,PRODUCTTYPE,PRODUCT,QUARTER,YEAR,MONTH 925,850,CANADA,EAST,EDUCATION,FURNITURE,SOFA,1,1993,12054 925,533,CANADA,EAST,EDUCATION,FURNITURE,SOFA,1,1993,12054 925,646,CANADA,EAST,EDUCATION,FURNITURE,SOFA,1,1993,12054 $ cat file2.csv ACTUAL,PREDICT,COUNTRY,REGION,DIVISION,PRODUCTTYPE,PRODUCT,QUARTER,YEAR,MONTH 925,850,CANADA,EAST,EDUCATION,FURNITURE,SOFA,1,1993,12054 925,533,CANADA,EAST,EDUCATION,FURNITURE,SOFA,1,1993,12054 925,111,CANADA,EAST,EDUCATION,FURNITURE,SOFA,1,1993,12054 and: $ cp file1.csv file3.csv gives: $ awk -f tst.awk *.csv file1.csv file3.csv Or use a one-liner: $ awk -F, 'FNR==1 && s=substr($0,1,index($0,"PREDICT")) {i=sub(/,/, "", s) + 1}s && $i==646 { print FILENAME; nextfile }' *.csv file1.csv file3.csv
Comparing two CSV files in linux
I have two CSV files with me in the following format: File1: No.1, No.2 983264,72342349 763498,81243970 736493,83740940 File2: No.1,No.2 "7938493","7364987" "2153187","7387910" "736493","83740940" I need to compare the two files and output the matched,unmatched values. I did it through awk: #!/bin/bash awk 'BEGIN { FS = OFS = "," } if (FNR==1){next} NR>1 && NR==FNR { a[$1]; next } FNR>1 { print ($1 in a) ? $1 FS "Match" : $1 FS "In file2 but not in file1" delete a[$1] } END { for (x in a) { print x FS "In file1 but not in file2" } }'file1 file2 But the output is: "7938493",In file2 but not in file1 "2153187",In file2 but not in file1 "8172470",In file2 but not in file1 7938493,In file1 but not in file2 2153187,In file1 but not in file2 8172470,In file1 but not in file2 Can you please tell me where I am going wrong?
Here are some corrections to your script: BEGIN { # FS = OFS = "," FS = "[,\"]+" OFS = ", " } # if (FNR==1){next} FNR == 1 {next} # NR>1 && NR==FNR { NR==FNR { a[$1]; next } # FNR>1 { $2 in a { # print ($1 in a) ? $1 FS "Match" : $1 FS "In file2 but not in file1" print ($2 in a) ? $2 OFS "Match" : $2 "In file2 but not in file1" delete a[$2] } END { for (x in a) { print x, "In file1 but not in file2" } } This is an awk script, so you can run it like awk -f script.awk file1 file2. Doing so gives these results: $ awk -f script.awk file1 file2 736493, Match 763498, In file1 but not in file2 983264, In file1 but not in file2 The main problem with your script was that it didn't correctly handle the double quotes around the numbers in file2. I changed the input field separator so that the double quotes are treated as part of the separator to deal with this. As a result, the first field $1 in the second file is empty (it is the bit between the start of the line and the first "), so you need to use $2 to refer to the first value you're interested in. Aside from that, I removed some redundant conditions from your other blocks and used OFS rather than FS in your first print statement.
How to compare two columns in multiple files in linux with awk
I have this code [motaro#Cyrax ]$ awk '{print $1}' awk1.txt awk2.txt line1a line2a file1a file2a It shows the ccolumns from the both files How can i find $1(of file 1) and $1(of file2) , separately
As per the comments above, for three or more files, set the conditionals like: FILENAME == ARGV[1] For example: awk 'FILENAME == ARGV[1] { print $1 } FILENAME == ARGV[2] { print $1 } FILENAME == ARGV[3] { print $1 }' file1.txt file2.txt file3.txt Alternatively, if you have a glob of files: Change the conditionals to: FILENAME == "file1.txt" For example: awk 'FILENAME == "file1.txt" { print $1 } FILENAME == "file2.txt" { print $1 } FILENAME == "file3.txt" { print $1 }' *.txt You may also want to read more about the variables ARGC and ARGV. Please let me know if anything requires more explanation. Cheers.
I am not sure exactly what you need. Probably you need predefined variable :FILENAME awk '{print $1,FILENAME}' awk1.txt awk2.txt This above command will output: line1a awk1.txt line2a awk1.txt file1a awk2.txt file2a awk2.txt
awk 'NR==FNR{a[FNR]=$0;next} {print a[FNR],$0}' file_1 file_2 found here