Comparing two CSV files in linux

Comparing two CSV files in linux - linux

I have two CSV files with me in the following format:
File1:
No.1, No.2
983264,72342349
763498,81243970
736493,83740940
File2:
No.1,No.2
"7938493","7364987"
"2153187","7387910"
"736493","83740940"
I need to compare the two files and output the matched,unmatched values.
I did it through awk:
#!/bin/bash
awk 'BEGIN {
FS = OFS = ","
}
if (FNR==1){next}
NR>1 && NR==FNR {
a[$1];
next
}
FNR>1 {
print ($1 in a) ? $1 FS "Match" : $1 FS "In file2 but not in file1"
delete a[$1]
}
END {
for (x in a) {
print x FS "In file1 but not in file2"
}
}'file1 file2
But the output is:
"7938493",In file2 but not in file1
"2153187",In file2 but not in file1
"8172470",In file2 but not in file1
7938493,In file1 but not in file2
2153187,In file1 but not in file2
8172470,In file1 but not in file2
Can you please tell me where I am going wrong?

Here are some corrections to your script:
BEGIN {
# FS = OFS = ","
FS = "[,\"]+"
OFS = ", "
}
# if (FNR==1){next}
FNR == 1 {next}
# NR>1 && NR==FNR {
NR==FNR {
a[$1];
next
}
# FNR>1 {
$2 in a {
# print ($1 in a) ? $1 FS "Match" : $1 FS "In file2 but not in file1"
print ($2 in a) ? $2 OFS "Match" : $2 "In file2 but not in file1"
delete a[$2]
}
END {
for (x in a) {
print x, "In file1 but not in file2"
}
}
This is an awk script, so you can run it like awk -f script.awk file1 file2. Doing so gives these results:
$ awk -f script.awk file1 file2
736493, Match
763498, In file1 but not in file2
983264, In file1 but not in file2
The main problem with your script was that it didn't correctly handle the double quotes around the numbers in file2. I changed the input field separator so that the double quotes are treated as part of the separator to deal with this. As a result, the first field $1 in the second file is empty (it is the bit between the start of the line and the first "), so you need to use $2 to refer to the first value you're interested in. Aside from that, I removed some redundant conditions from your other blocks and used OFS rather than FS in your first print statement.

Related

Merge two files using AWK with conditions

I am new to bash scripting and need help with below Question. I parsed a log file to get below and now stuck on later part.
I have a file1.csv with content as:
mac-test-1,10.32.9.12,15
mac-test-2,10.32.9.13,10
mac-test-3,10.32.9.14,11
mac-test-4,10.32.9.15,13
and second file2.csv has below content:
mac-test-3,10.32.9.14
mac-test-4,10.32.9.15
I want to do a file comparison and if the line in second file matches any line in first file then change the content of file 1 as below:
mac-test-1,10.32.9.12, 15, no match
mac-test-2,10.32.9.13, 10, no match
mac-test-3,10.32.9.14, 11, matched
mac-test-4,10.32.9.15, 13, matched
I tried this
awk -F "," 'NR==FNR{a[$1]; next} $1 in a {print $0",""matched"}' file2.csv file1.csv
but it prints below and doesn't include the not matching records
mac-test-3,10.32.9.14,11,matched
mac-test-4,10.32.9.15,13,matched
Also, in some cases the file2 can be empty so the result should be like this:
mac-test-1,10.32.9.12,15, no match
mac-test-2,10.32.9.13,10, no match
mac-test-3,10.32.9.14,11, no match
mac-test-4,10.32.9.15,13, no match

With your shown samples please try following awk code. You need not to check condition first and then print the statement because when you are checking $1 in a then those items who doesn't exist will NEVER come inside this condition's block. So its better to print whole line
of file1.csv and then print status of that particular line either its matched OR not-matched based on their existence inside array.
awk '
BEGIN { FS=OFS="," }
FNR==NR{
arr[$0]
next
}
{
print $0,(($1 OFS $2) in arr)?"Matched":"Not-matched"
}
' file2.csv file1.csv
EDIT: Adding a solution to handle empty file of file2.csv scenario here, same concept wise as above only thing it handles scenarios when file2.csv is an Empty file.
awk -v lines=$(wc -l < file2.csv) '
BEGIN { FS=OFS=","}
(lines==0){
print $0,"Not-Matched"
next
}
FNR==NR{
arr[$0]
next
}
{
print $0,(($1 OFS $2) in arr)?"Matched":"Not-matched"
}
' file2.csv file1.csv

You are not printing the else case:
awk -F "," 'NR==FNR{a[$1]; next}
{
if ($1 in a) {
print $0 ",matched"
} else {
print $0 ",no match"
}
}' file2.csv file1.csv
Output
mac-test-1,10.32.9.12,15,no match
mac-test-2,10.32.9.13,10,no match
mac-test-3,10.32.9.14,11,matched
mac-test-4,10.32.9.15,13,matched
Or in short, without manually printing the comma but using OFS:
awk 'BEGIN{FS=OFS=","} NR==FNR{a[$1];next}{ print $0 OFS (($1 in a)?"":"no")"match"}' file2.csv file1.csv
Edit
I found a solution on this page handling FNR==NR on an empty file.
When file2.csv is empty, all output lines will be:
mac-test-1,10.32.9.12,15,no match
Example
awk -F "," '
ARGV[1] == FILENAME{a[$1];next}
{
if ($1 in a) {
print $0 ",matched"
} else {
print $0 ",no match"
}
}' file2.csv file1.csv

Each of #RavinderSingh13's and #Thefourthbird's answers contain large parts of the solution but here it is all together:
awk '
BEGIN { FS=OFS="," }
{ key = $1 FS $2 }
FILENAME == ARGV[1] {
arr[key]
next
}
{
print $0, ( key in arr ? "matched" : "no match")
}
' file2.csv file1.csv
or if you prefer:
awk '
BEGIN { FS=OFS="," }
{ key = $1 FS $2 }
!f {
arr[key]
next
}
{
print $0, ( key in arr ? "matched" : "no match")
}
' file2.csv f=1 file1.csv

Merge two files AWK

I have to merge two files and need help with:
File1.csv
mac-test-2,10.57.8.2,Compliant
mac-test-6,10.57.8.6,Compliant
mac-test-12,10.57.8.12,Compliant
mac-test-17,10.57.8.17,Noncompliant
File2.csv
mac-test-17,10.57.8.17,2022-10-21
After Merge the content should be
Merge.csv
mac-test-2,10.57.8.2,Compliant,NA
mac-test-6,10.57.8.6,Compliant,NA
mac-test-12,10.57.8.12,Compliant,NA
mac-test-17,10.57.8.17,Noncompliant,2022-10-21
so logic is if the File1.txt doesnt have a matching record in File2.txt then "NA" should be inserted and if it is a match then date should be inserted in the fourth column.
I have written below
awk -F "," '
ARGV[1] == FILENAME{a[$1];next}
{
if ($1 in a) {
print $0 ","
} else {
print $0 ",NA"
}
}
' File2.csv File1.csv
But this is printing
mac-test-2,10.57.8.2,Compliant,NA
mac-test-6,10.57.8.6,Compliant,NA
mac-test-12,10.57.8.12,Compliant,NA
mac-test-17,10.57.8.17,Noncompliant,
I am not sure how I can print the date if it matches.

With your shown samples please try following awk code. Written and tested with your shown samples only.
awk '
BEGIN{ FS=OFS="," }
FNR==NR{
arr[$1]=$NF
next
}
{
print $0,($1 in arr?arr[$1]:"NA")
}
' file2.csv file1.csv
To handle empty file2.csv please try following awk program.
awk '
BEGIN{ FS=OFS="," }
ARGV[1] == FILENAME{
arr[$1]=$NF
next
}
{
if ($1 in arr) {
print $0,arr[$1]
}
else{
print $0,"N/A"
}
}' file2.csv file1.csv

comparing files Unix

I have 2 scripts file.txt and file2.txt
file1.txt
name|mandatory|
age|mandatory|
address|mandatory|
email|mandatory|
country|not-mandatory|
file2.txt
gabrielle||nashville|gabrielle#outlook.com||
These are my exact data files, In file1 column1 is the field name and column2 is to note whether the field should not be null in file2.
In file2 data is in single row separated by |.
The age mentioned as mandatory in file1 is not present in file2[which is a single row] and that is what my needed output too.
Expected output:
age mandatory
I got with code that file2 is in same format as file1 where mandatory is replaced with field2 data.
awk -F '|' '
NR==FNR && $3=="mandatory" {m[$2]++}
NR>FNR && $3=="" && m[$2] {printf "%s mandatory\n", $2}
' file1.txt file2.txt

You have to iterate over fields for(... i <= NR ...).
awk -F '|' '
NR==FNR { name[NR]=$1; man[NR]=$2 }
NR!=FNR {
for (i = 1; i <= NR; ++i) {
if ($i == "" && man[i] == "mandatory") {
printf("Field %s is mandatory!\n", name[i]);
}
}
}
' file1.txt file2.txt

Merge two files using awk in linux

I have a 1.txt file:
betomak#msn.com||o||0174686211||o||7880291304ca0404f4dac3dc205f1adf||o||Mario||o||Mario||o||Kawati
zizipi#libero.it||o||174732943.0174732943||o||e10adc3949ba59abbe56e057f20f883e||o||Tiziano||o||Tiziano||o||D'Intino
frankmel#hotmail.de||o||0174844404||o||8d496ce08a7ecef4721973cb9f777307||o||Melanie||o||Melanie||o||Kiesel
apoka-paris#hotmail.fr||o||0174847613||o||536c1287d2dc086030497d1b8ea7a175||o||Sihem||o||Sihem||o||Sousou
sofianomovic#msn.fr||o||174902297.0174902297||o||9893ac33a018e8d37e68c66cae23040e||o||Nabile||o||Nabile||o||Nassime
donaldduck#yahoo.com||o||174912161.0174912161||o||0c770713436695c18a7939ad82bc8351||o||Donald||o||Donald||o||Duck
cernakova#centrum.cz||o||0174991962||o||d161dc716be5daf1649472ddf9e343e6||o||Dagmar||o||Dagmar||o||Cernakova
trgsrl#tiscali.it||o||0175099675||o||d26005df3e5b416d6a39cc5bcfdef42b||o||Esmeralda||o||Esmeralda||o||Trogu
catherinesou#yahoo.fr||o||0175128896||o||2e9ce84389c3e2c003fd42bae3c49d12||o||Cat||o||Cat||o||Sou
ermimurati24#hotmail.com||o||0175228687||o||a7766a502e4f598c9ddb3a821bc02159||o||Anna||o||Anna||o||Beratsja
cece_89#live.fr||o||0175306898||o||297642a68e4e0b79fca312ac072a9d41||o||Celine||o||Celine||o||Jacinto
kendinegel39#hotmail.com||o||0175410459||o||a6565ca2bc8887cde5e0a9819d9a8ee9||o||Adem||o||Adem||o||Bulut
A 2.txt file:
9893ac33a018e8d37e68c66cae23040e:134:#a1
536c1287d2dc086030497d1b8ea7a175:~~#!:/92\
8d496ce08a7ecef4721973cb9f777307:demodemo
FS for 1.txt is "||o||" and for 2.txt is ":"
I want to merge two files in a single file result.txt based on the condition that the 3rd column of 1.txt must match with 1st column of 2.txt file and should be replaced by the 2nd column of 2.txt file.
The expected output will contain all the matching lines:
I am showing you one of them:
sofianomovic#msn.fr||o||174902297.0174902297||o||134:#a1||o||Nabile||o||Nabile||o||Nassime
I tried the script:
awk -F"||o||" 'NR==FNR{s=$0; sub(/:[^:]*$/, "", s); a[s]=$NF;next} {s = $5; for (i=6; i<=NF; ++i) s = s "," $i; if (s in a) { NF = 5; $5=a[s]; print } }' FS=: <(tr -d '\r' < 2.txt) FS="||o||" OFS="||o||" <(tr -d '\r' < 1.txt) > result.txt
But getting an empty file as the result. Any help would be highly appreciated.

If your actual Input_file(s) are same as shown sample then following awk may help you in same.
awk -v s1="||o||" '
FNR==NR{
a[$9]=$1 s1 $5;
b[$9]=$13 s1 $17 s1 $21;
next
}
($1 in a){
print a[$1] s1 $2 FS $3 s1 b[$1]
}
' FS="|" 1.txt FS=":" 2.txt
EDIT: Since OP has changed requirement a bit so providing code as per new ask where it will create 2 files too 1 file which will have ids present in 1.txt and NOT in 2.txt and other will be vice versa of it.
awk -v s1="||o||" '
FNR==NR{
a[$9]=$1 s1 $5;
b[$9]=$13 s1 $17 s1 $21;
c[$9]=$0;
next
}
($1 in a){
val=$1;
$1="";
sub(/:/,"");
print a[val] s1 $0 s1 b[val];
d[val]=$0;
next
}
{
print > "NOT_present_in_2.txt"
}
END{
for(i in d){
delete c[i]
};
for(j in c){
print j,c[j] > "NOT_present_in_1.txt"
}}
' FS="|" 1.txt FS=":" OFS=":" 2.txt

You can use this awk to get your output:
awk -F ':' 'NR==FNR{a[$1]=$2 FS $3; next} FNR==1{FS=OFS="||o||"; gsub(/[|]/, "\\\\&", FS)}
$3 in a{$3=a[$3]; print}' file2 file1 > result.txt
cat result.txt
frankmel#hotmail.de||o||0174844404||o||demodemo:||o||Melanie||o||Melanie||o||Kiesel
apoka-paris#hotmail.fr||o||0174847613||o||~~#!:/92\||o||Sihem||o||Sihem||o||Sousou
sofianomovic#msn.fr||o||174902297.0174902297||o||134:#a1||o||Nabile||o||Nabile||o||Nassime

Multi-input files for awk

I have two CSV files, the first one looks like below:
File1:
3124,3124,0,2,,1,0,1,1,0,0,0,0,0,0,0,0,1106,11
6118,6118,0,0,,0,0,1,0,0,0,0,1,1,1,1,1,5156,51
6679,6679,0,0,,1,0,1,0,0,0,0,0,1,0,1,0,1106,11
5249,5249,0,0,,0,0,1,1,0,0,0,0,0,0,0,0,1106,13
2658,2658,0,0,,1,0,1,1,0,0,0,0,0,0,0,0,1197,11
4322,4322,0,0,,1,0,1,1,0,0,0,0,0,0,0,0,1307,13
File2:
7792,1307,2012-06-07,,,,
5249,4001,2016-07-02,,,,
6001,1334,2017-01-23,,,,
2658,4001,2009-02-09,,,,
9279,1326,2014-12-20,,,,
what I need:
if the $2 in file2 = 4001, then has to match $1 of file2 with file1, if $18 in file1 = 1106 for the matched $1 then print that line.
the expected output:
5249,5249,0,0,,0,0,1,1,0,0,0,0,0,0,0,0,1106,13
I have tried something as the following, but with no success.
awk 'NR=FNR {A[$1]=$1;next} {print $1}'
P.S: The files are compressed, so I have to use the zcat command

I would try something like:
$ cat t.awk
BEGIN { FS = "," }
# Processing first file
NR == FNR && $18 == 1106 { a[$1] = $0; next }
# Processing second file
$2 == 4001 && $1 in a { print a[$1] }
$ awk -f t.awk file1.txt file2.txt
5249,5249,0,0,,0,0,1,1,0,0,0,0,0,0,0,0,1106,13

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Comparing two CSV files in linux - linux

Related

Merge two files using AWK with conditions

Merge two files AWK

comparing files Unix

Merge two files using awk in linux

Multi-input files for awk

Categories

Resources