Merge two files AWK - linux

I have to merge two files and need help with:
File1.csv
mac-test-2,10.57.8.2,Compliant
mac-test-6,10.57.8.6,Compliant
mac-test-12,10.57.8.12,Compliant
mac-test-17,10.57.8.17,Noncompliant
File2.csv
mac-test-17,10.57.8.17,2022-10-21
After Merge the content should be
Merge.csv
mac-test-2,10.57.8.2,Compliant,NA
mac-test-6,10.57.8.6,Compliant,NA
mac-test-12,10.57.8.12,Compliant,NA
mac-test-17,10.57.8.17,Noncompliant,2022-10-21
so logic is if the File1.txt doesnt have a matching record in File2.txt then "NA" should be inserted and if it is a match then date should be inserted in the fourth column.
I have written below
awk -F "," '
ARGV[1] == FILENAME{a[$1];next}
{
if ($1 in a) {
print $0 ","
} else {
print $0 ",NA"
}
}
' File2.csv File1.csv
But this is printing
mac-test-2,10.57.8.2,Compliant,NA
mac-test-6,10.57.8.6,Compliant,NA
mac-test-12,10.57.8.12,Compliant,NA
mac-test-17,10.57.8.17,Noncompliant,
I am not sure how I can print the date if it matches.

With your shown samples please try following awk code. Written and tested with your shown samples only.
awk '
BEGIN{ FS=OFS="," }
FNR==NR{
arr[$1]=$NF
next
}
{
print $0,($1 in arr?arr[$1]:"NA")
}
' file2.csv file1.csv
To handle empty file2.csv please try following awk program.
awk '
BEGIN{ FS=OFS="," }
ARGV[1] == FILENAME{
arr[$1]=$NF
next
}
{
if ($1 in arr) {
print $0,arr[$1]
}
else{
print $0,"N/A"
}
}' file2.csv file1.csv

Related

Merge two files using AWK with conditions

I am new to bash scripting and need help with below Question. I parsed a log file to get below and now stuck on later part.
I have a file1.csv with content as:
mac-test-1,10.32.9.12,15
mac-test-2,10.32.9.13,10
mac-test-3,10.32.9.14,11
mac-test-4,10.32.9.15,13
and second file2.csv has below content:
mac-test-3,10.32.9.14
mac-test-4,10.32.9.15
I want to do a file comparison and if the line in second file matches any line in first file then change the content of file 1 as below:
mac-test-1,10.32.9.12, 15, no match
mac-test-2,10.32.9.13, 10, no match
mac-test-3,10.32.9.14, 11, matched
mac-test-4,10.32.9.15, 13, matched
I tried this
awk -F "," 'NR==FNR{a[$1]; next} $1 in a {print $0",""matched"}' file2.csv file1.csv
but it prints below and doesn't include the not matching records
mac-test-3,10.32.9.14,11,matched
mac-test-4,10.32.9.15,13,matched
Also, in some cases the file2 can be empty so the result should be like this:
mac-test-1,10.32.9.12,15, no match
mac-test-2,10.32.9.13,10, no match
mac-test-3,10.32.9.14,11, no match
mac-test-4,10.32.9.15,13, no match
With your shown samples please try following awk code. You need not to check condition first and then print the statement because when you are checking $1 in a then those items who doesn't exist will NEVER come inside this condition's block. So its better to print whole line
of file1.csv and then print status of that particular line either its matched OR not-matched based on their existence inside array.
awk '
BEGIN { FS=OFS="," }
FNR==NR{
arr[$0]
next
}
{
print $0,(($1 OFS $2) in arr)?"Matched":"Not-matched"
}
' file2.csv file1.csv
EDIT: Adding a solution to handle empty file of file2.csv scenario here, same concept wise as above only thing it handles scenarios when file2.csv is an Empty file.
awk -v lines=$(wc -l < file2.csv) '
BEGIN { FS=OFS=","}
(lines==0){
print $0,"Not-Matched"
next
}
FNR==NR{
arr[$0]
next
}
{
print $0,(($1 OFS $2) in arr)?"Matched":"Not-matched"
}
' file2.csv file1.csv
You are not printing the else case:
awk -F "," 'NR==FNR{a[$1]; next}
{
if ($1 in a) {
print $0 ",matched"
} else {
print $0 ",no match"
}
}' file2.csv file1.csv
Output
mac-test-1,10.32.9.12,15,no match
mac-test-2,10.32.9.13,10,no match
mac-test-3,10.32.9.14,11,matched
mac-test-4,10.32.9.15,13,matched
Or in short, without manually printing the comma but using OFS:
awk 'BEGIN{FS=OFS=","} NR==FNR{a[$1];next}{ print $0 OFS (($1 in a)?"":"no")"match"}' file2.csv file1.csv
Edit
I found a solution on this page handling FNR==NR on an empty file.
When file2.csv is empty, all output lines will be:
mac-test-1,10.32.9.12,15,no match
Example
awk -F "," '
ARGV[1] == FILENAME{a[$1];next}
{
if ($1 in a) {
print $0 ",matched"
} else {
print $0 ",no match"
}
}' file2.csv file1.csv
Each of #RavinderSingh13's and #Thefourthbird's answers contain large parts of the solution but here it is all together:
awk '
BEGIN { FS=OFS="," }
{ key = $1 FS $2 }
FILENAME == ARGV[1] {
arr[key]
next
}
{
print $0, ( key in arr ? "matched" : "no match")
}
' file2.csv file1.csv
or if you prefer:
awk '
BEGIN { FS=OFS="," }
{ key = $1 FS $2 }
!f {
arr[key]
next
}
{
print $0, ( key in arr ? "matched" : "no match")
}
' file2.csv f=1 file1.csv

implementing Excel-vlookup-like function with awk

I have a question about vlookup function implementation with awk. I have a csv file having id-score pairs like this (say 1.csv):
id,score
1,16
3,12
5,13
11,8
13,32
17,37
23,74
29,7
31,70
41,83
There are "unscored" guys. I also have a csv file including all registered guys both scored and unscored like this (say, 2.csv) (I transposed for the want of space)
id,1,3,5,7,11,13,17,19,23,29,31,37,41
I would like to generate id-score pairs according to 2nd csv file so as to include both scored and unscored guys. For unscored guys, NAN would be used instead of the digit.
In other words, final result is desired to be like this:
id,score
1,16
3,12
5,13
7,NAN
11,8
13,32
17,37
19,NAN
23,74
29,7
31,70
37,NAN
41,83
When I tried to create a new table with the following awk command, it did not work to me. Thanks in advance for any advice.
awk 'FNR==NR{a[$1]++; next} {print $0, (a[$1]) ? a[$2] : "NAN"}' 1.csv 2.csv
here is your script with fixes: set field separators; save the score value for each id; print the value from lookup, if missing NaN
$ awk 'BEGIN {FS=OFS=","}
FNR==NR {a[$1]=$2; next}
{print $1, (($1 in a)?a[$1]:"NAN")}' file1 file2
id,score
1,16
3,12
5,13
7,NAN
11,8
13,32
17,37
19,NAN
23,74
29,7
31,70
37,NAN
41,83
With bash and join:
echo "id,score"
join --header -j 1 -t ',' <(sort 1.csv | grep -v '^id') <(tr ',' '\n' < 2.csv | grep -v '^id' | sort) -e "NAN" -a 2 -o 2.1,1.2 | sort -n
Output:
id,score
1,16
3,12
5,13
7,NAN
11,8
13,32
17,37
19,NAN
23,74
29,7
31,70
37,NAN
41,83
See: man join
With awk could you please try following, written with shown samples in GNU awk. Considering(like your shown samples) your both the Input_files have headers in their first line.
awk -v counter=2 '
FNR==1{
next
}
FNR==NR{
a[FNR]=$0
b[FNR]=$1
next
}
{
if($0==b[counter]){
print a[counter]
counter++
}
else{
print $0",NA"
}
}
' FS="," 1.csv <(tr ',' '\n' < 2.csv)
Explanation: Adding detailed explanation for above.
awk -v counter=2 ' ##Starting awk program from here and setting counter as 2.
FNR==1{ ##Checking condition if line is 1st then do following.
next ##next will skip all further statements from here.
}
FNR==NR{ ##Checking condition if FNR==NR which will be TRUE when Input_file 1.csv is being read.
a[FNR]=$0 ##Creating array a with index FNR and value of current line.
b[FNR]=$1 ##Creating array b with index FNR and value of 1st field of current line.
next ##next will skip all further statements from here.
}
{
if($0==b[counter]){ ##Checking condiiton if current line is same as array b with index counter value then do following.
print a[counter] ##Printing array a with index of counter here.
counter++ ##Increasing count of counter by 1 each time cursor comes here.
}
else{ ##Else part of for above if condition starts here.
print $0",NA" ##Printing current line and NA here.
}
}
' FS="," 1.csv <(tr ',' '\n' < 2.csv) ##Setting FS as , for Input_file 1.csv and sending 2.csv output by changing comma to new line to awk.
An awk solution could be:
awk -v FS=, -v OFS=, '
NR == 1 { print; next }
NR == FNR { score[$1] = $2; next }
{ for (i = 2; i <= NF; ++i)
print $i, score[$i] == "" ? "NAN" : score[$i] }
' 1.csv 2.csv

Comparison of two files using awk and print non matched records

I am comparing two files file1 and file2, i need to print the updated records of file2 which am comparing in file1. i need data changes of file2 and newly added records
File1:
1|footbal|play1
2|cricket|play2
3|tennis|play3
5|golf|play5
File2:
1|footbal|play1
2|cricket|play2
3|tennis|play3
4|soccer|play4
5|golf|play6
output file:
4|soccer|play4
5|golf|play6
i have tried the below solution but its not the expected output
awk -F'|' 'FNR == NR { a[$3] = $3; a[$1]=$1; next; } { if ( !($3 in a) && !($1 in a) ) { print $0; } }' file1.txt file2.txt
i have compared the column1 and column3 from both files
Could you please try following.
awk 'BEGIN{FS="|"}FNR==NR{a[$1,$3];next} !(($1,$3) in a)' Input_file1 Input_file2
OR a non-one liner form of solution.
awk '
BEGIN{
FS="|"
}
FNR==NR{
a[$1,$3]
next
}
!(($1,$3) in a)
' Input_file1 Input_file2
Explanation: Adding detailed explanation for above code.
awk ' ##Starting awk program from here.
BEGIN{ ##Starting BEGIN section of this program from here.
FS="|" ##Setting FS as pipe here as per Input_file(s).
} ##Closing BEGIN block for this awk code here.
FNR==NR{ ##Checking condition FNR==NR which will be TRUE when 1st Input_file named file1 is being read.
a[$1,$3] ##Creating an array named a with index os $1,$3 of current line.
next ##next will skip all further statements.
}
!(($1,$3) in a) ##Checking condition if $1,$3 are NOT present in array a then print that line from Input_file2.
' Input_file1 Input_file2 ##mentioning Input_file names here.
Output will be as follows.
4|soccer|play4
5|golf|play6

grep or awk or sed to concatenate or merge two files linux

I have a file1.csv file which has four columns.
udp,4080,10.11.76.172,10.121.147.99
tcp,22,10.21.146.131,10.131.149.91
tcp,8080,10.56.10.91,10.151.150.90
Another file2.yml file as below
ssh_port: "22"
Jenkins_port: 8080
sqlstr_port: "5162-5164"
I need to compare the two files and merge to one based on the port number.
I have tried something like this.
for port in $(cat file1.csv | cut -d',' -f2); do if [[ $port =~ ..
Is there any simple method where I can merge two files based on the port number, Where I need to get the output similar to like this.
tcp,22,10.21.146.131,10.131.149.91,ssh_port
tcp,8080,10.56.10.91,10.151.150.90,jenkins_port
Could you please try following awk and let me know if this helps you(this will not consider that your values have port ranges in file2).
awk 'FNR==NR{sub(/:/,"",$1);gsub(/\"/,"",$NF);a[$NF]=$1;next} ($2 in a){print $0,a[$2]}' FIle2.yml FS="," OFS="," FIle1.csv
EDIT: If you have ranges in your file2 separated with - then following may help you on same too.
awk 'FNR==NR{sub(/:/,"",$1);gsub(/\"/,"",$NF);if($NF~/-/){num=split($NF,array,"-");for(i=array[1];i<=array[num];i++){a[i]=$1}} else {a[$NF]=$1};next} ($2 in a){print $0,a[$2]}' FIle2.yml FS="," OFS="," FIle1.csv
Adding a non-one liner form of above solution too now.
awk '
FNR==NR{
sub(/:/,"",$1);
gsub(/\"/,"",$NF);
if($NF~/-/) { num=split($NF,array,"-");
for(i=array[1];i<=array[num];i++) { a[i]=$1 }}
else {a[$NF]=$1}; next }
($2 in a) { print $0,a[$2] }
' FIle2.yml FS="," OFS="," FIle1.csv
Extended awk solution considering multiple ports like 5162-5164:
awk 'NR == FNR{
gsub(/[:"]/, "");
len = split($2, a, "-");
for (i=1; i<=len; i++) ports[a[i]] = $1;
next
}
$2 in ports{ print $0, ports[$2] }' file2.yml FS=',' OFS=',' file1.csv
The output:
tcp,22,10.21.146.131,10.131.149.91,ssh_port
tcp,8080,10.56.10.91,10.151.150.90,Jenkins_port

Comparing two CSV files in linux

I have two CSV files with me in the following format:
File1:
No.1, No.2
983264,72342349
763498,81243970
736493,83740940
File2:
No.1,No.2
"7938493","7364987"
"2153187","7387910"
"736493","83740940"
I need to compare the two files and output the matched,unmatched values.
I did it through awk:
#!/bin/bash
awk 'BEGIN {
FS = OFS = ","
}
if (FNR==1){next}
NR>1 && NR==FNR {
a[$1];
next
}
FNR>1 {
print ($1 in a) ? $1 FS "Match" : $1 FS "In file2 but not in file1"
delete a[$1]
}
END {
for (x in a) {
print x FS "In file1 but not in file2"
}
}'file1 file2
But the output is:
"7938493",In file2 but not in file1
"2153187",In file2 but not in file1
"8172470",In file2 but not in file1
7938493,In file1 but not in file2
2153187,In file1 but not in file2
8172470,In file1 but not in file2
Can you please tell me where I am going wrong?
Here are some corrections to your script:
BEGIN {
# FS = OFS = ","
FS = "[,\"]+"
OFS = ", "
}
# if (FNR==1){next}
FNR == 1 {next}
# NR>1 && NR==FNR {
NR==FNR {
a[$1];
next
}
# FNR>1 {
$2 in a {
# print ($1 in a) ? $1 FS "Match" : $1 FS "In file2 but not in file1"
print ($2 in a) ? $2 OFS "Match" : $2 "In file2 but not in file1"
delete a[$2]
}
END {
for (x in a) {
print x, "In file1 but not in file2"
}
}
This is an awk script, so you can run it like awk -f script.awk file1 file2. Doing so gives these results:
$ awk -f script.awk file1 file2
736493, Match
763498, In file1 but not in file2
983264, In file1 but not in file2
The main problem with your script was that it didn't correctly handle the double quotes around the numbers in file2. I changed the input field separator so that the double quotes are treated as part of the separator to deal with this. As a result, the first field $1 in the second file is empty (it is the bit between the start of the line and the first "), so you need to use $2 to refer to the first value you're interested in. Aside from that, I removed some redundant conditions from your other blocks and used OFS rather than FS in your first print statement.

Resources