I managed to find half of the solution to my challenge, but I cannot find a way to add a conditional to deal with the other half. I am using awk. The field separator is ; and the values are inside double-quotes ". The files have only 3x fields each.
I have two files (file1.txt file2.txt) and want to store the differences in a third file(results.txt).
file1.txt
"SWITCH1";"rack7";"Datacenter1"
"SWTICH46";"rack1";"rack1"
"ROUTER3";"";"rack1"
"SWITCH7";"rack1";"rack1"
"ROUTER9";"rack1";"rack1"
"ROUTER22";"rack1";"Datacenter4"
file2.txt
"SWITCH1";"rack7";"Datacenter1"
"ROUTER22";";"Datacenter4"
"SWITCH51";"rack7";"Datacenter2"
If I use:
awk -F';' 'FNR==NR {a[$0];next} !($0 in a)' file1.txt file2.txt
I get:
"ROUTER22";";"Datacenter4"
"SWITCH51";"rack7";"Datacenter2"
But I want to treat $2 in file2.txt " and $2 in file1.txt rack1 not as a difference between files. Therefore whenever I find an entry on file2.txt that has " in field $2 and rack1 in field $2 in file1.txt for the same $1, I do not want to treat as difference and discard it.
The file is generated dynamically nightly and when it happens; field $2==rack1 in file1.txt while field $2==" in file2.txt. This is the match to exclude as well as the one I managed to exclude with the awk command above. Below is the expected output:
Desired results.txt
"SWITCH51";"rack7";"Datacenter2"
I am struggling to find a conditional to handle this scenario.
You could store the original lines in array a, like you do, plus modified lines where "rack1" is replaced by ":
$ awk -F';' -vOFS=';' 'FNR==NR {a[$0]; if($2=="\"rack1\"") {$2="\"";a[$0]}; next}
!($0 in a)' file1.txt file2.txt
"SWITCH51";"rack7";"Datacenter2"
Note the specification of the OFS output field separator. It is needed because when we modify the $2 field awk reconstructs $0 using the OFS which by default is a space while we need it to remain a semi-column for correct comparison when parsing file2.txt.
You could check if the value of field 2 is just " and replace it with "rack1"
If after the replacement $0 is not in array a then print the unmodified row which is the tmp variable in the example.
awk '
BEGIN{FS=OFS=";"}
FNR==NR {a[$0];next}
{
tmp = $0
sub(/^"$/, "\"rack1\"", $2)
if (!($0 in a)) print tmp
}
' file1.txt file2.txt
Output
"SWITCH51";"rack7";"Datacenter2"
Based on your shown samples, please try following awk code. Simple explanation would be, in first Input_file's reading creating 2 arrays a and b with index of $0 and $1,$3 respectively. In next Input_file's reading checking 2 conditions if $1,$3 is NOT present in b AND $0 is not present in a then print that line from Input_file2.
awk -F';' '
FNR==NR{
a[$0]
b[$1,$3]
next
}
!(($1,$3) in b) && !($0 in a)
' file1.txt file2.txt
awk -F';' '
NR==FNR { a[$0]; next }
{ key = $1 FS ($2 == "\"" ? "\"rack1\"" : $2) FS $3 }
!(key in a)
' file1.txt file2.txt
"SWITCH51";"rack7";"Datacenter2"
I have a question about vlookup function implementation with awk. I have a csv file having id-score pairs like this (say 1.csv):
id,score
1,16
3,12
5,13
11,8
13,32
17,37
23,74
29,7
31,70
41,83
There are "unscored" guys. I also have a csv file including all registered guys both scored and unscored like this (say, 2.csv) (I transposed for the want of space)
id,1,3,5,7,11,13,17,19,23,29,31,37,41
I would like to generate id-score pairs according to 2nd csv file so as to include both scored and unscored guys. For unscored guys, NAN would be used instead of the digit.
In other words, final result is desired to be like this:
id,score
1,16
3,12
5,13
7,NAN
11,8
13,32
17,37
19,NAN
23,74
29,7
31,70
37,NAN
41,83
When I tried to create a new table with the following awk command, it did not work to me. Thanks in advance for any advice.
awk 'FNR==NR{a[$1]++; next} {print $0, (a[$1]) ? a[$2] : "NAN"}' 1.csv 2.csv
here is your script with fixes: set field separators; save the score value for each id; print the value from lookup, if missing NaN
$ awk 'BEGIN {FS=OFS=","}
FNR==NR {a[$1]=$2; next}
{print $1, (($1 in a)?a[$1]:"NAN")}' file1 file2
id,score
1,16
3,12
5,13
7,NAN
11,8
13,32
17,37
19,NAN
23,74
29,7
31,70
37,NAN
41,83
With bash and join:
echo "id,score"
join --header -j 1 -t ',' <(sort 1.csv | grep -v '^id') <(tr ',' '\n' < 2.csv | grep -v '^id' | sort) -e "NAN" -a 2 -o 2.1,1.2 | sort -n
Output:
id,score
1,16
3,12
5,13
7,NAN
11,8
13,32
17,37
19,NAN
23,74
29,7
31,70
37,NAN
41,83
See: man join
With awk could you please try following, written with shown samples in GNU awk. Considering(like your shown samples) your both the Input_files have headers in their first line.
awk -v counter=2 '
FNR==1{
next
}
FNR==NR{
a[FNR]=$0
b[FNR]=$1
next
}
{
if($0==b[counter]){
print a[counter]
counter++
}
else{
print $0",NA"
}
}
' FS="," 1.csv <(tr ',' '\n' < 2.csv)
Explanation: Adding detailed explanation for above.
awk -v counter=2 ' ##Starting awk program from here and setting counter as 2.
FNR==1{ ##Checking condition if line is 1st then do following.
next ##next will skip all further statements from here.
}
FNR==NR{ ##Checking condition if FNR==NR which will be TRUE when Input_file 1.csv is being read.
a[FNR]=$0 ##Creating array a with index FNR and value of current line.
b[FNR]=$1 ##Creating array b with index FNR and value of 1st field of current line.
next ##next will skip all further statements from here.
}
{
if($0==b[counter]){ ##Checking condiiton if current line is same as array b with index counter value then do following.
print a[counter] ##Printing array a with index of counter here.
counter++ ##Increasing count of counter by 1 each time cursor comes here.
}
else{ ##Else part of for above if condition starts here.
print $0",NA" ##Printing current line and NA here.
}
}
' FS="," 1.csv <(tr ',' '\n' < 2.csv) ##Setting FS as , for Input_file 1.csv and sending 2.csv output by changing comma to new line to awk.
An awk solution could be:
awk -v FS=, -v OFS=, '
NR == 1 { print; next }
NR == FNR { score[$1] = $2; next }
{ for (i = 2; i <= NF; ++i)
print $i, score[$i] == "" ? "NAN" : score[$i] }
' 1.csv 2.csv
I am comparing two files file1 and file2, i need to print the updated records of file2 which am comparing in file1. i need data changes of file2 and newly added records
File1:
1|footbal|play1
2|cricket|play2
3|tennis|play3
5|golf|play5
File2:
1|footbal|play1
2|cricket|play2
3|tennis|play3
4|soccer|play4
5|golf|play6
output file:
4|soccer|play4
5|golf|play6
i have tried the below solution but its not the expected output
awk -F'|' 'FNR == NR { a[$3] = $3; a[$1]=$1; next; } { if ( !($3 in a) && !($1 in a) ) { print $0; } }' file1.txt file2.txt
i have compared the column1 and column3 from both files
Could you please try following.
awk 'BEGIN{FS="|"}FNR==NR{a[$1,$3];next} !(($1,$3) in a)' Input_file1 Input_file2
OR a non-one liner form of solution.
awk '
BEGIN{
FS="|"
}
FNR==NR{
a[$1,$3]
next
}
!(($1,$3) in a)
' Input_file1 Input_file2
Explanation: Adding detailed explanation for above code.
awk ' ##Starting awk program from here.
BEGIN{ ##Starting BEGIN section of this program from here.
FS="|" ##Setting FS as pipe here as per Input_file(s).
} ##Closing BEGIN block for this awk code here.
FNR==NR{ ##Checking condition FNR==NR which will be TRUE when 1st Input_file named file1 is being read.
a[$1,$3] ##Creating an array named a with index os $1,$3 of current line.
next ##next will skip all further statements.
}
!(($1,$3) in a) ##Checking condition if $1,$3 are NOT present in array a then print that line from Input_file2.
' Input_file1 Input_file2 ##mentioning Input_file names here.
Output will be as follows.
4|soccer|play4
5|golf|play6
I am trying to merge data from 2 text files based on some condition.
I have two files:
1.txt
gera077||o||emi_riv_90#hotmail.com||||200.45.113.254||o||0f8caa3ced5dc172901a427410d20540
okan1993||||killa-o#hotmail.de||||84.141.125.140||o||69c1cb5ddbc66cceebe0dddba3eddf68
Tosiunia||||tosia_19#amorki.pl||o||83.22.193.86|||||ddcbba2076646980391cb4971b8030
DREP
glen-666||o||glen-666#hotmail.com||||84.196.42.167||o||f139d8b49085d012af9048bb1cba3534
Page 1
Sheyes1 ||||summer_faerie_dustyrose#yahoo.com|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
.
BenPhynix||||BenPhynix#aol.de||||| 62.226.181.57||||11dea24f1caebb012e11285579050f38
menopause |||totoche#wanadoo.fr||o||83.193.209.52||o||d7ca4d78fc79a795695ae1c161ce82ea
jonof.|o||joflem#medi3.no||o||213.161.242.106||o||239f33743e4a070b728d4dcbd1091f1a
2.txt
f139d8b49085d012af9048bb1cba3534: 12883 #: "#
d7ca4d78fc79a795695ae1c161ce82ea: 123422
0f8caa3ced5dc172901a427410d20540 :: demo
Contains the matching lines from 1.txt and hash is replaced with corresponding value in 2.txt
result.txt
gera077 || o || emi_riv_90#hotmail.com || or || 200.45.113.254 || o ||: demo
glen-666-||glen-666#hotmail.com||||84.196.42.167||||12883 #: "#
menopause |||totoche#wanadoo.fr||o||83.193.209.52||o||123422
Contains the non-matching lines from 1.txt
left.txt
okan1993||||killa-o#hotmail.de||||84.141.125.140||o||69c1cb5ddbc66cceebe0dddba3eddf68
Tosiunia||||tosia_19#amorki.pl||o||83.22.193.86|||||ddcbba2076646980391cb4971b8030
DREP
Page 1
Sheyes1 ||||summer_faerie_dustyrose#yahoo.com|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
.
BenPhynix||||BenPhynix#aol.de||||| 62.226.181.57||||11dea24f1caebb012e11285579050f38
jonof.|o||joflem#medi3.no||o||213.161.242.106||o||239f33743e4a070b728d4dcbd1091f1a
The script I am trying is :
awk -v s1="||o||" '
FNR==NR{
a[$9]=$1 s1 $5;
b[$9]=$13 s1 $17 s1 $21;
c[$9]=$0;
next
}
($1 in a){
val=$1;
$1="";
sub(/:/,"");
print a[val] s1 $0 s1 b[val];
d[val]=$0;
next
}
END{
for(i in d){
delete c[i]
};
for(j in c){
print c[j] > "left.txt"
}}
' FS="|" 1.txt FS=":" OFS=":" 2.txt > result.txt
But it is giving me empty result.txt
I am facing difficulty in debugging the issue.
Any help would be highly appreciated.
Try following awk(completely based on your shown Input_file(s) and considering that your 2.txt will not have any duplicates on it too) and let me know if this helps you.
awk 'FNR==NR{a[$NF]=$0;next} $1~/:/{sub(/:/,"",$1);flag=1} ($1 in a){val=$1;if($0 ~ /:/ && !flag){sub(/[^:]*/,"");sub(/:/,"")};print a[val] OFS $0 > "result.txt";flag="";delete a[val]} END{for(i in a){print a[i]>"left.txt"}}' FS="|" 1.txt FS=" " OFS="||o||" 2.txt
Output will be 2 files named results.txt and left.txt. Will add non-one liner form and explanation too for above code shortly.
Adding a non-one liner form of solution too now.
awk '
FNR==NR{ ##FNR and NR both are awk out of the box variables and they denote line numbers in Input_file(s), difference between them is FNR value will be RESET when it complete reading 1 Input_file and NR value will be keep increasing till it completes reading all the Input_file(s).
a[$NF]=$0; ##Creating an array named a whose index is $NF(value of last field of current line) and value is current line.
next ##next is awk out of the box keyword which will skip all further statements now.
}
$1~/:/{ ##Checking condition here if current lines 1st field has a colon in it then do following:
sub(/:/,"",$1); ##Using sub function of awk which will substitute colon with NULL of 1st field of current line of current Input_file.
flag=1 ##Setting a variable named flag here(basically to make sure that 1st colon is substituted so need for another colon removal.
}
($1 in a){ ##Checking a condition here if current line $1 is present in array a then do following:
val=$1; ##Setting variable named val value to $1 here.
if($0 ~ /:/ && !flag){ ##Checking condition here if current line is having colon and variable flag is NOT NULL then do following:
sub(/[^:]*/,""); ##Substituting all the values from starting to till colon comes with NULL.
sub(/:/,"")}; ##Then substituting only 1 colon here.
print a[val] OFS $0 > "result.txt"; ##printing the value of array a whose index is variable val OFS(output field separator) current line values to output file named results.txt here.
flag=""; ##Unsetting the value of variable flag here.
delete a[val] ##Deleting the value of array a whose index is variable val here.
}
END{ ##Starting end section of this awk program here. which will be executed once all Input_file(s) have been read.
for(i in a){ ##Traversing through the array a now.
print a[i]>"left.txt"} ##Printing the value of array a(which will basically provide those values which are NOT matched in both files) in left.txt file.
}
' FS="|" 1.txt FS=" " OFS="||o||" 2.txt ##Setting FS="|" for 1.txt Input_file and then setting FS=" " and OFS="||o||" for 2.txt Input_file, 1.txt and 2.txt are Input_files for this program to run.
This awk script may also help.
$ awk 'BEGIN{FS="\|";OFS="|"}NR==FNR{data[$1]=$2;}
NR!=FNR{if($NF in data){
$NF=data[$NF];print >"result.txt"
}else{
print >"left.txt"}
}' <( sed 's/\s*:\s*/|/' 2.txt) 1.txt 2>/dev/null
Output
$ cat result.txt
gera077||o||emi_riv_90#hotmail.com||||200.45.113.254||o||: demo
glen-666||o||glen-666#hotmail.com||||84.196.42.167||o||12883 #: "#
menopause |||totoche#wanadoo.fr||o||83.193.209.52||o||123422
$ cat left.txt
okan1993||||killa-o#hotmail.de||||84.141.125.140||o||69c1cb5ddbc66cceebe0dddba3eddf68
Tosiunia||||tosia_19#amorki.pl||o||83.22.193.86|||||ddcbba2076646980391cb4971b8030
DREP
Page 1
Sheyes1 ||||summer_faerie_dustyrose#yahoo.com|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
.
BenPhynix||||BenPhynix#aol.de||||| 62.226.181.57||||11dea24f1caebb012e11285579050f38
jonof.|o||joflem#medi3.no||o||213.161.242.106||o||239f33743e4a070b728d4dcbd1091f1a
We have preprocessed the first file - using sed - to make its field delimiter | and used process substitution to pass the result to awk.
I have 2 files:
1.txt:
e10adc3949ba59abbe56e057f20f883e
f8b46e989c5794eec4e268605b63eb59
e3ceb5881a0a1fdaad01296d7554868d
2.txt:
e10adc3949ba59abbe56e057f20f883e:1111
679ab793796da4cbd0dda3d0daf74ec1:1234
f8b46e989c5794eec4e268605b63eb59:1#/233:
I want 2 files as output:
One is result.txt which contains lines from 2.txt whose match is in 1.txt
and another is left.txt which contains lines from 1.txt whose match is not in 2.txt
Expected output of both files is below:
result.txt
e10adc3949ba59abbe56e057f20f883e:1111
f8b46e989c5794eec4e268605b63eb59:1#/233:
left.txt
e3ceb5881a0a1fdaad01296d7554868d
I tried 1-2 approaches with awk but not succeeded. Any help would be highly appreciated.
My script:
awk '
FNR==NR{
val=$1;
sub(/[^:]*/,"");
sub(/:/,"");
a[val]=$0;
next
}
!($NF in a){
print > "left.txt";
next
}
{
print $1,$2,a[$NF]> "result.txt"
}
' FS=":" 2.txt FS=":" OFS=":" 1.txt
Following awk may help you in same.
awk 'FNR==NR{a[$1]=$0;next} ($0 in a){print a[$0] > "results.txt";next} {print > "left.txt"}' FS=":" OFS=":" 2.txt FS=" " OFS=":" 1.txt
EDIT: Adding explanation of code too here.
awk '
FNR==NR{ ##FNR==NR condition will be TRUE when first Input_file is being read by awk. Where FNR and NR are the out of the box variables for awk.
a[$1]=$0; ##creating an array named a whose index is $1 and value is $2 from 2.txt Input_file.
next ##next is out of the box keyword from awk and will skip all further statements of awk.
}
($0 in a){ ##Checking here condition if current line of Input_file 1.txt is present in array named a then do following.
print a[$0] > "results.txt"; ##Printing the current line into output file named results.txt, since current line is coming in array named a(which was created by 1st file).
next ##next is awk keyword which will skip further statements for awk code now.
}
{
print > "left.txt" ##Printing all lines which skip above condition(which means they did not come into array a) to output file named left.txt as per OP need.
}
' FS=":" OFS=":" 2.txt FS=" " OFS=":" 1.txt ##Setting FS(field separator) as colon for 2.txt and Setting FS to space for 1.txt here. yes, we could set multiple field separators for different Input_file(s).
How about this one:
awk 'BEGIN{ FS = ":" }NR==FNR{ a[$0]; next }$1 in a{ print $0 > "results.txt"; delete a[$1]; next }END{ for ( i in a ) print i > "left.txt" }' 1.txt 2.txt
Output:
results.txt
e10adc3949ba59abbe56e057f20f883e:1111
f8b46e989c5794eec4e268605b63eb59:1#/233:
left.txt
e3ceb5881a0a1fdaad01296d7554868d