Awk script to sum multiple column if value in column1 is duplicate - linux

Need your help to resolve the below query.
I want to sum up the values for column3,column5,column6, column7,column9,column10 if value in column1 is duplicate.
Also need to make duplicate rows as single row in output file and also need to put the value of column1 in column 8 in output file
input file
a|b|c|d|e|f|g|h|i|j
IN27201800024099|a|2.01|ad|5|56|6|rr|1|5
IN27201800023963|b|3|4|rt|67|6|61|ty|6
IN27201800024099|a|4|87|ad|5|6|1|rr|7.45
IN27201800024099|a|5|98|ad|5|6|1|rr|8
IN27201800023963|b|7|7|rt|5|5|1|ty|56
IN27201800024098|f|80|67|ty|6|6|1|4rght|765
output file
a|b|c|d|e|f|g|h|i|j|k
IN27201800024099|a|11.01|190|ad|66|18|3|rr|20.45|IN27201800024099
IN27201800023963|b|10|11|rt|72|11|62|ty|62|IN27201800023963
IN27201800024098|f|80|67|ty|6|6|1|4rght|765|IN27201800024098
Tried below code, but it is not working and also no clue how to complete the code to get correct output
awk 'BEGIN {FS=OFS="|"} FNR==1 {a[$1]+= (f3[key]+=$3;f5[key]+=$5;f6[key]+=$6;f7[key]+=$7;f9[key]+=$9;f10[key]+=$10;)} input.txt > output.txt

$ cat tst.awk
BEGIN {
FS=OFS="|"
}
NR==1 {
print $0, "h"
next
}
{
keys[$1]
for (i=2; i<=NF; i++) {
sum[$1,i] += $i
}
}
END {
for (key in keys) {
printf "%s", key
for (i=2; i<=NF; i++) {
printf "%s%s", OFS, sum[key,i]
}
print OFS key
}
}
$ awk -f tst.awk file
a|b|c|d|e|f|g|h
IN27201800023963|10|11|72|11|62|62|IN27201800023963
IN27201800024098|80|67|6|0|1|765|IN27201800024098
IN27201800024099|11.01|190|66|18|3|20.45|IN27201800024099
The above outputs the lines in random order, if you want them output in the same order as the key values were read in, it's just a couple more lines of code:
$ cat tst.awk
BEGIN {
FS=OFS="|"
}
NR==1 {
print $0, "h"
next
}
!seen[$1]++ {
keys[++numKeys] = $1
}
{
for (i=2; i<=NF; i++) {
sum[$1,i] += $i
}
}
END {
for (keyNr=1; keyNr<=numKeys; keyNr++) {
key = keys[keyNr]
printf "%s", key
for (i=2; i<=NF; i++) {
printf "%s%s", OFS, sum[key,i]
}
print OFS key
}
}
$ awk -f tst.awk file
a|b|c|d|e|f|g|h
IN27201800024099|11.01|190|66|18|3|20.45|IN27201800024099
IN27201800023963|10|11|72|11|62|62|IN27201800023963
IN27201800024098|80|67|6|0|1|765|IN27201800024098

Related

Merge two files using AWK with conditions

I am new to bash scripting and need help with below Question. I parsed a log file to get below and now stuck on later part.
I have a file1.csv with content as:
mac-test-1,10.32.9.12,15
mac-test-2,10.32.9.13,10
mac-test-3,10.32.9.14,11
mac-test-4,10.32.9.15,13
and second file2.csv has below content:
mac-test-3,10.32.9.14
mac-test-4,10.32.9.15
I want to do a file comparison and if the line in second file matches any line in first file then change the content of file 1 as below:
mac-test-1,10.32.9.12, 15, no match
mac-test-2,10.32.9.13, 10, no match
mac-test-3,10.32.9.14, 11, matched
mac-test-4,10.32.9.15, 13, matched
I tried this
awk -F "," 'NR==FNR{a[$1]; next} $1 in a {print $0",""matched"}' file2.csv file1.csv
but it prints below and doesn't include the not matching records
mac-test-3,10.32.9.14,11,matched
mac-test-4,10.32.9.15,13,matched
Also, in some cases the file2 can be empty so the result should be like this:
mac-test-1,10.32.9.12,15, no match
mac-test-2,10.32.9.13,10, no match
mac-test-3,10.32.9.14,11, no match
mac-test-4,10.32.9.15,13, no match
With your shown samples please try following awk code. You need not to check condition first and then print the statement because when you are checking $1 in a then those items who doesn't exist will NEVER come inside this condition's block. So its better to print whole line
of file1.csv and then print status of that particular line either its matched OR not-matched based on their existence inside array.
awk '
BEGIN { FS=OFS="," }
FNR==NR{
arr[$0]
next
}
{
print $0,(($1 OFS $2) in arr)?"Matched":"Not-matched"
}
' file2.csv file1.csv
EDIT: Adding a solution to handle empty file of file2.csv scenario here, same concept wise as above only thing it handles scenarios when file2.csv is an Empty file.
awk -v lines=$(wc -l < file2.csv) '
BEGIN { FS=OFS=","}
(lines==0){
print $0,"Not-Matched"
next
}
FNR==NR{
arr[$0]
next
}
{
print $0,(($1 OFS $2) in arr)?"Matched":"Not-matched"
}
' file2.csv file1.csv
You are not printing the else case:
awk -F "," 'NR==FNR{a[$1]; next}
{
if ($1 in a) {
print $0 ",matched"
} else {
print $0 ",no match"
}
}' file2.csv file1.csv
Output
mac-test-1,10.32.9.12,15,no match
mac-test-2,10.32.9.13,10,no match
mac-test-3,10.32.9.14,11,matched
mac-test-4,10.32.9.15,13,matched
Or in short, without manually printing the comma but using OFS:
awk 'BEGIN{FS=OFS=","} NR==FNR{a[$1];next}{ print $0 OFS (($1 in a)?"":"no")"match"}' file2.csv file1.csv
Edit
I found a solution on this page handling FNR==NR on an empty file.
When file2.csv is empty, all output lines will be:
mac-test-1,10.32.9.12,15,no match
Example
awk -F "," '
ARGV[1] == FILENAME{a[$1];next}
{
if ($1 in a) {
print $0 ",matched"
} else {
print $0 ",no match"
}
}' file2.csv file1.csv
Each of #RavinderSingh13's and #Thefourthbird's answers contain large parts of the solution but here it is all together:
awk '
BEGIN { FS=OFS="," }
{ key = $1 FS $2 }
FILENAME == ARGV[1] {
arr[key]
next
}
{
print $0, ( key in arr ? "matched" : "no match")
}
' file2.csv file1.csv
or if you prefer:
awk '
BEGIN { FS=OFS="," }
{ key = $1 FS $2 }
!f {
arr[key]
next
}
{
print $0, ( key in arr ? "matched" : "no match")
}
' file2.csv f=1 file1.csv

Merge two files AWK

I have to merge two files and need help with:
File1.csv
mac-test-2,10.57.8.2,Compliant
mac-test-6,10.57.8.6,Compliant
mac-test-12,10.57.8.12,Compliant
mac-test-17,10.57.8.17,Noncompliant
File2.csv
mac-test-17,10.57.8.17,2022-10-21
After Merge the content should be
Merge.csv
mac-test-2,10.57.8.2,Compliant,NA
mac-test-6,10.57.8.6,Compliant,NA
mac-test-12,10.57.8.12,Compliant,NA
mac-test-17,10.57.8.17,Noncompliant,2022-10-21
so logic is if the File1.txt doesnt have a matching record in File2.txt then "NA" should be inserted and if it is a match then date should be inserted in the fourth column.
I have written below
awk -F "," '
ARGV[1] == FILENAME{a[$1];next}
{
if ($1 in a) {
print $0 ","
} else {
print $0 ",NA"
}
}
' File2.csv File1.csv
But this is printing
mac-test-2,10.57.8.2,Compliant,NA
mac-test-6,10.57.8.6,Compliant,NA
mac-test-12,10.57.8.12,Compliant,NA
mac-test-17,10.57.8.17,Noncompliant,
I am not sure how I can print the date if it matches.
With your shown samples please try following awk code. Written and tested with your shown samples only.
awk '
BEGIN{ FS=OFS="," }
FNR==NR{
arr[$1]=$NF
next
}
{
print $0,($1 in arr?arr[$1]:"NA")
}
' file2.csv file1.csv
To handle empty file2.csv please try following awk program.
awk '
BEGIN{ FS=OFS="," }
ARGV[1] == FILENAME{
arr[$1]=$NF
next
}
{
if ($1 in arr) {
print $0,arr[$1]
}
else{
print $0,"N/A"
}
}' file2.csv file1.csv

comparing files Unix

I have 2 scripts file.txt and file2.txt
file1.txt
name|mandatory|
age|mandatory|
address|mandatory|
email|mandatory|
country|not-mandatory|
file2.txt
gabrielle||nashville|gabrielle#outlook.com||
These are my exact data files, In file1 column1 is the field name and column2 is to note whether the field should not be null in file2.
In file2 data is in single row separated by |.
The age mentioned as mandatory in file1 is not present in file2[which is a single row] and that is what my needed output too.
Expected output:
age mandatory
I got with code that file2 is in same format as file1 where mandatory is replaced with field2 data.
awk -F '|' '
NR==FNR && $3=="mandatory" {m[$2]++}
NR>FNR && $3=="" && m[$2] {printf "%s mandatory\n", $2}
' file1.txt file2.txt
You have to iterate over fields for(... i <= NR ...).
awk -F '|' '
NR==FNR { name[NR]=$1; man[NR]=$2 }
NR!=FNR {
for (i = 1; i <= NR; ++i) {
if ($i == "" && man[i] == "mandatory") {
printf("Field %s is mandatory!\n", name[i]);
}
}
}
' file1.txt file2.txt

manipulating files using awk linux

I have a 1.txt file (with field separator as ||o||):
aidagolf6#gmail.com||o||bb1e6b92d60454122037f302359d8a53||o||Aida||o||Aida||o||Muji?
aidagolf6#gmail.com||o||bcfddb5d06bd02b206ac7f9033f34677||o||Aida||o||Aida||o||Muji?
aidagolf6#gmail.com||o||bf6265003ae067b19b88fa4359d5c392||o||Aida||o||Aida||o||Garic Gara
aidagolf6#gmail.com||o||d3a6a8b1ed3640188e985f8a1efbfe22||o||Aida||o||Aida||o||Muji?
aidagolfa#hotmail.com||o||14f87ec1e760d16c0380c74ec7678b04||o||Aida||o||Aida||o||Rodriguez Puerto
2.txt (with field separator as :):
bf6265003ae067b19b88fa4359d5c392:hyworebu:#
14f87ec1e760d16c0380c74ec7678b04:sujycugu
I have a result.txt file (which will match 2nd column of 1.txt with first column of 2.txt and if results match, it will replace the 2nd column of 1.txt with 2nd column of 2.txt)
aidagolf6#gmail.com||o||hyworebu:#||o||Aida||o||Aida||o||Garic Gara
aidagolfa#hotmail.com||o||sujycugu||o||Aida||o||Aida||o||Rodriguez Puerto
And a left.txt file (which contains unmatched rows from 1.txt that have no match in 2.txt):
aidagolf6#gmail.com||o||d3a6a8b1ed3640188e985f8a1efbfe22||o||Aida||o||Aida||o||Muji?
aidagolf6#gmail.com||o||bb1e6b92d60454122037f302359d8a53||o||Aida||o||Aida||o||Muji?
aidagolf6#gmail.com||o||bcfddb5d06bd02b206ac7f9033f34677||o||Aida||o||Aida||o||Muji?
The script I am trying is:
awk -F '[|][|]o[|][|]' -v s1="||o||" '
NR==FNR {
a[$2] = $1;
b[$2]= $3s1$4s1$5;
next
}
($1 in a){
$1 = "";
sub(/:/, "")
print a[$1]s1$2s1b[$1] > "result.txt";
next
}' 1.txt 2.txt
The problem is the script is using ||o|| in 2.txt also due to which I am getting wrong results.
EDIT
Modified script:
awk -v s1="||o||" '
NR==FNR {
a[$2] = $1;
b[$2]= $3s1$4s1$5;
next
}
($1 in a){
$1 = "";
sub(/:/, "")
print a[$1]s1$2s1b[$1] > "result.txt";
next
}' FS = "||o||" 1.txt FS = ":" 2.txt
Now, I am getting following error:
awk: fatal: cannot open file `FS' for reading (No such file or
directory)
I've modified your original script:
awk -F'[|][|]o[|][|]' -v s1="||o||" '
NR == FNR {
a[$2] = $1;
b[$2] = $3 s1 $4 s1 $5;
c[$2] = $0; # keep the line for left.txt
}
NR != FNR {
split($0, d, ":");
r = substr($0, index($0, ":") + 1); # right side of the 1st ":"
if (a[d[1]] != "") {
print a[d[1]] s1 r s1 b[d[1]] > "result.txt";
c[d[1]] = ""; # drop from the list of left.txt
}
}
END {
for (var in c) {
if (c[var] != "") {
print c[var] > "left.txt"
}
}
}' 1.txt 2.txt
Next verion changes the order of file reading to reduce memory consumption:
awk -F'[|][|]o[|][|]' -v s1="||o||" '
NR == FNR {
split($0, a, ":");
r = substr($0, index($0, ":") + 1); # right side of the 1st ":"
map[a[1]] = r;
}
NR != FNR {
if (map[$2] != "") {
print $1 s1 map[$2] s1 $3 s1 $4 s1 $5 > "result.txt";
} else {
print $0 > "left.txt"
}
}' 2.txt 1.txt
and the final version makes use of file-based database which minimizes DRAM consumption, although I'm not sure if Perl is acceptable in your system.
perl -e '
use DB_File;
$file1 = "1.txt";
$file2 = "2.txt";
$result = "result.txt";
$left = "left.txt";
my $dbfile = "tmp.db";
tie(%db, "DB_File", $dbfile, O_CREAT|O_RDWR, 0644) or die "$dbfile: $!";
open(FH, $file2) or die "$file2: $!";
while (<FH>) {
chop;
#_ = split(/:/, $_, 2);
$db{$_[0]} = $_[1];
}
close FH;
open(FH, $file1) or die "$file1: $!";
open(RESULT, "> $result") or die "$result: $!";
open(LEFT, "> $left") or die "$left: $!";
while (<FH>) {
#_ = split(/\|\|o\|\|/, $_);
if (defined $db{$_[1]}) {
$_[1] = $db{$_[1]};
print RESULT join("||o||", #_);
} else {
print LEFT $_;
}
}
close FH;
untie %db;
'
rm tmp.db

Sending 'awk' results of several files to new files with the same name

I have a folder that contains text files. I need to extract 20 lines right after the LAST word 'Input' and send the results to files of the same name in a different folder
I use the following:
for i in error/*.log; do awk '/Input/ {n=NR} {a[NR]=$0} END {for (i=n;i<=n+20;i++) print a[i]}' $i > exceeded/`basename $i` done
What am I doing wrong?
Thanks for the help in advance
If you have spaces in the file names (the * part of *.log) then try this:
for i in error/*.log; do awk '/Input/ {n=NR} {a[NR]=$0} END {for (i=n;i<=n+20;i++) print a[i]}' "$i" > "exceeded/`basename \"$i\"`" ; done
Also, I am assuming that both the "error" and "exceeded" directories are in the same directory (and that "exceeded" already exists.)
I expect no upvotes for this:
When I hear "do something after the last ...", I think "reverse the input and do something after the **first* ..."
This awk only has to remember 20 lines: helpful if you have large files.
for f in error/*.log; do
tac "$f" |
awk -v n=20 -v pattern="input" '
BEGIN { for (i=1; i<=n; i++) line[i] = "" }
function keep_line(l) {
for (i=2; i<=n; i++) line[i-1] = line[i]
line[n] = l
}
$0 ~ pattern {
for (i=1; i<=n; i++) print line[i]
exit
}
{ keep_line($0) }
' |
tac > "exceeded/$(basename "$f")"
done
How's this?
for i in error/*.log; do
awk '/Input/ { i=21; delete a; next }
--i > 0 { a[21-i] = $0 }
END { for (i=1; i <=20; ++i) print a[i] }' "$i" >exceeded/"${i#error/}"
done
for i in error/*.log; do
awk 'NR==FNR{if(/Input/)n=NR;next} FNR>n && FNR<(n+21)' "$i" "$i" > exceeded/$(basename "$i")
done

Resources