awk group by and print if matches a condition - linux

I have this structure:
aaa,up
bbb,down
aaa,down
aaa,down
aaa,up
bbb,down
ccc,down
ccc,down
ddd,up
ddd,down
And I would like to have the next output:
aaa,up
bbb,down
ccc,down
ddd,up
So, the firs thing is to group by. Then, if at least one line is up print up else print down.
So far I have this:
awk -F"," '$2=="up"{arr[$1]++}END{for (a in arr) print a,arr[a]}'
Then I change $2=="down" and join the two results into one. But with this, I have duplicated values for up's and down's.
Sometimes, instead of ups and downs I receive 0,1,2,3,4 which are more variables and the up status is 0 and 1.
Thanks in advance.

How about save the value you see, with a preference for "up"?
awk -F "," '$2 ~ /0^(0|1)$/ { $2 = "up" }
$2 ~ /^[2-9]/ { $2 = "down" }
$2 == "up" || !($1 in a) { a[$1]=$2 }
END { OFS=FS; for(k in a) print k, a[k] }' file | sort
That is, if the value is "up", we always save it. Otherwise, we only save the value if we don't yet have a value for this key.
I'm not sure I grasped your 0,1,2,3,4 requirement. The first lines now convert a number into either "up" or "down".

It's similar to triplee one, but imho it's sufficiently different to have an answer on its own, in particular I think that the logical flow is clearer by skipping processing when the variable has already been "upped", and the job of discriminating the different possible types of $2 is handled to a simple user function
awk -F"," '
function up_p(x){
if(x==0||x=="down") return "down"; else return "up"
}
a[$1]=="up" {next}
{a[$1]=up_p($2)}
END {for(k in a) print k "," a[k]}' file | sort
aaa,up
bbb,down
ccc,down
ddd,up
On second thought, the user function is unnecessary...
awk -F"," '
a[$1]=="up" {next}
{a[$1]=($2==0||$2=="down")?"down":"up"}
END {for(k in a) print k "," a[k]}' file | sort
aaa,up
bbb,down
ccc,down
ddd,up
but it comes down to personal taste so I leave both versions in my answer.

Related

using awk to count the number of occurrences of pattern from another file

I am trying to take a file containing a list and count how many times items in that list occur in a target file. something like:
list.txt
blonde
red
black
target.txt
bob blonde male
sam blonde female
desired_output.txt
blonde 2
red 0
black 0
I have coopted the following code to get the values that are present in target.txt:
awk '{count[$2]++} END {for (word in count) print word, count[word]}' target.txt
But the output does not include the desired items that are in the liist.txt but not the target.txt
current_output.txt
blonde 2
I have tried a few things to get this working including:
awk '{word[$1]++;next;count[$2]++} END {for (word in count) print word, count[word]}' list.txt target.txt
However, I have had no success.
Could anyone help me make it so that this awk statement reads the key.txt file? any explanation of the code would also be much appreciated.
Thanks!
awk '
NR==FNR{a[$0]; next}
{
for(i=1; i<=NF; i++){
if ($i in a){ a[$i]++ }
}
}
END{
for(key in a){ printf "%s %d\n", key, a[key] }
}
' list.txt target.txt
NR==FNR{a[$0]; next} The condition NR==FNR is only true for the first file, so
the keys of array a are lines of list.txt.
for(i=1; i<=NF; i++) Now for the second file, this loops over all
its fields.
if ($i in a){ a[$i]++ } This checks if the field $i is present as a key
in the array a. If yes, the value (initially zero) associated with that key is incremented.
At the END, we just print the key followed by the number of occurrences a[key] and a newline (\n).
Output:
blonde 2
red 0
black 0
Notes:
Because of %d, the printf statement forces the conversion of a[key] to an integer in case it is still unset. The whole statement could be replaced by a simpler print key, a[key]+0. I missed that when writing the answer, but now you know two ways of doing the same thing. ;)
In your attempt you were, for some reason, only addressing field 2 ($2), ignoring other columns.

match between two files and merge the output using awk

I have two files.First column is common between the both files and I would like to merge the file and generate the output where its copy the first file third column every time in second file whenever there is match.
file1
412234;mark
413234;raja
file2
412234;value1
412234;value2
412234;value3
412234;value4
413234;value1
413234;value2
413234;value3
Output file
412234;value1;mark
412234;value2;mark
412234;value3;mark
412234;value4;mark
413234;value1;raja
413234;value2;raja
413234;value3;raja
Try this:
awk -F';' 'BEGIN{FS=OFS=";"} FNR==NR{a[$1]=$2; next} ($1 in a){print $1, $2, a[$1]}' file1 file2
explanation:
-F';' means that AWK will use ; as field separator;
BEGIN{FS=OFS=";"} set the Output filed separator, used by print function;
AWK parse all files sequentially, the condition:
FNR==NR
is true only when parsing the first file.
While parsing file1, it saves a vector a with first match as index and second match as value;
a is expected to be
a[412234] = mark
a[413234] = raja
($1 in a) is the condition to met, true when first match on file2 is found on vector a.
If true then execute:
print $1";"$2";"a[$1]
that prints matches from file2 and the value of the vector a, saved from file1
----- EDIT
In case file1 contains multiple lines with same index, you need to save all distinct values in a vector and then scan the whole vector for multiple matches on file2
awk -F';' ' \
function vlen(a){n=0; for(i in a) n++; return n;} # helper function defined here \
function contained(val, vect) {found =0; for (x in vect) { if(vect[x] == val) found=1}; return found} # helper function defined here \
BEGIN{FS=OFS=";"} # Set output field separator \
FNR==NR{n=vlen(a); a[n]=$1; b[n]=$2; next} # scan file1 and save all indexes and value in different vectors \
{if(contained($1,a)) { for (i in a) { if (a[i] == $1) { print $1, $2, b[i]}} } else { print $1, $2 } } # for each line in file2, scan the whole vector a looking for a match \
' file1 file2
here we are defining the vlen and contained helper functions
Would you try the following:
awk '
BEGIN {FS=OFS=";"}
NR==FNR {
c[$1]++
a[$1,c[$1]]=$2
next
}
{
if (c[$1]) {
for (i=1; i<=c[$1]; i++) {
$3=a[$1,i]; print
}
} else {
print
}
}' file1 file2
Result with the file1 and file2 provided in the OP's last comment:
412234;value1;mark
412234;value1;raja
412234;value2;mark
412234;value2;raja
413234;value1
413234;value2
If the index in the 1st column (such as 412234) appears more than once
in file1, we need to preserve the existing value in the 2nd column
(such as mark) without overwriting.
Then an array c is introduced to count the occurrences of the index.
Note that the order of the result differs from the OP's expected output.
I hope it is acceptable.

Bash addition by the first column, if data in second column is the same

I have a list with delimiters |
40|192.168.1.2|user4
42|192.168.1.25|user2
58|192.168.1.55|user3
118|192.168.1.3|user11
67|192.168.1.25|user2
As you can see, I have the same ip in the field 42|192.168.1.25|user2 and in the field 67|192.168.1.25|user2. How can I append these lines between them ? Can you give me a solution using awk. Can you give me some examples ?
I need in a result something like this:
40|192.168.1.2|user4
58|192.168.1.55|user3
109|192.168.1.25|user2
118|192.168.1.3|user11
How you can see, we have counted the numbers from first column.
If you need output in same order in which Input_file is there then following awk may help you in same.
awk -F"|" '!c[$2,$3]++{val++;v[val]=$2$3} {a[$2,$3]+=$1;b[$2,$3]=$2 FS $3;} END{for(j=1;j<=val;j++){print a[v[j]] FS b[v[j]]}}' SUBSEP="" Input_file
Adding a non-one liner form of solution too now.
awk -F"|" ' ##Making field separator as pipe(|) here for all the lines for Input_file.
!c[$2,$3]++{ ##Checking if array C whose index is $2,$3 is having its first occurrence in array c then do following.
val++; ##incrementing variable val value with 1 each time cursor comes here.
v[val]=$2$3 ##creating an array named v whose index is val and value is $2$3(second field 3rd field).
} ##Closing c array block here now.
{
a[$2,$3]+=$1; ##creating an array named a whose index is $2 $3 and incrementing its value with 1st field value and add in its same index values to get SUM.
b[$2,$3]=$2 FS $3;##create array b with index of $2$3 and setting its value to $2 FS $3, where FS is field separator.
} ##closing this block here.
END{ ##Starting awk code END bock here.
for(j=1;j<=val;j++){ ##starting a for loop here from variable named j value 1 to till value of variable val here.
print a[v[j]] FS b[v[j]] ##printing value of array a whose index is value of array v with index j, and array b with index of array v with index j here.
}}
' SUBSEP="" Input_file ##Setting SUBSEP to NULL here and mentioning the Input_file name here.
Short GNU datamash + awk solution:
datamash -st'|' -g2,3 sum 1 <file | awk -F'|' '{print $3,$1,$2}' OFS='|'
g2,3 - group by the 2nd and 3rd field (i.e. by IP address and user id)
sum 1 - sum the 1st field values within grouped records
The output:
40|192.168.1.2|user4
109|192.168.1.25|user2
118|192.168.1.3|user11
58|192.168.1.55|user3
Modifying the sample data to include different users for ip address 192.168.1.25:
$ cat ipfile
40|192.168.1.2|user4
42|192.168.1.25|user1 <=== same ip, different user
58|192.168.1.55|user3
118|192.168.1.3|user11
67|192.168.1.25|user9 <=== same ip, different user
And a simple awk script:
$ awk '
BEGIN { FS="|" ; OFS="|" }
{ sum[$2]+=$1 ; if (user[$2]=="") { user[$2]=$3 } }
END { for (idx in sum) { print sum[idx],idx,user[idx] } }
' ipfile
58|192.168.1.55|user3
40|192.168.1.2|user4
118|192.168.1.3|user11
109|192.168.1.25|user1 <=== captured first user id
BEGIN { FS="|" ; OFS="|" } : define input and output field separators; executed once at beginning
sum[$2]+=$1 : store/add field #1 to array (indexed by ip address == field #2); executed once for each row in data file
if .... : if a user hasn't already been stored for a given ip address, then store it now; this has the effect of saving the first user id we find for a given ip address; executed once for each row in data file
END { for .... / print ...} : loop through array indexes, printing our sum, ip address and (first) user id; executed once at the end
NOTE: No sorting requirement was provided in the original question; sorting could be added as needed ...
awk to the rescue!
$ awk 'BEGIN {FS=OFS="|"}
{a[$2 FS $3]+=$1}
END {for(k in a) print a[k],k}' file | sort -n
40|192.168.1.2|user4
58|192.168.1.55|user3
109|192.168.1.25|user2
118|192.168.1.3|user11
if user* is not part of the key and you want to capture the first value
$ awk 'BEGIN {FS=OFS="|"}
{c[$2]+=$1;
if(!($2 in u)) u[$2]=$3} # capture first user
END {for(k in c) print c[k],k,u[k]}' file | sort -n
which ends up almost the same with #markp's answer.
Another idea on the same path but allows for different users:
awk -F'|' '{c[$2] += $1}u[$2] !~ $3{u[$2] = (u[$2]?u[$2]",":"")$3}END{for(i in c)print c[i],i,u[i]}' OFS='|' input_file
If multiple users they will be separated by a comma

How to use a bash command in awk

Here is my problem
I have a File 1 where I have some data
Var1.1 Var1.2 Var1.3
Var2.1 Var2.2 Var2.3
Var3.1 Var3.2 Var3.3
And I have a File 2 that I would like edit thanks to the above data
File2 (1)
***pattern with Var2.1***
some text...
File2(2)
***pattern with Var2.1***
Here I want to add Var2.2 and Var2.3
some text
My first solution is to use AWK, but I don't know to include a bash command in. The AWK should make something like:
Search the pattern in the File2
When awk get it, awk calls a script which returns the wanted values from the File1.
Then awk can edit the File2
don't hesitate to explain me other possibilities if there are which are more simple !
Thank you !
This is how I run an external command from within awk to base64-decode a string:
cmd = "/usr/bin/base64 -i -d <<< " $2 " 2>/dev/null"
while ( ( cmd | getline result ) > 0 ) { }
close(cmd)
split(result, a, "[:=,]")
name=a[2]
Perhaps you can get some inspiration from it...
There's no need to run an external script to accomplish what you want. It can be done completely within a short AWK script.
awk 'FNR == NR {arr[$1] = $2 " " $3; next} {print; for (lookup in arr) {if ($0 ~ lookup) {split(arr[lookup], a); print "Here I want to add " a[1] " and " a[2]}}}' File1 File2
Explanation:
FNR == NR {arr[$1] = $2 " " $3; next} - Loop through the first file and save all the values in an array indexed by the first column. The record number equals the file record number for the first file.
print - Print every input line.
for (lookup in arr) {if ($0 ~ lookup) { - Loop through each of the array indices and see if the input line matches.
split(arr[lookup], a) - Split the value stored at the matched index into a temporary array.
print "Here I want to add " a[1] " and " a[2] - Print some text using the two values resulting from the split.

Check variables from different lines with awk

I want to combine values from multiple lines with different lengths using awk into one line if they match. In the following sample match values for first field,
aggregating values from second field into a list.
Input, sample csv:
222;a;DB;a
222;b;DB;a
555;f;DB;a
4444;a;DB;a
4444;d;DB;a
4444;z;DB;a
Output:
222;a|b
555;f
4444;a|d|z
How can I write an awk expression (maybe some other shell expression) to check if the first field value match with the next/previous line, and then print a list of second fields values aggregated and separated by a pipe?
awk '
BEGIN {FS=";"}
{ if ($1==prev) {sec=sec "|" $2; }
else { if (prev) { print prev ";" sec; };
prev=$1; sec=$2; }}
END { if (prev) { print prev ";" sec; }}'
This, as you requested, checks the consecutive lines.
does this oneliner work?
awk -F';' '{a[$1]=a[$1]?a[$1]"|"$2:$2;} END{for(x in a) print x";"a[x]}' file
tested here:
kent$ cat a
222;a;DB;a
222;b;DB;a
555;f;DB;a
4444;a;DB;a
4444;d;DB;a
4444;z;DB;a
kent$ awk -F';' '{a[$1]=a[$1]?a[$1]"|"$2:$2;} END{for(x in a) print x";"a[x]}' a
555;f
4444;a|d|z
222;a|b
if you want to keep it sorted, add a |sort at the end.
Slightly convoluted, but does the job:
awk -F';' \
'{
if (a[$1]) {
a[$1]=a[$1] "|" $2
} else {
a[$1]=$2
}
}
END {
for (k in a) {
print k ";" a[k]
}
}' file
Assuming that you have set the field separator ( -F ) to ; :
{
if ( $1 != last ) { print s; s = ""; }
last = $1;
s = s "|" $2;
} END {
print s;
}
The first line and the first character are slightly wrong, but that's an exercise for the reader :-). Two simple if's suffice to fix that.
(Edit: Missed out last line.)
this should work:
Command:
awk -F';' '{if(a[$1]){a[$1]=a[$1]"|"$2}else{a[$1]=$2}}END{for (i in a){print i";" a[i] }}' fil
Input:
222;a;DB;a
222;b;DB;a
555;f;DB;a
4444;a;DB;a
4444;d;DB;a
4444;z;DB;a
Output:
222;a|b
555;f
4444;a|d|z

Resources