Check variables from different lines with awk - linux

I want to combine values from multiple lines with different lengths using awk into one line if they match. In the following sample match values for first field,
aggregating values from second field into a list.
Input, sample csv:
222;a;DB;a
222;b;DB;a
555;f;DB;a
4444;a;DB;a
4444;d;DB;a
4444;z;DB;a
Output:
222;a|b
555;f
4444;a|d|z
How can I write an awk expression (maybe some other shell expression) to check if the first field value match with the next/previous line, and then print a list of second fields values aggregated and separated by a pipe?

awk '
BEGIN {FS=";"}
{ if ($1==prev) {sec=sec "|" $2; }
else { if (prev) { print prev ";" sec; };
prev=$1; sec=$2; }}
END { if (prev) { print prev ";" sec; }}'
This, as you requested, checks the consecutive lines.

does this oneliner work?
awk -F';' '{a[$1]=a[$1]?a[$1]"|"$2:$2;} END{for(x in a) print x";"a[x]}' file
tested here:
kent$ cat a
222;a;DB;a
222;b;DB;a
555;f;DB;a
4444;a;DB;a
4444;d;DB;a
4444;z;DB;a
kent$ awk -F';' '{a[$1]=a[$1]?a[$1]"|"$2:$2;} END{for(x in a) print x";"a[x]}' a
555;f
4444;a|d|z
222;a|b
if you want to keep it sorted, add a |sort at the end.

Slightly convoluted, but does the job:
awk -F';' \
'{
if (a[$1]) {
a[$1]=a[$1] "|" $2
} else {
a[$1]=$2
}
}
END {
for (k in a) {
print k ";" a[k]
}
}' file

Assuming that you have set the field separator ( -F ) to ; :
{
if ( $1 != last ) { print s; s = ""; }
last = $1;
s = s "|" $2;
} END {
print s;
}
The first line and the first character are slightly wrong, but that's an exercise for the reader :-). Two simple if's suffice to fix that.
(Edit: Missed out last line.)

this should work:
Command:
awk -F';' '{if(a[$1]){a[$1]=a[$1]"|"$2}else{a[$1]=$2}}END{for (i in a){print i";" a[i] }}' fil
Input:
222;a;DB;a
222;b;DB;a
555;f;DB;a
4444;a;DB;a
4444;d;DB;a
4444;z;DB;a
Output:
222;a|b
555;f
4444;a|d|z

Related

Changing previous duplicate line in awk

I want to change all duplicate names in .csv to unique, but after finding duplicate I cannot reach previous line, because it's already printed. I've tried to save all lines in array and print them in End section, but it doesn't work and I don't understand how to access specific field in this array (two-dimensional array isn't supported in awk?).
sample input
...,9,phone,...
...,43,book,...
...,27,apple,...
...,85,hook,...
...,43,phone,...
desired output
...,9,phone9,...
...,43,book,...
...,27,apple,...
...,85,hook,...
...,43,phone43,...
My attempt ($2 - id field, $3 - name field)
BEGIN{
FS=","
OFS=","
marker=777
}
{
if (names[$3] == marker) {
$3 = $3 $2
#Attempt to change previous duplicate
results[nameLines[$3]]=$3 id[$3]
}
names[$3] = marker
id[$3] = $2
nameLines[$3] = NR
results[NR] = $0
}
END{
#it prints some numbers, not saved lines
for(result in results)
print result
}
Here is single pass awk that stores all records in buffer:
awk -F, '
{
rec[NR] = $0
++fq[$3]
}
END {
for (i=1; i<=NR; ++i) {
n = split(rec[i], a, /,/)
if (fq[a[3]] > 1)
a[3] = a[3] a[2]
for (k=1; k<=n; ++k)
printf "%s", a[k] (k < n ? FS : ORS)
}
}' file
...,9,phone9,...
...,43,book,...
...,27,apple,...
...,85,hook,...
...,43,phone43,...
This could be easily done in 2 pass Input_file in awk where we need not to create 2 dimensional arrays in it. With your shown samples written in GNU awk.
awk '
BEGIN{FS=OFS=","}
FNR==NR{
arr1[$3]++
next
}
{
$3=(arr1[$3]>1?$3 $2:$3)
}
1
' Input_file Input_file
Output will be as follows:
...,9,phone9,...
...,43,book,...
...,27,apple,...
...,85,hook,...
...,43,phone43,...

Extract part of one column and save into another file using awk

I have a requirement to extract fields from a csv file. There are two columns billing_info and key_id. billing_info is a object which has multiple data items in curly braces. I need to extract billing_info.id_encrypted, key_id into a different file.
input.csv
billing_info,key_id
{id: '1B82', id_encrypted: '1Q4AW5bwyU', address: 'san jose', phone: '13423', country: 'v73jyqgE='},bf6-96f751
output.csv
billing_info.id_encrypted,key_id
1Q4AW5bwyU,bf6-96f751
May i know how to use awk command to extract the data in format mentioned in output.csv. Please help
Making some assumptions:
the first line of input lists the column names
the brace-delimited element contains an arbitrary number of comma-separated key-value pairs
key-value pairs can appear in an arbitrary order
values are delimited by single-quotes
commas cannot appear inside keys or values
single-quotes do not appear anywhere else
<csvfile | awk -F, '
BEGIN {
getline
print "billing_info.id_encrypted,key_id"
}
{
for (i=1; i<NF; i++)
if ($i ~ /id_encrypted/)
split($i, e, /\047/)
print e[2] "," $NF
}
'
Notes:
-F, splits input lines into comma-separated fields
BEGIN section handles the header
we output the header even if there is no input
for loop runs through all the fields (except the final one)
($i ~ /id_encrypted/) looks for any that contain the key word
split splits that field on single-quotes (/\047/)
print outputs the value found, and the final field
Here is a fast and elegant solution using awk:
awk -F ":" '{split($3,arr1,",");split($6,arr2,",");print arr1[1] "," arr2[2]}' input.csv > output.csv
With an explanation:
-F ":" make the awk field separator :
split($3,arr1,",") split the 3rd field by the ,into array having 2 elements.
split($6,arr2,",") split the 6th field by the ,into array having 2 elements.
Then print out the first element in arr1 and the second element in arr2.
I recommend you just convert your whole input to CSV and THEN you can trivially extract whatever fields you like from it using awk or Excel or any other tool, e.g.:
$ cat tst.awk
BEGIN { FS=OFS="," }
FNR==1 {
split($0,hdr)
next
}
{
fld[1] = fld[2] = $0
sub(/,[^,]*$/,"",fld[1])
gsub(/^{|}$/,"",fld[1])
sub(/.*,/,"",fld[2])
# print "trace: " hdr[1] "=<" fld[1] ">" | "cat>&2"
# print "trace: " hdr[2] "=<" fld[2] ">" | "cat>&2"
numTags = split(fld[1],tags,/'[^']*'/,vals)
delete tags[numTags--]
for (tagNr=1; tagNr<=numTags; tagNr++) {
gsub(/^, *|: *$/,"",tags[tagNr])
gsub(/^'|'$/,"",vals[tagNr])
# print "trace: " tagNr ": <" tags[tagNr] "=" vals[tagNr] ">" | "cat>&2"
}
}
FNR == 2 {
for (tagNr=1; tagNr<=numTags; tagNr++) {
printf "%s.%s%s", hdr[1], tags[tagNr], OFS
}
print hdr[2]
}
{
for (tagNr=1; tagNr<=numTags; tagNr++) {
printf "\"%s\"%s", vals[tagNr], OFS
}
printf "\"%s\"%s", fld[2], ORS
}
.
$ awk -f tst.awk file
billing_info.id,billing_info.id_encrypted,billing_info.address,billing_info.phone,billing_info.country,key_id
"1B82","1Q4AW5bwyU","san jose","13423","v73jyqgE=","bf6-96f751"
The above uses GNU awk for the 4th arg to split(). Uncomment the print trace lines to see what each step is doing if you like. You don't need to add the double quotes around each output field if you remove or replace any commas within each field (esp. the address).

AWK to to find first occurrence of string and assign to variable for compare

I have written following line of code which explodes the string by the first occurrence of the string after a delimiter.
echo "$line" | awk -F':' '{ st = index($0,":");print "field1: "$1 "
=> " substr($0,st+1)}';
But I don't want to display it. Want to take both occurrences in variable so I tried the following code
explodetext="$line" | awk -F':' '{ st = index($0,":")}';
Sample data:
id:1
url:http://test.com
Expected OutPUt will be:
key=id
val=1
key=url
val=http://test.com
but not working as expected.Any solution?
Thanks
Your code, expanded:
echo "$line" \
| awk -F':' '
{
st = index($0,":")
print "field1: " $1 " => " substr($0,st+1)
}'
The output of this appears merely to split the line according to the first colon. From the sample data you've provided, it seems that your lines contain two fields, which are separated by the first colon found. This means you can't safely use awk's field separator to find your data (though you can use it for field names), making index() a reasonable approach.
One strategy might be to place your input into an array, for assessment:
#!/usr/bin/awk -f
BEGIN {
FS=":"
}
{
record[$1]=substr($0,index($0,":")+1);
}
END {
if (record["id"] > 0) {
printf("Record ID %d had a value of %s.\n", record["id"], record["url"])
} else {
print "No valid records found."
}
}
I suppose that your text file input.txt is stored in the format as given below:
id:1
url:http://test1.com
You could use the below piece of code, say awkscript, to achieve what you wish to do :
#!/bin/bash
awk '
BEGIN{FS=":"}
{
if ($2 > 0) {
if ( getline > 0){
st = index($0,":")
url = substr($0,st+1);
system("echo Do something with " url);
}
}
}' $1
Run the code as ./awkscript input.txt
Note: I assume that that the input file contains only one id/url pair as you confirmed in your comment.

Awk between two patterns with pattern in the middle

Hi i am looking for an awk that can find two patterns and print the data between them to
a file only if in the middle there is a third patterns in the middle.
for example:
Start
1
2
middle
3
End
Start
1
2
End
And the output will be:
Start
1
2
middle
3
End
I found in the web awk '/patterns1/, /patterns2/' path > text.txt
but i need only output with the third patterns in the middle.
And here is a solution without flags:
$ awk 'BEGIN{RS="End"}/middle/{printf "%s", $0; print RT}' file
Start
1
2
middle
3
End
Explanation: The RS variable is the record separator, so we set it to "End", so that each Record is separated by "End".
Then we filter the Records that contain "middle", with the /middle/ filter, and for the matched records we print the current record with $0 and the separator with print RT
This awk should work:
awk '$1=="Start"{ok++} ok>0{a[b++]=$0} $1=="middle"{ok++} $1=="End"{if(ok>1) for(i=0; i<length(a); i++) print a[i]; ok=0;b=0;delete a}' file
Start
1
2
middle
3
End
Expanded:
awk '$1 == "Start" {
ok++
}
ok > 0 {
a[b++] = $0
}
$1 == "middle" {
ok++
}
$1 == "End" {
if (ok > 1)
for (i=0; i<length(a); i++)
print a[i];
ok=0;
b=0;
delete a
}' file
Just use some flags with awk:
/Start/ {
start_flag=1
}
/middle/ {
mid_flag=1
}
start_flag {
n=NR;
lines[NR]=$0
}
/End/ {
if (start_flag && mid_flag)
for(i=n;i<NR;i++)
print lines[i]
start_flag=mid_flag=0
delete lines
}
Modified the awk user000001
awk '/middle/{printf "%s%s\n",$0,RT}' RS="End" file
EDIT:
Added test for Start tag
awk '/Start/ && /middle/{printf "%s%s\n",$0,RT}' RS="End" file
This will work with any modern awk:
awk '/Start/{f=1;rec=""} f{rec=rec $0 ORS} /End/{if (rec~/middle/) printf "%s",rec}' file
The solutions that set RS to "End" are gawk-specific, which may be fine but it's definitely worth mentioning.

Sorting List and Adding Together Amounts Shellscript

I have a list such as:
10,Car Tyres
8,Car Tyres
4,Wheels
18,Crowbars
5,Jacks
5,Jacks
8,Jacks
The first number is quantity, second is item name. I need to get this list so that it only shows each item once and it adds together the quantity if the item appears more than once. The output of this working correctly would be:
18,Car Tyres
4,Wheels
18,Crowbars
18,Jacks
This will need to work on lists in this format of a few thousand lines, preferably coded in Linux shellscript, any help appreciated, thanks!
awk -F"," '{ t[$2] = t[$2] + $1 }
END{
for(o in t){
print o, t[o]
}
}' file
output
$ ./shell.sh
Crowbars 18
Wheels 4
Car Tyres 18
Jacks 18
How about a perl script?:
#!/usr/bin/perl -w
use strict;
my %parts;
while (<>) {
chomp;
my #fields = split /,/, $_;
if (scalar #fields > 1) {
if ($parts{$fields[1]}) {
$parts{$fields[1]} += $fields[0];
} else {
$parts{$fields[1]} = $fields[0];
}
}
}
foreach my $k (keys %parts) {
print $parts{$k}, ",$k\n";
}
awk -v FS=, '{ if (! $2 in a) {
a[$2] = $1;
}
else {
a[$2] += $1;
}
}
END {
for (name in a) {
printf("%s\t%d\n", name, a[name]);
}
}'
Look at:
man sort
man awk
The actual command you need is:
sort -n -t, +1 yourfile.txt | awk ......
You could also do this entirely in awk
Sum by group

Resources