Sorting List and Adding Together Amounts Shellscript - linux

I have a list such as:
10,Car Tyres
8,Car Tyres
4,Wheels
18,Crowbars
5,Jacks
5,Jacks
8,Jacks
The first number is quantity, second is item name. I need to get this list so that it only shows each item once and it adds together the quantity if the item appears more than once. The output of this working correctly would be:
18,Car Tyres
4,Wheels
18,Crowbars
18,Jacks
This will need to work on lists in this format of a few thousand lines, preferably coded in Linux shellscript, any help appreciated, thanks!

awk -F"," '{ t[$2] = t[$2] + $1 }
END{
for(o in t){
print o, t[o]
}
}' file
output
$ ./shell.sh
Crowbars 18
Wheels 4
Car Tyres 18
Jacks 18

How about a perl script?:
#!/usr/bin/perl -w
use strict;
my %parts;
while (<>) {
chomp;
my #fields = split /,/, $_;
if (scalar #fields > 1) {
if ($parts{$fields[1]}) {
$parts{$fields[1]} += $fields[0];
} else {
$parts{$fields[1]} = $fields[0];
}
}
}
foreach my $k (keys %parts) {
print $parts{$k}, ",$k\n";
}

awk -v FS=, '{ if (! $2 in a) {
a[$2] = $1;
}
else {
a[$2] += $1;
}
}
END {
for (name in a) {
printf("%s\t%d\n", name, a[name]);
}
}'

Look at:
man sort
man awk
The actual command you need is:
sort -n -t, +1 yourfile.txt | awk ......
You could also do this entirely in awk
Sum by group

Related

Changing previous duplicate line in awk

I want to change all duplicate names in .csv to unique, but after finding duplicate I cannot reach previous line, because it's already printed. I've tried to save all lines in array and print them in End section, but it doesn't work and I don't understand how to access specific field in this array (two-dimensional array isn't supported in awk?).
sample input
...,9,phone,...
...,43,book,...
...,27,apple,...
...,85,hook,...
...,43,phone,...
desired output
...,9,phone9,...
...,43,book,...
...,27,apple,...
...,85,hook,...
...,43,phone43,...
My attempt ($2 - id field, $3 - name field)
BEGIN{
FS=","
OFS=","
marker=777
}
{
if (names[$3] == marker) {
$3 = $3 $2
#Attempt to change previous duplicate
results[nameLines[$3]]=$3 id[$3]
}
names[$3] = marker
id[$3] = $2
nameLines[$3] = NR
results[NR] = $0
}
END{
#it prints some numbers, not saved lines
for(result in results)
print result
}
Here is single pass awk that stores all records in buffer:
awk -F, '
{
rec[NR] = $0
++fq[$3]
}
END {
for (i=1; i<=NR; ++i) {
n = split(rec[i], a, /,/)
if (fq[a[3]] > 1)
a[3] = a[3] a[2]
for (k=1; k<=n; ++k)
printf "%s", a[k] (k < n ? FS : ORS)
}
}' file
...,9,phone9,...
...,43,book,...
...,27,apple,...
...,85,hook,...
...,43,phone43,...
This could be easily done in 2 pass Input_file in awk where we need not to create 2 dimensional arrays in it. With your shown samples written in GNU awk.
awk '
BEGIN{FS=OFS=","}
FNR==NR{
arr1[$3]++
next
}
{
$3=(arr1[$3]>1?$3 $2:$3)
}
1
' Input_file Input_file
Output will be as follows:
...,9,phone9,...
...,43,book,...
...,27,apple,...
...,85,hook,...
...,43,phone43,...

How can i convert this format string to CSV?

my string is:
AA:: aaaaaaaaaaaaaaaaa
BB:: bbbbbbbbbbbbbbbb
C: ccccccccccccccccc
DD:: DDDDDDDDDDDD
E: EEEEEEEEEEEEE
AA: aaaaaaaaaaaaaaaaa2
BB:: bbbbbbbbbbbbbbbb2
C:: ccccccccccccccccc2
DD: DDDDDDDDDDDD2
E: EEEEEEEEEEEEE
....
i need get this format with standard linux command like awk or ... or perl function
AA,BB,C,DD,E
aaaaaaaa,bbbbbb,ccccc,dddddd,eeeee
aaaaaaaa2,bbbbbb2,ccccc2,dddddd2,eeeee2
exm: OUTPUT_STRING | awk ....
or perlFunction(OUTPUT_STRING){ .....
return formated_string; }
i searched google and try many help on more site and not work, so dont send me a link
some field have single : and some field have double : (this is random)
i try some help and not worked for me
sed -r 's/\\,|,|CN=|OU*//g' |awk -F "|=|:" '{printf $2"|"}'
or
sed -n '1h; 2,$H;${g;s/\n/,/g;p}' | sed 's/,,/\n/g'
or
awk -F ":" '{printf $2} {if (NF==0) {printf "\n"}}' | sed "s/ //" | sed "s/ /;/g"
One of many ways to achieve desired result
use strict;
use warnings;
my $file = do { local $/; <DATA> }; # read whole file
my #blocks = split /\n\n/, $file; # split file into blocks
my $print_header = 1; # flag to print header
foreach my $block (#blocks) { # process each block
$block =~ s/:+/:/g; # clean up the block :: -> :
my #lines = split /\n/, $block; # split the block into lines
my(#header,#data); # arrays to store header and data
foreach my $line (#lines) { # process each line
my($h,$d) = split /:\s*/, $line; # split line into header and data part
push #header, $h; # add header names into array
push #data, $d; # add data into array
}
if( $print_header ){ # if header not printed yet
print join(',', #header) . "\n"; # print header array
$print_header = 0; # flag the header is printed
}
print join(',', #data) . "\n"; # print data array
}
__DATA__
AA:: aaaaaaaaaaaaaaaaa
BB:: bbbbbbbbbbbbbbbb
C: ccccccccccccccccc
DD:: DDDDDDDDDDDD
E: EEEEEEEEEEEEE
AA: aaaaaaaaaaaaaaaaa2
BB:: bbbbbbbbbbbbbbbb2
C:: ccccccccccccccccc2
DD: DDDDDDDDDDDD2
E: EEEEEEEEEEEEE2
output
AA,BB,C,DD,E
aaaaaaaaaaaaaaaaa,bbbbbbbbbbbbbbbb,ccccccccccccccccc,DDDDDDDDDDDD,EEEEEEEEEEEEE
aaaaaaaaaaaaaaaaa2,bbbbbbbbbbbbbbbb2,ccccccccccccccccc2,DDDDDDDDDDDD2,EEEEEEEEEEEEE2
This gnu awk should do:
awk -v RS='' -F':* ?|\n' 'NR==1{print $1","$3","$5","$7","$9} {print $2","$4","$6","$8","$10}' t
AA,BB,C,DD,E
aaaaaaaaaaaaaaaaa,bbbbbbbbbbbbbbbb,ccccccccccccccccc,DDDDDDDDDDDD,EEEEEEEEEEEEE
aaaaaaaaaaaaaaaaa2,bbbbbbbbbbbbbbbb2,ccccccccccccccccc2,DDDDDDDDDDDD2,EEEEEEEEEEEEE
RS='' set record selector to nothing so awk works in block mode.
-F':* ?|\n' Sets field separator to : or :: or newline
NR==1{print $1","$3","$5","$7","$9} for first line print the header
{print $2","$4","$6","$8","$10} print the data fields.
A more generic solution that should work with more fields:
awk -v RS='' -F':* ?|\n' 'NR==1{for(i=1;i<=NF-2;i+=2) printf "%s,",$i;print $i} {for(i=2;i<=NF-2;i+=2) printf "%s,",$i;print $i}' file
AA,BB,C,DD,E
aaaaaaaaaaaaaaaaa,bbbbbbbbbbbbbbbb,ccccccccccccccccc,DDDDDDDDDDDD,EEEEEEEEEEEEE
aaaaaaaaaaaaaaaaa2,bbbbbbbbbbbbbbbb2,ccccccccccccccccc2,DDDDDDDDDDDD2,EEEEEEEEEEEEE
PS If not all record have all ID, then its an whole other story to program.
Using Text::CSV to handle edge cases:
use strict;
use warnings;
use Text::CSV 'csv';
my $input = do { local $/; readline }; # input from STDIN or filename argument
my #aoh;
my %headers;
foreach my $block (split /\n\n+/, $input) {
my %row;
foreach my $line (split /^/, $block) {
if ($line =~ m/^([^:]+):+\s*(.*)$/) {
$row{$1} = $2;
$headers{$1} = 1;
}
}
push #aoh, \%row;
}
csv(in => \#aoh, out => *STDOUT, headers => [sort keys %headers],
encoding => 'UTF-8', auto_diag => 2);

AWK file reformatting

I'm struggling to reformat a comma separated file using awk. The file contains minute data for a day for multiple servers and for multiple metrics
e.g 2 records, per minute, per server for 24hrs
Example input file:
server01,00:01:00,AckDelayAverage,9999
server01,00:01:00,AckDelayMax,8888
server01,00:02:00,AckDelayAverage,666
server01,00:02:00,AckDelayMax,5555
.....
server01,23:58:00,AckDelayAverage,4545
server01,23:58:00,AckDelayMax,8777
server01,23:59:00,AckDelayAverage,4686
server01,23:59:00,AckDelayMax,7820
server02,00:01:00,AckDelayAverage,1231
server02,00:01:00,AckDelayMax,4185
server02,00:02:00,AckDelayAverage,1843
server02,00:02:00,AckDelayMax,9982
.....
server02,23:58:00,AckDelayAverage,1022
server02,23:58:00,AckDelayMax,1772
server02,23:59:00,AckDelayAverage,1813
server02,23:59:00,AckDelayMax,9891
I'm trying to re-format the file to have a single row for each minute with a unique concatenation of fields 1 & 3 as the column headers
e.g the expected output file would look like:
Minute, server01-AckDelayAverage,server01-AckDelayMax, server02-AckDelayAverage,server02-AckDelayMax
00:01:00,9999,8888,1231,4185
00:02:00,666,5555,1843,8892
...
...
23:58:00,4545,8777,1022,1772
23:59:00,4686,7820,1813,9891
A solution using GNU awk. Call this as awk -F, -f script input_file:
/Average/ { average[$2, $1] = $4; }
/Max/ { maximum[$2, $1] = $4; }
{
if (!($2 in minutes)) {
minutes[$2] = 1;
}
if (!($1 in servers)) {
servers[$1] = 1;
}
}
END {
mcount = asorti(minutes, smin);
scount = asorti(servers, sserv);
printf "minutes";
for (col = 1; col <= scount; col++) {
printf "," sserv[col] "-average," sserv[col] "-maximum";
}
print "";
for (row = 1; row <= mcount; row++) {
key = smin[row];
printf key;
for (col = 1; col <= scount; col++) {
printf "," average[key, sserv[col]] "," maximum[key, sserv[col]];
}
print "";
}
}
run awk command : ./script.awk file
#! /bin/awk -f
BEGIN{
FS=",";
OFS=","
}
$1 ~ /server01/ && $3 ~ /Average/{
a[$2]["Avg01"] = $4;
}
$1 ~ /server01/ && $3 ~ /Max/{
a[$2]["Max01"] = $4;
}
$1 ~ /server02/ && $3 ~ /Average/{
a[$2]["Avg02"] = $4;
}
$1 ~ /server02/ && $3 ~ /Max/{
a[$2]["Max02"] = $4;
}
END{
print "Minute","server01-AckDelayAverage","server01-AckDelayMax","server02-AckDelayAverage","server02-AckDelayMax"
for(i in a){
print i,a[i]["Avg01"],a[i]["Max01"],a[i]["Avg02"],a[i]["Max02"] | "sort"
}
}
With awk and sort:
awk -F, -v OFS=, '{
a[$2]=(a[$2]?a[$2]","$4:$4)
}
END{
for ( i in a ) print i,a[i]
}' File | sort
If $4 has 0 values:
awk -F, -v OFS=, '!a[$2]{a[$2]=$2} {a[$2]=a[$2]","$4} END{for ( i in a ) print a[i]}' | sort
!a[$2]{a[$2]=$2}: If array with a with Index $2 ( the time in Minute) doesn't exit, array a with index as $2( the time in Minute) with value as $2 is created. True when Minute entry first time occurs in line.
{a[$2]=a[$2]","$4}: Concatenate value $4 to this array
END: Print all values of in array a
Finally pipe this awk result to sort.

Syntax error at or near {

I have an assigment where I have to count the number of words in each .c .cc and .h file.The problem is it keeps showing the syntax error at line 8 and 10 at or near { .This is not a finished script!It may have some other problems but I only needed help with the syntax error!
awk 'BEGIN {FS=" ";drb=0;valt=0;}
{if ( valt == 0 ){
for( i=1; i<=NF; i++)
drb++;
valt++;
}
else{
FNR==1{ printf "File name: %s,Word count: %d\n",FILENAME, drb;drb=0;}
for(i=1;i<=NF;i++)
drb++;}}
END {printf "File name: %s,Word count: %d",FILENAME,drb }' `find $1 -name '*.c' -o -name '*.cc' -o -name '*.h'`
Inside an action block the awk condition syntax is C-like so you need:
if (FNR==1) { foo }
instead of
FNR==1 { foo }
but more importantly it SOUNDS like all you need is:
awk '
{ drb += NF }
ENDFILE { printf "File name: %s,Word count: %d",FILENAME,drb; drb=0 }
' files...
The above uses GNU awk for ENDFILE. Note that this will work even for empty files which is a BIG problem for solutions using other awks (if they rely on testing FNR==1 instead of having a loop on ARGV[] in an END section, they will skip the file instead of printing it's name with word count zero).
The correct way to do this with non-gawk awks (assuming no duplicate file names) is:
awk '
{ drb[FILENAME] += NF }
END {
for (i=1;i<ARGC;i++) {
fname = ARGV[i]
printf "File name: %s,Word count: %d",fname,drb[fname]
}
}
' files...
If you CAN have duplicate file names then it gets even harder to implement, something like this (untested):
awk '
FNR==1 { ++cnt[FILENAME] }
{ drb[FILENAME,cnt[FILENAME]] += NF }
END {
delete cnt
for (i=1;i<ARGC;i++) {
fname = ARGV[i]
printf "File name: %s,Word count: %d",fname,drb[fname,++cnt[fname]]
}
}
' files...
I don't think the accepted answer is correct. Since this is a h/w problem I'll provide you a template to understand and work on it
awk 'FNR==1{if(s) print f,s; s=0; f=FILENAME} {s+=NF} END{print f,s}' files
notes: you already have NF as the loop condition just use it.
special handling is for the first file, but can be done in other ways too.
Of course what you actually need is already implemented as a command wc
wc -w files
will give you the results you need, pipe to awk for your formatting needs.

Check variables from different lines with awk

I want to combine values from multiple lines with different lengths using awk into one line if they match. In the following sample match values for first field,
aggregating values from second field into a list.
Input, sample csv:
222;a;DB;a
222;b;DB;a
555;f;DB;a
4444;a;DB;a
4444;d;DB;a
4444;z;DB;a
Output:
222;a|b
555;f
4444;a|d|z
How can I write an awk expression (maybe some other shell expression) to check if the first field value match with the next/previous line, and then print a list of second fields values aggregated and separated by a pipe?
awk '
BEGIN {FS=";"}
{ if ($1==prev) {sec=sec "|" $2; }
else { if (prev) { print prev ";" sec; };
prev=$1; sec=$2; }}
END { if (prev) { print prev ";" sec; }}'
This, as you requested, checks the consecutive lines.
does this oneliner work?
awk -F';' '{a[$1]=a[$1]?a[$1]"|"$2:$2;} END{for(x in a) print x";"a[x]}' file
tested here:
kent$ cat a
222;a;DB;a
222;b;DB;a
555;f;DB;a
4444;a;DB;a
4444;d;DB;a
4444;z;DB;a
kent$ awk -F';' '{a[$1]=a[$1]?a[$1]"|"$2:$2;} END{for(x in a) print x";"a[x]}' a
555;f
4444;a|d|z
222;a|b
if you want to keep it sorted, add a |sort at the end.
Slightly convoluted, but does the job:
awk -F';' \
'{
if (a[$1]) {
a[$1]=a[$1] "|" $2
} else {
a[$1]=$2
}
}
END {
for (k in a) {
print k ";" a[k]
}
}' file
Assuming that you have set the field separator ( -F ) to ; :
{
if ( $1 != last ) { print s; s = ""; }
last = $1;
s = s "|" $2;
} END {
print s;
}
The first line and the first character are slightly wrong, but that's an exercise for the reader :-). Two simple if's suffice to fix that.
(Edit: Missed out last line.)
this should work:
Command:
awk -F';' '{if(a[$1]){a[$1]=a[$1]"|"$2}else{a[$1]=$2}}END{for (i in a){print i";" a[i] }}' fil
Input:
222;a;DB;a
222;b;DB;a
555;f;DB;a
4444;a;DB;a
4444;d;DB;a
4444;z;DB;a
Output:
222;a|b
555;f
4444;a|d|z

Resources