Get the sum of a particular column using bash? - linux

I am trying to get the sum of the 5th column of a .csv file using bash, however the command I am using keeps getting me zero. I am piping the file through a grep to remove the column header row:
grep -v Header results.csv | awk '{sum += $5} END {print sum}'

here's how I would do it:
tail -n+2 | cut -d, -f5 | awk '{sum+=$1} END {print sum}'
or:
tail -n+2 | awk -F, '{sum+=$5} END {print sum}'
(depending on what turns out to be faster.)

Related

How to get "," separated output using awk command in linux

I am trying to print the output of awk command with "," delimited.
Trying to get the same output using cut.
cat File1
dot|is-big|a
dot|is-round|a
dot|is-gray|b
cat|is-big|a
hot|in-summer|a
dot|is-big|a
dot|is-round|b
dot|is-gray|a
cat|is-big|a
hot|in-summer|a
Command tried :
$awk 'BEGIN{FS="|"; OFS=","} {print $1,$3}' file1.csv | sort | uniq -c
Output Got:
2 cat,a
4 dot,a
2 dot,b
2 hot,a
Desired Output:
2,cat,a
4,dot,a
2,dot,b
2,hot,a
Couple of other commands tried :
$cat file1.csv |cut --output-delimiter="|" -d'|' -f1,3 | sort | uniq -c
You need to change the delimiter to , after running uniq -c, since it's adding the first column.
awk -F'|' '{print $1, $3}' file1.csv | sort | uniq -c | awk 'BEGIN{OFS=","} {$1=$1;print}'
But you don't need to use sort | uniq -c if you're using awk, it can do the counting itself.
awk 'BEGIN{FS="|";OFS=","} {a[$1 OFS $3]++} END{for(k in a) print a[k], k}' file1.csv

Awk or cut how to output the count of one unique column and other column values

Right now I have
grep "\sinstalled" combined_dpkg.log | awk -F ' ' '{print $5}' | sort | uniq -c | sort -rn
grep "\sinstalled" combined_dpkg.log | sort -k1 | awk '!a[$5]++' | cut -d " " -f1,5,6
And would like to combine the two into one query that includes the count of $5 with -f1,5,6.
If there is such a way to do so, or a way to retain values to be outputted following the final pipe.
The head -3 result of the first bash command above:
11 man-db:amd64
10 libc-bin:amd64
9 mime-support:all
And of the second bash command:
2015-11-10 linux-headers-4.2.0-18-generic:amd64 4.2.0-18.22
2015-11-10 linux-headers-4.2.0-18:all 4.2.0-18.22
2015-11-10 linux-signed-image-4.2.0-18-generic:amd64 4.2.0-18.22
File format looks like:
2015-11-05 13:23:53 upgrade firefox:amd64 41.0.2+build2-0ubuntu1 42.0+build2-0ubuntu0.15.10.1
2015-11-05 13:23:53 status half-configured firefox:amd64 41.0.2+build2-0ubuntu1
2015-11-05 13:23:53 status unpacked firefox:amd64 41.0.2+build2-0ubuntu1
2015-11-05 13:23:53 status half-installed firefox:amd64 41.0.2+build2-0ubuntu1
grep "\sinstalled" combined_dpkg.log | sort -k1 | awk '!a[$5]' | cut -d " " -f1,5,6 | uniq -c
Based on your comment : "For each package find the earliest (first) version ever installed. Print the package name, the version and the total number of times it was installed."
I guess this awk would do.
awk '$0!~/ installed/{next} !($5 in a){a[$5]=$1 FS $5 FS $6; count[$5]++; next} count[$5]>0 && a[$5]~$6{count[$5]++} END{for (i in a) print a[i],count[i]}' file

find maximum number in row string and show two column - linux

I want to find maximum number in the strings inside file
already i have a script to get maximum number
counters_2016080822.log:2016-08-08 15:55:00,10.26.x.x,SERVER#10.26.x.x,SSCM_VRC/sscm-vrc-flow-20160602,,transactions.tps,13
counters_2016080823.log:2016-08-08 23:00:00,10.26.x.x,SERVER#10.26.x.x,SSCM_VRC/sscm-vrc-flow-20160602,,transactions.tps,14
counters_2016080823.log:2016-08-08 23:05:00,10.26.x.x,SERVER#10.26.x.1x,SSCM_VRC/sscm-vrc-flow-20160602,,transactions.tps,19
firstly by putting last column which is number to new .txt file
using sed
sed 's/^.*tps,//'
13
14
19
then sorting and getting first row
grep -Eo '[0-9]+' myfile.txt | sort -rn | head -n 1
19
but now i want to find maximum then get maximum number and it is time
(date & time or just time)
as below:
23:05:00 19
Maybe something like
echo "counters_2016080822.log:2016-08-08 15:55:00,10.26.x.x,SERVER#10.26.x.x,SSCM_VRC/sscm-vrc-flow-20160602,,transactions.tps,13
counters_2016080823.log:2016-08-08 23:00:00,10.26.x.x,SERVER#10.26.x.x,SSCM_VRC/sscm-vrc-flow-20160602,,transactions.tps,14
counters_2016080823.log:2016-08-08 23:05:00,10.26.x.x,SERVER#10.26.x.1x,SSCM_VRC/sscm-vrc-flow-20160602,,transactions.tps,19" | \
sed -r 's/^.*:([0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}),.*,([0-9]+)$/\1 \2/' | \
sort -n -k 3 -t ' ' | tail -n 1
You can use only awk:
time and max:
awk -F, '$NF > max {max=$NF; time=$1}; END{ print substr(time,(length(time)-7))" "max}' myfile.txt
date,time and max:
awk -F, '$NF > max {max=$NF; time=$1}; END{ print substr(time,(length(time)-18))" "max}' myfile.txt
F : Input field separator variable
NF :gives you the total number of fields in a record,
or with awk and cut
this is time and max
awk -F, '$NF > max {max=$NF; time=$1}; END{ print time" "max}' myfile.txt | cut -d' ' -f2,3
this is date, time and max
awk -F, '$NF > max {max=$NF; time=$1}; END{ print time" "max}' myfile.txt | cut -d: -f2-
here is another solution
$ awk -F'[ ,]' '{print $2,$NF}' file | sort -k2nr | head -1
23:05:00 19
$ awk -F'[ ,]' 'NR==1{m=$NF} $NF>=m{m=$NF; t=$2} END{print t, m}' file
23:05:00 19

How to count number of rows per distinct row in Linux bash

I have a file like this:
id|domain
9930|googspf.biz
9930|googspf.biz
9930|googspf.biz
9931|googspf.biz
9931|googspf.biz
9931|googspf.biz
9931|googspf.biz
9931|googspf.biz
9942|googspf.biz
And I would like to count the number of times a distinct id shows up in my data like below:
9930|3
9931|5
9942|1
How can I do that with linux bash? Currently I am using this, but I am counting all lines with this:
cat filename | grep 'googspf.biz'| sort -t'|' -k1,1 | wc
can any body help?
Try this :
awk -F'|' '
/googspf.biz/{a[$1]++}
END{for (i in a) {print i, a[i]}}
' OFS='|' file
or
awk '
BEGIN {FS=OFS="|"}
/googspf.biz/{a[$1]++}
END{for (i in a) {print i, a[i]}}
' file
sed 1d file | cut -d'|' -f1 | sort | uniq -c
I first thought of using uniq -c (-c is for count) since your data seems to be sorted:
~$ grep "googspf.biz" f | cut -d'|' -f1|uniq -c
3 9930
5 9931
1 9942
And in order to format accordingly, I had to use awk:
~$ grep "googspf.biz" f | cut -d'|' -f1|uniq -c|awk '{print $2"|"$1}'
9930|3
9931|5
9942|1
But then, with awk only:
~$ awk -F'|' '/googspf/{a[$1]++}END{for (i in a){print i"|"a[i]}}' f
9930|3
9931|5
9942|1
-F'|' to use | as a delimiter, and if line matches googspf (or NR>1: if line's number is >1) increments the counter for the first field. At the end print accordingly.

Linux bc command total counts

Here is the output of my netstat command. I want to count total of first field number like 7+8+1+1+1+1+3+1+2..so on... How do i use bc or any other method command to total count them?
[root#example httpd]# netstat -natp | grep 7143 | grep EST | awk -F' ' '{print $5}' | awk -F: '{print $1}' | sort -nr | uniq -c
7 209.139.35.xxx
8 209.139.35.xxx
1 209.139.35.xxx
1 209.139.35.xxx
1 208.46.149.xxx
3 96.17.177.xxx
1 96.17.177.xxx
2 96.17.177.xxx
You need to get the first column with awk (You don't actually need this, but I'm leaving it as a monument to my eternal shame)
awk {'print $1'}
and then use awk again to sum the column of numbers and print the result
awk '{ sum+=$1} END {print sum}'
All together:
netstat -natp | grep 7143 | grep EST | awk -F' ' '{print $5}' | awk -F: '{print $1}' | sort -nr | uniq -c | awk {'print $1'} | awk '{ sum+=$1} END {print sum}'
I know this doesn't use bc, but it gets the job done, so hopefully that's enough.

Resources