Get the sum of a particular column using bash?

Get the sum of a particular column using bash? - linux

I am trying to get the sum of the 5th column of a .csv file using bash, however the command I am using keeps getting me zero. I am piping the file through a grep to remove the column header row:
grep -v Header results.csv | awk '{sum += $5} END {print sum}'

here's how I would do it:
tail -n+2 | cut -d, -f5 | awk '{sum+=$1} END {print sum}'
or:
tail -n+2 | awk -F, '{sum+=$5} END {print sum}'
(depending on what turns out to be faster.)

Related

How to get "," separated output using awk command in linux

I am trying to print the output of awk command with "," delimited.
Trying to get the same output using cut.
cat File1
dot|is-big|a
dot|is-round|a
dot|is-gray|b
cat|is-big|a
hot|in-summer|a
dot|is-big|a
dot|is-round|b
dot|is-gray|a
cat|is-big|a
hot|in-summer|a
Command tried :
$awk 'BEGIN{FS="|"; OFS=","} {print $1,$3}' file1.csv | sort | uniq -c
Output Got:
2 cat,a
4 dot,a
2 dot,b
2 hot,a
Desired Output:
2,cat,a
4,dot,a
2,dot,b
2,hot,a
Couple of other commands tried :
$cat file1.csv |cut --output-delimiter="|" -d'|' -f1,3 | sort | uniq -c

You need to change the delimiter to , after running uniq -c, since it's adding the first column.
awk -F'|' '{print $1, $3}' file1.csv | sort | uniq -c | awk 'BEGIN{OFS=","} {$1=$1;print}'
But you don't need to use sort | uniq -c if you're using awk, it can do the counting itself.
awk 'BEGIN{FS="|";OFS=","} {a[$1 OFS $3]++} END{for(k in a) print a[k], k}' file1.csv

Awk or cut how to output the count of one unique column and other column values

Right now I have
grep "\sinstalled" combined_dpkg.log | awk -F ' ' '{print $5}' | sort | uniq -c | sort -rn
grep "\sinstalled" combined_dpkg.log | sort -k1 | awk '!a[$5]++' | cut -d " " -f1,5,6
And would like to combine the two into one query that includes the count of $5 with -f1,5,6.
If there is such a way to do so, or a way to retain values to be outputted following the final pipe.
The head -3 result of the first bash command above:
11 man-db:amd64
10 libc-bin:amd64
9 mime-support:all
And of the second bash command:
2015-11-10 linux-headers-4.2.0-18-generic:amd64 4.2.0-18.22
2015-11-10 linux-headers-4.2.0-18:all 4.2.0-18.22
2015-11-10 linux-signed-image-4.2.0-18-generic:amd64 4.2.0-18.22
File format looks like:
2015-11-05 13:23:53 upgrade firefox:amd64 41.0.2+build2-0ubuntu1 42.0+build2-0ubuntu0.15.10.1
2015-11-05 13:23:53 status half-configured firefox:amd64 41.0.2+build2-0ubuntu1
2015-11-05 13:23:53 status unpacked firefox:amd64 41.0.2+build2-0ubuntu1
2015-11-05 13:23:53 status half-installed firefox:amd64 41.0.2+build2-0ubuntu1

grep "\sinstalled" combined_dpkg.log | sort -k1 | awk '!a[$5]' | cut -d " " -f1,5,6 | uniq -c

Based on your comment : "For each package find the earliest (first) version ever installed. Print the package name, the version and the total number of times it was installed."
I guess this awk would do.
awk '$0!~/ installed/{next} !($5 in a){a[$5]=$1 FS $5 FS $6; count[$5]++; next} count[$5]>0 && a[$5]~$6{count[$5]++} END{for (i in a) print a[i],count[i]}' file

find maximum number in row string and show two column - linux

I want to find maximum number in the strings inside file
already i have a script to get maximum number
counters_2016080822.log:2016-08-08 15:55:00,10.26.x.x,SERVER#10.26.x.x,SSCM_VRC/sscm-vrc-flow-20160602,,transactions.tps,13
counters_2016080823.log:2016-08-08 23:00:00,10.26.x.x,SERVER#10.26.x.x,SSCM_VRC/sscm-vrc-flow-20160602,,transactions.tps,14
counters_2016080823.log:2016-08-08 23:05:00,10.26.x.x,SERVER#10.26.x.1x,SSCM_VRC/sscm-vrc-flow-20160602,,transactions.tps,19
firstly by putting last column which is number to new .txt file
using sed
sed 's/^.*tps,//'
13
14
19
then sorting and getting first row
grep -Eo '[0-9]+' myfile.txt | sort -rn | head -n 1
19
but now i want to find maximum then get maximum number and it is time
(date & time or just time)
as below:
23:05:00 19

Maybe something like
echo "counters_2016080822.log:2016-08-08 15:55:00,10.26.x.x,SERVER#10.26.x.x,SSCM_VRC/sscm-vrc-flow-20160602,,transactions.tps,13
counters_2016080823.log:2016-08-08 23:00:00,10.26.x.x,SERVER#10.26.x.x,SSCM_VRC/sscm-vrc-flow-20160602,,transactions.tps,14
counters_2016080823.log:2016-08-08 23:05:00,10.26.x.x,SERVER#10.26.x.1x,SSCM_VRC/sscm-vrc-flow-20160602,,transactions.tps,19" | \
sed -r 's/^.*:([0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}),.*,([0-9]+)$/\1 \2/' | \
sort -n -k 3 -t ' ' | tail -n 1

You can use only awk:
time and max:
awk -F, '$NF > max {max=$NF; time=$1}; END{ print substr(time,(length(time)-7))" "max}' myfile.txt
date,time and max:
awk -F, '$NF > max {max=$NF; time=$1}; END{ print substr(time,(length(time)-18))" "max}' myfile.txt
F : Input field separator variable
NF :gives you the total number of fields in a record,
or with awk and cut
this is time and max
awk -F, '$NF > max {max=$NF; time=$1}; END{ print time" "max}' myfile.txt | cut -d' ' -f2,3
this is date, time and max
awk -F, '$NF > max {max=$NF; time=$1}; END{ print time" "max}' myfile.txt | cut -d: -f2-

here is another solution
$ awk -F'[ ,]' '{print $2,$NF}' file | sort -k2nr | head -1
23:05:00 19

$ awk -F'[ ,]' 'NR==1{m=$NF} $NF>=m{m=$NF; t=$2} END{print t, m}' file
23:05:00 19

How to count number of rows per distinct row in Linux bash

I have a file like this:
id|domain
9930|googspf.biz
9930|googspf.biz
9930|googspf.biz
9931|googspf.biz
9931|googspf.biz
9931|googspf.biz
9931|googspf.biz
9931|googspf.biz
9942|googspf.biz
And I would like to count the number of times a distinct id shows up in my data like below:
9930|3
9931|5
9942|1
How can I do that with linux bash? Currently I am using this, but I am counting all lines with this:
cat filename | grep 'googspf.biz'| sort -t'|' -k1,1 | wc
can any body help?

Try this :
awk -F'|' '
/googspf.biz/{a[$1]++}
END{for (i in a) {print i, a[i]}}
' OFS='|' file
or
awk '
BEGIN {FS=OFS="|"}
/googspf.biz/{a[$1]++}
END{for (i in a) {print i, a[i]}}
' file

sed 1d file | cut -d'|' -f1 | sort | uniq -c

I first thought of using uniq -c (-c is for count) since your data seems to be sorted:
~$ grep "googspf.biz" f | cut -d'|' -f1|uniq -c
3 9930
5 9931
1 9942
And in order to format accordingly, I had to use awk:
~$ grep "googspf.biz" f | cut -d'|' -f1|uniq -c|awk '{print $2"|"$1}'
9930|3
9931|5
9942|1
But then, with awk only:
~$ awk -F'|' '/googspf/{a[$1]++}END{for (i in a){print i"|"a[i]}}' f
9930|3
9931|5
9942|1
-F'|' to use | as a delimiter, and if line matches googspf (or NR>1: if line's number is >1) increments the counter for the first field. At the end print accordingly.

Linux bc command total counts

Here is the output of my netstat command. I want to count total of first field number like 7+8+1+1+1+1+3+1+2..so on... How do i use bc or any other method command to total count them?
[root#example httpd]# netstat -natp | grep 7143 | grep EST | awk -F' ' '{print $5}' | awk -F: '{print $1}' | sort -nr | uniq -c
7 209.139.35.xxx
8 209.139.35.xxx
1 209.139.35.xxx
1 209.139.35.xxx
1 208.46.149.xxx
3 96.17.177.xxx
1 96.17.177.xxx
2 96.17.177.xxx

You need to get the first column with awk (You don't actually need this, but I'm leaving it as a monument to my eternal shame)
awk {'print $1'}
and then use awk again to sum the column of numbers and print the result
awk '{ sum+=$1} END {print sum}'
All together:
netstat -natp | grep 7143 | grep EST | awk -F' ' '{print $5}' | awk -F: '{print $1}' | sort -nr | uniq -c | awk {'print $1'} | awk '{ sum+=$1} END {print sum}'
I know this doesn't use bc, but it gets the job done, so hopefully that's enough.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Get the sum of a particular column using bash? - linux

I am trying to get the sum of the 5th column of a .csv file using bash, however the command I am using keeps getting me zero. I am piping the file through a grep to remove the column header row: grep -v Header results.csv | awk '{sum += $5} END {print sum}'

here's how I would do it: tail -n+2 | cut -d, -f5 | awk '{sum+=$1} END {print sum}' or: tail -n+2 | awk -F, '{sum+=$5} END {print sum}' (depending on what turns out to be faster.)

Related

How to get "," separated output using awk command in linux

Awk or cut how to output the count of one unique column and other column values

find maximum number in row string and show two column - linux

How to count number of rows per distinct row in Linux bash

Linux bc command total counts

Categories

Resources