How to get "," separated output using awk command in linux - linux

I am trying to print the output of awk command with "," delimited.
Trying to get the same output using cut.
cat File1
dot|is-big|a
dot|is-round|a
dot|is-gray|b
cat|is-big|a
hot|in-summer|a
dot|is-big|a
dot|is-round|b
dot|is-gray|a
cat|is-big|a
hot|in-summer|a
Command tried :
$awk 'BEGIN{FS="|"; OFS=","} {print $1,$3}' file1.csv | sort | uniq -c
Output Got:
2 cat,a
4 dot,a
2 dot,b
2 hot,a
Desired Output:
2,cat,a
4,dot,a
2,dot,b
2,hot,a
Couple of other commands tried :
$cat file1.csv |cut --output-delimiter="|" -d'|' -f1,3 | sort | uniq -c

You need to change the delimiter to , after running uniq -c, since it's adding the first column.
awk -F'|' '{print $1, $3}' file1.csv | sort | uniq -c | awk 'BEGIN{OFS=","} {$1=$1;print}'
But you don't need to use sort | uniq -c if you're using awk, it can do the counting itself.
awk 'BEGIN{FS="|";OFS=","} {a[$1 OFS $3]++} END{for(k in a) print a[k], k}' file1.csv

Related

Awk or cut how to output the count of one unique column and other column values

Right now I have
grep "\sinstalled" combined_dpkg.log | awk -F ' ' '{print $5}' | sort | uniq -c | sort -rn
grep "\sinstalled" combined_dpkg.log | sort -k1 | awk '!a[$5]++' | cut -d " " -f1,5,6
And would like to combine the two into one query that includes the count of $5 with -f1,5,6.
If there is such a way to do so, or a way to retain values to be outputted following the final pipe.
The head -3 result of the first bash command above:
11 man-db:amd64
10 libc-bin:amd64
9 mime-support:all
And of the second bash command:
2015-11-10 linux-headers-4.2.0-18-generic:amd64 4.2.0-18.22
2015-11-10 linux-headers-4.2.0-18:all 4.2.0-18.22
2015-11-10 linux-signed-image-4.2.0-18-generic:amd64 4.2.0-18.22
File format looks like:
2015-11-05 13:23:53 upgrade firefox:amd64 41.0.2+build2-0ubuntu1 42.0+build2-0ubuntu0.15.10.1
2015-11-05 13:23:53 status half-configured firefox:amd64 41.0.2+build2-0ubuntu1
2015-11-05 13:23:53 status unpacked firefox:amd64 41.0.2+build2-0ubuntu1
2015-11-05 13:23:53 status half-installed firefox:amd64 41.0.2+build2-0ubuntu1
grep "\sinstalled" combined_dpkg.log | sort -k1 | awk '!a[$5]' | cut -d " " -f1,5,6 | uniq -c
Based on your comment : "For each package find the earliest (first) version ever installed. Print the package name, the version and the total number of times it was installed."
I guess this awk would do.
awk '$0!~/ installed/{next} !($5 in a){a[$5]=$1 FS $5 FS $6; count[$5]++; next} count[$5]>0 && a[$5]~$6{count[$5]++} END{for (i in a) print a[i],count[i]}' file

Get the sum of a particular column using bash?

I am trying to get the sum of the 5th column of a .csv file using bash, however the command I am using keeps getting me zero. I am piping the file through a grep to remove the column header row:
grep -v Header results.csv | awk '{sum += $5} END {print sum}'
here's how I would do it:
tail -n+2 | cut -d, -f5 | awk '{sum+=$1} END {print sum}'
or:
tail -n+2 | awk -F, '{sum+=$5} END {print sum}'
(depending on what turns out to be faster.)

How to count number of rows per distinct row in Linux bash

I have a file like this:
id|domain
9930|googspf.biz
9930|googspf.biz
9930|googspf.biz
9931|googspf.biz
9931|googspf.biz
9931|googspf.biz
9931|googspf.biz
9931|googspf.biz
9942|googspf.biz
And I would like to count the number of times a distinct id shows up in my data like below:
9930|3
9931|5
9942|1
How can I do that with linux bash? Currently I am using this, but I am counting all lines with this:
cat filename | grep 'googspf.biz'| sort -t'|' -k1,1 | wc
can any body help?
Try this :
awk -F'|' '
/googspf.biz/{a[$1]++}
END{for (i in a) {print i, a[i]}}
' OFS='|' file
or
awk '
BEGIN {FS=OFS="|"}
/googspf.biz/{a[$1]++}
END{for (i in a) {print i, a[i]}}
' file
sed 1d file | cut -d'|' -f1 | sort | uniq -c
I first thought of using uniq -c (-c is for count) since your data seems to be sorted:
~$ grep "googspf.biz" f | cut -d'|' -f1|uniq -c
3 9930
5 9931
1 9942
And in order to format accordingly, I had to use awk:
~$ grep "googspf.biz" f | cut -d'|' -f1|uniq -c|awk '{print $2"|"$1}'
9930|3
9931|5
9942|1
But then, with awk only:
~$ awk -F'|' '/googspf/{a[$1]++}END{for (i in a){print i"|"a[i]}}' f
9930|3
9931|5
9942|1
-F'|' to use | as a delimiter, and if line matches googspf (or NR>1: if line's number is >1) increments the counter for the first field. At the end print accordingly.

How to get the name of a person who has maximum age using UNIX commands

I would like to get the name of the person who has maximum age in a unix data file. How can I do this?
Rob,20
Tom,30
I tried this as below but it gives me only max age.
awk -F"," '{print $2}' age.txt | sort -r | head -1
$ cat file | awk -F, '{print $2,$1;}' | sort -n | tail -n1
30 Tom
$ cat file | awk -F, '{print $2,$1;}' | sort -n | tail -n1 | awk '{print $2;}'
Tom
Try perhaps
awk -F, '{if (maxage<$2) { maxage= $2; name=$1; };} END{print name}' \
age.txt
traditional:
sort -t, -nr +1 age.txt | head -1 | cut -d, -f1
POSIXy:
sort -t, -k2,2nr age.txt | head -n 1 | cut -d, -f1
i think you can easily do this using below command
echo -e "Rob,20\nTom,30\nMin,10\nMax,50" | sort -t ',' -rk 2 | head -n 1
Please comment in case of any issues.

Linux bc command total counts

Here is the output of my netstat command. I want to count total of first field number like 7+8+1+1+1+1+3+1+2..so on... How do i use bc or any other method command to total count them?
[root#example httpd]# netstat -natp | grep 7143 | grep EST | awk -F' ' '{print $5}' | awk -F: '{print $1}' | sort -nr | uniq -c
7 209.139.35.xxx
8 209.139.35.xxx
1 209.139.35.xxx
1 209.139.35.xxx
1 208.46.149.xxx
3 96.17.177.xxx
1 96.17.177.xxx
2 96.17.177.xxx
You need to get the first column with awk (You don't actually need this, but I'm leaving it as a monument to my eternal shame)
awk {'print $1'}
and then use awk again to sum the column of numbers and print the result
awk '{ sum+=$1} END {print sum}'
All together:
netstat -natp | grep 7143 | grep EST | awk -F' ' '{print $5}' | awk -F: '{print $1}' | sort -nr | uniq -c | awk {'print $1'} | awk '{ sum+=$1} END {print sum}'
I know this doesn't use bc, but it gets the job done, so hopefully that's enough.

Resources