How to count number of rows per distinct row in Linux bash

How to count number of rows per distinct row in Linux bash - linux

I have a file like this:
id|domain
9930|googspf.biz
9930|googspf.biz
9930|googspf.biz
9931|googspf.biz
9931|googspf.biz
9931|googspf.biz
9931|googspf.biz
9931|googspf.biz
9942|googspf.biz
And I would like to count the number of times a distinct id shows up in my data like below:
9930|3
9931|5
9942|1
How can I do that with linux bash? Currently I am using this, but I am counting all lines with this:
cat filename | grep 'googspf.biz'| sort -t'|' -k1,1 | wc
can any body help?

Try this :
awk -F'|' '
/googspf.biz/{a[$1]++}
END{for (i in a) {print i, a[i]}}
' OFS='|' file
or
awk '
BEGIN {FS=OFS="|"}
/googspf.biz/{a[$1]++}
END{for (i in a) {print i, a[i]}}
' file

sed 1d file | cut -d'|' -f1 | sort | uniq -c

I first thought of using uniq -c (-c is for count) since your data seems to be sorted:
~$ grep "googspf.biz" f | cut -d'|' -f1|uniq -c
3 9930
5 9931
1 9942
And in order to format accordingly, I had to use awk:
~$ grep "googspf.biz" f | cut -d'|' -f1|uniq -c|awk '{print $2"|"$1}'
9930|3
9931|5
9942|1
But then, with awk only:
~$ awk -F'|' '/googspf/{a[$1]++}END{for (i in a){print i"|"a[i]}}' f
9930|3
9931|5
9942|1
-F'|' to use | as a delimiter, and if line matches googspf (or NR>1: if line's number is >1) increments the counter for the first field. At the end print accordingly.

Related

How to get "," separated output using awk command in linux

I am trying to print the output of awk command with "," delimited.
Trying to get the same output using cut.
cat File1
dot|is-big|a
dot|is-round|a
dot|is-gray|b
cat|is-big|a
hot|in-summer|a
dot|is-big|a
dot|is-round|b
dot|is-gray|a
cat|is-big|a
hot|in-summer|a
Command tried :
$awk 'BEGIN{FS="|"; OFS=","} {print $1,$3}' file1.csv | sort | uniq -c
Output Got:
2 cat,a
4 dot,a
2 dot,b
2 hot,a
Desired Output:
2,cat,a
4,dot,a
2,dot,b
2,hot,a
Couple of other commands tried :
$cat file1.csv |cut --output-delimiter="|" -d'|' -f1,3 | sort | uniq -c

You need to change the delimiter to , after running uniq -c, since it's adding the first column.
awk -F'|' '{print $1, $3}' file1.csv | sort | uniq -c | awk 'BEGIN{OFS=","} {$1=$1;print}'
But you don't need to use sort | uniq -c if you're using awk, it can do the counting itself.
awk 'BEGIN{FS="|";OFS=","} {a[$1 OFS $3]++} END{for(k in a) print a[k], k}' file1.csv

Awk or cut how to output the count of one unique column and other column values

Right now I have
grep "\sinstalled" combined_dpkg.log | awk -F ' ' '{print $5}' | sort | uniq -c | sort -rn
grep "\sinstalled" combined_dpkg.log | sort -k1 | awk '!a[$5]++' | cut -d " " -f1,5,6
And would like to combine the two into one query that includes the count of $5 with -f1,5,6.
If there is such a way to do so, or a way to retain values to be outputted following the final pipe.
The head -3 result of the first bash command above:
11 man-db:amd64
10 libc-bin:amd64
9 mime-support:all
And of the second bash command:
2015-11-10 linux-headers-4.2.0-18-generic:amd64 4.2.0-18.22
2015-11-10 linux-headers-4.2.0-18:all 4.2.0-18.22
2015-11-10 linux-signed-image-4.2.0-18-generic:amd64 4.2.0-18.22
File format looks like:
2015-11-05 13:23:53 upgrade firefox:amd64 41.0.2+build2-0ubuntu1 42.0+build2-0ubuntu0.15.10.1
2015-11-05 13:23:53 status half-configured firefox:amd64 41.0.2+build2-0ubuntu1
2015-11-05 13:23:53 status unpacked firefox:amd64 41.0.2+build2-0ubuntu1
2015-11-05 13:23:53 status half-installed firefox:amd64 41.0.2+build2-0ubuntu1

grep "\sinstalled" combined_dpkg.log | sort -k1 | awk '!a[$5]' | cut -d " " -f1,5,6 | uniq -c

Based on your comment : "For each package find the earliest (first) version ever installed. Print the package name, the version and the total number of times it was installed."
I guess this awk would do.
awk '$0!~/ installed/{next} !($5 in a){a[$5]=$1 FS $5 FS $6; count[$5]++; next} count[$5]>0 && a[$5]~$6{count[$5]++} END{for (i in a) print a[i],count[i]}' file

find maximum number in row string and show two column - linux

I want to find maximum number in the strings inside file
already i have a script to get maximum number
counters_2016080822.log:2016-08-08 15:55:00,10.26.x.x,SERVER#10.26.x.x,SSCM_VRC/sscm-vrc-flow-20160602,,transactions.tps,13
counters_2016080823.log:2016-08-08 23:00:00,10.26.x.x,SERVER#10.26.x.x,SSCM_VRC/sscm-vrc-flow-20160602,,transactions.tps,14
counters_2016080823.log:2016-08-08 23:05:00,10.26.x.x,SERVER#10.26.x.1x,SSCM_VRC/sscm-vrc-flow-20160602,,transactions.tps,19
firstly by putting last column which is number to new .txt file
using sed
sed 's/^.*tps,//'
13
14
19
then sorting and getting first row
grep -Eo '[0-9]+' myfile.txt | sort -rn | head -n 1
19
but now i want to find maximum then get maximum number and it is time
(date & time or just time)
as below:
23:05:00 19

Maybe something like
echo "counters_2016080822.log:2016-08-08 15:55:00,10.26.x.x,SERVER#10.26.x.x,SSCM_VRC/sscm-vrc-flow-20160602,,transactions.tps,13
counters_2016080823.log:2016-08-08 23:00:00,10.26.x.x,SERVER#10.26.x.x,SSCM_VRC/sscm-vrc-flow-20160602,,transactions.tps,14
counters_2016080823.log:2016-08-08 23:05:00,10.26.x.x,SERVER#10.26.x.1x,SSCM_VRC/sscm-vrc-flow-20160602,,transactions.tps,19" | \
sed -r 's/^.*:([0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}),.*,([0-9]+)$/\1 \2/' | \
sort -n -k 3 -t ' ' | tail -n 1

You can use only awk:
time and max:
awk -F, '$NF > max {max=$NF; time=$1}; END{ print substr(time,(length(time)-7))" "max}' myfile.txt
date,time and max:
awk -F, '$NF > max {max=$NF; time=$1}; END{ print substr(time,(length(time)-18))" "max}' myfile.txt
F : Input field separator variable
NF :gives you the total number of fields in a record,
or with awk and cut
this is time and max
awk -F, '$NF > max {max=$NF; time=$1}; END{ print time" "max}' myfile.txt | cut -d' ' -f2,3
this is date, time and max
awk -F, '$NF > max {max=$NF; time=$1}; END{ print time" "max}' myfile.txt | cut -d: -f2-

here is another solution
$ awk -F'[ ,]' '{print $2,$NF}' file | sort -k2nr | head -1
23:05:00 19

$ awk -F'[ ,]' 'NR==1{m=$NF} $NF>=m{m=$NF; t=$2} END{print t, m}' file
23:05:00 19

Get the sum of a particular column using bash?

I am trying to get the sum of the 5th column of a .csv file using bash, however the command I am using keeps getting me zero. I am piping the file through a grep to remove the column header row:
grep -v Header results.csv | awk '{sum += $5} END {print sum}'

here's how I would do it:
tail -n+2 | cut -d, -f5 | awk '{sum+=$1} END {print sum}'
or:
tail -n+2 | awk -F, '{sum+=$5} END {print sum}'
(depending on what turns out to be faster.)

Awk: Words frequency from one text file, how to ouput into myFile.txt?

Given a .txt files with space separated words such as:
But where is Esope the holly Bastard
But where is
And the Awk function :
cat /pathway/to/your/file.txt | tr ' ' '\n' | sort | uniq -c | awk '{print $2"#"$1}'
I get the following output in my console :
1 Bastard
1 Esope
1 holly
1 the
2 But
2 is
2 where
How to get into printed into myFile.txt ?
I actually have 300.000 lines and near 2 millions words. Better to output the result into a file.
EDIT: Used answer (by #Sudo_O):
$ awk '{a[$1]++}END{for(k in a)print a[k],k}' RS=" |\n" myfile.txt | sort > myfileout.txt

Your pipeline isn't very efficient you should do the whole thing in awk instead:
awk '{a[$1]++}END{for(k in a)print a[k],k}' RS=" |\n" file > myfile
If you want the output in sorted order:
awk '{a[$1]++}END{for(k in a)print a[k],k}' RS=" |\n" file | sort > myfile
The actual output given by your pipeline is:
$ tr ' ' '\n' < file | sort | uniq -c | awk '{print $2"#"$1}'
Bastard#1
But#2
Esope#1
holly#1
is#2
the#1
where#2
Note: using cat is useless here we can just redirect the input with <. The awk script doesn't make sense either, it's just reversing the order of the words and words frequency and separating them with an #. If we drop the awk script the output is closer to the desired output (notice the preceding spacing however and it's unsorted):
$ tr ' ' '\n' < file | sort | uniq -c
1 Bastard
2 But
1 Esope
1 holly
2 is
1 the
2 where
We could sort again a remove the leading spaces with sed:
$ tr ' ' '\n' < file | sort | uniq -c | sort | sed 's/^\s*//'
1 Bastard
1 Esope
1 holly
1 the
2 But
2 is
2 where
But like I mention at the start let awk handle it:
$ awk '{a[$1]++}END{for(k in a)print a[k],k}' RS=" |\n" file | sort
1 Bastard
1 Esope
1 holly
1 the
2 But
2 is
2 where

Just redirect output to a file.
cat /pathway/to/your/file.txt % tr ' ' '\n' | sort | uniq -c | \
awk '{print $2"#"$1}' > myFile.txt

Just use shell redirection :
echo "test" > overwrite-file.txt
echo "test" >> append-to-file.txt
Tips
A useful command is tee which allow to redirect to a file and still see the output :
echo "test" | tee overwrite-file.txt
echo "test" | tee -a append-file.txt
Sorting and locale
I see you are working with asian script, you need to be need to be careful with the locale use by your system, as the resulting sort might not be what you expect :
* WARNING * The locale specified by the environment affects sort order. Set LC_ALL=C to get the traditional sort order that uses native byte values.
And have a look at the output of :
locale

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to count number of rows per distinct row in Linux bash - linux

Try this : awk -F'|' ' /googspf.biz/{a[$1]++} END{for (i in a) {print i, a[i]}} ' OFS='|' file or awk ' BEGIN {FS=OFS="|"} /googspf.biz/{a[$1]++} END{for (i in a) {print i, a[i]}} ' file

sed 1d file | cut -d'|' -f1 | sort | uniq -c

Related

How to get "," separated output using awk command in linux

Awk or cut how to output the count of one unique column and other column values

find maximum number in row string and show two column - linux

Get the sum of a particular column using bash?

Awk: Words frequency from one text file, how to ouput into myFile.txt?

Categories

Resources