bash: getting percentage from a frequency table - linux

i had made a small bash script in order to get the frequency of items in a certain column of a file.
The output would be sth like this
A 30
B 25
C 20
D 15
E 10
the command i used inside the script is like this
cut -f $1 $2| sort | uniq -c |
sort -r -k1,1 -n | awk '{printf "%-20s %-15d\n", $2,$1}'
how can i modify it to show the relative percentages for each case as well. so it would be like
A 30 30%
B 25 25%
C 20 20%
D 15 15%
E 10 10%

Try this (with the sort moved to the end:
cut -f $1 $2| sort | uniq -c | awk '{array[$2]=$1; sum+=$1} END { for (i in array) printf "%-20s %-15d %6.2f%%\n", i, array[i], array[i]/sum*100}' | sort -r -k2,2 -n

Change your awk command to something like this:
awk '{ a[++n,1] = $2; a[n,2] = $1; t += $1 }
END {
for (i = 1; i <= n; i++)
printf "%-20s %-15d%d%%\n", a[i,1], a[i,2], 100 * a[i,2] / t
}'

Related

How to generate a log record in a comma separated format

Earlier I am using this command which generates a record in the space-separated format
sudo cat path of the file |
awk -v d="$(date --date="1 days ago" +"%h %e")" '
/Accepted/ && ($0 ~ d) { print $1,$2,$9,$11}' |
sort | uniq -c
like this
xx xx xx xxxx xxx
xx xx xx xxxx xxx
Now my aim is to generate a comma-separated record for that I'm using this way
sudo cat path of the file |
awk -v d="$(date --date="1 days ago" +"%h %e")" '
/Accepted/ && ($0 ~ d) { print $1","$2","$9","$11}' |
sort | uniq -c
but it is giving me the record like this
xx xx,xx,xxxx,xxx
xx xx,xx,xxxx,xxx
I want to achieve a comma after the first column so on.
How can I do it help, please
The problem is that uniq -c prefixes the count to the line, separating the count from the data with a space (and usually including spaces before the count, at least for counts up to 3 digits long on a Mac, 6 digits long on RHEL 7.4). So, if there is no way to change the separator (there isn't on a Mac; and uniq 8.22 on RHEL 7.4 does not include such an option), then you'll have to do it yourself.
However, you can use awk to do the counting and formatting, leaving an optional sort for post-processing:
sudo cat /var/log/secure |
awk -v d="$(date --date="1 days ago" +"%h %e")" '
/Accepted/ && ($0 ~ d) { key = $1 "," $2 "," $9 "," $11; count[key]++ }
END { for (key in count) print count[key] ",", key }'
Warning: untested code — there isn't any usable sample data in the question!
I have used this if anyone has a better approach then do tell me
sudo cat path of the file |
awk -v d="$(date --date="1 days ago" +"%h %e")" '
/Accepted/ && ($0 ~ d) { print $1,$2,$9,$11}' |
sort | uniq -c | sed -E -e 's/^[[:blank:]]+//g' -e 's/[[:blank:]]+/,/g'
I would suggest to use "sed" and replace whitespace with ,. Here is simple echo statement to do that
[user#hostname ~]$ echo "xx xx xx xxxx xxx" | sed 's/ /,/g'
xx,xx,xx,xxxx,xxx
Similarly you can use another pipe at end and do sed something like
sudo cat path of the file |
awk -v d="$(date --date="1 days ago" +"%h %e")" '
/Accepted/ && ($0 ~ d) { print $1,$2,$9,$11}' |
sort | uniq -c | sed 's/ /,/g'

Linux bash scripting: Sum one column using awk for overall cpu utilization and display all fields

problem below:
Script: I execute ps command with pid,user... and I am trying to use awk to sum overall cpu utilization of different processes.
Command:
> $ps -eo pid,user,state,comm,%cpu,command --sort=-%cpu | egrep -v '(0.0)|(%CPU)' | head -n10 | awk '
> { process[$4]+=$5; }
> END{
> for (i in process)
> {
> printf($1" "$2" "$3" ""%-20s %s\n",i, process[i]" "$6) ;
> }
>
> }' | sort -nrk 5 | head
Awk: Sum 5th column according to the process name (4th column)
Output:
1. 10935 zbynda S dd 93.3 /usr/libexec/gnome-terminal-server
2. 10935 zbynda S gnome-shell 1.9 /usr/libexec/gnome-terminal-server
3. 10935 zbynda S sublime_text 0.6 /usr/libexec/gnome-terminal-server
4. 10935 zbynda S sssd_kcm 0.2 /usr/libexec/gnome-terminal-server
As you can see, the fourth and the fifth columns are all good, but the other ones (rows/columns) are just the first entry from ps command. I should have 4 different processes as in the fourth column, but for example, the last column shows only one same process.
How to get other entries from ps command? (not only the first entry)
Try this
ps -eo pid,user,state,comm,%cpu,command --sort=-%cpu | egrep -v '(0.0)|(%CPU)' | head -n10 | awk '
{ process[$4]+=$5; a1[$4]=$1;a2[$4]=$2;a3[$4]=$3;a6[$4]=$6}
END{
for (i in process)
{
printf(a1[i]" "a2[i]" "a3[i]" ""%-20s %s\n",i, process[i]" "a6[i]) ;
}
}' | sort -nrk 5 | head
an END rule is executed once only, after all the input is read.
Your printf uses $6, which retains the value from the last line. Think you want to use "i" instead.
Of course $1, $2, and $3 have the same problem so you will need to preserve incoming values as well. An exercise to the student is to fix this.

Get awk result greater than X

Command:
grep "redirect=on" access_log | awk '{print $1}' | sort -n | uniq -c | sort -nr | head -3
Output is:
34 3.247.44.149
6 5.218.131.185
3 7.173.135.94
Question: How to output only where the NR is greater then 10. In this case:
34 3.247.44.149
I tried already to play with $1 > 10 but $1 is the IP and not the number.
Thank you.
With single awk:
awk -F'[[:space:]]+|?' '$8=="redirect=on"{ a[$1]++ }
END{ for(ip in a) if(a[ip] > 10) print a[ip],ip }' access_log
-F'[[:space:]]+|?' - field separator
$8=="redirect=on" - considering only records with query param "redirect=on"
a[$1]++ - count same IP address occurrences

Sum all the numbers in a file given by positional parameter

I want to sum all the numbers in a file (columns and lines) given by the first parameter, but my program shows sum=sum+$i instead of the numeric sum:
sum=0;
file=$1
for i in $file
do
sum=sum+$i;
done;
echo "The sum is: " $sum
Input file:
$cat file.txt
10 20 10
40
50
Expected output :
The sum is: 21
Maybe if there is an awk method to solve this?
Try this -
$cat file1.txt
10 20 10
40
50
$awk '{for(i=1;i<=NF;i++) {sum+=$i}} END {print sum}' file1.txt
130
OR
$xargs < file1.txt| tr ' ' + | bc
130
cat file.txt | xargs | sed -e 's/\ /+/g' | bc
You can also use a simple read and an array to sum the value relying on word splitting to separate the values into an array via the default IFS (Internal Field Separator), e.g.
#!/bin/bash
declare -i sum=0
fn="${1:-/dev/stdin}" ## read from file as 1st argument (default stdin)
while read -r line; do ## read each line
a=( $line ) ## separate values into array
for i in ${a[#]}; do ## for each value in array
((sum += i)) ## add to sum
done
done <"$fn"
echo "sum: $sum"
Example Input File
$ cat dat/numfile.txt
10 20 10
40
50
Example Use/Output
$ bash sumnumfile.sh dat/numfile.txt
sum: 130
Another for some awks (at least mawk and gawk):
$ awk -v RS="[^0-9]" '{s+=$1}END{print s}' file
130

Linux command (Calculating the sum)

I have a .txt file with the following content:
a 3
a 4
a 5
a 6
b 1
b 3
b 5
c 9
c 10
I am wondering if there is any command (no awk if possible) that can read the .txt file and give the following output (Sorted by the second column):
c 19
a 18
b 9
You can use awk piped to sort:
awk '{sums[$1] += $2} END {for (i in sums) print i, sums[i]}' file | sort -rnk2
c 19
a 18
b 9
sums[$1] += $2 is adding value of $2 in an array sums that is indexed by field #1 ($1).
sort -rnk2 is reverse sorting numerically output of awk on field 2
Use can use this code:
cat 1.txt | awk '{arr[$1]+=$2}END{for (var in arr) print var," ",arr[var]}' | sort -rnk 2
Explanation:
cat 1.txt - read 1.txt file with content
awk - is a language very useful for data manipulation
{arr[$1]+=$2} for each line in content file increase array item with key first field with value of second field. Field separator by default is space.
END{for (var in arr) print var," ",arr[var]}' - after all line is proceeded, print array content
sort -rnk 2 - reverse numeric sort on field 2
Non-awk solutions.
perl
perl -lane '
$sum{$F[0]} += $F[1]
} END {
$, = " ";
print $_, $sum{$_} for reverse sort {$sum{$a} <=> $sum{$b}} keys %sum
' file.txt
bash version 4
declare -A sum
while read key val; do (( sum[$key] += $val )); done < file.txt
for key in "${!sum[#]}"; do echo "$key ${sum[$key]}"; done | sort -rn -k2
non-awk challenge accepted
vars=$(cut -d" " -f1 nums | uniq); paste <(echo "$vars") <(cat <(sed -e 's/ /+=/' nums) <(echo "$vars" | sed 's/$/;/') | bc) | sort -k2,2nr
c 19
a 18
b 9

Resources