Get awk result greater than X - linux

Command:
grep "redirect=on" access_log | awk '{print $1}' | sort -n | uniq -c | sort -nr | head -3
Output is:
34 3.247.44.149
6 5.218.131.185
3 7.173.135.94
Question: How to output only where the NR is greater then 10. In this case:
34 3.247.44.149
I tried already to play with $1 > 10 but $1 is the IP and not the number.
Thank you.

With single awk:
awk -F'[[:space:]]+|?' '$8=="redirect=on"{ a[$1]++ }
END{ for(ip in a) if(a[ip] > 10) print a[ip],ip }' access_log
-F'[[:space:]]+|?' - field separator
$8=="redirect=on" - considering only records with query param "redirect=on"
a[$1]++ - count same IP address occurrences

Related

Linux bash scripting: Sum one column using awk for overall cpu utilization and display all fields

problem below:
Script: I execute ps command with pid,user... and I am trying to use awk to sum overall cpu utilization of different processes.
Command:
> $ps -eo pid,user,state,comm,%cpu,command --sort=-%cpu | egrep -v '(0.0)|(%CPU)' | head -n10 | awk '
> { process[$4]+=$5; }
> END{
> for (i in process)
> {
> printf($1" "$2" "$3" ""%-20s %s\n",i, process[i]" "$6) ;
> }
>
> }' | sort -nrk 5 | head
Awk: Sum 5th column according to the process name (4th column)
Output:
1. 10935 zbynda S dd 93.3 /usr/libexec/gnome-terminal-server
2. 10935 zbynda S gnome-shell 1.9 /usr/libexec/gnome-terminal-server
3. 10935 zbynda S sublime_text 0.6 /usr/libexec/gnome-terminal-server
4. 10935 zbynda S sssd_kcm 0.2 /usr/libexec/gnome-terminal-server
As you can see, the fourth and the fifth columns are all good, but the other ones (rows/columns) are just the first entry from ps command. I should have 4 different processes as in the fourth column, but for example, the last column shows only one same process.
How to get other entries from ps command? (not only the first entry)
Try this
ps -eo pid,user,state,comm,%cpu,command --sort=-%cpu | egrep -v '(0.0)|(%CPU)' | head -n10 | awk '
{ process[$4]+=$5; a1[$4]=$1;a2[$4]=$2;a3[$4]=$3;a6[$4]=$6}
END{
for (i in process)
{
printf(a1[i]" "a2[i]" "a3[i]" ""%-20s %s\n",i, process[i]" "a6[i]) ;
}
}' | sort -nrk 5 | head
an END rule is executed once only, after all the input is read.
Your printf uses $6, which retains the value from the last line. Think you want to use "i" instead.
Of course $1, $2, and $3 have the same problem so you will need to preserve incoming values as well. An exercise to the student is to fix this.

Print lines not containg a period linux

I have a file with thousands of rows. I want to print the rows which do not contain a period.
awk '{print$2}' file.txt | head
I have used this to print the column I am interested in, column 2 (The file only has two columns).
I have removed the head and then did
awk '{print$2}' file.txt | grep -v "." | head
But I only get blank lines not any actual values which is expected, I think it has included the spaces between the rows but I am not sure.
Is there an alternative command?
As suggested by Jim, I did-
awk '{print$2}' file.txt | grep -v "\." | head
However the number of lines is greater than before, is this expected? Also, my output is a list of numbers but with spaces in between them (Vertical), is this normal?
file.txt example below-
120.4 3
270.3 7.9
400.8 3.9
200.2 4
100.2 8.7
300.2 3.4
102.3 6
49.0 2.3
38.0 1.2
So the expected (and correct) output would be 3 lines, as there is 3 values in column 2 without the period:
$ awk '{print$2}' file.txt | grep -v "\." | head
3
4
6
However, when running the code as above, I instead get 5, which is also counting the spaces between the rows I think:
$ awk '{print$2}' file.txt | grep -v "\." | head
3
4
6
You seldom need to use grep if you're already using awk
This would print the second column on each line where that second column doesn't contain a dot:
awk '$2 !~ /\./ {print $2}'
But you also wanted to skip empty lines, or perhaps ones where the second column is not empty. So just test for that, too:
awk '$2 != "" && $2 !~ /\./ {print $2}'
(A more amusing version would be awk '$2 ~ /./ && $2 !~ /\./ {print $2}' )
As you said, grep -v "." gives you only blank lines. That's because the dot means "any character", and with -v, the only lines printed are those that don't contain, well, any characters.
grep is interpreting the dot as a regex metacharacter (the dot will match any single character). Try escaping it with a backslash:
awk '{print$2}' file.txt | grep -v "\." | head
If I understand well, you can try this sed
sed ':A;N;${s/.*/&\n/};/\n$/!bA;s/\n/ /g;s/\([^ ]*\.[^ ]* \)//g' file.txt
output
3
4
6

Sum of two maximum patterns in linux file

I am a newbie in linux. Need help for a command.
I have file in linux with following values:
2-1
2-10
2-11
2-12
2-2
2-3
1-1
1-10
1-11
1-2
1-3
1-9
Needed output needed is 23. Sum of maximum from 1- & 2- pattern i.e. 11 from 1-11 & 12 from 2-12
awk -F"-" 'BEGIN{a=0; b=0;} {if(int($1)==1 && int($2)>a){a=int($2)}; if(int($1)==2 && int($2)>b){b=int($2)}}END{print a+b}' file
output:
23
Another awk using ternary operator
awk -v FS='-' '{m1=($1==1?(m1>$2?m1:$2):m1);m2=($1==2?(m2>$2?m2:$2):m2)}END{print m1+m2}' file
sort + awk pipeline:
sort -t- -k2 -n file | awk -F'-' '{a[$1]=$2}END{ print a[1]+a[2] }'
The output:
23
$ awk -F'-' '{max[$1] = ($2 > max[$1] ? $2 : max[$1])} END{for (key in max) sum+=max[key]; print sum}' file
23
$ awk -F- 'a[$1]<$2{a[$1]=$2}END{for(i in a)s+=a[i]; print s}' infile
23

Linux command (Calculating the sum)

I have a .txt file with the following content:
a 3
a 4
a 5
a 6
b 1
b 3
b 5
c 9
c 10
I am wondering if there is any command (no awk if possible) that can read the .txt file and give the following output (Sorted by the second column):
c 19
a 18
b 9
You can use awk piped to sort:
awk '{sums[$1] += $2} END {for (i in sums) print i, sums[i]}' file | sort -rnk2
c 19
a 18
b 9
sums[$1] += $2 is adding value of $2 in an array sums that is indexed by field #1 ($1).
sort -rnk2 is reverse sorting numerically output of awk on field 2
Use can use this code:
cat 1.txt | awk '{arr[$1]+=$2}END{for (var in arr) print var," ",arr[var]}' | sort -rnk 2
Explanation:
cat 1.txt - read 1.txt file with content
awk - is a language very useful for data manipulation
{arr[$1]+=$2} for each line in content file increase array item with key first field with value of second field. Field separator by default is space.
END{for (var in arr) print var," ",arr[var]}' - after all line is proceeded, print array content
sort -rnk 2 - reverse numeric sort on field 2
Non-awk solutions.
perl
perl -lane '
$sum{$F[0]} += $F[1]
} END {
$, = " ";
print $_, $sum{$_} for reverse sort {$sum{$a} <=> $sum{$b}} keys %sum
' file.txt
bash version 4
declare -A sum
while read key val; do (( sum[$key] += $val )); done < file.txt
for key in "${!sum[#]}"; do echo "$key ${sum[$key]}"; done | sort -rn -k2
non-awk challenge accepted
vars=$(cut -d" " -f1 nums | uniq); paste <(echo "$vars") <(cat <(sed -e 's/ /+=/' nums) <(echo "$vars" | sed 's/$/;/') | bc) | sort -k2,2nr
c 19
a 18
b 9

bash: getting percentage from a frequency table

i had made a small bash script in order to get the frequency of items in a certain column of a file.
The output would be sth like this
A 30
B 25
C 20
D 15
E 10
the command i used inside the script is like this
cut -f $1 $2| sort | uniq -c |
sort -r -k1,1 -n | awk '{printf "%-20s %-15d\n", $2,$1}'
how can i modify it to show the relative percentages for each case as well. so it would be like
A 30 30%
B 25 25%
C 20 20%
D 15 15%
E 10 10%
Try this (with the sort moved to the end:
cut -f $1 $2| sort | uniq -c | awk '{array[$2]=$1; sum+=$1} END { for (i in array) printf "%-20s %-15d %6.2f%%\n", i, array[i], array[i]/sum*100}' | sort -r -k2,2 -n
Change your awk command to something like this:
awk '{ a[++n,1] = $2; a[n,2] = $1; t += $1 }
END {
for (i = 1; i <= n; i++)
printf "%-20s %-15d%d%%\n", a[i,1], a[i,2], 100 * a[i,2] / t
}'

Resources