Wanted to count total attemps by day awk [duplicate] - linux

This question already has answers here:
Best way to simulate "group by" from bash?
(17 answers)
Closed 2 years ago.
im trying to count the number of ocurrences for a list in awk, actually im able to get it for each user the total attemps, but i wanted total attemps by day. I have a txt file something like:
ID, Event, Date, Type, Message, API, User, Protocol, Attemps
1, ERROR, 30-NOV-20, 4, TEXT, 2, user1, GUI, 9
i used below awk to count total attemps:
awk 'FNR == NR {count[$(NF-3)]++; next} {print $(NF-3), $3 "\t" count[$(NF-3)]}' file file
Can someone help me?
Expected output:
USER ATTEMPS DATE
user1 3 20-NOV-2020
user1 6 22-NOV-2020
user2 2 01-DEC-2020
user3 4 12-NOV-2020
user3 19 18-NOV-2020

This is not using only awk but it should works if you need total attempts per day:
awk -F, '{print $3}' file | sort | uniq -c
Edit: To have total attempts per day and per user you can do following:
awk -F, '{print $3 $7}' file | sort | uniq -c

this should do, but couldn't test with more data
$ awk -F', ' -v OFS='\t' '
NR==1 {print $7,$NF,$3; next}
NF {a[$7,$3]+=$NF}
END {for(k in a)
{split(k,ks,SUBSEP);
print ks[1],a[k],ks[2]}}' file
User Attemps Date
user1 9 30-NOV-20

awk -F, 'NR > 1 { map[$3]+=$NF } END { for (i in map) { print i" - "map[i] } }' file
Using gnu Awk, set the field delimiter to a comma and then use the 3rd field as the index for an array map and the value a running total of attempts ($NF). Once all lines are processed, we loop through the map array printing the index and the value which is the date and the attempt total.

Related

Linux Unique values count [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 1 year ago.
Improve this question
I have .csv file, I want to count total values from column 5 only if corresponding value of column8 is not equal to "999"
I have tried with this, but not getting desired output.
cat test.csv | sed "1 d" |awk -F , '$8 != 999' | cut -d, -f5 | sort | uniq | wc -l >test.txt
Note that total number of records is more than 20K
I do am getting the number of unique values but it is not subtracting values with 999.
Can anyone help?
Sample Input:
Col1,Col2,Col3,Col4,Col5,Col7,Col8,Col9,Col10,Col11
1,0,0,0,ABCD,0,0,5436,0,0,0
1,0,0,0,543674,0,0,18999,0,0,0
1,0,0,0,143527,0,0,1336,0,0,0
1,0,0,0,4325,0,0,999,0,0,0
1,0,0,0,MCCDU,0,0,456,0,0,0
1,0,0,0,MCCDU,0,0,190,0,0,0
1,0,0,0,4325,0,0,190,0,0,0
What I want to do is not count the value from col5 if the corresponding value from col8 ==999.
By count total I mean total lines.
In above sample input, col5 value of line 6 and 7 is same, I need them to count as one.
I need to sort as Col5 values could be duplicate as I need to find total unique values.
Script:
awk 'BEGIN {FS=","} NR > 1 && $8 != 999 {uniq[$5]++} END {for(key in uniq) {total+=uniq[key]}; print "Total: "total}' input.csv
Output:
543674 1
143527 1
ABCD 1
MCCDU 2
4325 1
Total: 6
With an awk that supports length(array) (e.g. GNU awk and some others):
$ awk -F',' '(NR>1) && ($8!=999){vals[$5]} END{print length(vals)}' test.csv
5
With any awk:
$ awk -F',' '(NR>1) && ($8!=999) && !seen[$5]++{ cnt++ } END{print cnt+0}' test.csv
5
The +0 in the END is so you get numeric output even if the input file is empty.

Sum of two maximum patterns in linux file

I am a newbie in linux. Need help for a command.
I have file in linux with following values:
2-1
2-10
2-11
2-12
2-2
2-3
1-1
1-10
1-11
1-2
1-3
1-9
Needed output needed is 23. Sum of maximum from 1- & 2- pattern i.e. 11 from 1-11 & 12 from 2-12
awk -F"-" 'BEGIN{a=0; b=0;} {if(int($1)==1 && int($2)>a){a=int($2)}; if(int($1)==2 && int($2)>b){b=int($2)}}END{print a+b}' file
output:
23
Another awk using ternary operator
awk -v FS='-' '{m1=($1==1?(m1>$2?m1:$2):m1);m2=($1==2?(m2>$2?m2:$2):m2)}END{print m1+m2}' file
sort + awk pipeline:
sort -t- -k2 -n file | awk -F'-' '{a[$1]=$2}END{ print a[1]+a[2] }'
The output:
23
$ awk -F'-' '{max[$1] = ($2 > max[$1] ? $2 : max[$1])} END{for (key in max) sum+=max[key]; print sum}' file
23
$ awk -F- 'a[$1]<$2{a[$1]=$2}END{for(i in a)s+=a[i]; print s}' infile
23

How to add number of identical line next to the line itself? [duplicate]

This question already has answers here:
Find duplicate lines in a file and count how many time each line was duplicated?
(7 answers)
Closed 7 years ago.
I have file file.txt which look like this
a
b
b
c
c
c
I want to know the command to which get file.txt as input and produces the output
a 1
b 2
c 3
I think uniq is the command you are looking for. The output of uniq -c is a little different from your format, but this can be fixed easily.
$ uniq -c file.txt
1 a
2 b
3 c
If you want to count the occurrences you can use uniq with -c.
If the file is not sorted you have to use sort first
$ sort file.txt | uniq -c
1 a
2 b
3 c
If you really need the line first followed by the count, swap the columns with awk
$ sort file.txt | uniq -c | awk '{ print $2 " " $1}'
a 1
b 2
c 3
You can use this awk:
awk '!seen[$0]++{ print $0, (++c) }' file
a 1
b 2
c 3
seen is an array that holds only uniq items by incrementing to 1 first time an index is populated. In the action we are printing the record and an incrementing counter.
Update: Based on comment below if intent is to get a repeat count in 2nd column then use this awk command:
awk 'seen[$0]++{} END{ for (i in seen) print i, seen[i] }' file
a 1
b 2
c 3

How to retrieve one days data from a log file having multiple days of data

I have a zipped log file () having 3 days of data.I want to retrieve only one days data. Currently the code for calculating the sum of datavolume is as given below.
Server_Sent_bl1=`gzcat $LOGDIR/blprxy1/archive"$i"/*.log.gz | nawk -F"|" '{sum+=$(NF -28)} END{print sum}'`
There are 3 logs , suppose all the 3 logs contain data of 06/jul/2014 , how to retrieve jul 6th data from those 3 files and then sum up the data volume?
You could try this:
$ gzcat $LOGDIR/blprxy1/archive"$i"/*.log.gz | grep "06/jul/2014" | nawk -F"|" '{sum+=$(NF -28)} END{print sum}'

linux command to get the last appearance of a string in a text file

I want to find the last appearance of a string in a text file with linux commands. For example
1 a 1
2 a 2
3 a 3
1 b 1
2 b 2
3 b 3
1 c 1
2 c 2
3 c 3
In such a text file, i want to find the line number of the last appearance of b which is 6.
I can find the first appearance with
awk '/ b / {print NR;exit}' textFile.txt
but I have no idea how to do it for the last occurrence.
cat -n textfile.txt | grep " b " | tail -1 | cut -f 1
cat -n prints the file to STDOUT prepending line numbers.
grep greps out all lines containing "b" (you can use egrep for more advanced patterns or fgrep for faster grep of fixed strings)
tail -1 prints last line of those lines containing "b"
cut -f 1 prints first column, which is line # from cat -n
Or you can use Perl if you wish (It's very similar to what you'd do in awk, but frankly, I personally don't ever use awk if I have Perl handy - Perl supports 100% of what awk can do, by design, as 1-liners - YMMV):
perl -ne '{$n=$. if / b /} END {print "$n\n"}' textfile.txt
This can work:
$ awk '{if ($2~"b") a=NR} END{print a}' your_file
We check every second file being "b" and we record the number of line. It is appended, so by the time we finish reading the file, it will be the last one.
Test:
$ awk '{if ($2~"b") a=NR} END{print a}' your_file
6
Update based on sudo_O advise:
$ awk '{if ($2=="b") a=NR} END{print a}' your_file
to avoid having some abc in 2nd field.
It is also valid this one (shorter, I keep the one above because it is the one I thought :D):
$ awk '$2=="b" {a=NR} END{print a}' your_file
Another approach if $2 is always grouped (may be more efficient then waiting until the end):
awk 'NR==1||$2=="b",$2=="b"{next} {print NR-1; exit}' file
or
awk '$2=="b"{f=1} f==1 && $2!="b" {print NR-1; exit}' file

Resources