Binned histogram of timings in log file on command line - linux

To quickly evaluate the timings of various operations from a log file on a linux server, I would like to extract them from the log and create a textual/tsv style histogram. To have a better idea of how the timings are distributed, I want to bin them into ranges of 0-10ms, 10-20ms etc.
The output should look something like this:
121 10
39 20
12 30
7 40
1 100
How to achieve this with the usual set of unix command line tools?

Quick answer:
cat <file> | egrep -o [0-9]+ | sed "s/$/ \/10*10/" | bc | sort -n | uniq -c
Detailed answer:
grep the pattern of your timing or number. You may need to do multiple grep steps to extract exactly the numbers you want from your logs.
use sed to add arithmetic expression for integer division by desired factor and multiply it back on
bc performs the calculation
the well-known sort | uniq combo to count occurrences

Related

Piping, sorting then counting

I have this task on hand.
Finish this task using a combined command (a.k.a. piping) using four individual commands. Task: Find all lines containing the keyword "wonderful" from file "day.txt" and sort the lines alphabetically. Pick the first fifteen of the sorted lines and count how many characters there are in the fifteen lines.
I know the first command should be grep "wonderful" day.txt but then I can't figure out what to do afterwards.
Something like this would do it.
grep "wonderful" day.txt | sort | head -15 | wc -c
References:
sort
head
wc
Piping in linux

How to capture running log using bash script? [Case Closed]

I'm a new player in bash scripting. There's something that I want to know about capture logfile using bash script.
Let's say there is a server which store logfile every hour with format file below.
file[20160509_130000].log
The logfile has detailed information like this.
13:00:00 INFO [file.chdev130] Event: ADIEVN_PLAY_DONE, Finished , nbytes=16360
13:00:00 INFO [file.chdev39] adiCollectDigits() success
My question is how can i read and store the running log or one hour before to get specific parameter (e.g. "Event") to new file using bash scripting?
Can someone teach me how to do it? Thanks.
Update
Here the flow that I want (for this time I want to know how the first point works):
Get the running log or one hour before.
Read the 15 minute interval (13:00:00 - 13:15:00).
grep the parameter (e.g. "Event) in that interval.
Count the parameter.
Store it to another file.
SOLVED
Here the solution in case someone need it.
List all the content based on time stamp using ls -t then pipe it
Use grep -v ^d (i still doesn't know the exact explanation for ^d), pipe again
Display first few lines with head
So the result is,
ls -t *.log | grep -v ^d | head -1 (for display the running log)
ls -t *.log | grep -v ^d | head -2 | tail -1 (for display the one log before the running log)
Hope it'll help. Thanks
== Case Closed ==
tail -f /path-to-logfile will allow you to read a file continuously until you cancel tail execution.
grep "^13:" /path-to-logfile will show you all strings in the file, which starts from "13:", in our case you'll get every record for 13-th hour.
grep "Event" /path-to-logfile will show you all strings with "Event" in them, i think, you got the idea about grep already. )
You can figure out the current or the previous log filename using date, using your logfile name convention:
fname="file[`date -v-1H '+%Y%m%d_%H0000'`].log"
will give you the previous filename. Omit -v-1H to get the current one.
For the 15-minute intervals, you may use regexps like '^\d\d:(0\d|1[0-4])' for 00:00-14:59 interval, '^\d\d:(1[5-9]|2\d)' for 15:00-29:59, etc.
For example, in the first regex, ^ matches the beginning of the line, \d\d: matches "two digits and a colon", and (0\d|1[0-4]) matches either 0 with any adjacent digit, or 1 with adjacent digit from 0 to 4. In the second regex, (1[5-9]|2\d) matches 1 with digit from 5 to 9, or 2 with any digit.
Then you grep -Po '(?<=Event: ).+(?=,)' the type of events in your log, assuming that the type of event always ends with a ,. That regexp will greedily match any symbols, as many as it can, starting from one symbol, if they are between strings Event: and , (Event: and , themselves are not matched, that's what lookbehind/lookaheads are for). Then use sort | uniq -c to count number of different events entries .
So the resulting script would look something like
fname="file[`date -v-1H '+%Y%m%d_%H0000'`].log"
grep -P '^\d\d:(0\d|1[0-4])' "$fname" | grep -Po '(?<=Event: ).+(?=,)' | sort | uniq -c > "/tmp/$fname_00-14" # entries for first 15 minutes
grep -P '^\d\d:(1[5-9]|2\d)' "$fname" | grep -Po '(?<=Event: ).+(?=,)' | sort | uniq -c > "/tmp/$fname_15-29" # entries for second 15 minutes
grep -P '^\d\d:(3\d|4[0-4])' "$fname" | grep -Po '(?<=Event: ).+(?=,)' | sort | uniq -c > "/tmp/$fname_30-44" # entries for third 15 minutes
grep -P '^\d\d:(4[5-9]|5\d)' "$fname" | grep -Po '(?<=Event: ).+(?=,)' | sort | uniq -c > "/tmp/$fname_45-59" # entries for fourth 15 minutes
for the last hour log.
Another approach is to use logtail with cron entries to get last log entries, instead of grepping. You can set up your script to be run by cron at 00, 15, 30 and 45 minutes each hour, determine the log filename inside it and logtail the file like logtail -f "$fname". The catch here would be that when running at 00 minutes, you'd need to use the -v-1H switch for date, and this approach is not as accurate as grepping out the times.

unix script sort showing the number of items?

I have a shell script which is grepping the results of a file then it calls sort -u to get the unique entries. Is there a way to have sort also tell me how many of each of those entries there are? So the output would be something like:
user1 - 50
user2 - 23
user3 - 40
etc..
Use sort input | uniq -c. uniq does what -u does in sort -u, but also has the additional -c option for counting.
Grep has a -c switch to count the occurrence of each item..
grep -c needle haystack
will give the number of needles which you can sort as needed..
Given a sorted list, uniq -c will show the item, and how many. It will be the first column, so I will often do something like:
sort file.txt | uniq -c |sort -nr
The -n in the sort will parse numbers correctly, like 9 before 11 (though with the '-r', it will reverse the count, since I usually want the higher count lines first).

Unix command "uniq" & "sort"

As we known
uniq [options] [file1 [file2]]
It remove duplicate adjacent lines from sorted file1. The option -c prints each line once, counting instances of each. So if we have the following result:
34 Operating System
254 Data Structure
5 Crypo
21 C++
1435 C Language
589 Java 1.6
And we sort above data using "sort -1knr", the result is as below:
1435 C Language
589 Java 1.6
254 Data Structure
34 Operating System
21 C++
5 Crypo
Can anyone help me out that how to output only the book name in this order (no number)?
uniq -c filename | sort -k 1nr | awk '{$1='';print}'
You can also use sed for that, as follows:
uniq -c filename | sort -k -1nr | sed 's/[0-9]\+ \(.\+\)/\1/g'
Test:
echo "34 Data Structure" | sed 's/[0-9]\+ \(.\+\)/\1/g'
Data Structure
This can also be done with a simplified regex (courtesy William Pursell):
echo "34 Data Structure" | sed 's/[0-9]* *//'
Data Structure
Why do you use uniq -c to print the number of occurences, which you then want to remove with some cut/awk/sed dance?
Instead , you could just use
sort -u $file1 $file2 /path/to/more_files_to_glob*
Or do some systems come with a version of sort which doesn't support -u ?

How to crop(cut) text files based on starting and ending line-numbers in cygwin?

I have few log files around 100MBs each.
Personally I find it cumbersome to deal with such big files. I know that log lines that are interesting to me are only between 200 to 400 lines or so.
What would be a good way to extract relavant log lines from these files ie I just want to pipe the range of line numbers to another file.
For example, the inputs are:
filename: MyHugeLogFile.log
Starting line number: 38438
Ending line number: 39276
Is there a command that I can run in cygwin to cat out only that range in that file? I know that if I can somehow display that range in stdout then I can also pipe to an output file.
Note: Adding Linux tag for more visibility, but I need a solution that might work in cygwin. (Usually linux commands do work in cygwin).
Sounds like a job for sed:
sed -n '8,12p' yourfile
...will send lines 8 through 12 of yourfile to standard out.
If you want to prepend the line number, you may wish to use cat -n first:
cat -n yourfile | sed -n '8,12p'
You can use wc -l to figure out the total # of lines.
You can then combine head and tail to get at the range you want. Let's assume the log is 40,000 lines, you want the last 1562 lines, then of those you want the first 838. So:
tail -1562 MyHugeLogFile.log | head -838 | ....
Or there's probably an easier way using sed or awk.
I saw this thread when I was trying to split a file in files with 100 000 lines. A better solution than sed for that is:
split -l 100000 database.sql database-
It will give files like:
database-aaa
database-aab
database-aac
...
And if you simply want to cut part of a file - say from line 26 to 142 - and input it to a newfile :
cat file-to-cut.txt | sed -n '26,142p' >> new-file.txt
How about this:
$ seq 1 100000 | tail -n +10000 | head -n 10
10000
10001
10002
10003
10004
10005
10006
10007
10008
10009
It uses tail to output from the 10,000th line and onwards and then head to only keep 10 lines.
The same (almost) result with sed:
$ seq 1 100000 | sed -n '10000,10010p'
10000
10001
10002
10003
10004
10005
10006
10007
10008
10009
10010
This one has the advantage of allowing you to input the line range directly.
If you are interested only in the last X lines, you can use the "tail" command like this.
$ tail -n XXXXX yourlogfile.log >> mycroppedfile.txt
This will save the last XXXXX lines of your log file to a new file called "mycroppedfile.txt"
This is an old thread but I was surprised nobody mentioned grep. The -A option allows specifying a number of lines to print after a search match and the -B option includes lines before a match. The following command would output 10 lines before and 10 lines after occurrences of "my search string" in the file "mylogfile.log":
grep -A 10 -B 10 "my search string" mylogfile.log
If there are multiple matches within a large file the output can rapidly get unwieldy. Two helpful options are -n which tells grep to include line numbers and --color which highlights the matched text in the output.
If there is more than file to be searched grep allows multiple files to be listed separated by spaces. Wildcards can also be used. Putting it all together:
grep -A 10 -B 10 -n --color "my search string" *.log someOtherFile.txt

Resources