How to capture running log using bash script? [Case Closed] - linux

I'm a new player in bash scripting. There's something that I want to know about capture logfile using bash script.
Let's say there is a server which store logfile every hour with format file below.
file[20160509_130000].log
The logfile has detailed information like this.
13:00:00 INFO [file.chdev130] Event: ADIEVN_PLAY_DONE, Finished , nbytes=16360
13:00:00 INFO [file.chdev39] adiCollectDigits() success
My question is how can i read and store the running log or one hour before to get specific parameter (e.g. "Event") to new file using bash scripting?
Can someone teach me how to do it? Thanks.
Update
Here the flow that I want (for this time I want to know how the first point works):
Get the running log or one hour before.
Read the 15 minute interval (13:00:00 - 13:15:00).
grep the parameter (e.g. "Event) in that interval.
Count the parameter.
Store it to another file.
SOLVED
Here the solution in case someone need it.
List all the content based on time stamp using ls -t then pipe it
Use grep -v ^d (i still doesn't know the exact explanation for ^d), pipe again
Display first few lines with head
So the result is,
ls -t *.log | grep -v ^d | head -1 (for display the running log)
ls -t *.log | grep -v ^d | head -2 | tail -1 (for display the one log before the running log)
Hope it'll help. Thanks
== Case Closed ==

tail -f /path-to-logfile will allow you to read a file continuously until you cancel tail execution.
grep "^13:" /path-to-logfile will show you all strings in the file, which starts from "13:", in our case you'll get every record for 13-th hour.
grep "Event" /path-to-logfile will show you all strings with "Event" in them, i think, you got the idea about grep already. )

You can figure out the current or the previous log filename using date, using your logfile name convention:
fname="file[`date -v-1H '+%Y%m%d_%H0000'`].log"
will give you the previous filename. Omit -v-1H to get the current one.
For the 15-minute intervals, you may use regexps like '^\d\d:(0\d|1[0-4])' for 00:00-14:59 interval, '^\d\d:(1[5-9]|2\d)' for 15:00-29:59, etc.
For example, in the first regex, ^ matches the beginning of the line, \d\d: matches "two digits and a colon", and (0\d|1[0-4]) matches either 0 with any adjacent digit, or 1 with adjacent digit from 0 to 4. In the second regex, (1[5-9]|2\d) matches 1 with digit from 5 to 9, or 2 with any digit.
Then you grep -Po '(?<=Event: ).+(?=,)' the type of events in your log, assuming that the type of event always ends with a ,. That regexp will greedily match any symbols, as many as it can, starting from one symbol, if they are between strings Event: and , (Event: and , themselves are not matched, that's what lookbehind/lookaheads are for). Then use sort | uniq -c to count number of different events entries .
So the resulting script would look something like
fname="file[`date -v-1H '+%Y%m%d_%H0000'`].log"
grep -P '^\d\d:(0\d|1[0-4])' "$fname" | grep -Po '(?<=Event: ).+(?=,)' | sort | uniq -c > "/tmp/$fname_00-14" # entries for first 15 minutes
grep -P '^\d\d:(1[5-9]|2\d)' "$fname" | grep -Po '(?<=Event: ).+(?=,)' | sort | uniq -c > "/tmp/$fname_15-29" # entries for second 15 minutes
grep -P '^\d\d:(3\d|4[0-4])' "$fname" | grep -Po '(?<=Event: ).+(?=,)' | sort | uniq -c > "/tmp/$fname_30-44" # entries for third 15 minutes
grep -P '^\d\d:(4[5-9]|5\d)' "$fname" | grep -Po '(?<=Event: ).+(?=,)' | sort | uniq -c > "/tmp/$fname_45-59" # entries for fourth 15 minutes
for the last hour log.
Another approach is to use logtail with cron entries to get last log entries, instead of grepping. You can set up your script to be run by cron at 00, 15, 30 and 45 minutes each hour, determine the log filename inside it and logtail the file like logtail -f "$fname". The catch here would be that when running at 00 minutes, you'd need to use the -v-1H switch for date, and this approach is not as accurate as grepping out the times.

Related

Getting specific PID from CentOS Journalctl

I'm writing a bash script that will print on the screen all the latest logs from a service that has already died (or still lives, both situations must work). I know its name and don't have to guess.
I'm having difficulty getting the latest PID for a process that has already died from journalctl. I'm not talking about this:
journalctl | grep "<processname>"
This will give me all the logs that include processname in their text.
I've also tried:
journalctl | pgrep -f "<processname>"
This command gave me a list of numbers which supposedly should include the pid of my process. It was not there.
These ideas came from searching for previous questions. I haven't found a question that answers specifically what I asked.
How can I extract the latest PID from journalctl for a specific process?
I figured this out.
First, you must be printing your PID in your logs. It doesn't appear there automatically. Then, you can use grep -E and awk to grab exactly the expression you want from your log:
Var=$(journalctl --since "24 hours ago" | grep -E "\[([0-9]+)\]" | tail -n 1 | awk '{print $5}' | awk -F"[][{}]" '{print $2}'
This one-liner script takes the logs from the last 24 hours, grep with -E to use an expression, tail -n 1 to grab the last most updated line from those results and then, using awk to delimit the line and grab the exact expression you need from it.

Optimizing search in linux

I have a huge log file close to 3GB in size.
My task is to generate some reporting based on # of times something is being logged.
I need to find the number of time StringA , StringB , StringC is being called separately.
What I am doing right now is:
grep "StringA" server.log | wc -l
grep "StringB" server.log | wc -l
grep "StringC" server.log | wc -l
This is a long process and my script takes close to 10 minutes to complete. What I want to know is that whether this can be optimized or not ? Is is possible to run one grep command and find out the number of time StringA, StringB and StringC has been called individually ?
You can use grep -c instead of wc -l:
grep -c "StringA" server.log
grep can't report count of individual strings. You can use awk:
out=$(awk '/StringA/{a++;} /StringB/{b++;} /StringC/{c++;} END{print a, b, c}' server.log)
Then you can extract each count with a simple bash array:
arr=($out)
echo "StringA="${arr[0]}
echo "StringA="${arr[1]}
echo "StringA="${arr[2]}
This (grep without wc) is certainly going to be faster and possibly awk solution is also faster. But I haven't measured any.
Certainly this approach could be optimized since grep doesn't perform any text indexing. I would use a text indexing engine like one of those from this review or this stackexchange QA . Also you may consider using journald from systemd which stores logs in a structured and indexed format so lookups are more effective.
So many greps so little time... :-)
According to David Lyness, a straight grep search is about 7 times as fast as an awk in large file searches.
If that is the case, the current approach could be optimized by changing grep to fgrep, but only if the patterns being searched for are not regular expressions. fgrep is optimized for fixed patterns.
If the number of instances is relatively small compared to the original log file entries, it may be an improvement to use the egrep version of grep to create a temporary file filled with all three instances:
egrep "StringA|StringB|StringC" server.log > tmp.log
grep "StringA" tmp.log | wc -c
grep "StringB" tmp.log | wc -c
grep "StringC" tmp.log | wc -c
The egrep variant of grep allows for a | (vertical bar/pipe) character to be used between two or more separate search strings so that you can find multiple strings in statement. You can use grep -E to do the same thing.
Full documentation is in the man grep page and information about the Extended Regular Expressions that egrep uses from the man 7 re_format command.

bash script - print X rows from a seleccted file from a folder

I'm trying to write a script which help to follows the logs of my application.
The logs of my application are written to "var/log/MyLogs/" with the following pattern:
runningNumber_XXX.txt , for example:
0_XXX.txt
37_xxx.txt
99_xxx.txt
101_xxx.txt
103_xxx.txt
I'm trying to write a bash script (without a success for now) which will print last 20 rows of the last log file (the last log file is the file with has the biggest prefix number).
I know I need to go over the files in the folder (for file in /var/log/MyLogs/*) and check which file name has the biggest prefix, and after it print the last 20 rows from the selected file.
please help me....
Thanks...
find /var/log/MyLogs -iname '*_xxx.txt' | sort -n | tail -1 | xargs tail -20
Get correct files
Sort numerically
Get last log file
Get last 20 rows
tail -20 $(ls -1 /var/log/MyLogs/*_*.txt | sort -n -t _ -k 1 -r | head -1)
ls -1 [0-9]*_XXX.txt | sort -rn | head -1 | xargs tail -20
Usually is the bad practice using ls in shell scripts, but if you can ensure than the logfiles doesn't contains spaces and other strange characters, you can use a simple:
tail -20 $(ls -t1 /var/log/[0-9]*_XXX.txt | head -1)
The:
ls -t sorts the files my modification time newest comes first
head the the 1st
tail print the last lines
AGAIN, this is usually a bad practice, you can use it only when you knows what you're doing.

How do I get an output from Linux Top in Batch Mode on every iteration?

I'm trying to log CPU and Memory stats into a file by using top on an Arch Linux. I'm just interested in one specific process and get the wanted parameters as shown below:
top -b -n1 -p 310 | tail -fn 1 | awk '{printf "%s,%s,%s,%s\n",$1,$12,$9,$10}'
This gives me an output to command line like:
310,name,0.0,10.5
So now, if I want to run this command like 10 times with a delay of 1s and write the output to a logfile I use:
top -b -n10 -p 310 -d 1 | tail -fn 1 | awk '{printf "%s,%s,%s,%s\n",$1,$12,$9,$10}' >> log.txt
But, instead printing me line by line to the logfile, I only get the last output. So my logfile contains only 1 line, although top must have been executed 10 times.
What am I doing wrong here?
PS: Printing to command line instead into a logfile produces only 1 line (the last output) as well...
The problem is because of tail command you use. Try something like this
top -p 310-b -n2 -d 1 | grep -w 310 | awk '{printf "%s,%s,%s,%s\n",$1,$12,$9,$10}'
I use grep -w to filter the lines only containing the info you are interested

How do I grep multiple lines (output from another command) at the same time?

I have a Linux driver running in the background that is able to return the current system data/stats. I view the data by running a console utility (let's call it dump-data) in a console. All data is dumped every time I run dump-data. The output of the utility is like below
Output:
- A=reading1
- B=reading2
- C=reading3
- D=reading4
- E=reading5
...
- variableX=readingX
...
The list of readings returned by the utility can be really long. Depending on the scenario, certain readings would be useful while everything else would be useless.
I need a way to grep only the useful readings whose names might have have nothing in common (via a bash script). I.e. Sometimes I'll need to collect A,D,E; and other times I'll need C,D,E.
I'm attempting to graph the readings over time to look for trends, so I can't run something like this:
# forgive my pseudocode
Loop
dump-data | grep A
dump-data | grep D
dump-data | grep E
End Loop
to collect A,D,E as that would actually give me readings from 3 separate calls of dump-data as that would not be accurate.
If you want to save all result of grep in the same file, you can just join all expressions in one:
grep -E 'expr1|expr2|expr3'
But if you want to have results (for expr1, expr2 and expr3) in separate files, things are getting more interesting.
You can do this using tee >(command).
For example, here I process the same pipe with thre different commands:
$ echo abc | tee >(sed s/a/_a_/ > file1) | tee >(sed s/b/_b_/ > file2) | sed s/c/_c_/ > file3
$ grep "" file[123]
file1:_a_bc
file2:a_b_c
file3:ab_c_
But the command seems to be too complex.
I would better save dump-data results to a file and then grep it.
TEMP=$(mktemp /tmp/dump-data-XXXXXXXX)
dump-data > ${TEMP}
grep A ${TEMP}
grep B ${TEMP}
grep C ${TEMP}
You can use dump-data | grep -E "A|D|E". Note the -E option of grep. Alternatively you could use egrep without the -E option.
you can simply use:
dump-data | grep -E 'A|D|E'
awk '/MY PATTERN/{print > "matches-"FILENAME;}' myfile{1,3}
thx Guru at Stack Exchange

Resources