Search for log having value greater than certain time - linux

Below is the sample log:
2020-10-14 00:05:44,621 debug [org.jboss.as] ...............
2020-10-14 00:05:45,560 debug [org.jboss.as] ...............
2020-10-14 00:05:46,222 debug [org.jboss.as] ...............
2020-10-14 00:05:46,608 debug [org.jboss.as] ...JBoss .... server is started ............
Below is the desired output:
2020-10-14 00:05:46,608 debug [org.jboss.as] ...JBoss .... server is started ............
When I use awk if it is exact time it will display the output otherwise it not displaying the output. Below is the code:
Not displaying output:
awk /'2020-10-14 00:05:46,607'/ '/home/notyo/application.log' | grep -e 'JBoss.*started'
Displaying output:
awk /'2020-10-14 00:05:46,608'/ '/home/notyo/application.log' | grep -e 'JBoss.*started'
2020-10-14 00:05:46,608 debug [org.jboss.as] ...JBoss .... server is started ............
Besides that I got to find a solution. Below works but not as expected.
Displaying output as expected:
grep -e 'JBoss.*started' '/home/notyo/application.log' | awk '$0 >= "2020-10-14 00:05:46,608"'
2020-10-14 00:05:46,608 debug [org.jboss.as] ...JBoss .... server is started ............
Displaying output. Expecting that it should not display as using >
grep -e 'JBoss.*started' '/home/notyo/application.log' | awk '$0 > "2020-10-14 00:05:46,608"'
2020-10-14 00:05:46,608 debug [org.jboss.as] ...JBoss .... server is started ............
Can I know why such behaviour. Could you advice what is the correct approach?

If you compare the strings with $0 > "2020-10-14 00:05:46,608", it will compare the string 2020-10-14 00:05:46,608 debug [org.jboss.as] ...JBoss .... server is started ............ with a string 2020-10-14 00:05:46,608 which returns true.
In order to compare only date and time portion, please try instead:
grep -e 'JBoss.*started' '/home/notyo/application.log' | awk '($1" "$2) > "2020-10-14 00:05:46,608"'
If you want to include the exact time above, please replace > with >=, which also works as you expect.

Your command:
awk '$0 > "2020-10-14 00:05:46,608"'
is not working because you're comparing full record against date-time string, but you should be comparing ($1 " " $2) against date-time string like this:
awk '($1 " " $2) > "2020-10-14 00:05:46,607"' file.log
2020-10-14 00:05:46,608 debug [org.jboss.as] ...JBoss server is started ............
For proper time comparison, you may use this awk command with a call to mktime to convert date string into EPOCH time and then adding milli-sec value separately:
cat srchlog.awk
function convt(ts, ms) {
ms = ts
sub(/.*,/, "", ms)
gsub(/[-:,]/, " ", ts)
return mktime(ts) + ms/1000
}
BEGIN {
sval = convt(dt)
}
/JBoss.*started/ && convt($1 " " $2) > sval
Then use it as:
awk -v dt='2020-10-14 00:05:46,607' -f srchlog.awk file.log
2020-10-14 00:05:46,608 debug [org.jboss.as] ...JBoss server is started ............
And then:
awk -v dt='2020-10-14 00:05:46,608' -f srchlog.awk file.log
# no output

Related

if condition in awk command is not working as expected in BEGIN block. correct me if am wrong

I have a command like below
kubectl get pods |
grep -v 1/1 |
grep -v 2/2 |
awk '{print $1:$2:$3}' |
awk 'BEGIN{ print "<style>table,th,td {border:1px solid black;}
Sample output from kubectl:
NAMESPACE NAME READY STATUS RESTARTS
ABC ABC-jkij 1/1 RUNNING 897
BAC BAC-jkij 2/2 RUNNING 897
HJI HJI-jkij 2/2 RUNNING 897
kubectl get pods | grep -v 1/1 | grep -v 2/2 | awk '{print $1:$2:$3}'
The above command will result only the headers like below as i kept -v:
NAMESPACE NAME READY STATUS RESTARTS
so in Awk we have no of records variable right, I want to put a condition in the Awk block which should print only if there are any 0/1 or 0/2 pod results, not just the headers.
So to summmarize
NAMESPACE NAME READY STATUS RESTARTS
--> awk should not print anything
NAMESPACE NAME READY STATUS RESTARTS
ABC ABC-jkij 0/1 RUNNING 897
ABC ABC-jkij 0/2 RUNNING 897
Awk should print only in above scenarios.
Above command will give the headers like NAME NAMESPACE STATUS as output if there are no pods in with 0/1 0/2 etc., status. Now I want to include an if condition saying if (NR >1 ) then only it should print else it should not print anything ie., headers. If I am trying to put if (NR>1) in begin block it's still printing the headers.
You should generally avoid piping Awk into Awk; it's a scripting language, so you can put as many statements as you want.
kubectl get pods |
awk 'NR>1 && !/1\/1|2\/2/ {
if (!headers++) print "<style>table,th,td {border:1px solid black;}</style><table>"
print "<tr>"
for (i=1; i<=3; ++i) printf "<td>%s</td>\n", $i
print "</tr>"
}
END { if (headers) print "</table>" }'
I had to guess a bit what your expected output would be but if there are glitches, I expect it should be easy to see what to fix.

Different script behavior between Ubuntu 20.04.2 and 21.04 terminated

Can run the below command in shell script with no problem on Ubuntu 21.04 :
grep -h "new tip" logs/node.log | tail -1000 | sed -rn "s/\[(.*)\].\[(.*)\].* ([0-9]+)/\2,\3/p" | awk -F, -v threshold=2 '{"date +%s.%3N -d\""$1"\""|getline actualTime; delay=actualTime-$2-1591566291; sum+=delay; if (delay > threshold ) print $1 " " delay;} END {print "AVG=", sum/NR}'
but when I run the exact same script on Ubuntu 20.04.2, I get this error :
/bin/sh: 1: Syntax error: Unterminated quoted string
It's definitely the exact same script because I scp'd it from the 21.04 to 20.04.2. Couldn't find any topics in stackoverflow or on the overall internet which addressed this difference. Both Ubuntu's are on Linux cloud servers. About the only way to run the script with no error is taking out this awk line: "date +%s.%3N -d\""$1"\""|getline actualTime;
I tried playing around with the reference to the $1 field but nothing would work. Tried it with nawk instead of awk, but no luck. Maybe as a last resort I can upgrade the OS from v 20 to v 21.
Has anyone seen this before?
Added: Thanks all for the quick replies. Here are the first lines of the log file that the script is running against
[Relaynod:cardano.node.ChainDB:Notice:60] [2021-06-30 02:20:14.36 UTC] Chain extended, new tip: de56b9f458e8942ca74c6a1913dc58fa896823dc19b366285e15481f434ed337 at slot 33453323
[Relaynod:cardano.node.ChainDB:Notice:60] [2021-06-30 02:20:15.17 UTC] Chain extended, new tip: e88ea4f438944bd15186fe93f321c117ec769cfbd33667654634f4510cfd3780 at slot 33453324
Just to make sure it's not a data issue. I ran the script on the Ubuntu 21 server against the file (worked), then copied the file to the Ubuntu 20 server and ran the exact same script against the copied file, and get the error.
I'll try out the suggestions on this topic and will let everyone know the answer.
New update: after laptop crash and replacement, remembered to come back to this post. I ended up using mktime like Ed mentioned. It's working now.
The shell script:
#!/bin/bash
grep -h "new tip" logs/node.log | tail -1000 | sed -rn "s/[(.)].[(.)].* ([0-9]+)/\2,\3/p" | mawk -F, -v threshold=2 -f check_delay.awk
The awk script:
BEGIN{ ENVIRON["TZ"] = "UTC"; }
{
year = substr($1,1,4);
month = substr($1,6,2);
day = substr($1,9,2);
hour = substr($1,12,2);
min = substr($1,15,2);
sec = substr($1,18,2);
timestamp = year" "month" "day" "hour" "min" "sec;
actualTime=mktime(timestamp) + 7200;
delay=actualTime-$2-1591566291;
sum+=delay;
if (delay >= threshold )
print $1 " " delay;}
END {print "AVG=", sum/NR}
You're spawning a shell to call date using whatever value happens to be in $1 in your data so the result of that will depend on your data. Look:
$ echo '3/27/2021' | awk '{"date +%s.%3N -d\""$1"\"" | getline; print}'
1616821200.000
$ echo 'a"b' | awk '{"date +%s.%3N -d\""$1"\"" | getline; print}'
sh: -c: line 0: unexpected EOF while looking for matching `"'
sh: -c: line 1: syntax error: unexpected end of file
a"b
and what this command outputs from a log file:
sed -rn "s/\[(.*)\].\[(.*)\].* ([0-9]+)/\2,\3/p"
will vary greatly depending on the contents of specific lines in the log file since the parts you're trying to isolate aren't anchored and use .*s when you presumably meant to use [^]]*s. For example:
$ echo '[foo] [3/27/2021] 15 something [probably] happened at line 50'
[foo] [3/27/2021] 15 something [probably] happened at line 50
$ echo '[foo] [3/27/2021] 15 something [probably] happened at line 50' | sed -rn "s/\[(.*)\].\[(.*)\].* ([0-9]+)/\2,\3/p"
3/27/2021] 15 something [probably,50
$ echo '[foo] [3/27/2021] 15 something [probably] happened at line 50' | sed -rn "s/\[(.*)\].\[(.*)\].* ([0-9]+)/\2,\3/p" | awk -F, -v threshold=2 '{"date +%s.%3N -d\""$1"\""|getline actualTime; delay=actualTime-$2-1591566291; sum+=delay; if (delay > threshold ) print $1 " " delay;} END {print "AVG=", sum/NR}'
date: invalid date ‘3/27/2021] 15 something [probably’
AVG= -1591566341
If you want to do that then you could introduce a check for a valid date to avoid THAT specific error, e.g. (but obviously create a better date verification regexp than this):
$ echo 'a"b' | awk '$1 ~ "[0-9]/[0-9]+/[0-9]" {"date +%s.%3N -d\""$1"\"" | getline; print}'
$
but it's still fragile and extremely slow.
You're using GNU sed for -r so you have or can get GNU awk and that has builtin time functions so you shouldn't be spawning a subshell to call date in the first place, you should just be using mktime(), see https://stackoverflow.com/a/68180908/1745001, which will avoid cryptic errors like that and run orders of magnitude faster.

Change date format with awk

I have a log file, I'm trying to reformat using sed/awk/grep but running into difficulties with the date format. The log looks like this:
1.2.3.4 - - [28/Mar/2019:11:43:58 +0000] "GET /e9bb2dddd28b/5.6.7.8/YL0000000000.rom HTTP/1.1" "-" "Yealink W52P 25.81.0.10 00:00:00:00:00:00" 404 - 1 5 0.146
I would like the output as so:
Yealink,1.2.3.4,28-03-2019 11:43:58
I have tried the following:
grep Yealink access.log | grep 404 | sed 's/\[//g' | awk '{print "Yealink,",$1,",",strftime("%Y-%m-%d %H:%M:%S", $4)}' | sed 's/, /,/g' | sed 's/ ,/,/g'
edit - removing [ before passing date string to strftime based on comments - but still not working as expected
However this returns a null date - so clearly I have the strftime syntax wrong:
Yealink,1.2.3.4,1970-01-01 01:00:00
Update 2019-10-25: gawk is now getting strptime() in an extension library, see https://groups.google.com/forum/#!msg/comp.lang.awk/Ft6_h7NEIaE/tmyxd94hEAAJ
Original post:
See the gawk manual for strftime, it doesn't expect a time in any format except seconds since the epoch. If gawk had a strptime() THEN that would work, but it doesn't (and I can't persuade the maintainers to provide one) so you have to massage the timestamp into a format that mktime() can convert to seconds and then pass THAT to strftime(), e.g.:
$ awk '{
split($4,t,/[[\/:]/)
old = t[4] " " (index("JanFebMarAprMayJunJulAugSepOctNovDec",t[3])+2)/3 " " t[2] " " t[5] " " t[6] " " t[7];
secs = mktime(old)
new = strftime("%d-%m-%Y %T",secs);
print $4 ORS old ORS secs ORS new
}' file
[28/Mar/2019:11:43:58
2019 3 28 11 43 58
1553791438
28-03-2019 11:43:58
but of course you don't need mktime() or strftime() at all - just shuffle the date components around:
$ awk '{
split($4,t,/[[\/:]/)
new = sprintf("%02d-%02d-%04d %02d:%02d:%02d",t[2],(index("JanFebMarAprMayJunJulAugSepOctNovDec",t[3])+2)/3,t[4],t[5],t[6],t[7])
print $4 ORS new
}' file
[28/Mar/2019:11:43:58
28-03-2019 11:43:58
That will work in any awk, not just GNU awk, since it doesn't require time functions.
index("JanFebMarAprMayJunJulAugSepOctNovDec",t[3])+2)/3 is just the idiomatic way to convert a 3-char month name abbreviation (e.g. Mar) into the equivalent month number (3).
Another awk, thanks #EdMorton for reviewing the getline usage.
The idea here is to use date command in awk which accepts abbreviated Months
$ date -d"28/Mar/2019:11:43:58 +0000" "+%F %T" # Fails
date: invalid date ‘28/Mar/2019:11:43:58 +0000’
$ date -d"28 Mar 2019:11:43:58 +0000" "+%F %T" # Again fails because of : before time section
date: invalid date ‘28 Mar 2019:11:43:58 +0000’
$ date -d"28 Mar 2019 11:43:58 +0000" "+%F %T" # date command works but incorrect time because of + in the zone
2019-03-28 17:13:58
$ date -d"28 Mar 2019 11:43:58" "+%F %T" # correct value after stripping +0000
2019-03-28 11:43:58
$
Results
awk -F"[][]" -v OFS=, '/Yealink/ {
split($1,a," "); #Format $1 to get IP
gsub("/", " ",$2); sub(":"," ",$2); sub("\\+[0-9]+","",$2); # Massage to get data value
cmd = "date -d\047" $2 "\047 \047+%F %T\047"; if ( (cmd | getline line) > 0 ) $2=line; close(cmd) # use system date
print "Yealink",a[1],$2
} ' access.log
Below is the file content
$ cat access.log
1.2.3.4 - - [28/Mar/2019:11:43:58 +0000] "GET /e9bb2dddd28b/5.6.7.8/YL0000000000.rom HTTP/1.1" "-" "Yealink W52P 25.81.0.10 00:00:00:00:00:00" 404 - 1 5 0.146
$

Grepping archive logs with grep, awk, and a pipe

I am trying to extract some counts from tomcat catalina.out logs & able to extract count from single catalina.out file. I used the command below
grep "WHAT: " /appl/cas/tomcat/logs/catalina.out | awk -F'for ' '{if($2=="") print $1; else print $2;}' | awk -F'WHAT: ' '{if($2=="") print $1; else print $2;}' | sort | uniq -c
which is giving expected results like
1 http://whiteyellowpages.com/whiteyellowpages/commonpage.do?locale=fr&advanced=no&search.name=oujdial
1 http://whiteyellowpages.com/whiteyellowpages/commonpage.do?locale=fr&advanced=no&search.name=OUKHOUIA
I want to extract some counts from archive logs of catalina.out, with a filename format of: 2015-03-24_03:50:50_catalina.out ... and so on.
I used the same command with *_catalina.out as below:
grep "WHAT: " /appl/cas/tomcat/logs/*_catalina.out | awk -F'for ' '{if($2=="") print $1; else print $2;}' | awk -F'WHAT: ' '{if($2=="") print $1; else print $2;}' | sort | uniq -c
which is giving correct results, but I want to add filename which are processing above command. The expected result is:
2015-03-24_03:50:50_catalina.out 1 http://whiteyellowpages.com/whiteyellowpages/commonpage.do?locale=fr&advanced=no&search.name=oujdial
2015-03-24_03:50:50_catalina.out 1 http://whiteyellowpages.com/whiteyellowpages/commonpage.do?locale=fr&advanced=no&search.name=OUKHOUIA
I tried grep with -H & -l options but no success. Can you help with this?
Sample logs
2015-03-23 03:43:52,987 INFO [http-apr-10.155.50.93-4443-exec-4][AuthenticationManagerImpl:96] org.jasig.cas.support.spnego.authentication.handler.support.JCIFSSpnegoAuthenticationHandler successfully authenticated SV003006$
2015-03-23 03:43:52,988 DEBUG [http-apr-10.155.50.93-4443-exec-4][SpnegoCredentialsToPrincipalResolver:54] Attempting to resolve a principal...
2015-03-23 03:43:52,989 DEBUG [http-apr-10.155.50.93-4443-exec-4][SpnegoCredentialsToPrincipalResolver:64] Creating SimplePrincipal for [SV003006$]
2015-03-23 03:43:52,989 DEBUG [http-apr-10.155.50.93-4443-exec-4][LdapPersonAttributeDao:103] Created seed map='{username=[SV003006$]}' for uid='SV003006$'
2015-03-23 03:43:52,990 DEBUG [http-apr-10.155.50.93-4443-exec-4][LdapPersonAttributeDao:301] Adding attribute 'sAMAccountName' with value '[SV003006$]' to query builder 'null'
2015-03-23 03:43:52,990 DEBUG [http-apr-10.155.50.93-4443-exec-4][LdapPersonAttributeDao:328] Generated query builder '(sAMAccountName=SV003006$)' from query Map {username=[SV003006$]}.
2015-03-23 03:43:52,992 INFO [http-apr-10.155.50.93-4443-exec-4][AuthenticationManagerImpl:119] Resolved principal SV003006$
2015-03-23 03:43:52,993 INFO [http-apr-10.155.50.93-4443-exec-4][AuthenticationManagerImpl:61] org.jasig.cas.support.spnego.authentication.handler.support.JCIFSSpnegoAuthenticationHandler#4a23d87f authenticated SV003006$ with credential SV003006$.
2015-03-23 03:43:52,993 DEBUG [http-apr-10.155.50.93-4443-exec-4][AuthenticationManagerImpl:62] Attribute map for SV003006$: {}
2015-03-23 03:43:52,994 INFO [http-apr-10.155.50.93-4443-exec-4][Slf4jLoggingAuditTrailManager:41] Audit trail record BEGIN
=============================================================
WHO: SV003006$
WHAT: supplied credentials: SV003006$
ACTION: AUTHENTICATION_SUCCESS
APPLICATION: CAS
WHEN: Mon Mar 23 03:43:52 CET 2015
CLIENT IP ADDRESS: 10.155.70.144
SERVER IP ADDRESS: 10.155.50.93
=============================================================

Selective string operation

I need help in string processing in CSH/TCSH script.
I know basic processing, but need help understand how we can handle advance string operation requirements.
I have a log file whose format is something like this:
[START_A]
log info of A
[END_A]
[START_B]
log info of B
[END_B]
[START_C]
log info of C
[END_C]
My requirement is to selectively extract content between start and end tag and store them in a file.
For example the content between START_A and END_A will be stored in A.log
this should work for you:
awk -F'_' '/\[START_.\]/{s=1;gsub(/]/,"",$2);f=$2".txt";next;}/\[END_.\]/{s=0} s{print $0 > f}' yourLog
test:
kent$ cat test
[START_A]
log info of A
[END_A]
[START_B]
log info of B
[END_B]
[START_C]
log info of C
[END_C]
kent$ awk -F'_' '/\[START_.\]/{s=1;gsub(/]/,"",$2);f=$2".txt";next;}/\[END_.\]/{s=0} s{print $0 > f}' test
kent$ head *.txt
==> A.txt <==
log info of A
==> B.txt <==
log info of B
==> C.txt <==
log info of C
Awk can do them 1 at a time:
cat log | awk '/[START_A]/,/[END_A]/'
This might work for you:
sed '/.*\[START_\([^]]*\)\].*/s||/\\[START_\1\\]/,/\\[END_\1\\]/w \1.log|p;d' file |
sed -nf - file

Resources