Grepping archive logs with grep, awk, and a pipe - linux

I am trying to extract some counts from tomcat catalina.out logs & able to extract count from single catalina.out file. I used the command below
grep "WHAT: " /appl/cas/tomcat/logs/catalina.out | awk -F'for ' '{if($2=="") print $1; else print $2;}' | awk -F'WHAT: ' '{if($2=="") print $1; else print $2;}' | sort | uniq -c
which is giving expected results like
1 http://whiteyellowpages.com/whiteyellowpages/commonpage.do?locale=fr&advanced=no&search.name=oujdial
1 http://whiteyellowpages.com/whiteyellowpages/commonpage.do?locale=fr&advanced=no&search.name=OUKHOUIA
I want to extract some counts from archive logs of catalina.out, with a filename format of: 2015-03-24_03:50:50_catalina.out ... and so on.
I used the same command with *_catalina.out as below:
grep "WHAT: " /appl/cas/tomcat/logs/*_catalina.out | awk -F'for ' '{if($2=="") print $1; else print $2;}' | awk -F'WHAT: ' '{if($2=="") print $1; else print $2;}' | sort | uniq -c
which is giving correct results, but I want to add filename which are processing above command. The expected result is:
2015-03-24_03:50:50_catalina.out 1 http://whiteyellowpages.com/whiteyellowpages/commonpage.do?locale=fr&advanced=no&search.name=oujdial
2015-03-24_03:50:50_catalina.out 1 http://whiteyellowpages.com/whiteyellowpages/commonpage.do?locale=fr&advanced=no&search.name=OUKHOUIA
I tried grep with -H & -l options but no success. Can you help with this?
Sample logs
2015-03-23 03:43:52,987 INFO [http-apr-10.155.50.93-4443-exec-4][AuthenticationManagerImpl:96] org.jasig.cas.support.spnego.authentication.handler.support.JCIFSSpnegoAuthenticationHandler successfully authenticated SV003006$
2015-03-23 03:43:52,988 DEBUG [http-apr-10.155.50.93-4443-exec-4][SpnegoCredentialsToPrincipalResolver:54] Attempting to resolve a principal...
2015-03-23 03:43:52,989 DEBUG [http-apr-10.155.50.93-4443-exec-4][SpnegoCredentialsToPrincipalResolver:64] Creating SimplePrincipal for [SV003006$]
2015-03-23 03:43:52,989 DEBUG [http-apr-10.155.50.93-4443-exec-4][LdapPersonAttributeDao:103] Created seed map='{username=[SV003006$]}' for uid='SV003006$'
2015-03-23 03:43:52,990 DEBUG [http-apr-10.155.50.93-4443-exec-4][LdapPersonAttributeDao:301] Adding attribute 'sAMAccountName' with value '[SV003006$]' to query builder 'null'
2015-03-23 03:43:52,990 DEBUG [http-apr-10.155.50.93-4443-exec-4][LdapPersonAttributeDao:328] Generated query builder '(sAMAccountName=SV003006$)' from query Map {username=[SV003006$]}.
2015-03-23 03:43:52,992 INFO [http-apr-10.155.50.93-4443-exec-4][AuthenticationManagerImpl:119] Resolved principal SV003006$
2015-03-23 03:43:52,993 INFO [http-apr-10.155.50.93-4443-exec-4][AuthenticationManagerImpl:61] org.jasig.cas.support.spnego.authentication.handler.support.JCIFSSpnegoAuthenticationHandler#4a23d87f authenticated SV003006$ with credential SV003006$.
2015-03-23 03:43:52,993 DEBUG [http-apr-10.155.50.93-4443-exec-4][AuthenticationManagerImpl:62] Attribute map for SV003006$: {}
2015-03-23 03:43:52,994 INFO [http-apr-10.155.50.93-4443-exec-4][Slf4jLoggingAuditTrailManager:41] Audit trail record BEGIN
=============================================================
WHO: SV003006$
WHAT: supplied credentials: SV003006$
ACTION: AUTHENTICATION_SUCCESS
APPLICATION: CAS
WHEN: Mon Mar 23 03:43:52 CET 2015
CLIENT IP ADDRESS: 10.155.70.144
SERVER IP ADDRESS: 10.155.50.93
=============================================================

Related

Search for log having value greater than certain time

Below is the sample log:
2020-10-14 00:05:44,621 debug [org.jboss.as] ...............
2020-10-14 00:05:45,560 debug [org.jboss.as] ...............
2020-10-14 00:05:46,222 debug [org.jboss.as] ...............
2020-10-14 00:05:46,608 debug [org.jboss.as] ...JBoss .... server is started ............
Below is the desired output:
2020-10-14 00:05:46,608 debug [org.jboss.as] ...JBoss .... server is started ............
When I use awk if it is exact time it will display the output otherwise it not displaying the output. Below is the code:
Not displaying output:
awk /'2020-10-14 00:05:46,607'/ '/home/notyo/application.log' | grep -e 'JBoss.*started'
Displaying output:
awk /'2020-10-14 00:05:46,608'/ '/home/notyo/application.log' | grep -e 'JBoss.*started'
2020-10-14 00:05:46,608 debug [org.jboss.as] ...JBoss .... server is started ............
Besides that I got to find a solution. Below works but not as expected.
Displaying output as expected:
grep -e 'JBoss.*started' '/home/notyo/application.log' | awk '$0 >= "2020-10-14 00:05:46,608"'
2020-10-14 00:05:46,608 debug [org.jboss.as] ...JBoss .... server is started ............
Displaying output. Expecting that it should not display as using >
grep -e 'JBoss.*started' '/home/notyo/application.log' | awk '$0 > "2020-10-14 00:05:46,608"'
2020-10-14 00:05:46,608 debug [org.jboss.as] ...JBoss .... server is started ............
Can I know why such behaviour. Could you advice what is the correct approach?
If you compare the strings with $0 > "2020-10-14 00:05:46,608", it will compare the string 2020-10-14 00:05:46,608 debug [org.jboss.as] ...JBoss .... server is started ............ with a string 2020-10-14 00:05:46,608 which returns true.
In order to compare only date and time portion, please try instead:
grep -e 'JBoss.*started' '/home/notyo/application.log' | awk '($1" "$2) > "2020-10-14 00:05:46,608"'
If you want to include the exact time above, please replace > with >=, which also works as you expect.
Your command:
awk '$0 > "2020-10-14 00:05:46,608"'
is not working because you're comparing full record against date-time string, but you should be comparing ($1 " " $2) against date-time string like this:
awk '($1 " " $2) > "2020-10-14 00:05:46,607"' file.log
2020-10-14 00:05:46,608 debug [org.jboss.as] ...JBoss server is started ............
For proper time comparison, you may use this awk command with a call to mktime to convert date string into EPOCH time and then adding milli-sec value separately:
cat srchlog.awk
function convt(ts, ms) {
ms = ts
sub(/.*,/, "", ms)
gsub(/[-:,]/, " ", ts)
return mktime(ts) + ms/1000
}
BEGIN {
sval = convt(dt)
}
/JBoss.*started/ && convt($1 " " $2) > sval
Then use it as:
awk -v dt='2020-10-14 00:05:46,607' -f srchlog.awk file.log
2020-10-14 00:05:46,608 debug [org.jboss.as] ...JBoss server is started ............
And then:
awk -v dt='2020-10-14 00:05:46,608' -f srchlog.awk file.log
# no output

Change date format with awk

I have a log file, I'm trying to reformat using sed/awk/grep but running into difficulties with the date format. The log looks like this:
1.2.3.4 - - [28/Mar/2019:11:43:58 +0000] "GET /e9bb2dddd28b/5.6.7.8/YL0000000000.rom HTTP/1.1" "-" "Yealink W52P 25.81.0.10 00:00:00:00:00:00" 404 - 1 5 0.146
I would like the output as so:
Yealink,1.2.3.4,28-03-2019 11:43:58
I have tried the following:
grep Yealink access.log | grep 404 | sed 's/\[//g' | awk '{print "Yealink,",$1,",",strftime("%Y-%m-%d %H:%M:%S", $4)}' | sed 's/, /,/g' | sed 's/ ,/,/g'
edit - removing [ before passing date string to strftime based on comments - but still not working as expected
However this returns a null date - so clearly I have the strftime syntax wrong:
Yealink,1.2.3.4,1970-01-01 01:00:00
Update 2019-10-25: gawk is now getting strptime() in an extension library, see https://groups.google.com/forum/#!msg/comp.lang.awk/Ft6_h7NEIaE/tmyxd94hEAAJ
Original post:
See the gawk manual for strftime, it doesn't expect a time in any format except seconds since the epoch. If gawk had a strptime() THEN that would work, but it doesn't (and I can't persuade the maintainers to provide one) so you have to massage the timestamp into a format that mktime() can convert to seconds and then pass THAT to strftime(), e.g.:
$ awk '{
split($4,t,/[[\/:]/)
old = t[4] " " (index("JanFebMarAprMayJunJulAugSepOctNovDec",t[3])+2)/3 " " t[2] " " t[5] " " t[6] " " t[7];
secs = mktime(old)
new = strftime("%d-%m-%Y %T",secs);
print $4 ORS old ORS secs ORS new
}' file
[28/Mar/2019:11:43:58
2019 3 28 11 43 58
1553791438
28-03-2019 11:43:58
but of course you don't need mktime() or strftime() at all - just shuffle the date components around:
$ awk '{
split($4,t,/[[\/:]/)
new = sprintf("%02d-%02d-%04d %02d:%02d:%02d",t[2],(index("JanFebMarAprMayJunJulAugSepOctNovDec",t[3])+2)/3,t[4],t[5],t[6],t[7])
print $4 ORS new
}' file
[28/Mar/2019:11:43:58
28-03-2019 11:43:58
That will work in any awk, not just GNU awk, since it doesn't require time functions.
index("JanFebMarAprMayJunJulAugSepOctNovDec",t[3])+2)/3 is just the idiomatic way to convert a 3-char month name abbreviation (e.g. Mar) into the equivalent month number (3).
Another awk, thanks #EdMorton for reviewing the getline usage.
The idea here is to use date command in awk which accepts abbreviated Months
$ date -d"28/Mar/2019:11:43:58 +0000" "+%F %T" # Fails
date: invalid date ‘28/Mar/2019:11:43:58 +0000’
$ date -d"28 Mar 2019:11:43:58 +0000" "+%F %T" # Again fails because of : before time section
date: invalid date ‘28 Mar 2019:11:43:58 +0000’
$ date -d"28 Mar 2019 11:43:58 +0000" "+%F %T" # date command works but incorrect time because of + in the zone
2019-03-28 17:13:58
$ date -d"28 Mar 2019 11:43:58" "+%F %T" # correct value after stripping +0000
2019-03-28 11:43:58
$
Results
awk -F"[][]" -v OFS=, '/Yealink/ {
split($1,a," "); #Format $1 to get IP
gsub("/", " ",$2); sub(":"," ",$2); sub("\\+[0-9]+","",$2); # Massage to get data value
cmd = "date -d\047" $2 "\047 \047+%F %T\047"; if ( (cmd | getline line) > 0 ) $2=line; close(cmd) # use system date
print "Yealink",a[1],$2
} ' access.log
Below is the file content
$ cat access.log
1.2.3.4 - - [28/Mar/2019:11:43:58 +0000] "GET /e9bb2dddd28b/5.6.7.8/YL0000000000.rom HTTP/1.1" "-" "Yealink W52P 25.81.0.10 00:00:00:00:00:00" 404 - 1 5 0.146
$

Search and Print a specific digit from a logfile

I have a log.text file with this structure:
user session login_time application database db_connect_time request request_time connection_source connection_ip request_state
+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------
admin 0 9 0 none 0 Not Requested* a00:bf32::
admin 989855740 1335 DRRDEVMH DRRPRODB 1201 none 0 Not Requested a00:8a45::
admin 1768947706 932 test test 916 none 0 Not Requested a00:94b6::
WARNING - 1241024 - Possible string truncation in column 1.
WARNING - 1241028 - Output column defined with warnings.
WARNING - 1241024 - Possible string truncation in column 9.
WARNING - 1241028 - Output column defined with warnings.
WARNING - 1241024 - Possible string truncation in column 10.
WARNING - 1241028 - Output column defined with warnings.
OK/INFO - 1241044 - Records returned: [3].
As we can see in the last line of log.txt there is a string Records returned: [3]. That digit 3 is my target, extracting that digit (as 3 in this case) I want to print following line in a separate file.
The total records returned = 3
I am using:
sed -n 's#^.*Records returned.*[\(.*\)$#\1#p' log.txt > out.txt
but its not giving the result. What mistake I am making here, please ?
you need to escape [, try this one
sed -n 's#^.*Records returned.*\[\(.*\)\].*$#\1#p' log.txt > out.txt
Edit
If you want to print out the string like this
The total records returned = 3
just prepend The total records returned = before \1, so the script will be
sed -n 's#^.*Records returned.*\[\(.*\)\].*$#The total records returned = \1#p' log.txt > out.txt
Using awk
awk -F "[][]" '$0~t {print "The total",t,"=",$2}' t="Records returned" log.txt > out.txt
cat out.txt
The total Records returned = 3
sed -n '$ s/.*\([[:digit:]]\{1,\}\)].$/The total records returned = \1/p'
Assuming, as your sample and explaination state, that info is on the last line with this format.
Suppose Your Data in a Test.txt file then You can simply use below Command
echo "Total Records Count = `cat Test.txt | tail -n 1 | cut -d '[' -f2 | cut -d ']' -f1` "
Total Records Count = 3

Script to get the browser version for user

I've written a script to get the browser version of users but I need to clean up the output. What the script does is looks at the apache logs for # and IE8 then emails me the information. The problem I have is the output as when the grep finds a email address and IE8 it gives me the full output - i.e. /page/code/user#foobar.com/home.php whereas the output i'm looking is the just the email address and to only have this information recorded once a day:
Example:
user#foobar IE8
Thanks
#!/bin/bash
#Setting date and time (x and y and z aren't being used at the moment)
x="$(date +'%d/%b/%Y')"
y="$(date +'%T')"
z="$(date +'%T' | awk 'BEGIN { FS =":"} ; {print $1}')"
#Human readable for email title
emaildate=$(date +"%d%b%Y--Hour--%H")
#Setting date and time for grep and filename
beta="$(date +'%d/%b/%Y:%H')"
sigma="$(date +'%d-%b-%Y-%H')"
#CurrentAccess logs
log='/var/logs/access.log'
#Set saved log location
newlogs=/home/user/Scripts/browser/logs
#Prefrom the grep for the current day
grep # $log | grep $beta | awk 'BEGIN { FS = " " } ; { print $7 }' | sort -u >> $newlogs/broswerusage"$sigma".txt
mail -s "IE8 usage for $emaildate" user#exmaple.com < $newlogs/broswernusage"$sigma".txt

Selective string operation

I need help in string processing in CSH/TCSH script.
I know basic processing, but need help understand how we can handle advance string operation requirements.
I have a log file whose format is something like this:
[START_A]
log info of A
[END_A]
[START_B]
log info of B
[END_B]
[START_C]
log info of C
[END_C]
My requirement is to selectively extract content between start and end tag and store them in a file.
For example the content between START_A and END_A will be stored in A.log
this should work for you:
awk -F'_' '/\[START_.\]/{s=1;gsub(/]/,"",$2);f=$2".txt";next;}/\[END_.\]/{s=0} s{print $0 > f}' yourLog
test:
kent$ cat test
[START_A]
log info of A
[END_A]
[START_B]
log info of B
[END_B]
[START_C]
log info of C
[END_C]
kent$ awk -F'_' '/\[START_.\]/{s=1;gsub(/]/,"",$2);f=$2".txt";next;}/\[END_.\]/{s=0} s{print $0 > f}' test
kent$ head *.txt
==> A.txt <==
log info of A
==> B.txt <==
log info of B
==> C.txt <==
log info of C
Awk can do them 1 at a time:
cat log | awk '/[START_A]/,/[END_A]/'
This might work for you:
sed '/.*\[START_\([^]]*\)\].*/s||/\\[START_\1\\]/,/\\[END_\1\\]/w \1.log|p;d' file |
sed -nf - file

Resources