I need to parse modsec logs so only the date and ID of the triggered rule would display.
For example, I have such log:
[Fri Jan 29 19:12:14 test test] [:error] ModSecurity: Warning. detected XSS using libinjection. [file "/etc/apache2 r_configs/OWASP3/rules/REQUEST-941-APPLICATION-ATTACK-XSS.conf"] [line "37"] [id "941100"] [rev "2"] [msg "XSS Attack Detected via libinjection"] [data "Matched Data: x-forwarded-for found within ARGS:data[]: [vc_row full_width=\x22stretch_row\x22 initial_loading_animation=\x22fadeIn\x22 show_overlay=\x221\x22 pofo_enable_responsive_css=\x221\x22 pofo_hidden_markup_1507889268_2_40=\x22\x22 css=\x22.vc_custom_1608665830226{background-image: url(https://argeoslab.tech/wp-content/uploads/2020/12/geniemebanner.jpg?id=22321) !important;}\x22][vc_column width=\x221/4\x22 pofo_hidden_markup_1507901669_2_21=\x22\x22 pofo_hidden_markup_1507901601_2_49=\x22\x22 of..."] [severity "CRITICAL"] [ver "OWASP_CRS/3.0.0"] [maturity "1"] [accuracy "9"] [tag "application-multi"] [tag "language-multi"] [tag "platform-multi"] [tag "attack-xss"]
I would need to get the date at the beginning and then the ID. I can get the ID with this:
awk '{for (I=1;I<NF;I++) if ($I == "[id") print $(I+1)}'
I've tried to pipe it with second awk which would work this way, but the if statement would go for $i == "Fri" || $i == "Mon" and so on, but it did not work well.
However, I cannot figure out how I would get both the date ([Fri Jan 29 19:12:14) and the ID.
I initially get the output with grep of modsec in apache log, so there would be a much bigger output and I need to go through each line and not only the first occurrence
Any help is appreciated, thanks
Probably one of these...
$ grep -Eo '\[(Sun|Mon|Tue|Wed|Thu|Fri|Sat|id)[ a-zA-Z0-9:\"]+\]' filename
[Fri Jan 29 19:12:14 test test]
[id "941100"]
$ awk -F '(] )' '{ c=0; for(i=1;i<=NF;++i) if(match($(i),/^\[(Sun|Mon|Tue|Wed|Thu|Fri|Sat|id)/)) { ++c; print $(i)"]"; if(c==2) break }}' filename
[Fri Jan 29 19:12:14 test test]
[id "941100"]
$ awk -F '(] )' '{ for(i=1;i<=NF;++i) if(match($(i),/^\[(Sun|Mon|Tue|Wed|Thu|Fri|Sat|id)/)) { if(i==1) { s=$(i)"] " } else { s=$(i)"]"; i=NF } printf("%s",s) } print "" }' filename
[Fri Jan 29 19:12:14 test test] [id "941100"]
Related
I have a log file, I'm trying to reformat using sed/awk/grep but running into difficulties with the date format. The log looks like this:
1.2.3.4 - - [28/Mar/2019:11:43:58 +0000] "GET /e9bb2dddd28b/5.6.7.8/YL0000000000.rom HTTP/1.1" "-" "Yealink W52P 25.81.0.10 00:00:00:00:00:00" 404 - 1 5 0.146
I would like the output as so:
Yealink,1.2.3.4,28-03-2019 11:43:58
I have tried the following:
grep Yealink access.log | grep 404 | sed 's/\[//g' | awk '{print "Yealink,",$1,",",strftime("%Y-%m-%d %H:%M:%S", $4)}' | sed 's/, /,/g' | sed 's/ ,/,/g'
edit - removing [ before passing date string to strftime based on comments - but still not working as expected
However this returns a null date - so clearly I have the strftime syntax wrong:
Yealink,1.2.3.4,1970-01-01 01:00:00
Update 2019-10-25: gawk is now getting strptime() in an extension library, see https://groups.google.com/forum/#!msg/comp.lang.awk/Ft6_h7NEIaE/tmyxd94hEAAJ
Original post:
See the gawk manual for strftime, it doesn't expect a time in any format except seconds since the epoch. If gawk had a strptime() THEN that would work, but it doesn't (and I can't persuade the maintainers to provide one) so you have to massage the timestamp into a format that mktime() can convert to seconds and then pass THAT to strftime(), e.g.:
$ awk '{
split($4,t,/[[\/:]/)
old = t[4] " " (index("JanFebMarAprMayJunJulAugSepOctNovDec",t[3])+2)/3 " " t[2] " " t[5] " " t[6] " " t[7];
secs = mktime(old)
new = strftime("%d-%m-%Y %T",secs);
print $4 ORS old ORS secs ORS new
}' file
[28/Mar/2019:11:43:58
2019 3 28 11 43 58
1553791438
28-03-2019 11:43:58
but of course you don't need mktime() or strftime() at all - just shuffle the date components around:
$ awk '{
split($4,t,/[[\/:]/)
new = sprintf("%02d-%02d-%04d %02d:%02d:%02d",t[2],(index("JanFebMarAprMayJunJulAugSepOctNovDec",t[3])+2)/3,t[4],t[5],t[6],t[7])
print $4 ORS new
}' file
[28/Mar/2019:11:43:58
28-03-2019 11:43:58
That will work in any awk, not just GNU awk, since it doesn't require time functions.
index("JanFebMarAprMayJunJulAugSepOctNovDec",t[3])+2)/3 is just the idiomatic way to convert a 3-char month name abbreviation (e.g. Mar) into the equivalent month number (3).
Another awk, thanks #EdMorton for reviewing the getline usage.
The idea here is to use date command in awk which accepts abbreviated Months
$ date -d"28/Mar/2019:11:43:58 +0000" "+%F %T" # Fails
date: invalid date ‘28/Mar/2019:11:43:58 +0000’
$ date -d"28 Mar 2019:11:43:58 +0000" "+%F %T" # Again fails because of : before time section
date: invalid date ‘28 Mar 2019:11:43:58 +0000’
$ date -d"28 Mar 2019 11:43:58 +0000" "+%F %T" # date command works but incorrect time because of + in the zone
2019-03-28 17:13:58
$ date -d"28 Mar 2019 11:43:58" "+%F %T" # correct value after stripping +0000
2019-03-28 11:43:58
$
Results
awk -F"[][]" -v OFS=, '/Yealink/ {
split($1,a," "); #Format $1 to get IP
gsub("/", " ",$2); sub(":"," ",$2); sub("\\+[0-9]+","",$2); # Massage to get data value
cmd = "date -d\047" $2 "\047 \047+%F %T\047"; if ( (cmd | getline line) > 0 ) $2=line; close(cmd) # use system date
print "Yealink",a[1],$2
} ' access.log
Below is the file content
$ cat access.log
1.2.3.4 - - [28/Mar/2019:11:43:58 +0000] "GET /e9bb2dddd28b/5.6.7.8/YL0000000000.rom HTTP/1.1" "-" "Yealink W52P 25.81.0.10 00:00:00:00:00:00" 404 - 1 5 0.146
$
Hoping someone can help me with a bash linux script to generate report from http logs.
Logs format:
domain.com 101.100.144.34 - r.c.bob [14/Feb/2017:11:31:20 +1100] "POST /webmail/json HTTP/1.1" 200 1883 "https://example.domain.com/webmail/index-rui.jsp?v=1479958955287" "Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko" 1588 2566 "110.100.34.39" 9FC1CC8A6735D43EF75892667C08F9CE 84670 - - - -
Output require:
time in epoch,host,Resp Code,count
1485129842,101.100.144.34,200,4000
1485129842,101.101.144.34,404,1889
what I have so far but nothing near what I am trying to achieve:
tail -100 httpd_access_*.log | awk '{print $5 " " $2 " " $10}' | sort | uniq
awk 'BEGIN{
# print header
print "time in epoch,host,Resp Code,count"
# prepare month conversion array
split( "Jan Feb Mar Apr May Jun Jui Aug Sep Oct Nov Dec", tmp)
for (i in tmp) M[tmp[i]]=i
}
{
#prepare time conversion for mktime() using array and substitution
# from 14/Feb/2017:11:31:20 +1100
# to YYYY MM DD HH MM SS [DST]
split( $5, aT, /[:/[:blank:]]/)
t = $5; sub( /^.*:|:/, " ", t)
t = aT[3] " " M[aT[2]] " " aT[1] t
# count (not clear if it s this to count due to time changing
Count[ sprintf( "%s, %s, %s", mktime( t), $2, $10)]++
}
END{
# disply the result counted
for( e in Count) printf( "%s, %d\n", e, Count[e])
}
' httpd_access_*.log
count is to be more specificaly describe to be sure about the criteria to count
need GNU awk for mktime() function
assume time is always in this format
no secure nor filter (not the purpose of this)
Sure the pure AWK based solution above would be much faster, and more complete.
But can also be done in smaller steps:
First get date and convert it to EPOCH:
$ dt=$(awk '{print $5,$6}' file.log)
$ ep=$(date -d "$(sed -e 's,/,-,g' -e 's,:, ,' <<<"${dt:1:-1}")" +"%s")
$ echo "$ep"
1487032280
Since now you have the epoch date in the bash var $ep, you can continue with your initiall awk like this:
$ awk -v edt=$ep '{print edt","$2","$10}' file.log
1487032280,101.100.144.34,200
If you want a header , you can just print one before last awk with a simple echo.
I am trying to extract some counts from tomcat catalina.out logs & able to extract count from single catalina.out file. I used the command below
grep "WHAT: " /appl/cas/tomcat/logs/catalina.out | awk -F'for ' '{if($2=="") print $1; else print $2;}' | awk -F'WHAT: ' '{if($2=="") print $1; else print $2;}' | sort | uniq -c
which is giving expected results like
1 http://whiteyellowpages.com/whiteyellowpages/commonpage.do?locale=fr&advanced=no&search.name=oujdial
1 http://whiteyellowpages.com/whiteyellowpages/commonpage.do?locale=fr&advanced=no&search.name=OUKHOUIA
I want to extract some counts from archive logs of catalina.out, with a filename format of: 2015-03-24_03:50:50_catalina.out ... and so on.
I used the same command with *_catalina.out as below:
grep "WHAT: " /appl/cas/tomcat/logs/*_catalina.out | awk -F'for ' '{if($2=="") print $1; else print $2;}' | awk -F'WHAT: ' '{if($2=="") print $1; else print $2;}' | sort | uniq -c
which is giving correct results, but I want to add filename which are processing above command. The expected result is:
2015-03-24_03:50:50_catalina.out 1 http://whiteyellowpages.com/whiteyellowpages/commonpage.do?locale=fr&advanced=no&search.name=oujdial
2015-03-24_03:50:50_catalina.out 1 http://whiteyellowpages.com/whiteyellowpages/commonpage.do?locale=fr&advanced=no&search.name=OUKHOUIA
I tried grep with -H & -l options but no success. Can you help with this?
Sample logs
2015-03-23 03:43:52,987 INFO [http-apr-10.155.50.93-4443-exec-4][AuthenticationManagerImpl:96] org.jasig.cas.support.spnego.authentication.handler.support.JCIFSSpnegoAuthenticationHandler successfully authenticated SV003006$
2015-03-23 03:43:52,988 DEBUG [http-apr-10.155.50.93-4443-exec-4][SpnegoCredentialsToPrincipalResolver:54] Attempting to resolve a principal...
2015-03-23 03:43:52,989 DEBUG [http-apr-10.155.50.93-4443-exec-4][SpnegoCredentialsToPrincipalResolver:64] Creating SimplePrincipal for [SV003006$]
2015-03-23 03:43:52,989 DEBUG [http-apr-10.155.50.93-4443-exec-4][LdapPersonAttributeDao:103] Created seed map='{username=[SV003006$]}' for uid='SV003006$'
2015-03-23 03:43:52,990 DEBUG [http-apr-10.155.50.93-4443-exec-4][LdapPersonAttributeDao:301] Adding attribute 'sAMAccountName' with value '[SV003006$]' to query builder 'null'
2015-03-23 03:43:52,990 DEBUG [http-apr-10.155.50.93-4443-exec-4][LdapPersonAttributeDao:328] Generated query builder '(sAMAccountName=SV003006$)' from query Map {username=[SV003006$]}.
2015-03-23 03:43:52,992 INFO [http-apr-10.155.50.93-4443-exec-4][AuthenticationManagerImpl:119] Resolved principal SV003006$
2015-03-23 03:43:52,993 INFO [http-apr-10.155.50.93-4443-exec-4][AuthenticationManagerImpl:61] org.jasig.cas.support.spnego.authentication.handler.support.JCIFSSpnegoAuthenticationHandler#4a23d87f authenticated SV003006$ with credential SV003006$.
2015-03-23 03:43:52,993 DEBUG [http-apr-10.155.50.93-4443-exec-4][AuthenticationManagerImpl:62] Attribute map for SV003006$: {}
2015-03-23 03:43:52,994 INFO [http-apr-10.155.50.93-4443-exec-4][Slf4jLoggingAuditTrailManager:41] Audit trail record BEGIN
=============================================================
WHO: SV003006$
WHAT: supplied credentials: SV003006$
ACTION: AUTHENTICATION_SUCCESS
APPLICATION: CAS
WHEN: Mon Mar 23 03:43:52 CET 2015
CLIENT IP ADDRESS: 10.155.70.144
SERVER IP ADDRESS: 10.155.50.93
=============================================================
I want a script to extract logs from a file xyz.rawlog, then create a xyz directory full of files named:
Arp-14-00.rawlog
Apr-14-01.rawlog
Full example:
~/xyz/Apr-14-02.rawlog
One possible issue: Log lines may have the leading 0 in the day field, or it may be spaced out.
Example:
Apr 01 12:
Apr 1 12:
Sample Logs:
Apr 14 02:35:33 DC501.xx.org/10.1.7.145/1.13.136.2 MSWinEventLog,4,Security,3959142,Tue Apr 14 02:35:32 2015,4769,Microsoft-Windows-Security-Auditing,XX.ORG\PereyrR1#XX.ORG,N/A,Success Audit,DC501.xx.org,Kerberos Service Ticket Operations,,A Kerberos service ticket was requested. Account Information: Account Name: PereyrR1#XX.ORG Account Domain: XX.ORG Logon GUID: {2F6FCDED-FBA0-DBF5-88D2-0B048E612E21} Service Information: Service Name: AHCTXXML501$ Service ID: ... – Joshua C. 44 mins ago
Apr 14 04:32:16 1232-devr01/127.0.0.1/1.14.0.65 kernel: iptables:IN= OUT=upstream1 SRC=2.7.1.238 DST=207.188.35.17 EN=52 TOS=0x00 PREC=0x00 TTL=64 ID=2574 DF PROTO=TCP SPT=34030 DPT=61613 WINDOW=112 RES=0x00 ACK PSH FIN URGP=0 –
This is how I want the command syntax.
~/Logsplit.sh xyz
Working Script:
#/bin/bash
mkdir $1
awk -v fpath="$1" -F":" '{
filename = fpath "/" gensub("[ ]+", "-", "g", $1) ".rawlog";
print >> filename
}' $1.rawlog
exit;
If I did understand correctly your question, you want to split content by date, where the full line of content is to be inserted in that new file.
You can do something like:
`
mkdir $1
awk -v fpath="$1" -F":" '{
filename = fpath "/" gensub("[ ]+", "-", "g", $1) ".rawlog";
print >> filename
}' $1.rawlog
`
This should do:
awk -F: '{split($1,a," ");print $1 > a[1]"_"a[2]+0"_"a[3]+0".log"}' *.rawlog
I have a text file with the following format:
Wind River Linux glibc_cgl (cgl) 3.0.3
Build label: NDPGSN_5_0_SRC_GSN_LINUX_GPBOS_2
Build host: eselnvlx1114
Build date: Mon Mar 18 23:24:08 CET 2013
Installed: Fri Jun 20 02:22:08 EEST 2014
Last login: Fri Aug 8 11:37:08 2014 from 172
gsh list_imsins
=== sysadm#eqm01s14p2 ANCB ~ # gsh list_imsin
ps Class Identifiers |
---------------------------------------
A imsins -imsi 20201
A imsins -imsi 20205
A imsins -imsi 20210
A imsins -imsi 204045
I want to extract the numbers next to -imsi. The output would look like:
20201
20205
202210
204045
And after that process the output further, which I've already done. At first I was informed that the text format was static, so I wrote the following script:
for (( r=1; r<5; r++));
do
awk 'NR>12' IMSI$r.txt | awk '{print $NF "\r"}' > N$r
awk 'NR>12' IMSI$r.txt | awk '{print $NF "\r"}' >> out
done
I had 2 files as output because I needed to use both for other purposes.
Is there any way to make the script more flexible, to deal with dynamic text files?
As a possible solution, is it possible to make the script look for the phrase -imsi and grab the record after it? And continue doing so until it finds the end of file?
I tried doing that using grep and awk but I never got the right output. If you have any other ideas to do that please share.
I would go for something like:
$ awk '/-imsi/ {print $NF}' file
20201
20205
20210
204045
This prints the last word on those lines containing -imsi.
You can also use grep with a look-behind, to print the numbers after -imsi.
$ grep -Po '(?<=-imsi )[0-9]*' file
20201
20205
20210
204045