Using awk to generate report from apache http logs

Using awk to generate report from apache http logs - linux

Hoping someone can help me with a bash linux script to generate report from http logs.
Logs format:
domain.com 101.100.144.34 - r.c.bob [14/Feb/2017:11:31:20 +1100] "POST /webmail/json HTTP/1.1" 200 1883 "https://example.domain.com/webmail/index-rui.jsp?v=1479958955287" "Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko" 1588 2566 "110.100.34.39" 9FC1CC8A6735D43EF75892667C08F9CE 84670 - - - -
Output require:
time in epoch,host,Resp Code,count
1485129842,101.100.144.34,200,4000
1485129842,101.101.144.34,404,1889
what I have so far but nothing near what I am trying to achieve:
tail -100 httpd_access_*.log | awk '{print $5 " " $2 " " $10}' | sort | uniq

awk 'BEGIN{
# print header
print "time in epoch,host,Resp Code,count"
# prepare month conversion array
split( "Jan Feb Mar Apr May Jun Jui Aug Sep Oct Nov Dec", tmp)
for (i in tmp) M[tmp[i]]=i
}
{
#prepare time conversion for mktime() using array and substitution
# from 14/Feb/2017:11:31:20 +1100
# to YYYY MM DD HH MM SS [DST]
split( $5, aT, /[:/[:blank:]]/)
t = $5; sub( /^.*:|:/, " ", t)
t = aT[3] " " M[aT[2]] " " aT[1] t
# count (not clear if it s this to count due to time changing
Count[ sprintf( "%s, %s, %s", mktime( t), $2, $10)]++
}
END{
# disply the result counted
for( e in Count) printf( "%s, %d\n", e, Count[e])
}
' httpd_access_*.log
count is to be more specificaly describe to be sure about the criteria to count
need GNU awk for mktime() function
assume time is always in this format
no secure nor filter (not the purpose of this)

Sure the pure AWK based solution above would be much faster, and more complete.
But can also be done in smaller steps:
First get date and convert it to EPOCH:
$ dt=$(awk '{print $5,$6}' file.log)
$ ep=$(date -d "$(sed -e 's,/,-,g' -e 's,:, ,' <<<"${dt:1:-1}")" +"%s")
$ echo "$ep"
1487032280
Since now you have the epoch date in the bash var $ep, you can continue with your initiall awk like this:
$ awk -v edt=$ep '{print edt","$2","$10}' file.log
1487032280,101.100.144.34,200
If you want a header , you can just print one before last awk with a simple echo.

Related

Parsing Mod Security rules

I need to parse modsec logs so only the date and ID of the triggered rule would display.
For example, I have such log:
[Fri Jan 29 19:12:14 test test] [:error] ModSecurity: Warning. detected XSS using libinjection. [file "/etc/apache2 r_configs/OWASP3/rules/REQUEST-941-APPLICATION-ATTACK-XSS.conf"] [line "37"] [id "941100"] [rev "2"] [msg "XSS Attack Detected via libinjection"] [data "Matched Data: x-forwarded-for found within ARGS:data[]: [vc_row full_width=\x22stretch_row\x22 initial_loading_animation=\x22fadeIn\x22 show_overlay=\x221\x22 pofo_enable_responsive_css=\x221\x22 pofo_hidden_markup_1507889268_2_40=\x22\x22 css=\x22.vc_custom_1608665830226{background-image: url(https://argeoslab.tech/wp-content/uploads/2020/12/geniemebanner.jpg?id=22321) !important;}\x22][vc_column width=\x221/4\x22 pofo_hidden_markup_1507901669_2_21=\x22\x22 pofo_hidden_markup_1507901601_2_49=\x22\x22 of..."] [severity "CRITICAL"] [ver "OWASP_CRS/3.0.0"] [maturity "1"] [accuracy "9"] [tag "application-multi"] [tag "language-multi"] [tag "platform-multi"] [tag "attack-xss"]
I would need to get the date at the beginning and then the ID. I can get the ID with this:
awk '{for (I=1;I<NF;I++) if ($I == "[id") print $(I+1)}'
I've tried to pipe it with second awk which would work this way, but the if statement would go for $i == "Fri" || $i == "Mon" and so on, but it did not work well.
However, I cannot figure out how I would get both the date ([Fri Jan 29 19:12:14) and the ID.
I initially get the output with grep of modsec in apache log, so there would be a much bigger output and I need to go through each line and not only the first occurrence
Any help is appreciated, thanks

Probably one of these...
$ grep -Eo '\[(Sun|Mon|Tue|Wed|Thu|Fri|Sat|id)[ a-zA-Z0-9:\"]+\]' filename
[Fri Jan 29 19:12:14 test test]
[id "941100"]
$ awk -F '(] )' '{ c=0; for(i=1;i<=NF;++i) if(match($(i),/^\[(Sun|Mon|Tue|Wed|Thu|Fri|Sat|id)/)) { ++c; print $(i)"]"; if(c==2) break }}' filename
[Fri Jan 29 19:12:14 test test]
[id "941100"]
$ awk -F '(] )' '{ for(i=1;i<=NF;++i) if(match($(i),/^\[(Sun|Mon|Tue|Wed|Thu|Fri|Sat|id)/)) { if(i==1) { s=$(i)"] " } else { s=$(i)"]"; i=NF } printf("%s",s) } print "" }' filename
[Fri Jan 29 19:12:14 test test] [id "941100"]

Change date format with awk

I have a log file, I'm trying to reformat using sed/awk/grep but running into difficulties with the date format. The log looks like this:
1.2.3.4 - - [28/Mar/2019:11:43:58 +0000] "GET /e9bb2dddd28b/5.6.7.8/YL0000000000.rom HTTP/1.1" "-" "Yealink W52P 25.81.0.10 00:00:00:00:00:00" 404 - 1 5 0.146
I would like the output as so:
Yealink,1.2.3.4,28-03-2019 11:43:58
I have tried the following:
grep Yealink access.log | grep 404 | sed 's/\[//g' | awk '{print "Yealink,",$1,",",strftime("%Y-%m-%d %H:%M:%S", $4)}' | sed 's/, /,/g' | sed 's/ ,/,/g'
edit - removing [ before passing date string to strftime based on comments - but still not working as expected
However this returns a null date - so clearly I have the strftime syntax wrong:
Yealink,1.2.3.4,1970-01-01 01:00:00

Update 2019-10-25: gawk is now getting strptime() in an extension library, see https://groups.google.com/forum/#!msg/comp.lang.awk/Ft6_h7NEIaE/tmyxd94hEAAJ
Original post:
See the gawk manual for strftime, it doesn't expect a time in any format except seconds since the epoch. If gawk had a strptime() THEN that would work, but it doesn't (and I can't persuade the maintainers to provide one) so you have to massage the timestamp into a format that mktime() can convert to seconds and then pass THAT to strftime(), e.g.:
$ awk '{
split($4,t,/[[\/:]/)
old = t[4] " " (index("JanFebMarAprMayJunJulAugSepOctNovDec",t[3])+2)/3 " " t[2] " " t[5] " " t[6] " " t[7];
secs = mktime(old)
new = strftime("%d-%m-%Y %T",secs);
print $4 ORS old ORS secs ORS new
}' file
[28/Mar/2019:11:43:58
2019 3 28 11 43 58
1553791438
28-03-2019 11:43:58
but of course you don't need mktime() or strftime() at all - just shuffle the date components around:
$ awk '{
split($4,t,/[[\/:]/)
new = sprintf("%02d-%02d-%04d %02d:%02d:%02d",t[2],(index("JanFebMarAprMayJunJulAugSepOctNovDec",t[3])+2)/3,t[4],t[5],t[6],t[7])
print $4 ORS new
}' file
[28/Mar/2019:11:43:58
28-03-2019 11:43:58
That will work in any awk, not just GNU awk, since it doesn't require time functions.
index("JanFebMarAprMayJunJulAugSepOctNovDec",t[3])+2)/3 is just the idiomatic way to convert a 3-char month name abbreviation (e.g. Mar) into the equivalent month number (3).

Another awk, thanks #EdMorton for reviewing the getline usage.
The idea here is to use date command in awk which accepts abbreviated Months
$ date -d"28/Mar/2019:11:43:58 +0000" "+%F %T" # Fails
date: invalid date ‘28/Mar/2019:11:43:58 +0000’
$ date -d"28 Mar 2019:11:43:58 +0000" "+%F %T" # Again fails because of : before time section
date: invalid date ‘28 Mar 2019:11:43:58 +0000’
$ date -d"28 Mar 2019 11:43:58 +0000" "+%F %T" # date command works but incorrect time because of + in the zone
2019-03-28 17:13:58
$ date -d"28 Mar 2019 11:43:58" "+%F %T" # correct value after stripping +0000
2019-03-28 11:43:58
$
Results
awk -F"[][]" -v OFS=, '/Yealink/ {
split($1,a," "); #Format $1 to get IP
gsub("/", " ",$2); sub(":"," ",$2); sub("\\+[0-9]+","",$2); # Massage to get data value
cmd = "date -d\047" $2 "\047 \047+%F %T\047"; if ( (cmd | getline line) > 0 ) $2=line; close(cmd) # use system date
print "Yealink",a[1],$2
} ' access.log
Below is the file content
$ cat access.log
1.2.3.4 - - [28/Mar/2019:11:43:58 +0000] "GET /e9bb2dddd28b/5.6.7.8/YL0000000000.rom HTTP/1.1" "-" "Yealink W52P 25.81.0.10 00:00:00:00:00:00" 404 - 1 5 0.146
$

SED to parse apache logs between timestamp

I am trying to parse a log and get the lines between timestamp.Tried sed approach like below but facing issue with regex
Log pattern:
IP - - [20/Apr/2018:14:25:37 +0000] "GET / HTTP/1.1" 301 3936 "-" "
IP - - [20/Apr/2018:14:44:08 +0000]
----------------------------------
IP- - [20/Apr/2018:20:43:46 +0000]
I need to get the lines between 14:25 and 20:43 for 20th april as the log contains other dates also.
Tried this:
sed -n '/\[14:25/,/\[20:43/p' *-https_access.log.1
but not working.

Since you mentioned you want logs for 20th April, I'd suggest something like :
$ sed -n '/20\/Apr\/2018:14:25/,/20\/Apr\/2018:20:43/p' *-https_access.log.1
This is very less likely to conflict with false matches in case "20:43" occurs elsewhere.

sed is not appropriate because it's hard to compare element (like day and hour).
with awk (self commented):
awk -F '[ []' '
{
# separt date and hour then rebuild the fields
sub(/:/, " ", $5);$0=$0""
}
# print if it s the day and between the 2 hour (string compare works in this case)
$5 ~ /20.Apr.2018/ && $6 >= "04:25" && $7 < "20:44"
' YourFile
more generaly, we can use variable to give date and hour as paramter to the awk (not the purpose here)

To print lines between match1 and match2 with sed or awk you can do:
sed -n '/match1/,/match2/p' inputfile
awk '/match1/,/match2/' inputfile
in your example match1 is 20/Apr/2018:14:25 and match2 is 20/Apr/2018:20:43. So any of these commands should work for you:
sed -n '/20\/Apr\/2018:14:25/,/20\/Apr\/2018:20:43/p' inputfile
awk '/20\/Apr\/2018:14:25/,/20\/Apr\/2018:20:43/' inputfile
or use | as a sed's delimiter to prevent escaping slash:
sed -n '\|20/Apr/2018:14:25|,\|20/Apr/2018:20:43|p' inputfile

The best solution is to use awk for this. What you need to do is convert your time-stamps to a unix-time and then do the comparisons. In awk you can do this using mktime()
mktime(datespec [, utc-flag ]): Turn datespec into a timestamp in the same form as is returned by systime(). It is similar to the
function of the same name in ISO C. The argument, datespec, is a
string of the form YYYY MM DD HH MM SS [DST]. The string consists of
six or seven numbers representing, respectively, the full year
including century, the month from 1 to 12, the day of the month from 1
to 31, the hour of the day from 0 to 23, the minute from 0 to 59, the
second from 0 to 60,55 and an optional daylight-savings flag.
In order to convert your time-format of the form 20/Apr/2018:14:25:37 +0000 into 2018 04 20 14 25 37 +0000
awk -v tstart="20/Apr/2018:14:25:00" -v tend = "20/Apr/2018:20:43:00" \
'function tounix(str) {
split(str,a,"/|:| ")
return mktime(a[3]" "month[a[2]]" "a[1]" "a[4]" "a[5]" "a[6])
}
BEGIN{
month["Jan"]="01";month["Feb"]="02";month["Mar"]="03"
month["Apr"]="04";month["May"]="05";month["Jun"]="06"
month["Jul"]="07";month["Aug"]="08";month["Sep"]="09"
month["Oct"]="10";month["Nov"]="11";month["Dec"]="12"
FS="\\[|\\]"
t1=tounix(tstart)
t2=tounix(tend)
}
{ t=tounix($2) }
(t1<=t && t<=t)' <file>
This method is robust as it will do true time comparisons which are independent of leap years, day/month/year-cross-overs, ... In contrast to other solutions provided, this method also does not require the existence of the date tstart and tend in the file

How to convert epoch to yyyy-mm-ddThh:mm:ss linux

I want to convert epoch like "1444039517190" to yyyy-mm-ddThh:mm:ss using linux?
I have tried using the following script but it gives the wrong output:
echo 1444039517190 | awk '{ print strftime("%Y-%m-%d %H:%M:%S",$1) }'
Output:
47729-10-16 09:06:30

Your timestamp is not in seconds; try:
echo 1444039517190 | awk '{ print strftime("%Y-%m-%d %H:%M:%S",$1/1000) }'
Or if you don't mind losing precision:
date --date=#$((1444039517190 / 1000)) +%Y-%m-%d\ %H:%M:%S
And check if your version of strftime(3) support the shorter format %F %T instead of %Y-%m-%d %H:%M:%S.

Why not use date directly ? You would also have to divide by 1000.
date -d #1444039517.190
Output
Mon Oct 5 12:05:17 CEST 2015

parse httpd log in bash

my httpd log has the following format
123.251.0.000 - - [05/Sep/2014:18:19:24 -0700] "GET /myapp/MyService?param1=value1&param2=value2&param3=value3 HTTP/1.1" 200 15138 "-" "-"
I need to extract the following fields and display on a line:
IP value1 httpResponseCode(eg.200), dataLength
what's the most efficient way to do this in bash?

As you're using Linux, chances are that you also have GNU awk installed. If so:
$ awk 'match ($7, /param1=([^& ]*)/, m) { print $1, m[1], $9",", $10 }' http.log
gives:
123.251.0.000 value1 200, 15138
This works as long as value1 hasn't got an ampersand or space in it, which they shouldn't if the request has been escaped correctly.

$ cat tmp.txt
123.251.0.000 - - [05/Sep/2014:18:19:24 -0700] "GET /myapp/MyService?param1=value1&param2=value2&param3=value3 HTTP/1.1" 200 15138 "-" "-"
$ awk '{ print "IP", $1, $9, $10 }' tmp.txt
IP 123.251.0.000 200 15138

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Using awk to generate report from apache http logs - linux

Related

Parsing Mod Security rules

Change date format with awk

SED to parse apache logs between timestamp

How to convert epoch to yyyy-mm-ddThh:mm:ss linux

parse httpd log in bash

Categories

Resources