I have a HAProxy log file with content similar to this:
Feb 28 11:16:10 localhost haproxy[20072]: 88.88.88.88:6152 [28/Feb/2017:11:16:01.220] frontend backend_srvs/srv1 9063/0/0/39/9102 200 694 - - --VN 9984/5492/191/44/0 0/0 {Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36|http://subdomain.domain.com/location1} "GET /location1 HTTP/1.1"
Feb 28 11:16:10 localhost haproxy[20072]: 88.88.88.88:6152 [28/Feb/2017:11:16:10.322] frontend backend_srvs/srv1 513/0/0/124/637 200 14381 - - --VN 9970/5491/223/55/0 0/0 {Mozilla/5.0 AppleWebKit/537.36 Chrome/56.0.2924.87 Safari/537.36|http://subdomain.domain.com/location2} "GET /location2 HTTP/1.1"
Feb 28 11:16:13 localhost haproxy[20072]: 88.88.88.88:6152 [28/Feb/2017:11:16:10.960] frontend backend_srvs/srv1 2245/0/0/3/2248 200 7448 - - --VN 9998/5522/263/54/0 0/0 {another user agent with fewer columns|http://subdomain.domain.com/location3} "GET /location3 HTTP/1.1"
Feb 28 11:16:13 localhost haproxy[20072]: 88.88.88.88:6152 [28/Feb/2017:11:16:10.960] frontend backend_srvs/srv1 2245/0/0/3/2248 200 7448 - - --VN 9998/5522/263/54/0 0/0 {Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36|} "GET /another_location HTTP/1.1"
I want to extract some of the fields in order to have the following output:
Field 1 Field 2 Field 3 Field 4 Field 5 Field 6
Date/time HTTP status code HTTP Method Request HTTP version Referer URL
Basically, in this particular case the output should be:
Feb 28 11:16:10 200 GET /location1 HTTP/1.1 http://subdomain.domain.com/location1
Feb 28 11:16:10 200 GET /location2 HTTP/1.1 http://subdomain.domain.com/location2
Feb 28 11:16:13 200 GET /location3 HTTP/1.1 http://subdomain.domain.com/location3
Feb 28 11:16:13 200 GET /another_location HTTP/1.1
The only problem here is extracting the Referer URL which is between curly brackets together with the user agent and they're separated by a pipe. Also, the user agent has a variable number of fields.
The only solution I could think of was extracting the referer url separately and then pasting the columns together:
requests_temp=`grep -F " 88.88.88.88:" /root/file.log | tr -d '"'`
requests=`echo "${requests_temp}" | awk '{print $1" "$2" "$3" "$11, $(NF-2), $(NF-1), $NF}' > /tmp/requests_tmp`
referer_url=`echo "${requests_temp}" | awk 'NR > 1 {print $1}' RS='{' FS='}' | awk -F'|' '{ print $2 }' > /tmp/referer_url_tmp`
paste /tmp/abuse_requests_tmp /tmp/referer_url_tmp
But I don't really like this method. Is there any other way in which I can do it using only one awk line? Maybe assign the referer url column to a variable inside awk and then using it to create the same output?
try below solution -
awk '/88.88.88.88/ {gsub(/"/,"",$0);split($(NF-3),a,"|"); {print $1,$2,$3,$11, $(NF-2), $(NF-1), $NF, substr(a[2],1,(length(a[2])-1))}}' a
Feb 28 11:16:10 200 GET /location1 HTTP/1.1 http://subdomain.domain.com/location1
Feb 28 11:16:10 200 GET /location2 HTTP/1.1 http://subdomain.domain.com/location2
Feb 28 11:16:13 200 GET /location3 HTTP/1.1 http://subdomain.domain.com/location3
Feb 28 11:16:13 200 GET /another_location HTTP/1.1
You can do all at once using awk:
awk '$6 ~ /88\.88\.88\.88:[0-9]+/{
split($0,a,/[{}]/)
$0=a[1] OFS a[3]
split(a[2],b,"|")
print $1,$2,$3,$11,substr($18,2),$19,substr($20,1,length($20)-1),b[2]
}' file.log
The first split is splitting the variable part of line (included in between the {...}) into the array a.
The line is rebuilt in order to have a fix number of fields $0=a[1] OFS a[3]
The second split allows extracting the URL from variable based on | characters.
At last the print shows all needed elements. Note the substr are here for removing the ".
Related
Experts I came again after reading how to provide minimal reproducible example, I am placing the question again.
I want to filter the fully qualified hostname(eg: dtc4028.ptc.db01.delta.com) and count the repetition on an individual host.
Below is my raw data:
Feb 24 07:20:56 dbv0102 postfix/smtpd[29531]: NOQUEUE: reject: RCPT from dtc4023.ptc.db01.delta.com[172.10.10.161]: 554 5.7.1 <beta_st#dtc.com>: Sender address rejected: Access denied; from=<beta_st#dtc.com> to=<stordb#dtc.com> proto=ESMTP helo=<dtc4023.ptc.db01.delta.com>
Feb 24 07:21:20 dbv0102 postfix/smtpd[29528]: NOQUEUE: reject: RCPT from dtc4023.ptc.db01.delta.com[172.10.10.161]: 554 5.7.1 <beta_st#dtc.com>: Sender address rejected: Access denied; from=<beta_st#dtc.com> to=<stordb#dtc.com> proto=ESMTP helo=<dtc4023.ptc.db01.delta.com>
Feb 21 05:05:06 dbv0102 postfix/smtpd[32001]: disconnect from dtc4028.ptc.db01.delta.com[172.12.78.81]
Feb 21 05:05:23 dbv0102 postfix/smtpd[32010]: connect from dtc4028.ptc.db01.delta.com[172.12.78.81]
Feb 21 05:06:15 dbv0102 postfix/smtpd[31994]: connect from dtc3024.ptc.db01.delta.com[172.10.10.166]
Feb 21 05:06:15 dbv0102 postfix/smtpd[31994]: disconnect from dtc3024.ptc.db01.delta.com[172.10.10.166]
Feb 21 13:05:08 dbv0102 postfix/smtpd[29043]: lost connection after CONNECT from dtc4028.ptc.db01.delta.com[172.12.78.81]
Feb 21 13:05:08 dbv0102 postfix/smtpd[29048]: lost connection after CONNECT from dtc4028.ptc.db01.delta.com[172.12.78.82]
What myself tried:
What I am doing here, Just taking desired column 1,2,4 and 8
$ awk '/from dtc/{print $1, $2, $4, $8}' maillog.log
Feb 24 dbv0102 RCPT
Feb 24 dbv0102 RCPT
Feb 21 dbv0102 dtc4028.ptc.db01.delta.com[172.12.78.81]
Feb 21 dbv0102 dtc4028.ptc.db01.delta.com[172.12.78.81]
Feb 21 dbv0102 dtc3024.ptc.db01.delta.com[172.10.10.166]
Feb 21 dbv0102 dtc3024.ptc.db01.delta.com[172.10.10.166]
Feb 21 dbv0102 after
Feb 21 dbv0102 after
Secondly, I am removing RCPT|after as these lines do not have hostnames and then also removing [] to just have hostname's and count their repition.
$ awk '/from dtc/{print $1, $2, $4, $8}' maillog.log| egrep -v "RCPT|after" | awk '{print $4}'| cut -d"[" -f1 | uniq -c
2 dtc4028.ptc.db01.delta.com
2 dtc3024.ptc.db01.delta.com
What I Wish:
I wish if this can be written more intelligently with the awk itself rather i'm doing it dirty way.
Note: Can we get only the FQDN hostnames like dtc4028.ptc.db01.delta.com after the 6th column.
Based on your shown samples, could you please try following. Written and tested in GNU awk.
awk '
match($0,/from .*com\[/){
count[substr($0,RSTART+5,RLENGTH-6)]++
}
END{
for(key in count){
print count[key],key
}
}
' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
match($0,/from .*com\[/){ ##Using match function to match regex from .*com\[
count[substr($0,RSTART+5,RLENGTH-6)]++ ##Whenever match is having a regex matched so it sets RSTART and RLENGTH, RSTART tells us starting point of matched regex and RLENGTH is complete length.
}
END{ ##Starting END block of this program from here.
for(key in count){ ##Traversing through count array here.
print count[key],key ##Printing its key and value here.
}
}
' Input_file ##Mentioning Input_file name here.
$ awk -F'[[ ]' '$8=="from"{ cnt[$9]++ } END{ for (host in cnt) print cnt[host], host }' file
2 dtc4028.ptc.db01.delta.com
2 dtc3024.ptc.db01.delta.com
I have a log file which contains millions line like this:
$ cat file.log
10.0.7.92 - - [05/Jun/2017:03:50:06 +0000] "GET /adserver/html5/inwapads/?category=[IAB]&size=320x280&ak=AY1234&output=vast&version=1.1&sleepAfter=&requester=John&adFormat=preappvideo HTTP/1.1" 200 131 "-" "Mozilla/5.0 (Linux; Android 6.0.1; SM-S120VL Build/MMB29M; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/58.0.3029.83 Mobile Safari/537.36" 0.000 1029 520 127.0.0.1
10.0.6.91 - - [05/Jun/2017:03:50:06 +0000] "GET /adserver/html5/inwapads/?category=[IAB]&output=vast&version=1.1&sleepAfter=&requester=John&size=320x280&ak=AY1234&adFormat=preappvideo HTTP/1.1" 200 131 "-" "Mozilla/5.0 (Linux; Android 6.0.1; SM-S120VL Build/MMB29M; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/58.0.3029.83 Mobile Safari/537.36" 0.000 1029 520 127.0.0.1
I want print output of every line like this in excel with different columns:
inwapads AY1234 john 320x280
How to do that use awk or do I need to use another method.
If your desired Input looks like the file data:
$ cat file.log
10.0.7.92 - - [05/Jun/2017:03:50:06 +0000] "GET /adserver/html5/inwapads/?category=[IAB]&size=320x280&ak=AY1234&output=vast&version=1.1&sleepAfter=&requester=John&adFormat=preappvideo HTTP/1.1" 200 131 "-" "Mozilla/5.0 (Linux; Android 6.0.1; SM-S120VL Build/MMB29M; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/58.0.3029.83 Mobile Safari/537.36" 0.000 1029 520 127.0.0.1
10.0.6.91 - - [05/Jun/2017:03:50:06 +0000] "GET /adserver/html5/inwapads/?category=[IAB]&output=vast&version=1.1&sleepAfter=&requester=John&size=320x280&ak=AY1234&adFormat=preappvideo HTTP/1.1" 200 131 "-" "Mozilla/5.0 (Linux; Android 6.0.1; SM-S120VL Build/MMB29M; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/58.0.3029.83 Mobile Safari/537.36" 0.000 1029 520 127.0.0.1
Then you can simply use awk working on column $7 with some gensub( /regex/, substitution, n, column), awk's general substitution tool
$ awk '{
item=gensub( /(^.*\/)(.*\/)(.*)(\/)(\?.*$)/ , "\\3" , 1, $7 )
ak=gensub( /(^.*ak\=)([A-Z]*[0-9]*)(\&)(.*$)/ , "\\2" , 1, $7)
req=gensub( /(^.*requester\=)([A-Za-z]*)(\&)(.*$)/ , "\\2", 1, $7)
s=gensub( /(^.*size\=)([0-9]*x[0-9]*)(\&.*$)/, "\\2", 1, $7)
print item, ak, req, s
}' file.log
Output:
inwapads AY1234 John 320x280
inwapads AY1234 John 320x280
In additional for my question
Here is example of log:
10.10.10.10 - - [21/Mar/2016:00:00:00 +0000] "GET /example?page=&per_page=100&scopes= HTTP/1.1" 200 769 "-" "" "1.1.1.1"
10.10.10.10 - - [21/Mar/2016:00:00:00 +0000] "GET /example?page=&per_page=500&scopes= HTTP/1.1" 200 769 "-" "" "1.1.1.1"
10.10.10.10 - - [21/Mar/2016:00:00:00 +0000] "GET /example?page=&per_page=100&scopes= HTTP/1.1" 200 769 "-" "" "1.1.1.1"
11.11.11.11 - - [21/Mar/2016:00:00:00 +0000] "GET /example?page=&per_page=10&scopes= HTTP/1.1" 200 769 "-" "" "1.1.1.1"
12.12.12.12 - - [21/Mar/2016:00:00:00 +0000] "GET /example?page=&per_page=500&scopes= HTTP/1.1" 200 769 "-" "" "1.1.1.1"
13.13.13.13 - - [21/Mar/2016:00:00:00 +0000] "GET /example HTTP/1.1" 200 769 "-" "" "1.1.1.1"
With following command
awk --re-interval '/per_page=[0-9]{3}/{cnt[$1]++} END{for (ip in cnt) print ip, cnt[ip]}' file
I can get counted and groupped result of each IPs witch cosist per_page >= 100 in parameters:
12.12.12.12 1
10.10.10.10 3
How I can modify it for output with per_page parameter value? For example (any format):
12.12.12.12 - per_page-500 - 1
10.10.10.10 - per_page-100 - 2
10.10.10.10 - per_page-500 - 1
awk to the rescue!
$ awk --re-interval -v OFS=' - ' '
match($0,/per_page=[0-9]{3}/){cnt[$1 OFS substr($0, RSTART,RLENGTH)]++}
END{for (ip in cnt) print ip, cnt[ip]}' file
12.12.12.12 - per_page=500 - 1
10.10.10.10 - per_page=500 - 1
10.10.10.10 - per_page=100 - 2
Why Linux cant print to the printer model: Custom Engineering VKP80 ?
$ lpstat -p -d
printer CUSTOM_Engineering_VKP80 is idle. enabled since Sat 05 Apr 2014 10:50:52 PM CEST
printer LabelWriter-450 is idle. enabled since Tue 25 Mar 2014 10:47:06 AM CET
printer PDF is idle. enabled since Tue 25 Mar 2014 10:40:12 AM CET
printer Zebra_TLP2844 is idle. enabled since Tue 25 Mar 2014 05:52:37 AM CET
system default destination: LabelWriter-450
$ echo 'test' > /tmp/test.txt
$ lpr -P CUSTOM_Engineering_VKP80 /tmp/test.txt
Nothing happens.
EDIT:
E [05/Apr/2014:22:49:07 +0200] [Client 17] Empty Basic password.
E [05/Apr/2014:22:49:12 +0200] [Client 16] pam_authenticate() returned 7 (Authentication failure)
E [05/Apr/2014:22:49:20 +0200] [Client 15] pam_authenticate() returned 7 (Authentication failure)
E [05/Apr/2014:22:49:25 +0200] [Client 15] pam_authenticate() returned 7 (Authentication failure)
W [05/Apr/2014:22:50:58 +0200] CreateProfile failed: org.freedesktop.ColorManager.AlreadyExists:profile id 'CUSTOM_Engineering_VKP80-Gray..' already exists
E [05/Apr/2014:22:56:39 +0200] [Job 133] Aborting job because it has no files.
E [05/Apr/2014:22:57:20 +0200] [Job 134] Aborting job because it has no files.
E [05/Apr/2014:22:57:30 +0200] [Job 135] Aborting job because it has no files.
E [05/Apr/2014:22:58:08 +0200] [Job 136] Aborting job because it has no files.
E [05/Apr/2014:23:01:57 +0200] [Job 137] Stopping unresponsive job.
localhost - - [05/Apr/2014:22:50:52 +0200] "POST /admin HTTP/1.1" 200 1752 - -
localhost - sun [05/Apr/2014:22:50:52 +0200] "POST /admin HTTP/1.1" 200 1752 - -
localhost - - [05/Apr/2014:22:50:52 +0200] "POST /admin/ HTTP/1.1" 401 392 CUPS-Add-Modify-Printer successful-ok
localhost - sun [05/Apr/2014:22:50:52 +0200] "POST /admin/ HTTP/1.1" 200 392 CUPS-Add-Modify-Printer successful-ok
localhost - sun [05/Apr/2014:22:50:52 +0200] "POST /admin HTTP/1.1" 200 10930 - -
localhost - sun [05/Apr/2014:22:50:58 +0200] "POST /admin HTTP/1.1" 200 404 - -
localhost - - [05/Apr/2014:22:50:58 +0200] "POST /admin/ HTTP/1.1" 401 8463 CUPS-Add-Modify-Printer successful-ok
localhost - sun [05/Apr/2014:22:50:58 +0200] "POST /admin/ HTTP/1.1" 200 8463 CUPS-Add-Modify-Printer successful-ok
localhost - sun [05/Apr/2014:22:50:58 +0200] "POST /admin HTTP/1.1" 200 2789 - -
localhost - - [05/Apr/2014:22:51:38 +0200] "POST /printers/LabelWriter-450 HTTP/1.1" 200 311 Create-Job successful-ok
localhost - - [05/Apr/2014:22:52:11 +0200] "POST /printers/LabelWriter-450 HTTP/1.1" 200 328 Create-Job successful-ok
localhost - - [05/Apr/2014:22:52:26 +0200] "POST /printers/LabelWriter-450 HTTP/1.1" 200 329 Create-Job successful-ok
localhost - - [05/Apr/2014:22:53:07 +0200] "POST /printers/CUSTOM_Engineering_VKP80 HTTP/1.1" 200 321 Create-Job successful-ok
localhost - - [05/Apr/2014:22:54:26 +0200] "POST /printers/CUSTOM_Engineering_VKP80 HTTP/1.1" 200 322 Create-Job successful-ok
localhost - - [05/Apr/2014:22:54:26 +0200] "POST /printers/CUSTOM_Engineering_VKP80 HTTP/1.1" 200 282 Send-Document successful-ok
localhost - - [05/Apr/2014:23:00:19 +0200] "POST /printers/CUSTOM_Engineering_VKP80 HTTP/1.1" 200 322 Create-Job successful-ok
localhost - - [05/Apr/2014:23:00:19 +0200] "POST /printers/CUSTOM_Engineering_VKP80 HTTP/1.1" 200 282 Send-Document successful-ok
localhost - - [05/Apr/2014:23:01:57 +0200] "POST /jobs HTTP/1.1" 200 139 Restart-Job successful-ok
localhost - sun [05/Apr/2014:23:22:28 +0200] "GET /admin/log/error_log? HTTP/1.1" 200 899 - -
I have access logs like this, and I would like to grab each and everyone of them and then order them by which one is found the most.
173.192.238.41 - - [28/Feb/2013:07:06:09 -0500] "GET / HTTP/1.1" 200 20644 "-" "Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.19; aggregator:Spinn3r (Spinn3r 3.1); http://spinn3r.com/robot) Gecko/2010040121 Firefox/3.0.19"
208.115.113.84 - - [28/Feb/2013:07:06:19 -0500] "GET /tag/bright HTTP/1.1" 404 327 "-" "Mozilla/5.0 (compatible; Ezooms/1.0; ezooms.bot#gmail.com)"
94.228.34.214 - - [28/Feb/2013:07:10:16 -0500] "GET /alli-comes-home-12-10-09-day-224-2264/feed HTTP/1.1" 404 359 "-" "magpie-crawler/1.1 (U; Linux amd64; en-GB; +http://www.brandwatch.net)"
209.171.42.71 - - [28/Feb/2013:07:11:19 -0500] "GET /feed/atom HTTP/1.1" 404 326 "-" "Mozilla/5.0 (compatible; BlogScope/1.0; +http://www.blogscope.net/; U of Toronto)"
94.228.34.229 - - [28/Feb/2013:07:12:48 -0500] "GET /the-latest-design-franck-muller-watches-and-versace-watches-6838/feed HTTP/1.1" 404 386 "-" "magpie-crawler/1.1 (U; Linux amd64; en-GB; +http://www.brandwatch.net)"
I can to cat and sort it right like this?
cat /path/to/access.log | awk '{print $1}' | sort | uniq -c
You're close. After counting them, you have to sort by the count:
awk '{print $1}' /path/to/access.log | sort | uniq -c | sort -n
You can also do the counting in awk rather than using sort and uniq:
awk '{count[$1]++} END {for (ip in count) print count[ip], ip;}' | sort -n
awk '{a[$1]++}END{for(i in a)print a[i],i}' your_log|sort -rn
or
perl -lane '$x{$F[0]}++;END{for(keys %x){print $x{$_}." ".$_;}}' your_log|sort -rn
Here's one way you can order the IPv4 addresses by occurrence and then by address:
# cut takes only the first column from access.log
<access.log cut -d' ' -f1 |
# Presort the IP addresses so uniq can count them
sort |
uniq -c |
# Format the stream so it only contains `.' delimiters
sed 's/^ *//; s/ /./' |
# Now sort numerically based on each consecutive dot delimited column
sort -t. -k1,1n -k2,2n -k3,3n -k4,4n -k5,5n |
# Reset the first delimter
sed 's/\./ /'
Test input:
cat << EOF > access.log
173.192.238.41 - - [28/Feb/2013:07:06:09 -0500] "GET / HTTP/1.1" 200 20644 "-" "Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.19; aggregator:Spinn3r (Spinn3r 3.1); http://spinn3r.com/robot) Gecko/2010040121 Firefox/3.0.19"
208.115.113.84 - - [28/Feb/2013:07:06:19 -0500] "GET /tag/bright HTTP/1.1" 404 327 "-" "Mozilla/5.0 (compatible; Ezooms/1.0; ezooms.bot#gmail.com)"
94.228.34.229 - - [28/Feb/2013:07:12:48 -0500] "GET /the-latest-design-franck-muller-watches-and-versace-watches-6838/feed HTTP/1.1" 404 386 "-" "magpie-crawler/1.1 (U; Linux amd64; en-GB; +http://www.brandwatch.net)"
94.228.34.214 - - [28/Feb/2013:07:10:16 -0500] "GET /alli-comes-home-12-10-09-day-224-2264/feed HTTP/1.1" 404 359 "-" "magpie-crawler/1.1 (U; Linux amd64; en-GB; +http://www.brandwatch.net)"
209.171.42.71 - - [28/Feb/2013:07:11:19 -0500] "GET /feed/atom HTTP/1.1" 404 326 "-" "Mozilla/5.0 (compatible; BlogScope/1.0; +http://www.blogscope.net/; U of Toronto)"
209.71.42.71 - - [28/Feb/2013:07:11:19 -0500] "GET /feed/atom HTTP/1.1" 404 326 "-" "Mozilla/5.0 (compatible; BlogScope/1.0; +http://www.blogscope.net/; U of Toronto)"
94.228.34.229 - - [28/Feb/2013:07:12:48 -0500] "GET /the-latest-design-franck-muller-watches-and-versace-watches-6838/feed HTTP/1.1" 404 386 "-" "magpie-crawler/1.1 (U; Linux amd64; en-GB; +http://www.brandwatch.net)"
94.229.34.229 - - [28/Feb/2013:07:12:48 -0500] "GET /the-latest-design-franck-muller-watches-and-versace-watches-6838/feed HTTP/1.1" 404 386 "-" "magpie-crawler/1.1 (U; Linux amd64; en-GB; +http://www.brandwatch.net)"
94.227.34.229 - - [28/Feb/2013:07:12:48 -0500] "GET /the-latest-design-franck-muller-watches-and-versace-watches-6838/feed HTTP/1.1" 404 386 "-" "magpie-crawler/1.1 (U; Linux amd64; en-GB; +http://www.brandwatch.net)"
EOF
Output:
1 94.227.34.229
1 94.228.34.214
1 94.229.34.229
1 173.192.238.41
1 208.115.113.84
1 209.71.42.71
1 209.171.42.71
2 94.228.34.229