Grok expression to Parse Log data - logstash-grok

I have just started using grok for logstash and I am trying to parse my log file line below using grok filter.
10.210.57.60 0x756682x2 connectadmin [12/May/2020:00:00:00 +0530] "GET /rest/auth/1/session HTTP/1.1" 200 286 456 "-" "Jersey/2.11 (HttpUrlConnection 1.8.0_171)" "1twyrho"
I am interested in :
IP : 10.210.57.60 //
user : connectadmin //
timestamp : 12/May/2020:00:00:00 +0530 //
URL : /rest/auth/1/session //
Response Code : 200
I am currently stuck with the grok expression : %{IPV4:client_ip} %{WORD:skip_me1} %{USERNAME}
by which I am able to get IP and username. Can you please help me proceed.
Thank You..

I have used grok debugger https://grokdebug.herokuapp.com/ to get the desired output.
Below is the grok pattern that will match your requirement:
%{IPV4:IP} %{GREEDYDATA:girbish} %{USERNAME:user} \[%{HTTPDATE:timestamp}\] "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" %{NUMBER:response} (?:%{NUMBER:bytes}|-) %{GREEDYDATA:girbish}
Also, below is the screenshot of final output after using the grok pattern
Note: You can remove unnecessary fields using mutate filter of logstash.

Related

Logstash - Grok Syntax Issues

I'm using filebeat to send log to logstash but I'm having issues with grok syntax on Logstash. I used the grok debugger on Kibanna and manager to come to a solution.
The problem is that I can't find the same syntax for Logstash.
The original log :
{"log":"188.188.188.188 - tgaro [22/Aug/2022:11:37:54 +0200] \"PROPFIND /remote.php/dav/files/xxx#yyyy.com/ HTTP/1.1\" 207 1035 \"-\" \"Mozilla/5.0 (Windows) mirall/2.6.1stable-Win64 (build 20191105) (Nextcloud)\"\n","stream":"stdout","time":"2022-08-22T09:37:54.782377901Z"}
The message receive in Logstash :
"message" => "{\"log\":\"188.188.188.188 - tgaro [22/Aug/2022:11:37:54 +0200] \\\"PROPFIND /remote.php/dav/files/xxx#yyyy.com/ HTTP/1.1\\\" 207 1035 \\\"-\\\" \\\"Mozilla/5.0 (Windows) mirall/2.6.1stable-Win64 (build 20191105) (Nextcloud)\\\"\\n\",\"stream\":\"stdout\",\"time\":\"2022-08-22T09:37:54.782377901Z\"}",
The Grok Pattern i used on Grok Debugger (Kibana):
{\\"log\\":\\"%{IPORHOST:clientip} %{HTTPDUSER:ident} %{HTTPDUSER:auth} \[%{HTTPDATE:timestamp}\] \\\\\\"(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\\\\\\" (?:-|%{NUMBER:response}) (?:-|%{NUMBER:bytes}) \\\\\\("%{DATA:referrer}\\\\\\") \\\\\\"%{DATA:user-agent}\\\\\\"
The real problem is that I can't even manage to get the IP (188.188.188.188).
I tried :
match => { "message" => '{\\"log\\":\\"%{IPORHOST:clientip}' # backslash to escape the backslash
match => { "message" => '{\\\"log\\\":\\\"%{IPORHOST:clientip}' # backslash to escape the quote
match => { "message" => "{\\\"log\\\":\\\"%{IPORHOST:clientip}" # backslash to escape the quote
Help would be appreciated
Thanks !
ps : The log used here is shrink. The real log is mixed with Json and string so i can't send it as Json in Filebeat.
Ok, so i manage to make it work by using this :
grok {
match => { "message" => '%{SYSLOGTIMESTAMP:syslog_timestamp} %{IPORHOST:syslog_server} %{WORD:syslog_tag}: %{GREEDYDATA:jsonMessage}' }
}
json {
source => "jsonMessage"
}
grok {
match => { "jsonMessage" => '%{IPORHOST:clientip} %{HTTPDUSER:ident} %{HTTPDUSER:auth} \[%{HTTPDATE:timestamp}\] \\"(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\\" (?:-|%{NUMBER:response}) (?:-|%{NUMBER:bytes}) \\("%{DATA:referrer}\\") \\"%{DATA:user-agent}\\"'}
}
with log like this :
Aug 24 00:00:01 hostname containers: {"log":"188.188.188.188 - user.name#things.com [23/Aug/2022:23:59:52 +0200] \"PROPFIND /remote.php/dav/files/ HTTP/1.1\" 207 1159 \"-\" \"Mozilla/5.0 (Linux) mirall/3.4.2-1ubuntu1 (Nextcloud, ubuntu-5.15.0-46-generic ClientArchitecture: x86_64 OsArchitecture: x86_64)\"\n","stream":"stdout","time":"2022-08-23T21:59:52.612843092Z"}
The first match will fetch the first 3 field (time, hostname and tag), and then get everything after the : with the pattern GREEDYDATA in the jsonMessage.
Than the json filter is used on the jsonMessage. Since then, we have the information needed in the new field log created by using the json filter.
I still don't understand why my grok work on Kibanna debugger but not on Logstash. I mean it's probably because some character needs to be escaped. But even when i escape them it didn't work.
#RaiZy_Style has already filtered out the JSON using the JSON Filter and trying to match the jsonMessage using the GROK Filter
I used the GROK Debugger to create a grok pattern that will match jsonMessage field.
The jsonMessage that I assumed coming after using the JSON Filter for the above log example is:
188.188.188.188 - user.name#things.com [23/Aug/2022:23:59:52 +0200] \\"PROPFIND /remote.php/dav/files/ HTTP/1.1\\" 207 1159 \\"-\\" \\"Mozilla/5.0 (Linux) mirall/3.4.2-1ubuntu1 (Nextcloud, ubuntu-5.15.0-46-generic ClientArchitecture: x86_64 OsArchitecture: x86_64)\\"\\n"
Here is pattern that will work:
%{IPORHOST:clientip} %{USER:ident} %{DATA:auth} \[%{HTTPDATE:timestamp}\] \\\\"(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" %{NUMBER:response} (?:%{NUMBER:bytes}|-) \\\\"
Output Screenshot:
Note: If you want to expand the rawrequest field value into individual filed, it can be done also.

How to use IF ELSE condition in grok pattern in logstash

I have web and API log combined and I want to save it separately in elasticsearch. So I want to write one pattern if the request is for API then if past should execute, the request is web then else part of the log should be executed.
Below are few web and API logs.
00:06:27,778 INFO [stdout] (ajp--0.0.0.0-8009-38) 00:06:27.777 [ajp--0.0.0.0-8009-38] INFO c.r.s.web.rest.WidgetController - Method getWidgetDetails() started to get widget details.
00:06:27,783 INFO [stdout] (ajp--0.0.0.0-8009-38) ---> HTTP GET http://api.survey.me/v1/getwidgetdetails?profileName=jeremy-steffens&profileLevel=INDIVIDUAL&companyProfileName=premier-nationwide-lending&hideHistory=true
00:06:27,817 INFO [stdout] (ajp--0.0.0.0-8009-38) <--- HTTP 200 http://api.survey.me/v1/getwidgetdetails?profileName=jeremy-steffens&profileLevel=INDIVIDUAL&companyProfileName=premier-nationwide-lending&hideHistory=true (29ms)
00:06:27,822 INFO [stdout] (ajp--0.0.0.0-8009-38) 00:06:27.822 [ajp--0.0.0.0-8009-38] INFO c.r.s.web.rest.WidgetController - Method getWidgetDetails() finished.
00:06:27,899 INFO [stdout] (ajp--0.0.0.0-8009-40) 00:06:27.899 [ajp--0.0.0.0-8009-40] INFO c.r.s.web.controller.LoginController - Inside initLoginPage() of LoginController
I tried to write condition but it's not working. It's working only up to thread name. After thread I have multiple type log so not able to write witout if condition.
(?:%{TIME:CREATED_ON})(?:%{SPACE})%{WORD:LEVEL}%{SPACE}\[%{NOTSPACE}\]%{SPACE}\(%{NOTSPACE:THREAD}\)
Can anybody give me suggestion?
You don't need to use an if/else conditon to do this, you can use multiple patterns, one will match the API log lines and the other will match the WEB log lines.
For the API log lines you can use the following pattern:
(?:%{TIME:CREATED_ON})(?:%{SPACE})%{WORD:LEVEL}%{SPACE}\[%{NOTSPACE}\]%{SPACE}\(%{NOTSPACE:THREAD}\)%{SPACE}(?:%{DATA})%{SPACE}\[%{DATA}\]%{SPACE}%{WORD}%{SPACE}%{GREEDYDATA:MSG}
And your return will be something like this:
{
"MSG": "c.r.s.web.controller.LoginController - Inside initLoginPage() of LoginController",
"CREATED_ON": "00:06:27,899",
"LEVEL": "INFO",
"THREAD": "ajp--0.0.0.0-8009-40"
}
For the web lines you can use the following pattern:
(?:%{TIME:CREATED_ON})(?:%{SPACE})%{WORD:LEVEL}%{SPACE}\[%{NOTSPACE}\]%{SPACE}\(%{NOTSPACE:THREAD}\)%{SPACE}%{DATA}%{WORD:PROTOCOL}%{SPACE}%{WORD:MethodOrStatus}%{SPACE}%{GREEDYDATA:ENDPOINT}
And the result will be:
{
"CREATED_ON": "00:06:27,783",
"PROTOCOL": "HTTP",
"ENDPOINT": "http://api.survey.me/v1/getwidgetdetails?profileName=jeremy-steffens&profileLevel=INDIVIDUAL&companyProfileName=premier-nationwide-lending&hideHistory=true",
"LEVEL": "INFO",
"THREAD": "ajp--0.0.0.0-8009-38",
"MethodOrStatus": "GET"
}
To use multiple patterns in grok just do this:
grok {
match => ["message", "pattern1", "pattern2"]
}
Or you can save your patterns to a file and use patterns_dir to point to the directory of the file.
If you still want to use a conditional, just check for anything in the message, for example:
if "HTTP" in [message] {
grok { your grok for the web messages }
} else {
grok { your grok for the api messages }
}

grok filter for processing log4j logs pattern in Logstash

I am stuck in finding grok filter for processing conversion pattern %d{HH:mm:ss.SSS} %-5p [%t][%c] %m%n in log4j logs
here is an example log entry:
2018-02-12 12:10:03 INFO classname:25 - Exiting application.
2017-12-31 05:09:06 WARN foo:133 - Redirect Request : login
2015-08-19 08:07:03 INFO DBConfiguration:47 - Initiating DynamoDb Configuration...
2016-02-12 11:06:49 ERROR foo:224 - Error Code : 500
can anyone help in finding the Logstash grok filter.
Here I found the filter for your log4j pattren.
filter{
mutate {
gsub => ['message', "\n", " "]
}
grok {
match => { "message" => "(?<date>[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}) (?:%{LOGLEVEL:loglevel}) +(?:%{WORD:caller_class}):(?:%{NONNEGINT:caller_line}) - (?:%{GREEDYDATA:msg})" }
}
}
However, this is specific to the above log.

grok filter logstash JSON parse error, original data now in message field

I'm using logstash with a configuration input{rabbitmq} filter{grok} output{elastic}
From rabbit I receive nginx logs in this format :
- - [06/Mar/2017:15:45:53 +0000] "GET /check HTTP/1.1" 200 0 "-" "ELB-HealthChecker/2.0"
and I'm using grok filter as simple as follow :
filter{
if [type] == "nginx" {
grok{
match => { "message" => "%{NGINXACCESS}" }
}
}
}
and the pattern is
NGUSERNAME [a-zA-Z\.\#\-\+_%]+
NGUSER %{NGUSERNAME}
NGINXACCESS %{NGUSER:ident} %{NGUSER:auth} \[%{HTTPDATE:timestamp}\] "%{WORD:verb} %{URIPATHPARAM:request} HTTP/%{NUMBER:httpversion}" %{NUMBER:response} (?:%{NUMBER:bytes}|-) (?:"(?:%{URI:referrer}|-)"|%{QS:referrer}) %{QS:agent}
I tried the pattern in grok debugger and it seems to work just fine but running the pipeline i get this error
[2017-03-06T16:46:40,692][ERROR][logstash.codecs.json ] JSON parse error, original data now in message field {:error=>#, :data=>"- - [06/Mar/2017:16:46:40 +0000] \"GET /check HTTP/1.1\" 200 0 \"-\" \"ELB-HealthChecker/2.0\""}
it seems like someone(logstash?) is adding \ to the result...
hope to get some help, thanks!
This does not seem to be a grok error at all. if grok fails to parse it will add a tag _grokparsefailure to your event. A JSON parse error would be due to your input trying to read codec => json {} when your log format is plainly not JSON. Make sure that your input plugin that is handling these log types is using codec => plain or an appropriate type.
See logstash codecs for more info.

conditional matching with grok for logstash

I have php log of this format
[Day Mon DD HH:MM:SS YYYY] [Log-Type] [client <ipv4 ip address>] <some php error type>: <other msg with /path/of/a/php/script/file.php and something else>
[Day Mon DD HH:MM:SS YYYY] [Log-Type] [client <ipv4 ip address>] <some php error type>: <other msg without any file name in it>
[Day Mon DD HH:MM:SS YYYY] [Log-Type] [client <ipv4 ip address>] <some msg with out semicolon in it but /path/of/a/file inside the message>
This I am trying to send to Graylog2 after processing through logstash. Using this post here, I was able to start. now I would like to get some additional fields, so that my final version would look something like this.
{
"message" => "<The entire error message goes here>",
"#version" => "1",
"#timestamp" => "converted timestamp from Day Mon DD HH:MM:SS YYYY",
"host" => "<ipv4 ip address>",
"logtime" => "Day Mon DD HH:MM:SS YYYY",
"loglevel" => "Log-Type",
"clientip" => "<ipv4 ip address>",
"php_error_type" => "<some php error type>"
"file_name_from_the_log" => "/path/of/a/file || /path/of/a/php/script/file.php"
"errormsg" => "<the error message after first colon (:) found>"
}
I have the expression for individual line, or atleast I think these should be able to parse, using grokdebugger. something like this:
%{DATA:php_error_type}: %{DATA:message_part1}%{URIPATHPARAM:file_name}%{GREEDYDATA:errormsg}
%{DATA:php_error_type}: %{GREEDYDATA:errormsg}
%{DATA:message_part1}%{URIPATHPARAM:file_name}%{GREEDYDATA:errormsg}
But somehow I am finding it very difficult to make it work for the entire log file.
Any suggestion please? Also, not sure if there would be any other type of error messages coming in the log file. but the intention is to get the same format for all. Any suggestions how to tackle these logs to get the above mentioned format?
The grok filter can be configured with multiple patterns:
grok {
match => [
"message", "%{DATA:php_error_type}: %{DATA:message_part1}%{URIPATHPARAM:file_name}%{GREEDYDATA:errormsg}",
"message", "%{DATA:php_error_type}: %{GREEDYDATA:errormsg}",
"message", "%{DATA:message_part1}%{URIPATHPARAM:file_name}%{GREEDYDATA:errormsg}"
]
}
(Instead of a single filter with multiple patterns you could have multiple grok filters, but then you'd probably want to disable the _grokparsefailure tagging with tag_on_failure => [].)
If you have some part of your log line missing sometime you can use the following syntax :
(?:%{PATTERN1}|%{PATTERN2})
or
(?:%{PATTERN1}|)
To allow PATTERN1 OR ''. (empty)
Using this, you can have have only one pattern to manage :
grok {
match => [
"message", "(?:%{DATA:php_error_type}: |)(?:%{DATA:message_part1}:)(?:%{URIPATHPARAM:file_name}|)%{GREEDYDATA:errormsg}",
]
}
If you have problems, maybe replace %{DATA} by a more restrictive pattern.
You can also use this syntax (more regex like)
(?:%{PATTERN1})?
To debug a complex grok pattern, I recommend :
https://grokconstructor.appspot.com/do/match (multiline option + multiple input lines at same time + others options)
https://grokdebug.herokuapp.com/ (simpler to use)

Resources