How to filter a value from the request line of log - logstash-grok

I have the following log and I need to filter only salePoint from it.
"GET /supero/global/grocery/fullgrainMenu.jsp?id=cat12216&salePoint=0012FT&locale=es_ES&version=0510091431 HTTP/1.1"
I tried \"(%{NOTSPACE:request}(?:&salePoint=%{DATA:salePoint})?)\" but it giving wrong output
"salePoint": "0012FT&locale=es_ES&version=0510091431 HTTP/1.1"
Expected output is "salePoint": "0012FT
Thanks

Since the question specified that the intention is to find and filter only salePoint, you can use the following grok pattern:
(%{GREEDYDATA:before})?(salePoint=%{WORD:salePoint})(%{GREEDYDATA:after})?
Explanation :
before : It stores the optional data before salePoint entry is found.
salePoint : this stores the salePoint value
after : It stores the optional data after salePoint.
As always you can use add more to the pattern if you need to filter out more fields.
Example :
"GET /supero/global/grocery/fullgrainMenu.jsp?id=cat12216&salePoint=0012FT&locale=es_ES&version=0510091431 HTTP/1.1"
With the above pattern output is :
{
"before": [
[
""GET /supero/global/grocery/fullgrainMenu.jsp?id=cat12216&"
]
],
"salePoint": [
[
"0012FT"
]
],
"after": [
[
"&locale=es_ES&version=0510091431 HTTP/1.1""
]
]
}
Please use Grok Debugger to play around with the pattern.

Related

How do you find a quoted string with specific word in a log message using grok pattern

I have a log message from my server with the format below:
{"host":"a.b.com","source_type":"ABCD"}
I have this grok pattern so far but it accepts any word in double quotation.
\A%{QUOTEDSTRING}:%{PROG}
how can I change "QUOTEDSTRING" that only check for "host"?
"host" is not at the beginning of the message all the time and it can be found in the middle of message as well.
Thanks for your help.
Since the question specified that "host" can be anywhere between in the log, you can use the following:
\{(\"%{GREEDYDATA:data_before}\",)?(\"host\":\"%{DATA:host_value}\")?(,\"%{GREEDYDATA:data_after}\")?\}
Explanation :
data_before stores the optional data before host type entry is found. You can separate it more as per your need
host : this stores the host value
data_after stores the optional data after host type entry is found. You can seaprate it more as per your need
Example :
{"host":"a.b.com","source_type":"ABCD"}
Output :
{
"data_before": [
[
null
]
],
"host_value": [
[
"a.b.com"
]
],
"data_after": [
[
"source_type":"ABCD"
]
]
}
{"host":"a.b.com"}
Output :
{
"data_before": [
[
null
]
],
"host_value": [
[
"a.b.com"
]
],
"data_after": [
[
null
]
]
}
{"source_type":"ABCD","host":"a.b.com","data_type":"ABCD"}
Output :
{
"data_before": [
[
"source_type":"ABCD"
]
],
"host_value": [
[
"a.b.com"
]
],
"data_after": [
[
"data_type":"ABCD"
]
]
}
Tip : Use the following resources to tune and test your logging patterns :
Grok Debugger
Grok Patterns Full List

Logstash Grok Filter for Opentracing in Quarkus' log

Working on getting our Quarkus log files into elasticsearch. My problem is in trying to process the logs in logstash... How can I get the traceId and spanId using grok filter?
Here's a sample log entry:
21:11:32 INFO traceId=50a4f8740c30b9ca, spanId=50a4f8740c30b9ca, sampled=true [or.se.po.re.EmployeeResource] (vert.x-eventloop-thread-1) getEmployee with [id:2]
Here is my grok:
%{TIME} %{LOGLEVEL} %{WORD:traceId} %{WORD:spanId} %{GREEDYDATA:msg}
Using grok debugger, it seem traceId and spanId are not detected.
AFIK Grok expressions need to be exactly as the original text. So try to add commas, spaces and event all the text you do not want to capture. For instance traceId=
%{TIME} %{LOGLEVEL} traceId=%{WORD:traceId}, spanId=%{WORD:spanId}, %{GREEDYDATA:msg}
This is the output from https://grokdebug.herokuapp.com/ for your log line and my grok expression suggestion.
{
"TIME": [
[
"21:11:32"
]
],
"HOUR": [
[
"21"
]
],
"MINUTE": [
[
"11"
]
],
"SECOND": [
[
"32"
]
],
"LOGLEVEL": [
[
"INFO"
]
],
"traceId": [
[
"50a4f8740c30b9ca"
]
],
"spanId": [
[
"50a4f8740c30b9ca"
]
],
"msg": [
[
"sampled=true [or.se.po.re.EmployeeResource] (vert.x-eventloop-thread-1) getEmployee with [id:2]"
]
]
}
As other users have mentioned, it is important to notice the spaces between the words. For instance, there are two spaces between the logLevel and the traceId. You can use the s+ regular expression to forget about them. But maybe using it too much has a big (and bad) impact on performance.
%{TIME}\s+%{LOGLEVEL}\s+traceId=%{WORD:traceId},\s+spanId=%{WORD:spanId},\s+%{GREEDYDATA:msg}
The issue could be a couple of things:
The spacing between fields might be off (try adding \s? or perhaps \t after %{LOGLEVEL})
The %{WORD} pattern might not be picking up the value because of the inclusion of =
Something like this pattern could work (you might need to modify it some):
^%{TIME:time} %{LOGLEVEL:level}\s?(?:%{WORD:traceid}=%{WORD:traceid}), (?:%{WORD:spanid}=%{WORD:spanid}), (?:%{WORD:sampled}=%{WORD:sampled}) %{GREEDYDATA:msg}$

Ignore and move to next pattern if log contains a specific word

I have a log file which comes from spring log file. The log file has three formats. Each of the first two formats is a single line, between them if there is keyword app-info, it is the message printed by own developer. If no, it is printed by spring framework. We may treat developers message different from spring framework ones. The third format is a multiline stack trace.
We have an example for our own format, for example
2018-04-27 10:42:49 [http-nio-8088-exec-1] - INFO - app-info - injectip ip 192.168.16.89
The above line has app-info key works, so it is our own developers'.
2018-04-27 10:42:23 [RMI TCP Connection(10)-127.0.0.1] - INFO - org.apache.catalina.core.ContainerBase.[Tomcat].[localhost].[/] - Initializing Spring FrameworkServlet 'dispatcherServlet'
The above line has not app-info keyword, so it is printed by spring framework.
In my Grok filter, The first pattern is for messages printed from spring framework, the second is for developers' message, the third format is for multiline stacktrace. I want to first regex clearly mention that spring framework pattern does not have key word app-info so that it could get paserexception and follow the second pattern which is developers own format. So I have following formats in regex tool, but I got compile error. My regex is as follows:
(?<timestamp>[\d\-\s\:]+)\s\[(?<threadname>[\d\.\w\s\(\)\-]+)\]\s-\s(?<loglevel>[\w]+)\s+-\s+(?<systemmsg>[^((?app-info).)*\s\.\w\-\'\:\d\[\]\/]+)
since in Grok filter, I use instruction from this link
filter {
grok {
match => [ "message", "PATTERN1", "PATTERN2" , "PATTERN3" ]
}
}
My current configure in logstash is as follows which does not mention app-info clearly in the pattern:
filter {
grok {
match => [
"message",
'(?<timestamp>[\d\-\s\:]+)\s\[(?<threadname>[\d\.\w\s\(\)\-]+)\]\s-\s(?<loglevel>[\w]+)\s+-\s+(?<systemmsg>[\s\.\w\-\'\:\d\[\]\/^[app-info]]+)',
'(?<timestamp>[\d\-\s\:]+)\s\[(?<threadname>[\d\.\w\s\(\)\-]+)\]\s-\s(?<loglevel>[\w]+)\s+-\s(?<appinfo>app-info)\s-\s(?<systemmsg>[\w\d\:\{\}\,\-\(\)\s\"]+)',
'(?<timestamp>[\d\-\s\:]+)\s\[(?<threadname>[\w\-\d]+)\]\s-\s(?<loglevel>[\w]+)\s\-\s(?<appinfo>app-info)\s-\s(?<params>params):(?<jsonstr>[\"\w\d\,\:\.\{\}]+)\s(?<exceptionname>[\w\d\.]+Exception):\s(?<exceptiondetail>[\w\d\.]+)\n\t(?<extralines>at[\s\w\.\d\~\?\n\t\(\)\_\[\]\/\:\-]+)\n\d'
]
}
}
With the format in above logstash configuration, when handling with
2018-04-27 10:42:49 [http-nio-8088-exec-1] - INFO - app-info - injectip ip 192.168.16.89
The first pattern(spring framework pattern) already works, so it does not fall into second pattern which is our own developers format. The parser has parsered successfully as follows:
{
"timestamp": [
[
"2018-04-27 10:42:49"
]
],
"threadname": [
[
"http-nio-8088-exec-1"
]
],
"loglevel": [
[
"INFO"
]
],
"systemmsg": [
[
"app-info - injectip ip 192.168.16.89\n\n"
]
]
}
Any hints I could let first pattern clearly mention that systemmsg shall not contain key word "app-info"?
EDIT:
My goal is that if there is no key word app-info, I let pattern 1 to handle the log. If there is key word app-info, I let pattern 2 to handle the log.
With following log which does not contains key word app-info (pattern 1 shall works),
2018-04-27 10:42:23 [RMI TCP Connection(10)-127.0.0.1] - INFO - org.apache.catalina.core.ContainerBase.[Tomcat].[localhost].[/] - Initializing Spring FrameworkServlet 'dispatcherServlet'
I got following result no match with first pattern modified following your suggestion, which is not my goal.
(?<timestamp>[\d\-\s\:]+)\s\[(?<threadname>[\d\.\w\s\(\)\-]+)\]\s-\s(?<loglevel>[\w]+)\s+-\s+(?<systemmsg>[^(?:(?!app\-info).)*\s\.\w\-\'\:\d\[\]\/]+)
see demo. My goal is to extract timestamp, thread name, log level and system msg. But first pattern does not give me the expected result. The tool say there is no match.
if I remove ^(?:(?!app-info).)*, then above log(without key word app-info) parser works. See demo
But now, It also works for log which contains key word app-info which is not expected, since now I want to extract timestamp, threadname, loglevel,app-info(exist or not)(the field shall be extracted or grouped), then systemmsg. The expectation is that the first parser returns error, let second parser to handle the log. demo could see the parser also works for log with key word app-info. Systemmsg put field app-info into its value which is not expected.
So I want pattern 1, handles log without keyword app-info, pattern 2 handles log with keyword app-info. So I clearly let pattern 1 throw parse error or exception when it contains key word app-info.
My goal is let pattern 1 handles log without keyword app-info. If
there is app-info, the first pattern shall throw parse error, so that
the second parser could handle the log.
You can use the following as your first pattern,
(?<data>^(?!.*app-info).*)%{LOGLEVEL:log}%{DATA:other_data}%{IP:ip}$
What it will do is, it will ignore the log if there is app-info in it at any position, and move to the 2nd PATTERN.
EXAMPLE
Log without app-info,
2018-04-27 10:42:49 [http-nio-8088-exec-1] - INFO injectip ip 192.168.16.89
You can filter it as per your requirements.
OUTPUT
{
"data": [
[
"2018-04-27 10:42:49 [http-nio-8088-exec-1] - "
]
],
"log": [
[
"INFO"
]
],
"other_data": [
[
" injectip ip "
]
],
"ip": [
[
"192.168.16.89"
]
]
}
Now log with app-info,
2018-04-27 10:42:49 [http-nio-8088-exec-1] - INFO app-info injectip ip 192.168.16.89
OUTPUT
No Matches
Please test it here
EDIT 2:
If you make PATTERN1 equals to (?<data>^(?!.*app-info).*)
you will get,
{
"data": [
[
"2018-04-27 10:42:49 [http-nio-8088-exec-1] - INFO injectip ip 192.168.16.89"
]
]
}
you can then add a 2nd grok filter for the data field as follows,
grok {
match => {"data" => "DEFINE PATTERN HERE"}
}
I used GREEDYDATA for this, suppose you have following log line
Redirect Controller: successful redirection for click data: {a:123, b:345}
and you want to capture until "data" then use GREEDYDATA as following
%{GREEDYDATA}data:%{SPACE}%{rest of the pattern}

Concatenate a word to an email using pre-defined grok filter

first of all thank you for reading my question.
i have an email address in a log in following format,
Apr 24 19:38:51 ip-10-0-1-204 sendmail[9489]: w3OJco1s009487: sendid:name#test.co.uk, delay=00:00:01, xdelay=00:00:01, mailer=smtp, pri=120318, relay=webmx.bglen.net. [10.0.3.231], dsn=2.0.0, stat=Sent (Ok: queued as E2DEF60724), w3OJco1s009487: to=<username#domain.us>, delay=00:00:01, xdelay=00:00:01, mailer=smtp, pri=120318, relay=webmx.[redacted].net. [10.0.3.231], dsn=2.0.0, stat=Sent (Ok: queued as E2DEF60724)
and i need to extract the email along with the word sendid
output should look like this,
{
"DATA": [
[
"sendid:name#test.co.uk"
]
]
}
i have tried following but it only extracts email i tested it here, http://grokdebug.herokuapp.com/ ,
sendid:%{DATA},
How can i concatenate the word sendid: to the email without creating a new field or defining a new regex? can someone please help?
i have also tried this but it doesn't work,
sendid:%{"sendid:"} %{DATA},
Your sendid:%{DATA}, won't work because anything that you provide outside grok pattern are matched as surroundings, in your case everything between sendid: and , will be matched, and it will give you,
{
"DATA": [
[
"name#test.co.uk"
]
]
}
You need to create a custom pattern and combine it with pre-defined pattern for your solution, since you cannot use any pre-defined pattern entirely.
Logstash allows you to create custom patterns using Oniguruma regex library for such situations. The syntax is,
(?<field_name>the pattern here)
in your case it will be,
\b(?<data>sendid:%{EMAILADDRESS})\b
OUTPUT:
{
"data": [
[
"sendid:name#test.co.uk"
]
],
"EMAILADDRESS": [
[
"name#test.co.uk"
]
],
"EMAILLOCALPART": [
[
"name"
]
],
"HOSTNAME": [
[
"test.co.uk"
]
]
}

Separate output values from a single grok query?

I've been capturing web logs using logstash, and specifically I'm trying to capture web URLs, but also split them up.
If I take an example log entry URL:
"GET https://www.stackoverflow.com:443/some/link/here.html HTTP/1.1"
I use this grok pattern:
\"(?:%{NOTSPACE:http_method}|-)(?:%{SPACE}http://)?(?:%{SPACE}https://)?(%{NOTSPACE:http_site}:)?(?:%{NUMBER:http_site_port:int})?(?:%{GREEDYDATA:http_site_url})? (?:%{WORD:http_type|-}/)?(?:%{NOTSPACE:http_version:float})?(?:%{SPACE})?\"
I get this:
{
"http_method": [
[
"GET"
]
],
"SPACE": [
[
" ",
null,
""
]
],
"http_site": [
[
"www.stackoverflow.com"
]
],
"BASE10NUM": [
[
"443"
]
],
"http_site_url": [
[
"/some/link/here.html"
]
],
"http_type": [
[
"HTTP"
]
]
}
The trouble is, I'm trying to ALSO capture the entire URL:
https://www.stackoverflow.com:443/some/link/here.html
So in total, I'm seeking 4 separate outputs:
http_site_complete https://www.stackoverflow.com:443/some/link/here.html
http_site www.stackoverflow.com
http_site_port 443
http_site_url /some/link/here.html
Is there some way to do this?
First, look at the built-in patterns for dealing with URLs. Putting something like URIHOST in your pattern will be easier to read and maintain that a bunch od WORDs or NOTSPACEs.
Second, once you have lots of little fields, you can always use logstash's filters to manipulate them. You could use:
mutate {
add_field => { "http_site_complete", "%{http_site}:%{http_site_port}%{http_site_url}" }
}
}
Or you could get fancy with your regexp and use a named group:
(?<total>%{WORD:wordOne} %{WORD:wordTwo} %{WORD:wordThree})
which would individually capture three fields and make one more field from the whole string.

Resources