Working on getting our Quarkus log files into elasticsearch. My problem is in trying to process the logs in logstash... How can I get the traceId and spanId using grok filter?
Here's a sample log entry:
21:11:32 INFO traceId=50a4f8740c30b9ca, spanId=50a4f8740c30b9ca, sampled=true [or.se.po.re.EmployeeResource] (vert.x-eventloop-thread-1) getEmployee with [id:2]
Here is my grok:
%{TIME} %{LOGLEVEL} %{WORD:traceId} %{WORD:spanId} %{GREEDYDATA:msg}
Using grok debugger, it seem traceId and spanId are not detected.
AFIK Grok expressions need to be exactly as the original text. So try to add commas, spaces and event all the text you do not want to capture. For instance traceId=
%{TIME} %{LOGLEVEL} traceId=%{WORD:traceId}, spanId=%{WORD:spanId}, %{GREEDYDATA:msg}
This is the output from https://grokdebug.herokuapp.com/ for your log line and my grok expression suggestion.
{
"TIME": [
[
"21:11:32"
]
],
"HOUR": [
[
"21"
]
],
"MINUTE": [
[
"11"
]
],
"SECOND": [
[
"32"
]
],
"LOGLEVEL": [
[
"INFO"
]
],
"traceId": [
[
"50a4f8740c30b9ca"
]
],
"spanId": [
[
"50a4f8740c30b9ca"
]
],
"msg": [
[
"sampled=true [or.se.po.re.EmployeeResource] (vert.x-eventloop-thread-1) getEmployee with [id:2]"
]
]
}
As other users have mentioned, it is important to notice the spaces between the words. For instance, there are two spaces between the logLevel and the traceId. You can use the s+ regular expression to forget about them. But maybe using it too much has a big (and bad) impact on performance.
%{TIME}\s+%{LOGLEVEL}\s+traceId=%{WORD:traceId},\s+spanId=%{WORD:spanId},\s+%{GREEDYDATA:msg}
The issue could be a couple of things:
The spacing between fields might be off (try adding \s? or perhaps \t after %{LOGLEVEL})
The %{WORD} pattern might not be picking up the value because of the inclusion of =
Something like this pattern could work (you might need to modify it some):
^%{TIME:time} %{LOGLEVEL:level}\s?(?:%{WORD:traceid}=%{WORD:traceid}), (?:%{WORD:spanid}=%{WORD:spanid}), (?:%{WORD:sampled}=%{WORD:sampled}) %{GREEDYDATA:msg}$
This is the field
device_version => 2.6.1.1280 [eng:v1.3.26.0 rul:v2018.07.12.09 act:v2018.01.20.01 sws:v2018.07.12.09]
How can I get eng, and rul ... value and put them individually to a new field?
Thanks
If you just want to match eng and rul value, you can simple match them using %{DATA},
eng:%{DATA:eng}\srul:%{DATA:rul}\s
This will output,
{
"eng": [
[
"v1.3.26.0"
]
],
"rul": [
[
"v2018.07.12.09"
]
]
}
You can test it at https://grokdebug.herokuapp.com/
Edit:
filter {
grok {
match => { "device_version" => "eng:%{DATA:eng}\srul:%{DATA:rul}\s" }
}
}
You should also have a look at default grok patterns available, https://github.com/elastic/logstash/blob/v1.4.2/patterns/grok-patterns
Question 1 -
56dd573d.5edd this is my session id, i have grok filter like
%{WORD:session_id}.%{WORD:session_id} - this will read the session id and output will look like this
"session_id": [
[
"56dd573d",
"5edd"
]
]
Is there any way where i can get output something like
"session_id": [
[
"56dd573d.5edd"
]
]
I just need it in single field
Question 2 -
2016-03-08 06:48:15.477 GMT
this is a line from my log entry, i have used
%{DATESTAMP:log_time} %{WORD}
grok filter to read this date, here i simply want to drop or ignore the GMT
Is there any special pattern to ignore the next word from the log line which is not useful ?
Updated
Question 3 - How do i handle null value, its after GMT
2016-03-07 10:26:05 GMT,,
This is my postgresql log entry
2016-03-08 06:48:15.477 GMT,"postgres","sugarcrm",24285,"[local]",56dd573d.5edd,4,"idle",2016-03-07 10:26:05 GMT,,0,LOG,00000,"disconnection: session time: 20:22:09.928 user=postgres database=sugarcrm host=[local]",,,,,,,,,""
Note - null value may be in "" or ,,
Answer for question 3
I found the solution for handling ,,
Below is configuration for handling ,, value by replacing 0 with it
input {
file {
path => "/var/log/logstash/postgres.log"
start_position => "beginning"
type => "postgres"
}
}
filter {
mutate {
gsub => [
"message", "^,", "0,",
"message", ",,", ",0,",
"message", ",,", ",0,",
"message", ",,", ",0,",
"message", ",$", ",0"
]
}
grok {
match => ["message","%{GREEDYDATA:msg1}"]
}
}
output {
stdout { codec => rubydebug }
}
Reference -
http://comments.gmane.org/gmane.comp.sysutils.logstash.user/13842
But i am trying for "" null value i tried below configuration but i am getting configuration error
filter { mutate {
gsub => [
"message", "^,", "0,",
"message", ",,", ",0,",
"message", ",,", ",0,",
"message", ",,", ",0,",
"message", ",$", ",0",
"message", "^\"" "null\""
"message", """" ""null""
"message", """" ""null""
"message", ""$", ""null"
] }
I need to replace "" with null
Regarding question 1. It separates the two because essentially what youre asking it to do it add another value to session_id. You want something like:
(?<session_ID>(%{WORD}.%{WORD}))
Try it out on https://grokdebug.herokuapp.com/ . Where you can test your patterns. I The above isnt the greatest of solutions, but I dont have enough information about the rest of the message. Because if you know more, you can throw away the WORD match. If it is a structured session_ID with fixed length, for example, you can do:
(?<session_ID>([a-zA-Z0-9]{1,8}\.)[a-zA-Z0-9]{1,4})
Regarding the second question. I would hard code it for a quick hack:
%{DATESTAMP:log_time} GMT
give some more information and we can give a better more specific answer. The above should work, but there are several ways to skin a cat!
I've been capturing web logs using logstash, and specifically I'm trying to capture web URLs, but also split them up.
If I take an example log entry URL:
"GET https://www.stackoverflow.com:443/some/link/here.html HTTP/1.1"
I use this grok pattern:
\"(?:%{NOTSPACE:http_method}|-)(?:%{SPACE}http://)?(?:%{SPACE}https://)?(%{NOTSPACE:http_site}:)?(?:%{NUMBER:http_site_port:int})?(?:%{GREEDYDATA:http_site_url})? (?:%{WORD:http_type|-}/)?(?:%{NOTSPACE:http_version:float})?(?:%{SPACE})?\"
I get this:
{
"http_method": [
[
"GET"
]
],
"SPACE": [
[
" ",
null,
""
]
],
"http_site": [
[
"www.stackoverflow.com"
]
],
"BASE10NUM": [
[
"443"
]
],
"http_site_url": [
[
"/some/link/here.html"
]
],
"http_type": [
[
"HTTP"
]
]
}
The trouble is, I'm trying to ALSO capture the entire URL:
https://www.stackoverflow.com:443/some/link/here.html
So in total, I'm seeking 4 separate outputs:
http_site_complete https://www.stackoverflow.com:443/some/link/here.html
http_site www.stackoverflow.com
http_site_port 443
http_site_url /some/link/here.html
Is there some way to do this?
First, look at the built-in patterns for dealing with URLs. Putting something like URIHOST in your pattern will be easier to read and maintain that a bunch od WORDs or NOTSPACEs.
Second, once you have lots of little fields, you can always use logstash's filters to manipulate them. You could use:
mutate {
add_field => { "http_site_complete", "%{http_site}:%{http_site_port}%{http_site_url}" }
}
}
Or you could get fancy with your regexp and use a named group:
(?<total>%{WORD:wordOne} %{WORD:wordTwo} %{WORD:wordThree})
which would individually capture three fields and make one more field from the whole string.
I have the following I'm trying to parse with GROK:
Hello|STATSTIME=20-AUG-15 12.20.03.051000 PM|World
I can parse the first bunch of it with GROK like so:
match => ["message","%{WORD:FW}\|STATSTIME=%{MONTHDAY:MDAY}-%{WORD:MON}-%{INT:YY} %{INT:HH}"]
Anything further than that gives me an error. I can't figure out how to quote the : character, : does not work and %{TIME:time} does not work. I'd like to be able to get the whole thing as a timestamp, but can't get it broken up. Any ideas?
You can use this to debug grok expressions
The time format is as shown here
To parse 12.20.03.051000
%{INT:hour}.%{INT:min}.%{INT:sec}.%{INT:ms}
Output will be something like this
{
"hour": [
[
"12"
]
],
"min": [
[
"20"
]
],
"sec": [
[
"03"
]
],
"ms": [
[
"051000"
]
]
}