How to extract value from log with grok and logstash - logstash

I must extract value from a log composed by row like this:
<38>1 [2017-03-15T08:45:23.168Z] apache.01.mysite.com event=login;src_ip=xxx.xxx.xxx.xxx\, xxx.xxx.xxx.xxx\, xxx.xxx.xxx.xxx;site=FE-B1-Site;cstnr=1454528;user=498119;result=SUCCESS
For example with %{IP:source}
I obtain only the first IP but, sometimes, I have 3 IP address.
How I can extract all IP,'cstnr', 'result' and 'user' ?

Looks like you have a nested, delimited key-value format. First delimiter is ;, with each of those a key=value. Additionally, the values are delimited on ,'. You have enough grok to get the first IP address, but I suggest doing something a bit different:
Use grok to grab the entire string after your site-name.
Use the kv filter with field_split => ';', which will create fields named the same as your keys.
Use the csv filter on the src_ip address captured in the kv filter stage.
Use columns => [ cstnr', 'result', 'user' ] to get those fields named right.

Related

Logstash Grok regex parsing

I am trying to do a parsing on a plaintext message using Grok; my goal is to explode the plaintext to a JSON log.
The message has a quite rigid format, as follows:
<timestamp> <loglevel> <greedydata> field1=value1, field2=value2, .... fieldN=valueN
Where the number of fields is not fixed.
It's possible to capture every field=value pair using a named capturing group, being able to use the same "field" name as the key in the output message?
Thanks
TL;DR - use dissect instead of grok
You want something like:
{
"timestamp": <timestamp>,
"loglevel": <loglevel>,
"field1": value1,
"field2": value2,
....
"fieldN": valueN
}
Where the keys (field1, fieldN etc) are dynamic.
You cannot use grok to do this. Even using a pattern like this (then using array position indices) won't work:
( field[0-9]+=%{DATA:value})+$
You need to handle this a different way. Your options are:
handle this before it hits logstash
use a ruby filter
use the dissect filter

How can you parse 2 properties out of 1 value with grok in logstash?

Some context:
I want to parse the following log statement using grok in logstash
07:51:45,729 TRACE [com.company.Class] (ajp-/1.2.3.4:8080-251) USERID called path: /url and took: 1000 ms
I am now using the following syntax to parse the complete message:
%{DATA:time}\s%{DATA:level}\s%{DATA:class}\s%{DATA:thread}\s%{DATA:userid}\s.*path:\s%{DATA:url}\s.*:\s%{NUMBER:duration:int}\sms
Which gives me all the properties that i have defined.
My question:
I want to parse this part (ajp-/1.2.3.4:8080-251) into a 'thread' property and an ip property.
The result needs to be:
thread: (ajp-/1.2.3.4:8080-251)
ip: 1.2.3.4
How can i do this?
Thanks
Just add a second grok filter after your working one. Do not put this in your existing grok filter because it will finish after the first match.
Example:
grok {
match => [ 'thread', '%{IP:ip}' ]
}
This obtains your previous field thread => "(ajp-/1.2.3.4:8080-251)" and adds a new field ip => "1.2.3.4"
Apart from that, I would recommend you to be more specific with your pattern. You used DATA everytime which is kind of imprecise. Start with something like this:
%{TIME:timestamp} %{WORD:method} \[%{JAVACLASS:class}\] \(%{DATA:thread}\) %{NUMBER:userid} %{DATA}%{URIPATH:uri}%{DATA}

How to define grok pattern for pipe delimited log message?

setting up ELK is very easy until you hit the logstash filter. I have a log delimited 10 fields. I may have some field blank but I am sure there will be 10 fields:
7/5/2015 10:10:18 AM|KDCVISH01|
|ClassNameUnavailable:MethodNameUnavailable|CustomerView|xwz261|ef315792-5c41-4bdf-aa66-73317e82e4d6|52|6182d1a1-7916-4874-995b-bc9a23437dab|<Exception>
afkh akla 487234 &*<Exception>
Q:
1- I am confused how grok or regex pattern will pick only the field that I am looking and not the similar match from another field. For example, what is the guarantee that DATESTAMP pattern picks only the first value and not the timestamp present in the last field (buried in stack trace)?
2- Is there a way to define positional mapping? For example, 1st fiels is dateTime, 2nd is machine name, 3rd is class name and so on. This will make sure I have fields displayed in Kibana no matter the field value is present or not.
I know i am little late, But here is a simple solution which i am using,
replace your | with space
option 1:
filter {
mutate {
gsub => ["message","\|"," "]
}
grok {
match => ["message","%{DATESTAMP:time} %{WORD:MESSAGE1} %{WORD:EXCEPTION} %{WORD:MESSAGE2}"]
}
}
option 2: excepting |
filter {
grok {
match => ["message","%{DATESTAMP:time}\|%{WORD:MESSAGE1}\|%{WORD:EXCEPTION}\|%{WORD:MESSAGE2}"]
}
}
it is working fine : http://grokdebug.herokuapp.com/. check here.

logstash grok filter for logs with arbitrary attribute-value pairs

(This is related to my other question logstash grok filter for custom logs )
I have a logfile whose lines look something like:
14:46:16.603 [http-nio-8080-exec-4] INFO METERING - msg=93e6dd5e-c009-46b3-b9eb-f753ee3b889a CREATE_JOB job=a820018e-7ad7-481a-97b0-bd705c3280ad data=71b1652e-16c8-4b33-9a57-f5fcb3d5de92
14:46:17.378 [http-nio-8080-exec-3] INFO METERING - msg=c1ddb068-e6a2-450a-9f8b-7cbc1dbc222a SET_STATUS job=a820018e-7ad7-481a-97b0-bd705c3280ad status=ACTIVE final=false
I built a pattern that matched the first line:
%{TIME:timestamp} %{NOTSPACE:http} %{WORD:loglevel}%{SPACE}%{WORD:logtype} - msg=%{NOTSPACE:msg}%{SPACE}%{WORD:action}%{SPACE}job=%{NOTSPACE:job}%{SPACE}data=%{NOTSPACE:data}
but obviously that only works for lines that have the data= at the end, versus the status= and final= at the end of the second line, or other attribute-value pairs on other lines? How can I set up a pattern that says that after a certain point there will be an arbitrary of foo=bar pairs that I want to recognize and output as attribute/value pairs in the output?
You can change your grok pattern like this to have all the key value pairs in one field (kvpairs):
%{TIME:timestamp} %{NOTSPACE:http} %{WORD:loglevel}%{SPACE}%{WORD:logtype} - %{GREEDYDATA:kvpairs}
Afterwards you can use the kv filter to parse the key value pairs.
kv {
source => "kvpairs"
remove_field => [ "kvpairs" ] # Delete the field afterwards
}
Unfortunately, you have some simple values inside your kv pairs (e.g. CREATE_JOB). You could parse them with grok and use one kv filter for the values before and another kv filter for the values after those simple values.

Multi-value fields only store last value

I'm relatively new to ELK and grok. I'm trying to parse a log file that may contain 1 or more repetitions of the same value. For example the log file could contain:
value1;value2;value3;
value1;
value1;value2;value3;value4;........value900;
For this example, I'm using the following grok pattern:
((?[a-z0-9]*)[;])+
This appears to work properly, and parse each value. The problem is that the "tag" field only contains the last value (ie value900). All of the previous values seem to be overwritten.
Is there a way to gather all of the values and store them into an array instead of only getting the last value?
Simply use mutate:
mutate {
split => ["tag",";"]
}
This will split the value that's in the tag field into an array. So just match the whole string in your grok ((?<tag>[a-z0-9;]+) and then split it from there.

Resources