logstash grok filter for logs with arbitrary attribute-value pairs - logstash

(This is related to my other question logstash grok filter for custom logs )
I have a logfile whose lines look something like:
14:46:16.603 [http-nio-8080-exec-4] INFO METERING - msg=93e6dd5e-c009-46b3-b9eb-f753ee3b889a CREATE_JOB job=a820018e-7ad7-481a-97b0-bd705c3280ad data=71b1652e-16c8-4b33-9a57-f5fcb3d5de92
14:46:17.378 [http-nio-8080-exec-3] INFO METERING - msg=c1ddb068-e6a2-450a-9f8b-7cbc1dbc222a SET_STATUS job=a820018e-7ad7-481a-97b0-bd705c3280ad status=ACTIVE final=false
I built a pattern that matched the first line:
%{TIME:timestamp} %{NOTSPACE:http} %{WORD:loglevel}%{SPACE}%{WORD:logtype} - msg=%{NOTSPACE:msg}%{SPACE}%{WORD:action}%{SPACE}job=%{NOTSPACE:job}%{SPACE}data=%{NOTSPACE:data}
but obviously that only works for lines that have the data= at the end, versus the status= and final= at the end of the second line, or other attribute-value pairs on other lines? How can I set up a pattern that says that after a certain point there will be an arbitrary of foo=bar pairs that I want to recognize and output as attribute/value pairs in the output?

You can change your grok pattern like this to have all the key value pairs in one field (kvpairs):
%{TIME:timestamp} %{NOTSPACE:http} %{WORD:loglevel}%{SPACE}%{WORD:logtype} - %{GREEDYDATA:kvpairs}
Afterwards you can use the kv filter to parse the key value pairs.
kv {
source => "kvpairs"
remove_field => [ "kvpairs" ] # Delete the field afterwards
}
Unfortunately, you have some simple values inside your kv pairs (e.g. CREATE_JOB). You could parse them with grok and use one kv filter for the values before and another kv filter for the values after those simple values.

Related

Regex group from within custom grok pattern

I'm trying to create custom grok patterns to extract various data using logstash and am wracking my brain getting the syntax correct to pull the regex group 1 equivalent from my log rows. I've looked at a ton of threads on this over the past 2 days, but nothing's out there that fits my example, and none of the canned grok patterns seem like they will pull the value I need.
3 example log file rows look similar to this (with abbreviated data for the examples):
2022-04-07 12:52:06,184:INFO :Thread-70_SCHEDULE.0001: MsgID=63759111848731967
2022-04-07 07:23:39,876:INFO :Thread-53_OrderInterfaceIntServer: MsgID=21316889724753182|
07:23:40,482 INFO [stdout] (http-/0.0.0.0:8080-20) 2022-04-07 07:23:40,482:ERROR
I want to create a custom grok pattern called SERVICE that extracts a pattern match using a regex match string:
Thread-[0-9]{2}_(.*?)\:
that for the 3 rows would return:
SCHEDULE.0001
OrderInterfaceIntServer
""
In the log:
SERVICE will always be prefixed by "Thread-xx_" where xx = 2-digit number followed by underscore. Some logs may not have this pattern at all (like row 3). In that case, no match.
SERVICE is always followed by a colon
In grok, I can define this in 2 ways:
SERVICE Thread-[0-9]{2}_(.*?)\:
or as a field using (?<service>Thread-[0-9]{2}_(.*?)\:)
however, for row 1, I get the response value of:
{
"service": [
[
"Thread-70_SCHEDULE.0001:"
]
]
}
What I want is:
{
"service": [
[
"SCHEDULE.0001"
]
]
}
Which is the equivalent of the regex group 1 response. I can't figure out how to manage the grok patterns to get the result I need.
You do not have to include all of the pattern in the capture group. You can use
grok { match => { "message" => "Thread-[0-9]{2}_(?<service>.*?):" } }
That will result in
"service" => "SCHEDULE.0001",
"service" => "OrderInterfaceIntServer",
and a "_grokparsefailure" tag on the third event.

parsing in Logstash whole files as one event each

I want to monitor with Logstash a directory with individual files, that each describe an event and consist each of key-value pairs
> /var/log/dir/history.*
> head /var/log/dir/history.1234
key1 = value1
key2 = value2
...
The kv filter plugin can parse the key-value pairs, however the input needs to be some kind of multiline as to conflate each file into one event.
The multiline input plugin requires a pattern to match as well the what-keyword, if the event starts/ends at the pattern match.
Since I consider the whole file as one event, I have no real regex or so to match.
How can I parse one file as input as one event with logstash's multiline or is there a better input plugin for this use case?
taking the solution from https://discuss.elastic.co/t/log-file-parsing-and-ingesting-into-es/247806/6
i.e., to ingest a whole file, the trick is to match on a not existing pattern
codec => multiline {
pattern => "^Spalanzanidonotmatchinanycase"
negate => true
what => "previous"
auto_flush_interval => 2
}

How to extract value from log with grok and logstash

I must extract value from a log composed by row like this:
<38>1 [2017-03-15T08:45:23.168Z] apache.01.mysite.com event=login;src_ip=xxx.xxx.xxx.xxx\, xxx.xxx.xxx.xxx\, xxx.xxx.xxx.xxx;site=FE-B1-Site;cstnr=1454528;user=498119;result=SUCCESS
For example with %{IP:source}
I obtain only the first IP but, sometimes, I have 3 IP address.
How I can extract all IP,'cstnr', 'result' and 'user' ?
Looks like you have a nested, delimited key-value format. First delimiter is ;, with each of those a key=value. Additionally, the values are delimited on ,'. You have enough grok to get the first IP address, but I suggest doing something a bit different:
Use grok to grab the entire string after your site-name.
Use the kv filter with field_split => ';', which will create fields named the same as your keys.
Use the csv filter on the src_ip address captured in the kv filter stage.
Use columns => [ cstnr', 'result', 'user' ] to get those fields named right.

How to define grok pattern for pipe delimited log message?

setting up ELK is very easy until you hit the logstash filter. I have a log delimited 10 fields. I may have some field blank but I am sure there will be 10 fields:
7/5/2015 10:10:18 AM|KDCVISH01|
|ClassNameUnavailable:MethodNameUnavailable|CustomerView|xwz261|ef315792-5c41-4bdf-aa66-73317e82e4d6|52|6182d1a1-7916-4874-995b-bc9a23437dab|<Exception>
afkh akla 487234 &*<Exception>
Q:
1- I am confused how grok or regex pattern will pick only the field that I am looking and not the similar match from another field. For example, what is the guarantee that DATESTAMP pattern picks only the first value and not the timestamp present in the last field (buried in stack trace)?
2- Is there a way to define positional mapping? For example, 1st fiels is dateTime, 2nd is machine name, 3rd is class name and so on. This will make sure I have fields displayed in Kibana no matter the field value is present or not.
I know i am little late, But here is a simple solution which i am using,
replace your | with space
option 1:
filter {
mutate {
gsub => ["message","\|"," "]
}
grok {
match => ["message","%{DATESTAMP:time} %{WORD:MESSAGE1} %{WORD:EXCEPTION} %{WORD:MESSAGE2}"]
}
}
option 2: excepting |
filter {
grok {
match => ["message","%{DATESTAMP:time}\|%{WORD:MESSAGE1}\|%{WORD:EXCEPTION}\|%{WORD:MESSAGE2}"]
}
}
it is working fine : http://grokdebug.herokuapp.com/. check here.

Multi-value fields only store last value

I'm relatively new to ELK and grok. I'm trying to parse a log file that may contain 1 or more repetitions of the same value. For example the log file could contain:
value1;value2;value3;
value1;
value1;value2;value3;value4;........value900;
For this example, I'm using the following grok pattern:
((?[a-z0-9]*)[;])+
This appears to work properly, and parse each value. The problem is that the "tag" field only contains the last value (ie value900). All of the previous values seem to be overwritten.
Is there a way to gather all of the values and store them into an array instead of only getting the last value?
Simply use mutate:
mutate {
split => ["tag",";"]
}
This will split the value that's in the tag field into an array. So just match the whole string in your grok ((?<tag>[a-z0-9;]+) and then split it from there.

Resources