get name of pattern that matched in grok in logstash

get name of pattern that matched in grok in logstash - logstash

If I have a patterns file with a bunch of regex patterns such as the following
A .*foo.*
B .*bar.*
C .*baz.*
and my grok filter looks like the following:
grok {
patterns_dir => ["/location/of/patterns"]
match => { "request" => [ "%{A}", "%{B}", "%{C}",]
}
}
is there any way to know which one matched. I.e the name of the SYNTAX. I would like to annotate the document with the name of the one that matched

what you would usually do is name the matched variables. The syntax for that would be:
(taking your example):
grok {
patterns_dir => ["/location/of/patterns"]
match =>
{
"request" => [ "%{A:A}", "%{B:NameOfB}", "%{C:SomeOtherName}",]
}
}
Accordingly, the matches of your grok would now be named:
A: A
B: NameOfB
C: SomeOtherName
So in your case you could just name them after the patterns. That should work just fine.
Alternatively (I just tested that with grok debugger) it appears that if you do not name your matched pattern they will default to the name of the pattern (which I think is what you want). The downfall of this is that if you reuse your pattern, the result will be an array of values.
This is the test I ran:
Input:
Caused by: com.my.application.IOException: null Caused by: com.my.application.IOException: null asd asd
grok:
(.*?)Caused by:%{GREEDYDATA}:%{GREEDYDATA}
Output:
{
"GREEDYDATA": [
[
" com.my.application.IOException: null Caused by: com.my.application.IOException",
" null asd asd"
]
]
}
Hope that solves your problems,
Artur
EDIT:
Based on OP's other question here is my approach to solving that issue dynamically.
You will still have to match the names. Decide on a common prefix on how to name your matches. I will base my example on 2 json strings to make this easier:
{"a" : "b", "prefix_patterna" : "", "prefix_patternb" : "bla"}
{"a" : "b", "prefix_patterna" : "sd", "prefix_patternb" : ""}
Note how there are 2 artificial matches, prefix_patterna and prefix_patternb. So, I decided on the prefix "prefix" and I use that to identify which event fields to inspect. (you can grok to also drop empty events if that is something you want).
Then in my filter, I use ruby to iterate through all events to find the one that matched my pattern:
ruby {
code => "
toAdd = nil;
event.to_hash.each { |k,v|
if k.start_with?('prefix_') && v.to_s != ''
toAdd = k
end
}
if toAdd.to_s != ''
event['test'] = toAdd
end
"
}
All this code does is to check the event keys for the prefix, and see if the value of that field is empty or nil. If it finds the field that has a value, it writes it into a new event field called "test".
Here are my tests:
Settings: Default pipeline workers: 8
Pipeline main started
{"a" : "b", "prefix_patterna" : "sd", "prefix_patternb" : ""}
{
"message" => "{\"a\" : \"b\", \"prefix_patterna\" : \"sd\", \"prefix_patternb\" : \"\"}",
"#version" => "1",
"#timestamp" => "2016-09-15T09:48:29.418Z",
"host" => "pandaadb",
"a" => "b",
"prefix_patterna" => "sd",
"prefix_patternb" => "",
"test" => "prefix_patterna"
}
{"a" : "b", "prefix_patterna" : "", "prefix_patternb" : "bla"}
{
"message" => "{\"a\" : \"b\", \"prefix_patterna\" : \"\", \"prefix_patternb\" : \"bla\"}",
"#version" => "1",
"#timestamp" => "2016-09-15T09:48:36.359Z",
"host" => "pandaadb",
"a" => "b",
"prefix_patterna" => "",
"prefix_patternb" => "bla",
"test" => "prefix_patternb"
}
Note how the first test writes "prefix_patterna" while the second test writes "prefix_patternb".
I hope this solves your issue,
Artur

You can tag the match, (or add fields) by having multiple grok filters as follows.
It doesn't feel elegant, is not very scalable as it is prone to a lot of repetition (not DRY), but seems to be the only way to "flag" matches of complex patterns - especially predefined library patterns.
Note you have to add conditionals to the subsequent filters to avoid them being run too when previous filters have already matched. Otherwise you'll still get _grokparsefailure tags for the later filters. Source
You also need to remove the failure tags of all but the final "else" filter. Otherwise you will get spurious _grokparsefailures e.g. from A when B or C matches. Source
grok {
patterns_dir => ["/location/of/patterns"]
match => { "request" => "%{A}"
add_tag => [ "pattern_A" ]
add_field => { "pattern" => "A" } # another option
tag_on_failure => [ ] # prevent false failure tags
}
if ("pattern_A" not in [tags]) {
grok {
patterns_dir => ["/location/of/patterns"]
match => { "request" => "%{B}"
add_tag => [ "pattern_B" ]
tag_on_failure => [ ] # prevent false failure tags
}
}
if (["pattern_A","pattern_B"] not in [tags]) {
grok {
patterns_dir => ["/location/of/patterns"]
match => { "request" => "%{C}"
add_tag => [ "pattern_C" ]
}
}
There may be ways to simplify / tune this, but I'm not an expert (yet!).

Related

logstash GROK filter along with KV plugin couldn't able to process the events

i am new to ELK. when i onboarded the below log file, it is going to "dead letter queue" in logstash because logstash couldn't able to process the events.I have written the GROK filter to parse the events but logstash still couldn't not process the events. Any help would be appreciated.
Below is the sample log format.
25193662345 [http-nio-8080-exec-44] DEBUG c.s.b.a.m.PerformanceMetricsFilter - method=PUT status=201 appLogicTime=1, streamInTime=0, blobStorageTime=31, totalTime=33 tenantId=b9sdfs-1033-4444-aba5-csdfsdfsf, immutableBlobId=bss_c_586331/Sample_app12-sdas-157123148464.txt, blobSize=2862, domain=abc
2519366789 [http-nio-8080-exec-47] DEBUG q.s.b.y.m.PerformanceMetricsFilter - method=PUT status=201 appLogicTime=1, streamInTime=0, blobStorageTime=32, totalTime=33 tenantId=b0csdfsd-1066-4444-adf4-ce7bsdfssdf, immutableBlobId=bss_c_586334/Sample_app15-615223-157sadas6648465.txt, blobSize=2862, domain=cde
GROK filter:
dissect { mapping => { "message" => "%{NUMBER:number} [%{thread}] %{level} %{class} - %{[#metadata][msg]}" } }
kv { source => "[#metadata][msg]" field_split => "," }
Thanks

You have basically two problems in your configuration.
1.) You are using the dissect filter, not grok, both are used to parse messages, but grok uses regular expressions to validate the value of the field and dissect is just positional, it does not perform any validation, if you have a WORD value in the position of a field that expects a NUMBER, grok will fail, but dissect will not.
If your log lines always have the same pattern, you should continue to use dissect since it is faster and needs less cpu.
Your correct dissect mapping should be:
dissect {
mapping => { "message" => "%{number} [%{thread}] %{level} %{class} - %{[#metadata][msg]}" }
}
2.) The field that contains the kv message is wrong, it has fields separated by space and by comma, kv won't work this way.
After your dissect filter this is the content of [#metadata][msg].
method=PUT status=201 appLogicTime=1, streamInTime=0, blobStorageTime=32, totalTime=33 tenantId=b0csdfsd-1066-4444-adf4-ce7bsdfssdf, immutableBlobId=bss_c_586334/Sample_app15-615223-157sadas6648465.txt, blobSize=2862, domain=cde
To solve this you should use a mutate filter to remove the comma from the [#metadata][msg] and use the kv filter with the default configurations.
This should be your filter configuration
filter {
dissect {
mapping => { "message" => "%{number} [%{thread}] %{level} %{class} - %{[#metadata][msg]}" }
}
mutate {
gsub => ["[#metadata][msg]",",",""]
}
kv {
source => "[#metadata][msg]"
}
}
Your output should be something like this:
{
"number" => "2519366789",
"#timestamp" => 2019-11-03T16:42:11.708Z,
"thread" => "http-nio-8080-exec-47",
"appLogicTime" => "1",
"domain" => "cde",
"method" => "PUT",
"level" => "DEBUG",
"blobSize" => "2862",
"#version" => "1",
"immutableBlobId" => "bss_c_586334/Sample_app15-615223-157sadas6648465.txt",
"streamInTime" => "0",
"status" => "201",
"blobStorageTime" => "32",
"message" => "2519366789 [http-nio-8080-exec-47] DEBUG q.s.b.y.m.PerformanceMetricsFilter - method=PUT status=201 appLogicTime=1, streamInTime=0, blobStorageTime=32, totalTime=33 tenantId=b0csdfsd-1066-4444-adf4-ce7bsdfssdf, immutableBlobId=bss_c_586334/Sample_app15-615223-157sadas6648465.txt, blobSize=2862, domain=cde",
"totalTime" => "33",
"tenantId" => "b0csdfsd-1066-4444-adf4-ce7bsdfssdf",
"class" => "q.s.b.y.m.PerformanceMetricsFilter"
}

How to capture repeated pattern in logstash(5.4.0) grok?

I would appreciate if someone can help me out with logstash grok.
Given a log like below ,
IN 192.168.11.2 IN 192.168.11.3
My goal is to put the ip address into array using grok. List of ip is dynamic and possible to extend more than 2.
e.g
tmp = [
"192.168.11.2", "192.168.11.3"
]
However, if I use a filter like below it ends up in single field.
filter {
grok {
match => { "message" => "(?<tmp>(IN %{IPV4}(\s)?)*)" }
}
}
Result,
"path" => "/tmp/sample.csv",
"#timestamp" => 2017-08-24T05:00:08.093Z,
"tmp" => "IN 192.168.11.2 IN 192.168.11.3",
"#version" => "1",
"host" => "host.ywlocal.net",
"message" => "IN 192.168.11.2 IN 192.168.11.3"
Would this be possible?

You can use the ruby filter for more advanced parsing:
filter {
ruby {
code => "event.set('ips') = event.get('message').scan(/\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\/)"
}
}
Regexp is not 100% correct to match ip address but should work for your needs

Logstash grok plugin, add field when matched

I have a grok match like this:
grok{ match => [ “message”, “Duration: %{NUMBER:duration}”, “Speed: %{NUMBER:speed}” ] }
I also want to add another field to captured variables if it matches a grok pattern. I know I can use mutate plugin and if-else to add new fields but I have too many matches and it will be too long that way. As an example, I want to capture right-side fields for given texts.
"Duration: 12" => [duration: "12", type: "duration_type"]
"Speed: 12" => [speed: "12", type: "speed_type"]
Is there a way to do this?

I am not 100% sure if that is what you need, but I did something similar. I have a basic parsing for my message, and then I analyse a specific field additionally with optional matches.
grok {
break_on_match => false
patterns_dir => "/etc/logstash/conf.d/patterns"
match => {
"message" => "\[%{LOGLEVEL:level}\] \[%{IPORHOST:from}\] %{TIMESTAMP_ISO8601:timestamp} \[%{DATA:thread}\] \[%{NOTSPACE:logger}\] %{GREEDYDATA:msg}"
"thread" => "(%{GREEDYDATA}%{REQUEST_TYPE:reqType}%{SPACE}%{URIPATH:reqPath}(%{URIPARAM:reqParam})?)?"
}
}
As you can see, the first one simply matches the complete message. I have a field thread, that is basically the Logger information. However, in my setup, http requests append some info to the thread name. In these cases, I want to OPTIONALLY match these as well.
Wit the above setup, the fields reqType, reqPath, reqParam are only created, if thread can match them. Otherwise they aren't.
I hope this is what you wanted.
Thanks,
Artur

Something like this?
filter{
grok { match => [ "message", "%{GREEDYDATA:types}: %{NUMBER:value}" ] }
mutate {
lowercase => [ "types" ]
add_field => { "%{types}" => "%{value}"
"type" => "%{types}_type" }
remove_field => [ "value", "types" ]
}
}

logstash : how to extract data from log4j message?

I try to extract data from my log4j message with logstash.
The message look like this :
Method findAll - Start by : bokc
I would like to extract the method name : "findAll" and the user "bokc".
How can I do this?
I use logstash 1.5.2 and my config is :
input {
log4j {
mode => "server"
type => "log4j-artemis"
port => 4560
}
}
filter {
multiline {
type => "log4j-artemis"
pattern => "^\\s"
what => "previous"
}
mutate {
add_field => [ "source_ip", "%{host}" ]
}
}

Use a grok filter:
filter {
grok {
match => [
"message",
"^Method %{WORD:method} - Start by : %{USER:user}"
]
tag_on_failure => []
}
}
This extracts the two words into the fields "method" and "user". The setting of tag_on_failure makes sure that non-matching messages aren't tagged with _grokparsefailure. Since most messages aren't supposed to match the pattern it doesn't make sense to mark them as failures.

Negative regexp in logstash configuration

I cannot get negative regexp expressions working within LogStash (as described in the docs)
Consider the following positive regex which works correctly to detect fields that have been assigned a value:
if [remote_ip] =~ /(.+)/ {
mutate { add_tag => ["ip"] }
}
However, the negative expression seems to return false even when the field is blank:
if [remote_ip] !~ /(.+)/ {
mutate { add_tag => ["no_ip"] }
}
Am I misunderstanding the usage?
Update - this was fuzzy thinking on my part. There were issues with my config file. If the rest of your config file is sane, the above should work.

This was fuzzy thinking on my part - there were issues with the rest of my config file.
Based on Ben Lim's example, I came up with an input that is easier to test:
input {
stdin { }
}
filter {
if [message] !~ /(.+)/ {
mutate { add_tag => ["blank_message"] }
}
if [noexist] !~ /(.+)/ {
mutate { add_tag => ["tag_does_not_exist"] }
}
}
output {
stdout {debug => true}
}
The output for a blank message is:
{
"message" => "",
"#version" => "1",
"#timestamp" => "2014-02-27T01:33:19.285Z",
"host" => "benchmark.example.com",
"tags" => [
[0] "blank_message",
[1] "tag_does_not_exist"
]
}
The output for a message with the content "test message" is:
test message
{
"message" => "test message",
"#version" => "1",
"#timestamp" => "2014-02-27T01:33:25.059Z",
"host" => "benchmark.example.com",
"tags" => [
[0] "tag_does_not_exist"
]
}
Thus, the "negative regex" /(.+)/ returns true only when the field is empty or the field does not exist.
The negative regex /(.*)/ will only return true when the field does not exist. If the field exists (whether empty or with values), the return value will be false.

Below is my configuration. The type field is not exist, therefore, the negative expression is return true.
input {
stdin {
}
}
filter {
if [type] !~ /(.+)/ {
mutate { add_tag => ["aa"] }
}
}
output {
stdout {debug => true}
}
The regexp /(.+)/ means it accepts everything, include blank. So, when the "type" field is exist, even the field value is blank, it also meet the regexp. Therefore, in your example, if the remote_ip field exist, your "negative expression" will always return false.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

get name of pattern that matched in grok in logstash - logstash

Related

logstash GROK filter along with KV plugin couldn't able to process the events

How to capture repeated pattern in logstash(5.4.0) grok?

Logstash grok plugin, add field when matched

logstash : how to extract data from log4j message?

Negative regexp in logstash configuration

Categories

Resources