Groking and then mutating? - logstash

I am running the following filter in a logstash config file:
filter {
if [type] == "logstash" {
grok {
match => {
"message" => [
"\[%{DATA:timestamp}\]\[%{DATA:severity}\]\[%{DATA:instance}\]%{DATA:mymessage}, reason:%{GREEDYDATA:reason}",
"\[%{DATA:timestamp}\]\[%{DATA:severity}\]\[%{DATA:instance}\]%{GREEDYDATA:mymessage}"
]
}
}
}
}
It kind of works:
it does identify and carve out variables "timestamp", "severity", "instance", "mymessage", and "reason"
Really what I wanted was to have text which is now %{mymessage} to be the ${message} but when I add any sort of mutate command to this grok it stops working (btw, should there be a log that tells me what is breaking? I didn't see it... ironic for a logging solution to not have verbose logging).
Here's what I tried:
filter {
if [type] == "logstash" {
grok {
match => {
"message" => [
"\[%{DATA:timestamp}\]\[%{DATA:severity}\]\[%{DATA:instance}\]%{DATA:mymessage}, reason:%{GREEDYDATA:reason}",
"\[%{DATA:timestamp}\]\[%{DATA:severity}\]\[%{DATA:instance}\]%{GREEDYDATA:mymessage}"
]
}
mutate => {
replace => [ "message", "%{mymessage}"]
remove => [ "mymessage" ]
}
}
}
}
So in summary I'd like to understand:
Are there log files I can look at to see why/where a failure is happening?
Why would my mutate commands illustated above not work?
I also thought that if I never used the mymessage variable but instead just referred to message as the variable that maybe it would automatically truncate message to just the matched pattern but that appeared to append the results instead ... what is the correct behaviour?

Using the overwrite option is the best solution, but I thought I'd address a couple of your questions directly anyway.
It depends on how Logstash is started. Normally you'd run it via an init script that passes the -l or --log option. /var/log/logstash would be typical.
mutate is a filter of its own, not a part of grok. You could've done like this (or used rename instead of replace + remove):
grok {
...
}
mutate {
replace => [ "message", "%{mymessage}" ]
remove => [ "mymessage" ]
}

I'd do it a different way. For what you're trying to do, the overwrite option might be more apt.
Something like this:
grok {
overwrite => "message"
match => [
"message" => [
"\[%{DATA:timestamp}\]\[%{DATA:severity}\]\[%{DATA:instance}\]%{DATA:message}, reason:%{GREEDYDATA:reason}",
"\[%{DATA:timestamp}\]\[%{DATA:severity}\]\[%{DATA:instance}\]%{GREEDYDATA:message}"
]
]
}
This'll replace 'message' with the 'grokked' bit.
I know that doesn't directly answer your question - about all I can say is when you start logstash, it writes to STDOUT - at least on the version I'm using - which I'm capturing and writing to a file. In here, it reports some of the errors.
There's a -l option to logstash that lets you specify a log file to use - this will usually show you what's going on in the parser, but bear in mind that if something doesn't match a rule, it won't necessarily tell you why it didn't.

Related

Logstash Add field from grok filter

Is it possible to match a message to a new field in logstash using grok and mutate?
Example log:
"<30>Dec 19 11:37:56 7f87c507df2a[20103]: [INFO] 2018-12-19 16:37:56 _internal (MainThread): 192.168.0.6 - - [19/Dec/2018 16:37:56] \"\u001b[37mGET / HTTP/1.1\u001b[0m\" 200 -\r"
I am trying to create a new key value where I match container_id to 7f87c507df2a.
filter {
grok {
match => [ "message", "%{SYSLOG5424PRI}%{NONNEGINT:ver} +(?:%{TIMESTAMP_ISO8601:ts}|-) +(?:%{HOSTNAME:service}|-) +(?:%{NOTSPACE:containerName}|-) +(?:%{NOTSPACE:proc}|-) +(?:%{WORD:msgid}|-) +(?:%{SYSLOG5424SD:sd}|-|) +%{GREEDYDATA:msg}" ]
}
mutate {
add_field => { "container_id" => "%{containerName}"}
}
}
The resulting logfile renders this, where the value of containerName isn't being referenced from grok, it is just a string literal:
"container_id": "%{containerName}"
I am trying to have the conf create:
"container_id": "7f87c507df2a"
Obviously the value of containerName isn't being linked from grok. Is what I want to do even possible?
As explained in the comments, my grok pattern was incorrect. For anyone that may wander towards this post that needs help with grok go here to make building your pattern less time consuming.
Here was the working snapshot:
filter {
grok {
match => [ "message", "\A%{SYSLOG5424PRI}%{SYSLOGTIMESTAMP}%{SPACE}%{BASE16NUM:docker_id}%{SYSLOG5424SD}%{GREEDYDATA:python_log_message}" ]
add_field => { "container_id" => "%{docker_id}" }
}
}

Grok filter for class name containing $

I am facing issue while using the Grok filter. Below is my filter which is working as expected while the class name do not have $ in it. When thread name is like PropertiesReader$ it is failing. What else can I use so it can parse class name with special characters ?
filter {
grok {
match => [ "message", "%{TIMESTAMP_ISO8601:LogDate} %{LOGLEVEL:loglevel} %{WORD:threadName}:%{NUMBER:ThreadID} - %{GREEDYDATA:Line}" ]
}
json {
source => "Line"
}
mutate {
remove_field => [ "Line" ]
}
}
You aren't limited to grok pattern names. You can do any regex. For example in place of %{WORD:threadName} you can put (?<threadName>[^:]+) which says to match any character that isn't a : and assign it to threadName.
You are using WORD as a pattern for your threadname which does not contain special characters. To confirm this let's take a look at this pattern: WORD \b\w+\b
Use a custom pattern. Just descibe it in a file like this:
MYPATTERN ([A-z]+\$?)
Then you can use it in your config like this:
grok {
patterns_dir => ["/path/to/pattern/dor"]
match => [ "message", "%{TIMESTAMP_ISO8601:LogDate} %LOGLEVEL:loglevel} %{MYPATTERN:threadName}:%{NUMBER:ThreadID} - %GREEDYDATA:Line}" ]
}
You'll find more information about custom patterns in the docs
You could also try with %{DATA:threadName} instead of %{WORD:threadName}, if your threadName won't contain whitespaces or colons.

Logstash grok filter doesn't work for the last field

With Logstash 2.3.3, grok filter doesn't work for the last field.
To reproduce the problem, create test.conf as follows:
input {
file {
path => "/Users/izeye/Applications/logstash-2.3.3/test.log"
}
}
filter {
grok {
match => { "message" => "%{DATA:id1},%{DATA:id2},%{DATA:id3},%{DATA:id4},%{DATA:id5}" }
}
}
output {
stdout {
codec => rubydebug
}
}
Run ./bin/logstash -f test.conf
and after it started, in another terminal run echo "1,2,3,4,5" >> test.log
and I got the following output:
Johnnyui-MacBook-Pro:logstash-2.3.3 izeye$ ./bin/logstash -f test.conf
Settings: Default pipeline workers: 8
Pipeline main started
{
"message" => "1,2,3,4,5",
"#version" => "1",
"#timestamp" => "2016-07-07T07:57:42.830Z",
"path" => "/Users/izeye/Applications/logstash-2.3.3/test.log",
"host" => "Johnnyui-MacBook-Pro.local",
"id1" => "1",
"id2" => "2",
"id3" => "3",
"id4" => "4"
}
You can see the missing id5.
I'm not sure this is a bug or mis-configured.
Any hint will be appreciated.
I think it is because how the DATA pattern is defined. Its regex is .*?, so it's a lazy match.
It's not a bug, it's how regex works (example).
But you might want to ask a regex question in order to have an accurate answer.
As a solution, you can replace the last DATA with NUMBER (or something appropriate to your situation). GREEDYDATA would also work.
Though, in that solution, the csv or dissect filters might be better fit, as easier to configure and more performant.

Logstash grok plugin, add field when matched

I have a grok match like this:
grok{ match => [ “message”, “Duration: %{NUMBER:duration}”, “Speed: %{NUMBER:speed}” ] }
I also want to add another field to captured variables if it matches a grok pattern. I know I can use mutate plugin and if-else to add new fields but I have too many matches and it will be too long that way. As an example, I want to capture right-side fields for given texts.
"Duration: 12" => [duration: "12", type: "duration_type"]
"Speed: 12" => [speed: "12", type: "speed_type"]
Is there a way to do this?
I am not 100% sure if that is what you need, but I did something similar. I have a basic parsing for my message, and then I analyse a specific field additionally with optional matches.
grok {
break_on_match => false
patterns_dir => "/etc/logstash/conf.d/patterns"
match => {
"message" => "\[%{LOGLEVEL:level}\] \[%{IPORHOST:from}\] %{TIMESTAMP_ISO8601:timestamp} \[%{DATA:thread}\] \[%{NOTSPACE:logger}\] %{GREEDYDATA:msg}"
"thread" => "(%{GREEDYDATA}%{REQUEST_TYPE:reqType}%{SPACE}%{URIPATH:reqPath}(%{URIPARAM:reqParam})?)?"
}
}
As you can see, the first one simply matches the complete message. I have a field thread, that is basically the Logger information. However, in my setup, http requests append some info to the thread name. In these cases, I want to OPTIONALLY match these as well.
Wit the above setup, the fields reqType, reqPath, reqParam are only created, if thread can match them. Otherwise they aren't.
I hope this is what you wanted.
Thanks,
Artur
Something like this?
filter{
grok { match => [ "message", "%{GREEDYDATA:types}: %{NUMBER:value}" ] }
mutate {
lowercase => [ "types" ]
add_field => { "%{types}" => "%{value}"
"type" => "%{types}_type" }
remove_field => [ "value", "types" ]
}
}

logstash fails to match a grok filter

I'm stuck. I cannot get why grok fails to match a simple regex under logstash.
grok works just fine as a standalone thing.
The only pattern which works for me is ".*" everything else just fails.
$ cat ./sample2-logstash.conf
input {
stdin {}
}
filter {
grok {
match => [ "message1", "foo.*" ]
add_tag => [ "this_is_foo" ]
tag_on_failure => [ "STUPID_LOGSTASH" ]
}
}
output {
stdout { codec => json_lines }
}
Here's the output:
$ echo "foo" |~/bin/logstash-1.4.0/bin/logstash -f ./sample2-logstash.conf
{"message":"foo","#version":"1","#timestamp":"2014-05-07T00:32:49.915Z","host":"serega-sv","tags":["STUPID_LOGSTASH"]}
Looks like I missed to do something in logstash because vanilla grok works just fine:
$ cat grok.conf
program {
file "./sample.log"
match {
pattern: "foo.*"
reaction: "LINE MATCHED! %{#LINE}"
}
}
Plain grok's output:
$ echo "foo" > ./sample.log; grok -f grok.conf
LINE MATCHED! foo
Thanks!
You configuration have error. The grok match field is message, instead of message1.
Then, at logstash grok page there is an example to show how to use grok. I think you have misunderstand. For example, if your log is
55.3.244.1 GET /index.html 15824 0.043
The grok pattern for logstash is
%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}
For %{IP:client}, The first parameter (IP) is grok pattern, the second parameter(client) is the field you want to put this message.
Everything #Ben Lim said. The very next section of the documentation shows how to apply semantics to generic regex syntax:
filter {
grok {
match => [ "message",
"^(?<ip>\S+) (?<verb>\S+) (?<request>\S+) (?<bytes>\S+) (?<delay>\S+)$"
]
}
}

Resources