Logstash Grok pattern for session id and null values - logstash

Question 1 -
56dd573d.5edd this is my session id, i have grok filter like
%{WORD:session_id}.%{WORD:session_id} - this will read the session id and output will look like this
"session_id": [
[
"56dd573d",
"5edd"
]
]
Is there any way where i can get output something like
"session_id": [
[
"56dd573d.5edd"
]
]
I just need it in single field
Question 2 -
2016-03-08 06:48:15.477 GMT
this is a line from my log entry, i have used
%{DATESTAMP:log_time} %{WORD}
grok filter to read this date, here i simply want to drop or ignore the GMT
Is there any special pattern to ignore the next word from the log line which is not useful ?
Updated
Question 3 - How do i handle null value, its after GMT
2016-03-07 10:26:05 GMT,,
This is my postgresql log entry
2016-03-08 06:48:15.477 GMT,"postgres","sugarcrm",24285,"[local]",56dd573d.5edd,4,"idle",2016-03-07 10:26:05 GMT,,0,LOG,00000,"disconnection: session time: 20:22:09.928 user=postgres database=sugarcrm host=[local]",,,,,,,,,""
Note - null value may be in "" or ,,
Answer for question 3
I found the solution for handling ,,
Below is configuration for handling ,, value by replacing 0 with it
input {
file {
path => "/var/log/logstash/postgres.log"
start_position => "beginning"
type => "postgres"
}
}
filter {
mutate {
gsub => [
"message", "^,", "0,",
"message", ",,", ",0,",
"message", ",,", ",0,",
"message", ",,", ",0,",
"message", ",$", ",0"
]
}
grok {
match => ["message","%{GREEDYDATA:msg1}"]
}
}
output {
stdout { codec => rubydebug }
}
Reference -
http://comments.gmane.org/gmane.comp.sysutils.logstash.user/13842
But i am trying for "" null value i tried below configuration but i am getting configuration error
filter { mutate {
gsub => [
"message", "^,", "0,",
"message", ",,", ",0,",
"message", ",,", ",0,",
"message", ",,", ",0,",
"message", ",$", ",0",
"message", "^\"" "null\""
"message", """" ""null""
"message", """" ""null""
"message", ""$", ""null"
] }
I need to replace "" with null

Regarding question 1. It separates the two because essentially what youre asking it to do it add another value to session_id. You want something like:
(?<session_ID>(%{WORD}.%{WORD}))
Try it out on https://grokdebug.herokuapp.com/ . Where you can test your patterns. I The above isnt the greatest of solutions, but I dont have enough information about the rest of the message. Because if you know more, you can throw away the WORD match. If it is a structured session_ID with fixed length, for example, you can do:
(?<session_ID>([a-zA-Z0-9]{1,8}\.)[a-zA-Z0-9]{1,4})
Regarding the second question. I would hard code it for a quick hack:
%{DATESTAMP:log_time} GMT
give some more information and we can give a better more specific answer. The above should work, but there are several ways to skin a cat!

Related

using Grok to skip parts of message or logs

I have just started using grok for logstash and I am trying to parse my log file using grok filter.
My logline is something like below
03-30-2017 13:26:13 [00089] TIMER XXX.TimerLog: entType [organization], queueType [output], memRecno = 446323718, audRecno = 2595542711, elapsed time = 998ms
I want to capture only the initial date/time stamp, entType [organization], and elapsed time = 998ms.
However, it looks like I have to match pattern for every word and number in the line. Is there a way I can skip it ? I tried to look everywhere but couldn't find anything. Kindly help.
As per Charles Duffy's comment.
There are 2 ways of doing this:
The GREEDYDATA way (?:.*):
grok {
match => {"message" => "^%{DATE_US:dte}\s*%{TIME:tme}\s*\[%{GREEDYDATA}elapsed time\s*=\s*%{BASE10NUM}"
}
Or, telling it to ignore a match and look for the next one in the list.
grok {
break_on_match => false
match => { "message" => "^%{DATE_US:dte}\s*%{TIME:tme}\s*\[" }
match => { "message" => "elapsed time\s*=\s*%{BASE10NUM:elapsedTime}"
}
You can then rejoin the date & time into a single field and convert it to a timestamp.
As Charles Duffy suggested, you can simply bypass data you don't need.
You can use .* to do that.
Following will produce the output you want,
%{DATE_US:dateTime}.*entType\s*\[%{WORD:org}\].*elapsed time\s*=\s*%{BASE10NUM}
Explanation:
\s* matches space character.
\[ is bypassing [ character.
%{WORD:org} defines a word boundary and place it in a new field org
Outputs
{
"dateTime": [
[
"03-30-2017"
]
],
"MONTHNUM": [
[
"03"
]
],
"MONTHDAY": [
[
"30"
]
],
"YEAR": [
[
"2017"
]
],
"org": [
[
"organization"
]
],
"BASE10NUM": [
[
"998"
]
]
}
Click for a list of all available grok patterns

Grok filter for class name containing $

I am facing issue while using the Grok filter. Below is my filter which is working as expected while the class name do not have $ in it. When thread name is like PropertiesReader$ it is failing. What else can I use so it can parse class name with special characters ?
filter {
grok {
match => [ "message", "%{TIMESTAMP_ISO8601:LogDate} %{LOGLEVEL:loglevel} %{WORD:threadName}:%{NUMBER:ThreadID} - %{GREEDYDATA:Line}" ]
}
json {
source => "Line"
}
mutate {
remove_field => [ "Line" ]
}
}
You aren't limited to grok pattern names. You can do any regex. For example in place of %{WORD:threadName} you can put (?<threadName>[^:]+) which says to match any character that isn't a : and assign it to threadName.
You are using WORD as a pattern for your threadname which does not contain special characters. To confirm this let's take a look at this pattern: WORD \b\w+\b
Use a custom pattern. Just descibe it in a file like this:
MYPATTERN ([A-z]+\$?)
Then you can use it in your config like this:
grok {
patterns_dir => ["/path/to/pattern/dor"]
match => [ "message", "%{TIMESTAMP_ISO8601:LogDate} %LOGLEVEL:loglevel} %{MYPATTERN:threadName}:%{NUMBER:ThreadID} - %GREEDYDATA:Line}" ]
}
You'll find more information about custom patterns in the docs
You could also try with %{DATA:threadName} instead of %{WORD:threadName}, if your threadName won't contain whitespaces or colons.

Logstash grok plugin, add field when matched

I have a grok match like this:
grok{ match => [ “message”, “Duration: %{NUMBER:duration}”, “Speed: %{NUMBER:speed}” ] }
I also want to add another field to captured variables if it matches a grok pattern. I know I can use mutate plugin and if-else to add new fields but I have too many matches and it will be too long that way. As an example, I want to capture right-side fields for given texts.
"Duration: 12" => [duration: "12", type: "duration_type"]
"Speed: 12" => [speed: "12", type: "speed_type"]
Is there a way to do this?
I am not 100% sure if that is what you need, but I did something similar. I have a basic parsing for my message, and then I analyse a specific field additionally with optional matches.
grok {
break_on_match => false
patterns_dir => "/etc/logstash/conf.d/patterns"
match => {
"message" => "\[%{LOGLEVEL:level}\] \[%{IPORHOST:from}\] %{TIMESTAMP_ISO8601:timestamp} \[%{DATA:thread}\] \[%{NOTSPACE:logger}\] %{GREEDYDATA:msg}"
"thread" => "(%{GREEDYDATA}%{REQUEST_TYPE:reqType}%{SPACE}%{URIPATH:reqPath}(%{URIPARAM:reqParam})?)?"
}
}
As you can see, the first one simply matches the complete message. I have a field thread, that is basically the Logger information. However, in my setup, http requests append some info to the thread name. In these cases, I want to OPTIONALLY match these as well.
Wit the above setup, the fields reqType, reqPath, reqParam are only created, if thread can match them. Otherwise they aren't.
I hope this is what you wanted.
Thanks,
Artur
Something like this?
filter{
grok { match => [ "message", "%{GREEDYDATA:types}: %{NUMBER:value}" ] }
mutate {
lowercase => [ "types" ]
add_field => { "%{types}" => "%{value}"
"type" => "%{types}_type" }
remove_field => [ "value", "types" ]
}
}

logstash generate #timestamp from parsed message

I have file containing series of such messages:
component+branch.job 2014-09-04_21:24:46 2014-09-04_21:24:49
It is string, some white spaces, first date and time, some white spaces and second date and time. Currently I'm using such filter:
filter {
grok {
match => [ "message", "%{WORD:componentName}\+%{WORD:branchName}\.%{WORD:jobType}\s+20%{DATE:dateStart}_%{TIME:timeStart}\s+20%{DATE:dateStop}_%{TIME:timeStop}" ]
}
}
I would like to convert dateStart and timeStart to #timestamp for that message.
I found that there is date filter but I don't know how to use it on two separate fields.
I have also tried something like this as filter:
date {
match => [ "message", "YYYY-MM-dd_HH:mm:ss" ]
}
but it didn't worked as expected.
Based on duplicate suggested by Magnus Bäck, I created solution for my problem. Solution was to mutate parsed data into one field:
mutate {
add_field => {"tmp_start_timestamp" => "20%{dateStart}_%{timeStart}"}
}
and then parse it as I suggested in my question.
So final solution looks like this:
filter {
grok {
match => [ "message", "%{WORD:componentName}\+%{WORD:branchName}\.%{DATA:jobType}\s+20%{DATE:dateStart}_%{TIME:timeStart}\s+20%{DATE:dateStop}_%{TIME:timeStop}" ]
}
mutate {
add_field => {"tmp_start_timestamp" => "20%{dateStart}_%{timeStart}"}
}
date {
match => [ "tmp_start_timestamp", "YYYY-MM-dd_HH:mm:ss" ]
}
}

Groking and then mutating?

I am running the following filter in a logstash config file:
filter {
if [type] == "logstash" {
grok {
match => {
"message" => [
"\[%{DATA:timestamp}\]\[%{DATA:severity}\]\[%{DATA:instance}\]%{DATA:mymessage}, reason:%{GREEDYDATA:reason}",
"\[%{DATA:timestamp}\]\[%{DATA:severity}\]\[%{DATA:instance}\]%{GREEDYDATA:mymessage}"
]
}
}
}
}
It kind of works:
it does identify and carve out variables "timestamp", "severity", "instance", "mymessage", and "reason"
Really what I wanted was to have text which is now %{mymessage} to be the ${message} but when I add any sort of mutate command to this grok it stops working (btw, should there be a log that tells me what is breaking? I didn't see it... ironic for a logging solution to not have verbose logging).
Here's what I tried:
filter {
if [type] == "logstash" {
grok {
match => {
"message" => [
"\[%{DATA:timestamp}\]\[%{DATA:severity}\]\[%{DATA:instance}\]%{DATA:mymessage}, reason:%{GREEDYDATA:reason}",
"\[%{DATA:timestamp}\]\[%{DATA:severity}\]\[%{DATA:instance}\]%{GREEDYDATA:mymessage}"
]
}
mutate => {
replace => [ "message", "%{mymessage}"]
remove => [ "mymessage" ]
}
}
}
}
So in summary I'd like to understand:
Are there log files I can look at to see why/where a failure is happening?
Why would my mutate commands illustated above not work?
I also thought that if I never used the mymessage variable but instead just referred to message as the variable that maybe it would automatically truncate message to just the matched pattern but that appeared to append the results instead ... what is the correct behaviour?
Using the overwrite option is the best solution, but I thought I'd address a couple of your questions directly anyway.
It depends on how Logstash is started. Normally you'd run it via an init script that passes the -l or --log option. /var/log/logstash would be typical.
mutate is a filter of its own, not a part of grok. You could've done like this (or used rename instead of replace + remove):
grok {
...
}
mutate {
replace => [ "message", "%{mymessage}" ]
remove => [ "mymessage" ]
}
I'd do it a different way. For what you're trying to do, the overwrite option might be more apt.
Something like this:
grok {
overwrite => "message"
match => [
"message" => [
"\[%{DATA:timestamp}\]\[%{DATA:severity}\]\[%{DATA:instance}\]%{DATA:message}, reason:%{GREEDYDATA:reason}",
"\[%{DATA:timestamp}\]\[%{DATA:severity}\]\[%{DATA:instance}\]%{GREEDYDATA:message}"
]
]
}
This'll replace 'message' with the 'grokked' bit.
I know that doesn't directly answer your question - about all I can say is when you start logstash, it writes to STDOUT - at least on the version I'm using - which I'm capturing and writing to a file. In here, it reports some of the errors.
There's a -l option to logstash that lets you specify a log file to use - this will usually show you what's going on in the parser, but bear in mind that if something doesn't match a rule, it won't necessarily tell you why it didn't.

Resources