Grok pattern for matching content in already parsed log line - logstash

I will try to explain what I want to achieve. I have grok pattern to match simple log line. There is response message json body in the log line and I am able to parse it with custom regex pattern and I see it in kibana dashboard as expected. The thing is I would like to extract some data from the body itself.
Here is the working grok pattern:
{TIMESTAMP_ISO8601:timestamp}%{SPACE}*%{LOGLEVEL:level}:%{SPACE}*%{DATA}%{BODY:body}
Custom pattern for BODY:
BODY (Body:.* \{.*})
And I see it parsed using grok debugger:
{
"timestamp": "2022-11-04 17:09:28.052",
"level": "INFO",
"body": {\"status\":200,\"page\":1, \"fieldToBeParsed\":12 .....//more json conten}
}
Is there any way to parse some of the content of the body together with the whole body.
So I can get result similar to:
{
"timestamp": "2022-11-04 17:09:28.052",
"level": "INFO",
"body": {\"status\":200,\"page\":1, \"fieldToBeParsed\":12 .....//more json conten},
"parsedFromBody: 12
}
Example:
{TIMESTAMP_ISO8601:timestamp}%{SPACE}*%{LOGLEVEL:level}:%{SPACE}*%{DATA}%{BODY:body}%
{FROMBODY:frombody}%
Thank you!

Once you have used grok to parse a field, you can use a second grok filter to parse fields created by the first grok. Do not try to do both in one grok, it may or may not work. The matches are a hash, and Java hashes are not ordered.
grok { match => { "body" => "fieldToBeParsed\":%{NUMBER:someField:int}" } }

Related

Logstash Grok Pattern that matches after 7th character only

I want to parse my logs with grok pattern.
This is my sample log file, which has 7 |*| characters in every log.
2022-12-15 01:11:22|*|639a7439798a26.27827512|*|5369168485-532|*|3622857|*|app.DEBUG|*|Checking the step |*|{"current_process":"PROVIDE PROBLEM INFORMATION","current_step":"SUBMIT_MEMO","queue_steps":}|*|{"_environment":"test","_application":"TEST"}
I am creating fields with grok pattern, but at the end I am trying to pick only last JSON part after 7th |*|{"\_environment":"test","\_application":"TEST"} in every log and parse it with JSON filter in Logstash.
How can I get only JSON object after 7th |*| object in every log?
Try this:
grok {
match => [ "message", "(?<notRequiredField>(?:.*?(\*)+){6}.*?((\*)+))\|%{GREEDYDATA:requiredField}" ]
}
mutate {
remove_field => [ "notRequiredField" ]
}

Parse logs in Logstash or filebeat and transform them as JSON to pull Elasticsearch

I am using filbeat to send logs to logstash and then store them in elasticsearch.
My logs file contains string like these:
MESSAGE: { "url": "http://IP:PORT/index.html" , "msg": "aaa" }
MESSAGE: { "url": "http://IP:PORT/index_2.html" , "msg": "bbb" }
I would like to store in Elasticsearch just the JSON object.
I am struggling in using some regex in order to parse the data and then transform them in JSON to send to an index to Elasticsearch.
Any help? should I put the logic to stripout "MESSAGE :"?
You need to split your message, one part with just MESSAGE:, that you can ignore, and another one with the json, which you will then send to a json filter.
Try the following:
filter {
grok {
match => { "message" => "MESSAGE:%{SPACE}%{GREEDYDATA:json_message}"
}
json {
source => "json_message"
}
}
This will get your original message and apply a grok filter to only get the json part in a field called json_message, then the json filter will parse this field and create the fields url and msg for each event.

LogStash dissect with key=value, comma

I have a pattern of logs that contain performance&statistical data. I have configured LogStash to dissect this data as csv format in order to save the values to ES.
<1>,www1,3,BISTATS,SCAN,330,712.6,2035,17.3,221.4,656.3
I am using the following LogSTash filter and getting the desired results..
grok {
match => { "Message" => "\A<%{POSINT:priority}>,%{DATA:pan_host},%{DATA:pan_serial_number},%{DATA:pan_type},%{GREEDYDATA:message}\z" }
overwrite => [ "Message" ]
}
csv {
separator => ","
columns => ["pan_scan","pf01","pf02","pf03","kk04","uy05","xd06"]
}
This is currently working well for me as long as the order of the columns doesn't get messed up.
However I want to make this logfile more meaningful and have each column-name in the original log. example-- <1>,www1,30000,BISTATS,SCAN,pf01=330,pf02=712.6,pf03=2035,kk04=17.3,uy05=221.4,xd06=656.3
This way I can keep inserting or appending key/values in the middle of the process without corrupting the data. (Using LogStash5.3)
By using #baudsp recommendations, I was able to formulate the following. I deleted the csv{} block completely and replace it with the kv{} block. The kv{} automatically created all the key values leaving me to only mutate{} the fields into floats and integers.
json {
source => "message"
remove_field => [ "message", "headers" ]
}
date {
match => [ "timestamp", "YYYY-MM-dd'T'HH:mm:ss.SSS'Z'" ]
target => "timestamp"
}
grok {
match => { "Message" => "\A<%{POSINT:priority}>,%{DATA:pan_host},%{DATA:pan_serial_number},%{DATA:pan_type},%{GREEDYDATA:message}\z" }
overwrite => [ "Message" ]
}
kv {
allow_duplicate_values => false
field_split_pattern => ","
}
Using the above block, I was able to insert the K=V, pairs anywhere in the message. Thanks again for all the help. I have added a sample code block for anyone trying to accomplish this task.
Note: I am using NLog for logging, which produces JSON outputs. From the C# code, the format looks like this.
var logger = NLog.LogManager.GetCurrentClassLogger();
logger.ExtendedInfo("<1>,www1,30000,BISTATS,SCAN,pf01=330,pf02=712.6,pf03=2035,kk04=17.3,uy05=221.4,xd06=656.3");

Logstash grok split values in cookie string

Want to split the following cookie information with grok in different fields.
Case 1
id=279ddd995;+user=Demo;+country=GB
Output
{
"id": "279ddd995",
"user": "Demo",
"country": "GB",
}
Your data is conveniently provided in key/value pairs. Rather than make a regexp for each field, you can use the kv{} filter to split them apart. This has the side benefits of generically handling any keys in any order.
try out following grok pattern
filter {
grok {
match => ["message", "id=(?<id>[^;]+);.*?user=(?<user>[^;]+);.*?country=(?<country>[^;]+).*?"]
}
}

How to add a new dynamic value(which is not there in input) to logstash output?

My input has timestamp in the format of Apr20 14:59:41248 Dataxyz.
Now in my output i need the timestamp in the below format:
**Day Month Monthday Hour:Minute:Second Year DataXYZ **. I was able to remove the timestamp from the input. But I am not quite sure how to add the new timestamp.
I matched the message using grok while receiving the input:
match => ["message","%{WORD:word} %{TIME:time} %{GREEDYDATA:content}"]
I tried using mutate add_field.but was not successful in adding the value of the DAY. add_field => [ "timestamp","%{DAY}"].I got the output as the word ´DAY´ and not the value of DAY. Can someone please throw some light on what is being missed.
You need to grok it out into the individual named fields, and then you can reference those fields in add_field.
So your grok would start like this:
%{MONTH:month}%{MONTHDAY:mday}
And then you can put them back together like this:
mutate {
add_field => {
"newField" => "%{mday} %{month}"
}
}
You can check with my answer, I think this very helpful to you.
grok {
match => { "message" => "%{TIMESTAMP_ISO8601:time} \[%{NUMBER:thread}\] %{LOGLEVEL:loglevel} %{JAVACLASS:class} - %{GREEDYDATA:msg}" }
}
if "Exception" in [msg] {
mutate {
add_field => { "msg_error" => "%{msg}" }
}
}
You can use custom grok patterns to extract/rename fields.
You can extract other fields similarly and rearrange/play arounnd with them in mutate filter. Refer to Custom Patterns for more information.

Resources