logstash keep matched field in message - logstash

2016-11-30 15:43:09.3060 DEBUG 20
Company.Product.LoggerDataFilter
[UOW:583ee57782fe0140c6dfbfd8] [DP:0] Creating
DeviceDataTransformationRequest for logger
[D:4E3239200C5032593D004100].
%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:loglevel}\s+ %{INT:threadId}
%{DATA:loggerName} %{UOW} %{DATAPACKET} %{GREEDYDATA:message}
%{DEVICEID}
The output of that is
{
"timestamp": [
"2016-11-30 15:43:09.3060"
],
"loglevel": [
"DEBUG"
],
"threadId": [
"20"
],
"loggerName": [
"Tts.IoT.DataLogger.Etl.Core.Filters.LoggerDataFilter"
],
"correlationId": [
"583ee57782fe0140c6dfbfd8"
],
"datapacket": [
"0"
],
"message": [
"Creating DeviceDataTransformationRequest for logger"
],
"deviceId": [
"4E3239200C5032593D004100"
]
}
Which is good - EXCEPT - the message is now lacking the DEVICEID property which I extracted. I want it both - as a separate field and still keep it in the message.
Can you do that?
(On a side note... how does structured logging like serilog help in this regard?)

How about try change it
%{GREEDYDATA:message} %{DEVICEID}
to
%{GREEDYDATA:testmessage} %{DEVICEID}
then add a field
mutate {
add_field => {
"message" => "%{testmessage} %{DEVICEID}"
}
remove_field => ["testmessage"]
}

Related

using grok pattern to skip a part from mesage

how can we get just extract the sessionid number from a pattern in grok
for example
"sessionid$:999"
I am trying to use %{DATA:line} but it gets
"line": [
[
" Sessionid$:999"
]
how can just get the session number and ignore "sessionId$" in it
Thanks
Try this:
GROK pattern:
Sessionid.\s*:%{NUMBER:line}
OUTPUT:
{
"line": [
[
"999"
]
],
"BASE10NUM": [
[
"999"
]
]
}

logstash grok, parse a line with json filter

I am using ELK(elastic search, kibana, logstash, filebeat) to collect logs. I have a log file with following lines, every line has a json, my target is to using Logstash Grok to take out of key/value pair in the json and forward it to elastic search.
2018-03-28 13:23:01 charge:{"oldbalance":5000,"managefee":0,"afterbalance":"5001","cardid":"123456789","txamt":1}
2018-03-28 13:23:01 manage:{"cuurentValue":5000,"payment":0,"newbalance":"5001","posid":"123456789","something":"new2","additionalFields":1}
I am using Grok Debugger to make regex pattern and see the result. My current regex is:
%{TIMESTAMP_ISO8601} %{SPACE} %{WORD:$:data}:{%{QUOTEDSTRING:key1}:%{BASE10NUM:value1}[,}]%{QUOTEDSTRING:key2}:%{BASE10NUM:value2}[,}]%{QUOTEDSTRING:key3}:%{QUOTEDSTRING:value3}[,}]%{QUOTEDSTRING:key4}:%{QUOTEDSTRING:value4}[,}]%{QUOTEDSTRING:key5}:%{BASE10NUM:value5}[,}]
As one could see it is hard coded since the keys in json in real log could be any word, the value could be integer, double or string, what's more, the length of the keys varies. so my solution is not acceptable. My solution result is shown as follows, just for reference. I am using Grok patterns.
My question is that trying to extract keys in json is wise or not since elastic search use json also? Second, if I try to take keys/values out of json, are there correct,concise Grok patterns?
current result of Grok patterns give following output when parsing first line in above lines.
{
"TIMESTAMP_ISO8601": [
[
"2018-03-28 13:23:01"
]
],
"YEAR": [
[
"2018"
]
],
"MONTHNUM": [
[
"03"
]
],
"MONTHDAY": [
[
"28"
]
],
"HOUR": [
[
"13",
null
]
],
"MINUTE": [
[
"23",
null
]
],
"SECOND": [
[
"01"
]
],
"ISO8601_TIMEZONE": [
[
null
]
],
"SPACE": [
[
""
]
],
"WORD": [
[
"charge"
]
],
"key1": [
[
""oldbalance""
]
],
"value1": [
[
"5000"
]
],
"key2": [
[
""managefee""
]
],
"value2": [
[
"0"
]
],
"key3": [
[
""afterbalance""
]
],
"value3": [
[
""5001""
]
],
"key4": [
[
""cardid""
]
],
"value4": [
[
""123456789""
]
],
"key5": [
[
""txamt""
]
],
"value5": [
[
"1"
]
]
}
second edit
Is it possible to use Json filter of Logstash? but in my case Json is part of line/event, not whole event is Json.
===========================================================
Third edition
I do not see updated solution functions well to parse json. My regex is as follows:
filter {
grok {
match => {
"message" => [
"%{TIMESTAMP_ISO8601}%{SPACE}%{GREEDYDATA:json_data}"
]
}
}
}
filter {
json{
source => "json_data"
target => "parsed_json"
}
}
It does not have key:value pair, instead it is msg+json string. The parsed json is not parsed.
Testing data is as below:
2018-03-28 13:23:01 manage:{"cuurentValue":5000,"payment":0,"newbalance":"5001","posid":"123456789","something":"new2","additionalFields":1}
2018-03-28 13:23:03 payment:{"cuurentValue":5001,"reload":0,"newbalance":"5002","posid":"987654321","something":"new3","additionalFields":2}
2018-03-28 13:24:07 management:{"cuurentValue":5002,"payment":0,"newbalance":"5001","posid":"123456789","something":"new2","additionalFields":1}
[2018-06-04T15:01:30,017][WARN ][logstash.filters.json ] Error parsing json {:source=>"json_data", :raw=>"manage:{\"cuurentValue\":5000,\"payment\":0,\"newbalance\":\"5001\",\"posid\":\"123456789\",\"something\":\"new2\",\"additionalFields\":1}", :exception=>#<LogStash::Json::ParserError: Unrecognized token 'manage': was expecting ('true', 'false' or 'null')
at [Source: (byte[])"manage:{"cuurentValue":5000,"payment":0,"newbalance":"5001","posid":"123456789","something":"new2","additionalFields":1}"; line: 1, column: 8]>}
[2018-06-04T15:01:30,017][WARN ][logstash.filters.json ] Error parsing json {:source=>"json_data", :raw=>"payment:{\"cuurentValue\":5001,\"reload\":0,\"newbalance\":\"5002\",\"posid\":\"987654321\",\"something\":\"new3\",\"additionalFields\":2}", :exception=>#<LogStash::Json::ParserError: Unrecognized token 'payment': was expecting ('true', 'false' or 'null')
at [Source: (byte[])"payment:{"cuurentValue":5001,"reload":0,"newbalance":"5002","posid":"987654321","something":"new3","additionalFields":2}"; line: 1, column: 9]>}
[2018-06-04T15:01:34,986][WARN ][logstash.filters.json ] Error parsing json {:source=>"json_data", :raw=>"management:{\"cuurentValue\":5002,\"payment\":0,\"newbalance\":\"5001\",\"posid\":\"123456789\",\"something\":\"new2\",\"additionalFields\":1}", :exception=>#<LogStash::Json::ParserError: Unrecognized token 'management': was expecting ('true', 'false' or 'null')
at [Source: (byte[])"management:{"cuurentValue":5002,"payment":0,"newbalance":"5001","posid":"123456789","something":"new2","additionalFields":1}"; line: 1, column: 12]>}
Please check the result:
You can use GREEDYDATA to assign entire block of json to a separate field like this,
%{TIMESTAMP_ISO8601}%{SPACE}%{GREEDYDATA:json_data}
This will create a separate file for your json data,
{
"TIMESTAMP_ISO8601": [
[
"2018-03-28 13:23:01"
]
],
"json_data": [
[
"charge:{"oldbalance":5000,"managefee":0,"afterbalance":"5001","cardid":"123456789","txamt":1}"
]
]
}
Then apply a json filter on json_data field as follows,
json{
source => "json_data"
target => "parsed_json"
}

grok parser (unexpected null in %{IPORHOST:syslog_server}) HAproxy

Following log:
Jul 25 07:45:12 tst-proxy202 haproxy[1104]: 10.64.111.222:36635 [25/Jul/2016:07:45:12.479] promocloud~ promocloud/tst-service-proxy203 32/0/1/27/60 200 664 - - ---- 0/0/0/0/0 0/0 {} {} "POST /RTI HTTP/1.1"
Is parsed with ${HAPROXYHTTP} grok pattern
%{SYSLOGTIMESTAMP:syslog_timestamp} %{IPORHOST:syslog_server} %{SYSLOGPROG}: %{IP:client_ip}:%{INT:client_port} \[%{HAPROXYDATE:accept_date}\] %{NOTSPACE:frontend_name} %{NOTSPACE:backend_name}/%{NOTSPACE:server_name} %{INT:time_request}/%{INT:time_queue}/%{INT:time_backend_connect}/%{INT:time_backend_response}/%{NOTSPACE:time_duration} %{INT:http_status_code} %{NOTSPACE:bytes_read} %{DATA:captured_request_cookie} %{DATA:captured_response_cookie} %{NOTSPACE:termination_state} %{INT:actconn}/%{INT:feconn}/%{INT:beconn}/%{INT:srvconn}/%{NOTSPACE:retries} %{INT:srv_queue}/%{INT:backend_queue} (\{%{HAPROXYCAPTUREDREQUESTHEADERS}\})?( )?(\{%{HAPROXYCAPTUREDRESPONSEHEADERS}\})?( )?"(<BADREQ>|(%{WORD:http_verb} (%{URIPROTO:http_proto}://)?(?:%{USER:http_user}(?::[^#]*)?#)?(?:%{URIHOST:http_host})?(?:%{URIPATHPARAM:http_request})?( HTTP/%{NUMBER:http_version})?))?"
This works well, up to some unexpected null in the syslog_server in a HOSTNAME section
"syslog_server": [
[
"tst-proxy202"
]
],
"HOSTNAME": [
[
"tst-proxy202",
null <<<<<<<<<
]
],
"IP": [
[
null,
null
]
],
"IPV6": [
[
null,
null,
null
]
],
"IPV4": [
[
null,
"10.64.111.222",
null
]
],
I did parse this with https://grokdebug.herokuapp.com/
and the patterns IPORHOST, and the IPORHOST
https://grokdebug.herokuapp.com/patterns#
works well against the hostname
tst-proxy202
%{IPORHOST:syslog_server}
{
"syslog_server": [
[
"tst-proxy202"
]
],
"HOSTNAME": [
[
"tst-proxy202"
]
],
"IP": [
[
null
]
],
"IPV6": [
[
null
]
],
"IPV4": [
[
null
]
]
}
Any idea what might be the problem?
If I understood you correctly you are trying to get rid of that null value. Well, the null value occurs because of the last part of the HAPROXYHTTP pattern (where it says ?(?:%{URIHOST:http_host})?(?:%{URIPATHPARAM:http_request})?( HTTP/%{NUMBER:http_version})?))?"). It somehow adds an empty HOSTNAME. Luckily, this is not a serious problem and here is why:
The default options of the grok filter include named_captures_only => true (docs) and keep_empty_captures => false (docs). Try these two options in the grok debugger and your output looks pretty clean. In logstash you don't have to change anything.
If logstash misinterprets your hostname try to retrieve it from the grok values yourself (e.g. use the mutate filter):
filter {
mutate {
replace => { "HOSTNAME" => "%{syslog_server}" }
}
}
Please let me know if you have further problems.

What is the GROK pattern for this log?

Can anyone please tell me the GROK pattern for this log
I am new to Logstash. Any help is appreciated
: "ppsweb1 [ERROR] [JJN01234313887b4319ad0536bf6324j34h5469624340M] [913h56a5-e359-4a75-be9a-fae60d1a5ecb] 2016-07-28 13:14:58.848 [http-nio-8080-exec-4] PaymentAction - Net amount 149644"
I tried the following:
%{WORD:field1} \[%{LOGLEVEL:field2}\] \[%{NOTSPACE:field3}\] \[%{NOTSPACE:field4}\] %{TIMESTAMP_ISO8601:timestamp} \[%{NOTSPACE:field5}\] %{WORD:field6} - %{GREEDYDATA:field7} %{NUMBER:filed8}
And I got the output as:
{
"field1": [
[
"ppsweb1"
]
],
"field2": [
[
"ERROR"
]
],
"field3": [
[
"JJN01234313887b4319ad0536bf6324j34h5469624340M"
]
],
"field4": [
[
"913h56a5-e359-4a75-be9a-fae60d1a5ecb"
]
],
"timestamp": [
[
"2016-07-28 13:14:58.848"
]
],
"field5": [
[
"http-nio-8080-exec-4"
]
],
"field6": [
[
"PaymentAction"
]
],
"field7": [
[
"Net amount"
]
],
"filed8": [
[
"149644"
]
]
}
You can change the names of fields as you want. You haven't mentioned anything about expected output in your question. So this is just to give you a basic idea. For further modifications you can use http://grokdebug.herokuapp.com/ to verify your filter.
Note: I have used basic patterns, there are complex patterns available and you can play around with the debugger to suit your requirements.
Good luck!

Logstash Grok filter getting multiple values per match

I have a server that sends access logs over to logstash in a custom log format, and am using logstash to filter these logs and send them to Elastisearch.
A log line looks something like this:
0.0.0.0 - GET / 200 - 29771 3 ms ELB-HealthChecker/1.0\n
And gets parsed using this grok filter:
grok {
match => [
"message", "%{IP:remote_host} %{USER:remote_user} %{WORD:method} %{URIPATHPARAM:requested_uri} %{NUMBER:status_code} - %{NUMBER:content_length} %{NUMBER:elapsed_time:int} ms %{GREEDYDATA:user_agent}",
"message", "%{IP:remote_host} - %{WORD:method} %{URIPATHPARAM:requested_uri} %{NUMBER:status_code} - %{NUMBER:content_length} %{NUMBER:elapsed_time:int} ms %{GREEDYDATA:user_agent}",
"message", "%{IP:remote_host} %{USER:remote_user} %{WORD:method} %{URIPATHPARAM:requested_uri} %{NUMBER:status_code} - - %{NUMBER:elapsed_time:int} ms %{GREEDYDATA:user_agent}",
"message", "%{IP:remote_host} - %{WORD:method} %{URIPATHPARAM:requested_uri} %{NUMBER:status_code} - - %{NUMBER:elapsed_time:int} ms %{GREEDYDATA:user_agent}"
]
add_field => {
"protocol" => "HTTP"
}
}
The final log gets parsed into this object (with real IPs stubbed out, and other fields taken out):
{
"_source": {
"message": " 0.0.0.0 - GET / 200 - 29771 3 ms ELB-HealthChecker/1.0\n",
"tags": [
"bunyan"
],
"#version": "1",
"host": "0.0.0.0:0000",
"remote_host": [
"0.0.0.0",
"0.0.0.0"
],
"remote_user": [
"-",
"-"
],
"method": [
"GET",
"GET"
],
"requested_uri": [
"/",
"/"
],
"status_code": [
"200",
"200"
],
"content_length": [
"29771",
"29771"
],
"elapsed_time": [
"3",
3
],
"user_agent": [
"ELB-HealthChecker/1.0",
"ELB-HealthChecker/1.0"
],
"protocol": [
"HTTP",
"HTTP"
]
}
}
Any ideas why I am getting multiple matches per log? Shouldn't Grok be breaking on the first match that successfully parses?
Chances are you have multiple config files that are being loaded. If you look at the output, specifically the elapsed_time shows up as both an integer and a string. From the config file you've provided, that's not possible since you have :int on anything that matches elapsed_time.

Resources