GROK pattern for optional field - logstash-grok

I have a log string like :
2018-08-02 12:02:25.904 [http-nio-8080-exec-1] WARN o.s.w.s.m.s.DefaultHandlerExceptionResolver.handleTypeMismatch - Failed to bind request element
In the above string [http-nio-8080-exec-1] is a optional field, it can be there in some log statements.
i created a grok patterns like with some references on net :
%{TIMESTAMP_ISO8601:timestamp} (\[%{DATA:thread}\])? %{LOGLEVEL:level}%{SPACE}%{JAVACLASS:class}\.%{DATA:method} - %{GREEDYDATA:loggedString}
seems its not working if i remove the thread name string.

you need to make the space character following the thread name optional: (\[%{DATA:thread}\] )?
input:
2018-08-02 12:02:25.904 WARN o.s.w.s.m.s.DefaultHandlerExceptionResolver.handleTypeMismatch - Failed to bind request element
pattern:
%{TIMESTAMP_ISO8601:timestamp} (\[%{DATA:thread}\] )?%{LOGLEVEL:level}%{SPACE}%{JAVACLASS:class}\.%{DATA:method} - %{GREEDYDATA:loggedString}
output:
{
"loggedString": "Failed to bind request element",
"method": "handleTypeMismatch",
"level": "WARN",
"class": "o.s.w.s.m.s.DefaultHandlerExceptionResolver",
"timestamp": "2018-08-02 12:02:25.904"
}

Related

Grok Patterns for SSSD Logs

I am trying to parse the SSSD Demon logs using Logstash grok patterns for better visibility
log samples
(Mon Nov 9 12:08:56 2020) [sssd[nss]] [client_recv] (0x0200): Client disconnected!
(Mon Nov 9 12:08:56 2020) [sssd[nss]] [client_close_fn] (0x2000): Terminated client [0x55ffd29d93c0][22]
I have created custom Grok patterns as stated below:
SSSD_TIME [ \(%{DAY} %{MONTH} %{MONTHDAY} %{TIME} %{YEAR}\)]+
SSSD_DEMON \[[a-z]*\[[a-z]*\]\]+
SSSD_FUNCTION \[[a-z,_]*\]+
SSD_LOG_LEVEL (\(\dx\d*\))+
I am getting the below output using the above custom grok patterns for the query stated below
%{SSSD_TIME:time} %{SSSD_DEMON:demon} %{SSSD_FUNCTION:function} %{SSD_LOG_LEVEL:loglevel}[:]\s+%{GREEDYDATA:message}
Output:
{
"function": "[client_recv]",
"loglevel": "(0x0200)",
"time": "(Mon Nov 9 12:08:56 2020)",
"demon": "[sssd[nss]]",
"message": "Client disconnected!"
}
I need to extract only the values with in the brackets and not the whole content
I tried skipping the brackets but it only work for first value
query below for skipping first bracket
\(%{SSSD_TIME:time}\) %{SSSD_DEMON:demon} %{SSSD_FUNCTION:function} %{SSD_LOG_LEVEL:loglevel}[:]\s+%{GREEDYDATA:message}
I need to get the below output
{
"function": "client_recv",
"loglevel": "0x0200",
"time": "Mon Nov 9 12:08:56 2020",
"demon": "sssd[nss]",
"message": "Client disconnected!"
}
If anyone can help me with this that will be great
Thanks
Here is the grok pattern for your desired output:
\((?<timestamp>%{DAY} %{MONTH} %{MONTHNUM} %{TIME} %{YEAR})\) \[(?<daemon>(.*))\] \[%{DATA:function}\] \(%{DATA:log_level}\): %{GREEDYDATA:message}
I have used the Grok Debugger to create the from pattern.
Here is the screenshot of the output:
If you want, you can then remove the unnecessary tags like DAY, MONTH etc., using mutate filter of logstash.

How to use IF ELSE condition in grok pattern in logstash

I have web and API log combined and I want to save it separately in elasticsearch. So I want to write one pattern if the request is for API then if past should execute, the request is web then else part of the log should be executed.
Below are few web and API logs.
00:06:27,778 INFO [stdout] (ajp--0.0.0.0-8009-38) 00:06:27.777 [ajp--0.0.0.0-8009-38] INFO c.r.s.web.rest.WidgetController - Method getWidgetDetails() started to get widget details.
00:06:27,783 INFO [stdout] (ajp--0.0.0.0-8009-38) ---> HTTP GET http://api.survey.me/v1/getwidgetdetails?profileName=jeremy-steffens&profileLevel=INDIVIDUAL&companyProfileName=premier-nationwide-lending&hideHistory=true
00:06:27,817 INFO [stdout] (ajp--0.0.0.0-8009-38) <--- HTTP 200 http://api.survey.me/v1/getwidgetdetails?profileName=jeremy-steffens&profileLevel=INDIVIDUAL&companyProfileName=premier-nationwide-lending&hideHistory=true (29ms)
00:06:27,822 INFO [stdout] (ajp--0.0.0.0-8009-38) 00:06:27.822 [ajp--0.0.0.0-8009-38] INFO c.r.s.web.rest.WidgetController - Method getWidgetDetails() finished.
00:06:27,899 INFO [stdout] (ajp--0.0.0.0-8009-40) 00:06:27.899 [ajp--0.0.0.0-8009-40] INFO c.r.s.web.controller.LoginController - Inside initLoginPage() of LoginController
I tried to write condition but it's not working. It's working only up to thread name. After thread I have multiple type log so not able to write witout if condition.
(?:%{TIME:CREATED_ON})(?:%{SPACE})%{WORD:LEVEL}%{SPACE}\[%{NOTSPACE}\]%{SPACE}\(%{NOTSPACE:THREAD}\)
Can anybody give me suggestion?
You don't need to use an if/else conditon to do this, you can use multiple patterns, one will match the API log lines and the other will match the WEB log lines.
For the API log lines you can use the following pattern:
(?:%{TIME:CREATED_ON})(?:%{SPACE})%{WORD:LEVEL}%{SPACE}\[%{NOTSPACE}\]%{SPACE}\(%{NOTSPACE:THREAD}\)%{SPACE}(?:%{DATA})%{SPACE}\[%{DATA}\]%{SPACE}%{WORD}%{SPACE}%{GREEDYDATA:MSG}
And your return will be something like this:
{
"MSG": "c.r.s.web.controller.LoginController - Inside initLoginPage() of LoginController",
"CREATED_ON": "00:06:27,899",
"LEVEL": "INFO",
"THREAD": "ajp--0.0.0.0-8009-40"
}
For the web lines you can use the following pattern:
(?:%{TIME:CREATED_ON})(?:%{SPACE})%{WORD:LEVEL}%{SPACE}\[%{NOTSPACE}\]%{SPACE}\(%{NOTSPACE:THREAD}\)%{SPACE}%{DATA}%{WORD:PROTOCOL}%{SPACE}%{WORD:MethodOrStatus}%{SPACE}%{GREEDYDATA:ENDPOINT}
And the result will be:
{
"CREATED_ON": "00:06:27,783",
"PROTOCOL": "HTTP",
"ENDPOINT": "http://api.survey.me/v1/getwidgetdetails?profileName=jeremy-steffens&profileLevel=INDIVIDUAL&companyProfileName=premier-nationwide-lending&hideHistory=true",
"LEVEL": "INFO",
"THREAD": "ajp--0.0.0.0-8009-38",
"MethodOrStatus": "GET"
}
To use multiple patterns in grok just do this:
grok {
match => ["message", "pattern1", "pattern2"]
}
Or you can save your patterns to a file and use patterns_dir to point to the directory of the file.
If you still want to use a conditional, just check for anything in the message, for example:
if "HTTP" in [message] {
grok { your grok for the web messages }
} else {
grok { your grok for the api messages }
}

How to remove filebeat tags like id, hostname, version, grok_failure message

I am new to elk my sample log is look like
2017-01-05T14:28:00 INFO zeppelin IDExtractionService transactionId abcdef1234 operation extractOCRData received request duration 12344 exception error occured
my filebeat configuration is below
filebeat.prospectors:
- input_type: log
paths:
- /opt/apache-tomcat-7.0.82/logs/*.log
document_type: apache-access
fields_under_root: true
output.logstash:
hosts: ["10.2.3.4:5044"]
And my logstash filter.conf file:
filter {
grok {
match => [ "message", "transactionId %{WORD:transaction_id} operation %{WORD:otype} received request duration %{NUMBER:duration} exception %{WORD:error}" ]
}
}
filter {
if "beats_input_codec_plain_applied" in [tags] {
mutate {
remove_tag => ["beats_input_codec_plain_applied"]
}
}
}
;
In kibana dashboard i can see log output as below
beat.name:
ebb8a5ec413b
beat.hostname:
ebb8a5ec413b
host:
ebb8a5ec413b
tags:
beat.version:
6.2.2
source:
/opt/apache-tomcat-7.0.82/logs/IDExtraction.log
otype:
extractOCRData
duration:
12344
transaction_id:
abcdef1234
#timestamp:
April 9th 2018, 16:20:31.853
offset:
805,655
#version:
1
error:
error
message:
2017-01-05T14:28:00 INFO zeppelin IDExtractionService transactionId abcdef1234 operation extractOCRData received request duration 12344 exception error occured
_id:
7X0HqmIBj3MEd9pqhTu9
_type:
doc
_index:
filebeat-2018.04.09
_score:
6.315
1 First question is how to remove filebeat tag like id,hostname,version,grok_failure message
2 how to sort logs on timestamp basis because Newly generated logs not appearing on top of kibana dashboard
3 Is there any changes required in my grok filter
You can remove filebeat tags by setting the value of fields_under_root: false in filebeat configuration file. You can read about this option here.
If this option is set to true, the custom fields are stored as
top-level fields in the output document instead of being grouped under
a fields sub-dictionary. If the custom field names conflict with other
field names added by Filebeat, the custom fields overwrite the other
fields.
you can check if _grokparsefailure is in tags using, if "_grokparsefailure" in [tags] and remove it with remove_tag => ["_grokparsefailure"]
Your grok filter seems to be alright.
Hope it helps.

grok filter logstash JSON parse error, original data now in message field

I'm using logstash with a configuration input{rabbitmq} filter{grok} output{elastic}
From rabbit I receive nginx logs in this format :
- - [06/Mar/2017:15:45:53 +0000] "GET /check HTTP/1.1" 200 0 "-" "ELB-HealthChecker/2.0"
and I'm using grok filter as simple as follow :
filter{
if [type] == "nginx" {
grok{
match => { "message" => "%{NGINXACCESS}" }
}
}
}
and the pattern is
NGUSERNAME [a-zA-Z\.\#\-\+_%]+
NGUSER %{NGUSERNAME}
NGINXACCESS %{NGUSER:ident} %{NGUSER:auth} \[%{HTTPDATE:timestamp}\] "%{WORD:verb} %{URIPATHPARAM:request} HTTP/%{NUMBER:httpversion}" %{NUMBER:response} (?:%{NUMBER:bytes}|-) (?:"(?:%{URI:referrer}|-)"|%{QS:referrer}) %{QS:agent}
I tried the pattern in grok debugger and it seems to work just fine but running the pipeline i get this error
[2017-03-06T16:46:40,692][ERROR][logstash.codecs.json ] JSON parse error, original data now in message field {:error=>#, :data=>"- - [06/Mar/2017:16:46:40 +0000] \"GET /check HTTP/1.1\" 200 0 \"-\" \"ELB-HealthChecker/2.0\""}
it seems like someone(logstash?) is adding \ to the result...
hope to get some help, thanks!
This does not seem to be a grok error at all. if grok fails to parse it will add a tag _grokparsefailure to your event. A JSON parse error would be due to your input trying to read codec => json {} when your log format is plainly not JSON. Make sure that your input plugin that is handling these log types is using codec => plain or an appropriate type.
See logstash codecs for more info.

ElasticSearch 1.5 upgrade: JodaTime conversion script error

I'm working through upgrading our ElasticSearch version from 1.3 to 1.5. We use the Java API heavily. the following script in an ES query:
{
"script" : {
"script" : "values contains (int)doc['timestamp'].date.toDateTime(DateTimeZone.forID('America/New_York')).getMonthOfYear()",
"params" : {
"values" : [ 1 ]
},
"lang" : "groovy"
}
}
This works with 1.3, but gives the following error in 1.5:
org.elasticsearch.action.search.SearchPhaseExecutionException: Failed to execute phase [query_fetch], all shards failed; shardFailures {[XwOu9zq0TMi2uOptdfIS7w][eventdata][0]: QueryPhaseExecutionException[[eventdata][0]: query[filtered(ConstantScore(ScriptFilter(values contains (int)doc['timestamp'].date.toDateTime(DateTimeZone.forID('America/New_York')).getMonthOfYear().toString())))->cache(org.elasticsearch.index.search.nested.NonNestedDocsFilter#2a8c0465)],from[0],size[10]: Query Failed [Failed to execute main query]]; nested: GroovyScriptExecutionException[MissingMethodException[No signature of method: Script1.contains() is applicable for argument types: (java.lang.Class) values: [int]
Possible solutions: toString(), toString(), notify()]]; }
at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.onFirstPhaseResult(TransportSearchTypeAction.java:238)
at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$1.onFailure(TransportSearchTypeAction.java:184)
at org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:565)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
What's the best way to address this?
That looks like a brackets issue with the cast, so it thinks you're passing int to contains. Try adding brackets to help out the parser:
values.contains((int)doc['timestamp'].date.toDateTime(DateTimeZone.forID('Ameri‌​ca/New_York')).getMonthOfYear())

Resources