Refinement on my working grok pattern for multiline javastack logs - logstash

My log file has numerous spaces and new line characters, I have written grok pattern to extract data from it. Need some confirmation if this approach is good for this kind of logs of if there is any other better approach please suggest.
Original Log file:
Active: 37 minutes 0.00 seconds
User:
ServiceUser1
Tenant:
Session:
9F0071A66D89544155D149CCE2453E9A:mx2135649930e123d964:(WebServiceFacade.java:84)
Parameters:
bosContext _cntx:
user:
ContextUser1
depth:
3
session id:
9F0071A66D89544155D149CCE2453E9A:mx2135649930e123d964:(WebServiceFacade.java:84)
bosUTF _className:
TestClassName1
bosStringList _construct:
2 entries
$$MXRIP$$|java.util.HashMap
1
bosUTF _methodName:
TestMethodName1
Working Grok for above log without spaces is this.
Active:((?m))%{GREEDYDATA:Active}\n\s*User\:\n((?m))%{GREEDYDATA:User}\n\s*Tenant:\n((?m))%{GREEDYDATA:Tenant}\n\s*Session:\n((?m))%{DATA:session}\n\s*Parameters:\n\s*bosContext\s_cntx:\n\s*user:\n((?m))%{GREEDYDATA:ContextUser}\n\s*depth:\n((?m))%{GREEDYDATA:depth}\n\s*session\sid:\n((?m))%{GREEDYDATA:SessionID}\n\s*bosUTF\s_className:\n((?m))%{DATA:ClassName}\n\s*bosStringList\s_construct:\n((?m))%{GREEDYDATA:Construct}\n\s*bosUTF\s_methodName:\n((?m))%{GREEDYDATA:Method}
Is it really good approach to write this many spaces and GREEDYDATA in grok pattern. Please confirm.

Related

Filebeat not sending correct multiline log to logstash

For some reason filebeat is not sending the correct logs while using the multiline filter in the filebeat.yml file. The log file im reading has some multiline logs, and some single lines. However, they all follow the same format by starting with a date. For an example, here is a couple lines:
2017-Aug-23 10:33:43: OutputFile: This is a sample message
2017-Aug-23 10:34:23: MainClass: Starting connection:
http.InputProcess: 0
http.OutPutProcess: 1
2017-Aug-23 10:35:21: OutputFile: This is a sample message 2
My Filebeat yml is:
- input_type: log
paths:
- /home/user/logfile.log
document_type: chatapp
multiline:
pattern: "^%{YYYY-MMM-dd HH:mm:ss}"
negate: true
match: before
For some reason when i see the filebeat logs hit elasticsearch, all of the logs will be aggragated into one log line, so it does not seem to be actually reading the file date by date. Can Anyone help? Thanks!
Use
pattern: "^%{YEAR}-%{MONTH}-%{MONTHDAY}"
The pattern you are currently using there is not a validly defined regex given the grok patterns.
You can test multiline patterns using the grokconstructor. I constructed this pattern from the grok-patterns predefined in logstash.

logstash filter , unfiltered lines

I am new to Logstash filter and going through different blogs and links to understand in detail. I have few questions which are still unanswered.
. If my log file has different log pattern e.g.
2017-01-30 14:30:58 INFO ThreadName:33 - {"t":1485786658088,"h":"abcd1234", "l":"INFO", "cN":"org.logstash.demo", "mN":"getNextvalue", "m":"fetching next value"}
2017-01-30 14:30:58 INFO AnotherThread:33 -my log pattern is different
I have below filter which is successfully filtering line 1 of the log
grok
{
match => [ "message", "%{TIMESTAMP_ISO8601:LogDate} %{LOGLEVEL:loglevel} %{WORD:threadName}:%{NUMBER:ThreadID} - %{GREEDYDATA:Line}" ]
}
json
{
source => "Line"
}
what will happen with the lines which can not be filtered using filter pattern?
Is there any way to capture all the lines which were not filtered and send to elasticSearch ?
Is there any good reading material where I can read about Input, Filter, Output plugins with the examples ?
To answer your questions:
The lines which cannot be filtered using grok would end up in a
grok_parsefailure. Make sure you handle it by dropping the lines
which don't actually match the filter criteria.
As far as I know you can't capture them separately and push it to ES. Maybe for this, you can have multiple grok patterns so that you can filter it out and send it to different ES indices thereafter.
I've added the links in the comment above.
This SO could come in handy. Hope it helps!
As #darth_vader points out, you'll get a "grok_parsefailure" tag on each document that doesn't match your pattern(s) in a grok{} filter. However, how you handle this failure is up to you.
By default, all the events will fall through to your output{} section, which presumably would send them to elasticsearch. You could also have a conditional output{} section, which sent parsed logs to one output and unparsed logs to another (a file{} output, or a different index, or...).
As for examples, the official doc tends to include incomplete fragments (at best), so you're probably going to find better examples in random internet blogs.

GROK Pattern filtering

Hi I am new to logstash and grok filtering, I have a sample log like this:
1/11/2017 12:00:17 AM :
Error thrown is:
No Error
Request sent is:
webMethod:GetOSSUpdatedOrderHeader|appCode:OSS|regionCode:EMEA|orderKeyList:|lastModifedDateTime:1/10/2017 11:59:13 PM|
I want to filter out the line separator which is a line full of ** (the last line)
Also when I want to be able to capture entire line including ":" in one field. For example in the above log, webMethod:GetOSSUpdatedOrderHeader has to be captured in one field in my grok pattern. Is there a way to achieve this?? TIA. Please refer the attached image for the sample log message
A few tips:
Photos of logs are not a good way to offer someone an example, copy and paste the log
The Grok Debugger is a great way of building your own grok patterns
This should work for the sample log line you pasted in:
%{NOTSPACE:webMethod}\|%{NOTSPACE:appCode}\|%{NOTSPACE:regionCode}\|%{NOTSPACE:orderKeyList}\|%{NOTSPACE:lastModifedDateTime}
However, what you requested, probably isn't quite what you want, as you just want the field content in the result, not the name of the field as well. This should give you more sensible results:
webMethod:%{NOTSPACE:webMethod}\|appCode:%{NOTSPACE:appCode}\|regionCode:%{NOTSPACE:regionCode}\|orderKeyList:(?:%{NOTSPACE:orderKeyList}|)\|lastModifedDateTime:%{NOTSPACE:lastModifedDateTime}
You would then want to process the lastModifedDateTime field with the date filter to get the date stamp in a format logstash can save to.

Logstash Grok Problems with 4 digit year

Spent the past hour trying to setup a grok filter for logstash. Working with the Grok Debugger everything's good until I get to the timestamp. Grok chokes on the four digit year.
Here is a logfile entry as its sent to logstash:
Jul 8 11:54:29 192.168.1.144 1 2016-07-08T15:55:09.629Z era.somedomain.local ETAServer 1755 Syslog {"event_type":"Threat_Event","ipv4":"192.168.1.118","source_uuid":"7ecab29a-7db3-4c79-96f5-3946de54cbbf","occured":"08-Jul-2016 15:54:54","severity":"Warning","threat_type":"trojan","threat_name":"HTML/Agent.V","scanner_id":"HTTP filter","scan_id":"virlog.dat","engine_version":"13773 (20160708)","object_type":"file","object_uri":"http://malware.wicar.org/data/java_jre17_exec.html","action_taken":"connection terminated","threat_handled":true,"need_restart":false,"username":"DOMAIN\username","processname":"C:\Program Files (x86)\Google\Chrome\Application\chrome.exe"}
What I'm having trouble with is the first part before the JSON data. The first part of my grok statement:
%{MONTH}\ %{MONTHDAY}%{SPACE}%{TIME}%{SPACE}%{IPV4}%{SPACE}%{NUMBER}%{SPACE}
works fine correctly identifying everything up to the number '1' just before the year in the timestamp. The problem is when I add the following:
%{MONTH}\ %{MONTHDAY}%{SPACE}%{TIME}%{SPACE}%{IPV4}%{SPACE}%{NUMBER}%{SPACE}%{TIMESTAMP_ISO8601}
then I get "No Matches" in the grok debugger. Messing around with it a bit more it appears that problem is somewhere between the number '1' and the first two digits of the year in the timestamp since %{TIMESTAMP_ISO8601} only uses a two digit year.
Any suggestions or help would be greatly appreciated.
Digging a little deeper into Regex and Grok looks like I figured it out. I replaced %{TIMESTAMP_ISO8601} with:
([^\d\d]%{YEAR})[./-]%{MONTHNUM}[./-]%{MONTHDAY}[T ]%{HOUR}:?%{MINUTE}(?::?%{SECOND})?%{ISO8601_TIMEZONE}
and it worked perfectly. The key was the [^\d\d] in front of %{YEAR}

Handling different log formats in the same file

I have a single log file that contains differing output formats.
For example:
line 1 = 2015-01-1 12:04:56 INFO 192.168.0.1 my_user someone logged in
line 2 = 2015-01-1 12:04:56 WARN [webserver-thread] (MyClass.java:66) user authenticated
Whilst the real solution is to either split them into separate files or unify the formats is it possible to grok differing log formats with Logstash?
My first recommendation is to run one grok{} to strip off the common stuff - the datetime and log level. You can put the remaining stuff back into the [message] field:
%{TIMESTAMP_ISO8601} %{WORD:level} %{GREEDYDATA:message}
Make sure to use the 'overwrite' parameter in grok{}.
Then if you want to parse the remaining information, your (multiple) regexps will be running against a shorter string, which should make them more efficient.
You can then have multiple patterns:
grok {
match => [
"message", "PATTERN1",
"message", "PATTERN2"
]
}
By default, grok will stop processing when it hits the first match.

Resources