How can I extract key word value from a string - logstash-grok

I am using this tool https://grokdebug.herokuapp.com/ to test my grok parser. The origin string I have is something like:
2020-05-01 01:59:10 server1 17.5.36.8 POST /v1/user.aspx r=1000&11:59:11.219&Method=Start&Credentials=xxxxxx:++http://localhost/v1/user.aspx&Reque
I'd like to parse the data to:
{
Method: Start,
r: 1000
Credential: xxxxx
}
I am looking at the parser https://github.com/logstash-plugins/logstash-patterns-core/blob/master/patterns/grok-patterns but I can't find a good one to use.

I don't know how to achieve this using solely grok patterns. If you're also using Logstash, you can try the following solution:
The pattern:
%{TIMESTAMP_ISO8601:timestamp}\s%{GREEDYDATA:server}\s%{IP:ip}\s%{GREEDYDATA:request_type}\s%{PATH:path}\sr=%{NUMBER:r}&%{TIME:some_time}&Method=%{GREEDYDATA:method}&Credentials=%{GREEDYDATA:username}:%{GREEDYDATA:password}%{MY_URI:uri}
It works with the custom pattern:
MY_URI http://(?:%{USER}(?::[^#]*)?#)?(?:%{URIHOST})?(?:%{URIPATHPARAM})?
I used the custom pattern because the %{URIPROTO} from the original %{URI} pattern won't separate credentials from the uri properly. I assumed that credentials are given in the following format:
username:password
In case I'm wrong, please replace:
Credentials=%{GREEDYDATA:username}:%{GREEDYDATA:password}
with:
Credentials=%{GREEDYDATA:credentials}
If the :++ string separates credentials from uri, you can use it to apply the %{URI} instead of %{MY_URI}.
The pattern works in the Grok Debugger. However the output is more readable in the Grok Constructor Matcher.
Since you're only interested in some fields, use the remove_field plugin. Use the add_field plugin to enclose the result in a new field.
Your logstash.conf file may look like this (if you place the file with the custom pattern in the patterns directory alongside the config file):
# logstash.conf
…
filter {
grok {
patterns_dir => ["./patterns"]
match => {
"message" => "%{TIMESTAMP_ISO8601:timestamp}\s%{GREEDYDATA:server}\s%{IP:ip}\s%{GREEDYDATA:request_type}\s%{PATH:path}\sr=%{NUMBER:r}&%{TIME:some_time}&Method=%{GREEDYDATA:method}&Credentials=%{GREEDYDATA:username}:%{GREEDYDATA:password}%{MY_URI:uri}"
}
mutate {
add_field => { "result" => "Method: %{method}, r: %{r} Credential: %{username}}
remove_field => ["timestamp", "server", "ip", "request_type", "path", "some_time", "password", "uri", "method", "r", "username"]
}
}
…

Related

logstash converting ruby code to logstash filters

I was wondering what will be the best way to implement the following task in logstash :
I have the following field that contains multiple paths divided by ':' :
my_field : "/var/log/my_custom_file.txt:/var/log/otherfile.log/:/root/aaa.jar
I want to add a new field called "first_file" that will contain only the file_name(without suffix) of the first path :
first_file : my_custom_file
I implemented it with the following ruby code ;
code => 'event.set("first_file",event.get("[my_field]").split(":")[0].split("/")[-1].split(".")[0])'
How can I use logstash filters (add_field,split,grok) to do the same task ? I feel like using ruby code should be my last option.
You could do it using just grok, but I think it would be clearer to use mutate to pull out the first value
mutate { split => { "my_field" => ":" } }
mutate { replace => "{ "my_field" => "[my_field][0]" } }
grok { match => { "my_field" => "/(?<my_field>[^/]+)\.%{WORD}$" } overwrite => [ "my_field" ] }
rather than
grok { match => { "my_field" => "/(?<my_field>[^/]+)\.%{WORD}:" } overwrite => [ "my_field" ] }
The (?<my_field>[^/]+) is a custom pattern (documented here) which creates a field called [my_field] from a sequence of one or more (+) characters which are not /
Yes with a basic grok you could match every field in the value.
This kind of filter must work (put it in your logstash configuration file), this one extract the "basename" of the file (filename without extension and path) :
filter{
grok {
match => { "my_field" => "%{GREEDYDATA}/%{WORD:filename}.%{WORD}:%{GREEDYDATA}/%{WORD:filename2}.%{WORD}:%{GREEDYDATA}/%{WORD:filename3}.%{WORD}" }
}
}
You could be more strict in grok with use of PATH in place of GREYDATA, I let you determine your best approach that works in your context.
You could debug the grok pattern with the online tool grokdebug.

Hard to stash a log file with different occurrence of order for a field using Logstash

I am trying to stash a log file to elasticsearch using Logstash. I am facing a problem while doing this.
If the log file has same kind of log lines like the below,
[12/Sep/2016:18:23:07] VendorID=5037 Code=C AcctID=5317605039838520
[12/Sep/2016:18:23:22] VendorID=9108 Code=A AcctID=2194850084423218
[12/Sep/2016:18:23:49] VendorID=1285 Code=F AcctID=8560077531775179
[12/Sep/2016:18:23:59] VendorID=1153 Code=D AcctID=4433276107716482
where the date, vendorId, code and acctID's order of occurrence of fields does not change or a new element is not added in to it, then the filter(given below) in the config files work well.
\[%{MONTHDAY}/%{MONTH}/%{YEAR}:%{TIME}\] VendorID=%{INT:VendorID} Code=%{WORD:Code} AcctID=%{INT:AcctID}
Suppose the order changes like the example given below or if a new element is added to one of the log lines, then the grokparsefailure occurs.
[12/Sep/2016:18:23:07] VendorID=5037 Code=C AcctID=5317605039838520
[12/Sep/2016:18:23:22] VendorID=9108 Code=A AcctID=2194850084423218
[12/Sep/2016:18:23:49] VendorID=1285 Code=F AcctID=8560077531775179
[12/Sep/2016:18:23:59] VendorID=1153 Code=D AcctID=4433276107716482
[12/Sep/2016:18:24:50] AcctID=3168124750473449 VendorID=1065 Code=L
[12/Sep/2016:18:24:50] AcctID=3168124750473449 VendorID=1065 Code=L
[12/Sep/2016:18:24:50] AcctID=3168124750473449 VendorID=1065 Code=L
Here in the example, the last three log lines are different from the first four log lines in order of occurrence of the fields. And because of this, the filter message with the grok pattern could not parse the below three lines as it is written for the first four lines.
How should I handle this scenario, when i come across this case? Please help me solve this problem. Also provide any link to any document for detailed explanation with examples.
Thank you very much in advance.
As correctly pointed out by baudsp, this can be achieved by multiple grok filters. The KV filter seems like a nicer option, but as for grok, this is one solution:
input {
stdin {}
}
filter {
grok {
match => {
"message" => ".*test1=%{INT:test1}.*"
}
}
grok {
match => {
"message" => ".*test2=%{INT:test2}.*"
}
}
}
output {
stdout { codec => rubydebug }
}
By having 2 different grok filter applied, we can disregard the order of the logs coming in. The patterns specified basically do not care about what comes before or after the String test and rather just standalone match their respective patterns.
So, for these 2 strings:
test1=12 test2=23
test2=23 test1=12
You will get the correct output. Test:
artur#pandaadb:~/dev/logstash$ ./logstash-2.3.2/bin/logstash -f conf_grok_ordering/
Settings: Default pipeline workers: 8
Pipeline main started
test1=12 test2=23
{
"message" => "test1=12 test2=23",
"#version" => "1",
"#timestamp" => "2016-12-21T16:48:24.175Z",
"host" => "pandaadb",
"test1" => "12",
"test2" => "23"
}
test2=23 test1=12
{
"message" => "test2=23 test1=12",
"#version" => "1",
"#timestamp" => "2016-12-21T16:48:29.567Z",
"host" => "pandaadb",
"test1" => "12",
"test2" => "23"
}
Hope that helps

How to combine characters to create custom pattern in GROK

I'm new to logstash and grok and have a question regarding a pattern.
Jul 26 09:46:37
The above content contains %{MONTH} %{MONTHDAY} %{TIME} and white spaces.
I need to know how to combine all these and create a pattern %{sample_timestamp}
Thanks!
Quotes from the Grok Custom Patterns Docs (RTFM):
First, you can use the Oniguruma syntax for named capture which will
let you match a piece of text and save it as a field:
(?<field_name>the pattern here)
...
Alternately, you can create a custom patterns file.
Create a directory called patterns with a file in it called extra (the file name doesn’t matter, but name it meaningfully for yourself)
In that file, write the pattern you need as the pattern name, a space, then the regexp for that pattern.
So you could create a pattern file that contained the line:
CUST_DATE %{MONTH} %{MONTHDAY} %{TIME}
Then use the patterns_dir setting in this plugin to tell logstash
where your custom patterns directory is.
filter {
grok {
patterns_dir => ["./patterns"]
match => { "message" => "%{CUST_DATE:datestamp}" }
}
}
Would result in the field:
datestamp => "Jul 26 09:46:37"
Filter
use pattern_definitions to define your patterns
filter {
grok {
pattern_definitions => { "MY_DATE" => "%{MONTH} %{MONTHDAY} %{TIME}" }
match => { "message" => "%{MY_DATE:timestamp}" }
}
}
Result
{
"timestamp": "Jul 26 09:46:37"
}
Tested using Logstash 6.5

Grok Pattern not working in Logstash

After parsing logs I am find there are some new lines at the end of the message
Sample message
ts:2016-04-26 05-02-16-018
CDT|ll:TRACE|tid:10000.140|scf:xxxxxxxxxxxxxxxxxxxxxxxxxxx.pc|mn:null|fn:xxxxxxxxxxxxxxxxxxxxxxxxxxx|ln:749|auid:xxxxxxxxxxxxxxxxxxxxxxxxxxx|eid:xxx.xxx.xxx.xxx-58261618-1-1461664935955-139|cid:900009865|ml:null|mid:-99|uip:xxx.xxx.xxx.xxx|hip:xxx.xxx.xxx.xxx|pli:null|msg:
xxxxxxxxxxxxxxxxxxxxxxxxxxx|pl: xxxxxxxxxxxxxxxxxxxxxxxxxxx
TAKE 1 xxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxx
I am using the regex pattern below as suggested below as answers
ts:(?(([0-9]+)-)+ ([0-9]+-)+ [A-Z]+)\|ll:%{WORD:ll}\|tid:%{NUMBER:tid}\|scf:%{DATA:scf}\|mn:%{WORD:mn}\|fn:%{WORD:fn}\|ln:%{WORD:ln}\|auid:%{WORD:auid}\|eid:%{DATA:eid}\|cid:%{WORD:cid}\|ml:%{WORD:ml}\|mid:%{NUMBER:mid}\|uip:%{DATA:uip}\|hip:%{DATA:hip}\|pli:%{WORD:pli}\|\smsg:%{GREEDYDATA:msg}(\|pl:(?(.|\r|\n)))
But unfortunately it is not working properly when the last part of the log is not present
ts:2016-04-26 05-02-16-018
CDT|ll:TRACE|tid:10000.140|scf:xxxxxxxxxxxxxxxxxxxxxxxxxxx.pc|mn:null|fn:xxxxxxxxxxxxxxxxxxxxxxxxxxx|ln:749|auid:xxxxxxxxxxxxxxxxxxxxxxxxxxx|eid:xxx.xxx.xxx.xxx-58261618-1-1461664935955-139|cid:900009865|ml:null|mid:-99|uip:xxx.xxx.xxx.xxx|hip:xxx.xxx.xxx.xxx
What should be the correct pattern?
-------------------Previous Question --------------------------------------
I am trying to parse log line such as this one.
ts:2016-04-26 05-02-16-018 CDT|ll:TRACE|tid:10000.140|scf:xxxxxxxxxxxxxxxxxxxxxxxxxxx.pc|mn:null|fn:xxxxxxxxxxxxxxxxxxxxxxxxxxx|ln:749|auid:xxxxxxxxxxxxxxxxxxxxxxxxxxx|eid:xxx.xxx.xxx.xxx-58261618-1-1461664935955-139|cid:900009865|ml:null|mid:-99|uip:xxx.xxx.xxx.xxx|hip:xxx.xxx.xxx.xxx|pli:null|msg: xxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxx
Below is my logstash filter
filter {
grok {
match => ["mesage", "ts:(?<date>(([0-9]+)-*)+ ([0-9]+-*)+ [A-Z]+)\|ll:%{WORD:ll}\|tid:%{WORD:tid}\|scf:%{WORD:scf}\|mn:%{WORD:mn}\|fn:%{WORD:fn}\|ln:%{WORD:ln}\|auid:%{WORD:auid}\|eid:%{WORD:eid}\|cid:%{WORD:cid}\|ml:%{WORD:ml}\|mid:%{WORD:mid}\|uip:%{WORD:uip}\|hip:%{WORD:hip}\|pli:%{WORD:pli}\|msg:%{WORD:msg}"]
}
date {
match => ["ts","yyyy-MM-dd HH-mm-ss-SSS ZZZ"]
target => "#timestamp"
}
}
I am getting "_grokparsefailure"
I have tested the configuration from #HAL, there was a few things to change:
In the grok filter mesage => message
In the date filter ts => date so the date parsing is on the right field
The CDT is a time zone name, it is captured by z in the date syntax.
So the right configuration would look like this :
filter{
grok {
match => ["message", "ts:(?<date>(([0-9]+)-*)+ ([0-9]+-*)+ [A-Z]+)\|ll:%{WORD:ll}\|tid:%{NUMBER:tid}\|scf:%{DATA:scf}\|mn:%{WORD:mn}\|fn:%{WORD:fn}\|ln:%{WORD:ln}\|auid:%{WORD:auid}\|eid:%{DATA:eid}\|cid:%{WORD:cid}\|ml:%{WORD:ml}\|mid:%{NUMBER:mid}\|uip:%{DATA:uip}\|hip:%{DATA:hip}\|pli:%{WORD:pli}\|\s*msg:%{GREEDYDATA:msg}"]
}
date {
match => ["date","yyyy-MM-dd HH-mm-ss-SSS z"]
target => "#timestamp"
}
}
Tried to parse your input via grokdebug with your expression but it failed to read out any fields. Managed to get it to work by changing the expression to:
ts:(?<date>(([0-9]+)-*)+ ([0-9]+-*)+ [A-Z]+)\|ll:%{WORD:ll}\|tid:%{NUMBER:tid}\|scf:%{DATA:scf}\|mn:%{WORD:mn}\|fn:%{WORD:fn}\|ln:%{WORD:ln}\|auid:%{WORD:auid}\|eid:%{DATA:eid}\|cid:%{WORD:cid}\|ml:%{WORD:ml}\|mid:%{NUMBER:mid}\|uip:%{DATA:uip}\|hip:%{DATA:hip}\|pli:%{WORD:pli}\|\s*msg:%{GREEDYDATA:msg}
I also think that you need to change the name of the column that logstash shall parse from mesage to message.
Also, the date parsing pattern should match the format of the date in the input. There is no timezone identity (ZZZ) in your input data (at least not in the example).
Something like this should work better (not tested though):
filter {
grok {
match => ["mesage", "ts:(?<date>(([0-9]+)-*)+ ([0-9]+-*)+ [A-Z]+)\|ll:%{WORD:ll}\|tid:%{NUMBER:tid}\|scf:%{DATA:scf}\|mn:%{WORD:mn}\|fn:%{WORD:fn}\|ln:%{WORD:ln}\|auid:%{WORD:auid}\|eid:%{DATA:eid}\|cid:%{WORD:cid}\|ml:%{WORD:ml}\|mid:%{NUMBER:mid}\|uip:%{DATA:uip}\|hip:%{DATA:hip}\|pli:%{WORD:pli}\|\s*msg:%{GREEDYDATA:msg}"]
}
date {
match => ["ts","yyyy-MM-dd HH-mm-ss-SSS"]
target => "#timestamp"
}
}

logstash pattern don't match in the expected way

I'm using logstash to collect my server.log from several glassfish domains. Unfortunatly in the log is no domainname. But the pathname have.
So I tried to get a part of the filename to match it to the GF-domain. The Problem is, that the pattern I defined don't matche the right part.
here the logstash.conf
file {
type => "GlassFish_Server"
sincedb_path => "D:/logstash/.sincedb_GF"
#start_position => beginning
path => "D:/logdir/GlassFish/Logs/GF0/server.log"
}
grok {
patterns_dir => "./patterns"
match =>
[ 'path', '%{DOMAIN:Domain}']
}
I' ve created a custom-pattern file and filled it with a regexp
my custom-pattern-file
DOMAIN (?:[a-zA-Z0-9_-]+[\/]){3}([a-zA-Z0-9_-]+)
And the result is:
"Domain" => "logdir/GlassFish/Logs/GF0"
I've tested my RegExp on https://www.regex101.com/ and is working fine.
Using http://grokdebug.herokuapp.com/ to verify the pattern brings the same "unwanted" result.
What I'm doing wrong? Has anybody an idea to get only the domain name "GF0", e.g. modify my pattern or using mutate in the logstash.conf?
I'm assuming that you're trying to strip out the GF0 portion from path?
If that's the case and you know that the path will always be in the same format, you could just use something like this for the grok:
filter {
grok {
match => [ 'path', '(?i)/Logs/%{WORD:Domain}/' ]
}
}
not as elegant as a regexp, but it should work.

Resources