Logstash Grok Filter - parsing custom file

Logstash Grok Filter - parsing custom file - logstash

I am finding that logstash is not a fan of my filter. Would be nice to have a second set of eyes on it.
First - my log file - has the following entries with new lines for every volume.
/vol/vol0/ 298844160 6916836 291927324 2% /vol/vol0/
My config file looks as follows:
INPUT
file {
type => "testing"
path => "/opt/log_repo/ssh/netapp/*"
tags => "netapp"
start_position => "beginning"
sincedb_path => "/dev/null"
}
FILTER
if [type] == "testing" {
grok{
match => [ "#message", "{UNIXPATH:volume}%{SPACE}%{INT:total}%{SPACE}%{INT:used}%{SPACE}%{INT:avail}%{SPACE}%{PROG:cap}%{SPACE}%{UNIXPATH:vols}"]
}
}
OUTPUT
if [type] == "testing" {
elasticsearch {
action => "index"
hosts => ["http://localhost:9200"]
index => ["testing4-%{+YYYY.MM.dd}"]
}
}
When I run this it tells me it has a bad config file. If I change the filter to:
match => [ "#message", "{UNIXPATH:volume}" ]
It creates a new field called volume with the volume name. I am using the space component because the log is simply not consistent. Some volumes will have 4 spaces between the usable space and some will have more or less depending on the volume name and the size.
To get to this configuration i leveraged the following sites:
https://grokdebug.herokuapp.com/discover?#
http://grokconstructor.appspot.com/do/constructionstep
Still struggling on what I am missing.... Any help would be greatly appreciated.
UPDATE: After adding the recommendation below it still doesn't create a new field.
_index string
message string
type string
tags string
path string
#timestamp date
#version string
host string
_source _source
_id string
_type string
_score

Your pattern doesn't matrch the sample log for a very simple and silly reason - you are missing % at the start of your pattern. If you will add it then it works like a charm:
So the full filter is:
if [type] == "testing" {
grok{
match => [ "#message", "%{UNIXPATH:volume}%{SPACE}%{INT:total}%{SPACE}%{INT:used}%{SPACE}%{INT:avail}%{SPACE}%{PROG:cap}%{SPACE}%{UNIXPATH:vols}"]
}
}

Related

Using grok to create fields in sample log

FMT="1358 15:41:07W19/03/21 (A) Interlocking Link 116 Restored" STY="A" AMSEQ="LINKFAIL" AMSST="RTN" ALTID="1358" TS="20210319154107" CP="LOC A" CP="LOC X" MP="104.95" MP="104.95" EQ="MDIPRIMARYOFF" POS="TC-NORTH"
The log format is as above. I would like to capture the following fields using grok
Time - 15:41:07
Date - 19/03/21
Message - Interlocking Link 116 Restored
Location - Loc X
Anyone help with creating grok pattern that I can use on my logstash filter to parse my logs?

I would not use grok to start with. This is key/value data, so a kv filter will get you started, then you can grok the parts of the FMT field out.
kv { include_keys => [ "FMT", "CP" ] target => "[#metadata]" }
mutate { add_field => { "Location" => "%{[#metadata][CP][1]}" } }
grok { match => { "[#metadata][FMT]" => "%{NUMBER} %{TIME:Time}W%{DATE_EU:Date} \(%{WORD}\) %{GREEDYDATA:Message}" } }
will result in
"Message" => "Interlocking Link 116 Restored",
"Date" => "19/03/21",
"Time" => "15:41:07",
"Location" => "LOC X",
Although having multiple CP fields feels fragile.
The include_keys option on the kv filter tells the filter to ignore other keys. Using target to put the fields under [#metadata] means they are available to other filters but are not sent to the output. The remove_field option on the kv filter is only processed if the filter is able to parse the message, so if your kv data is invalid you will have a [message] field on the event that you can look at.

Logstash: Dynamic field names based on filename

I got a filename in the format <key>:<value>-<key>:<value>.log like e.g. pr:64-author:mxinden-platform:aws.log containing logs of a test run.
I want to stream each line of the file to elasticsearch via logstash. Each line should be treated as a separate document. Each document should get the fields according to the filename. So e.g. for the above example let's say log-line 17-12-07 foo something happened bar would get the fields: pr with value 64, author with value mxinden and platform with value aws.
At the point in time, where I write the logstash configuration I do not know the names of the fields.
How do I dynamically add fields to each line based on the fields contained in the filename?
The static approach so far is:
filter {
mutate { add_field => { "file" => "%{[#metadata][s3][key]}"} }
else {
grok { match => { "file" => "pr:%{NUMBER:pr}-" } }
grok { match => { "file" => "author:%{USERNAME:author}-" } }
grok { match => { "file" => "platform:%{USERNAME:platform}-" } }
}
}
Changes to the filename structure are fine.

Answering my own question based on #dan-griffiths comment:
Solution for a file like pr=64,author=mxinden,platform=aws.log is to use the Elasticsearch kv filter like e.g.:
filter {
kv {
source => "file"
field_split => ","
}
}
where file is a field extracted from the filename via the AWS S3 input plugin.

Logstash issue with KV filter

I am trying to index a document in ElasticSearch through logstash. An example from the file I am trying to index is as follows
GET firstname=john&lastname=smith 400
My objective is to create an index that looks something like the following
HTTPMethod: GET
firstname : john
lastname: smith
query_time : 400
I did the following so far
filter {
grok{
match => {"message" => "%{WORD:HttpMethod} %{GREEDYDATA:KVText} %{NUMBER:time:int}"}
}
kv {
source => "KVText"
value_split => "&"
remove_field => [ "KVText" ]
}
}
However, when I execute the logstash conf file I see the following
"query_time": 400,
"message": "GET firstname=john&lastname=smith 400\r",
"HttpMethod": "GET",
"firstname=john": "lastname=smith"
I am not getting the index as a key1=value1 format as discrete values. e.g. firstname=john lastname=smith
Also, whenever I make a change to my log file the logstash process doesn't pick the change for indexing in real time. I have to rename the file and restart logstash. I do understand it has something to do with the since_db path in my logstash.conf.
Any pointers are truly appreciated.
Thanks
Nick

You are configuring the kv filter the wrong way.
The value_split param tells the filter what char to use for splitting one key/value pair (you should put "=") while the field_split config tells what char to use to split pairs from the string. Try using:
kv {
source => "KVText"
value_split => "="
field_split => "&"
remove_field => [ "KVText" ]
}

how to write the logstash output config to get ABC,AD_EF,123?

my log format is:
XXX: 03-20 17:52:28: XXX. * 0 XXX [XXX] [X XX: X]:XXX\tABC:AD_EF:123\t0\tXXXXXXXXXXXXXXXX\tXXXXXXXXXXXXXXXXXXX
how to write the logstash output config to get ABC, AD_EF, 123 ?
output example:
good,ABC,DEF,123
output {
file {
path => "/xxx/xxx/xxx/output.txt"
codec => plain {
format => "good,ABC,DEF,123" # how to write this regular expression????
}
flush_interval => 0
}
}

Your log output seems to have embedded tabs in it, and those tabs bracket your data. This is good, as it means the csv filter can pull that out for you.
filter {
csv {
separator => " "
columns => [ 'garbage1', 'good', 'garbage2', 'garbage3', 'garbage4' ]
source => "message"
}
}
Note, that is the actual tab character in there, which is hard to represent here.
You would then output the content of the good field to your file.

Thanks For All help, but maybe I made a mistake.
And Finally, I get the answer for my quest:
filter {
grok {
match => {
"message" => "XXX\t(?<field1>\w+?):(?<field2>\w+?):(?<field3>\d+?)\t"
}
}}

Logstash grok filter - name fields dynamically

I've got log lines in the following format and want to extract fields:
[field1: content1] [field2: content2] [field3: content3] ...
I neither know the field names, nor the number of fields.
I tried it with backreferences and the sprintf format but got no results:
match => [ "message", "(?:\[(\w+): %{DATA:\k<-1>}\])+" ] # not working
match => [ "message", "(?:\[%{WORD:fieldname}: %{DATA:%{fieldname}}\])+" ] # not working
This seems to work for only one field but not more:
match => [ "message", "(?:\[%{WORD:field}: %{DATA:content}\] ?)+" ]
add_field => { "%{field}" => "%{content}" }
The kv filter is also not appropriate because the content of the fields may contain whitespaces.
Is there any plugin / strategy to fix this problem?

Logstash Ruby Plugin can help you. :)
Here is the configuration:
input {
stdin {}
}
filter {
ruby {
code => "
fieldArray = event['message'].split('] [')
for field in fieldArray
field = field.delete '['
field = field.delete ']'
result = field.split(': ')
event[result[0]] = result[1]
end
"
}
}
output {
stdout {
codec => rubydebug
}
}
With your logs:
[field1: content1] [field2: content2] [field3: content3]
This is the output:
{
"message" => "[field1: content1] [field2: content2] [field3: content3]",
"#version" => "1",
"#timestamp" => "2014-07-07T08:49:28.543Z",
"host" => "abc",
"field1" => "content1",
"field2" => "content2",
"field3" => "content3"
}
I have try with 4 fields, it also works.
Please note that the event in the ruby code is logstash event. You can use it to get all your event field such as message, #timestamp etc.
Enjoy it!!!

I found another way using regex:
ruby {
code => "
fields = event['message'].scan(/(?<=\[)\w+: .*?(?=\](?: |$))/)
for field in fields
field = field.split(': ')
event[field[0]] = field[1]
end
"
}

I know that this is an old post, but I just came across it today, so I thought I'd offer an alternate method. Please note that, as a rule, I would almost always use a ruby filter, as suggested in either of the two previous answers. However, I thought I would offer this as an alternative.
If there is a fixed number of fields or a maximum number of fields (i.e., there may be fewer than three fields, but there will never be more than three fields), this can be done with a combination of grok and mutate filters, as well.
# Test message is: `[fieldname: value]`
# Store values in [#metadata] so we don't have to explicitly delete them.
grok {
match => {
"[message]" => [
"\[%{DATA:[#metadata][_field_name_01]}:\s+%{DATA:[#metadata][_field_value_01]}\]( \[%{DATA:[#metadata][_field_name_02]}:\s+%{DATA:[#metadata][_field_value_02]}\])?( \[%{DATA:[#metadata][_field_name_03]}:\s+%{DATA:[#metadata][_field_value_03]}\])?"
]
}
}
# Rename the fieldname, value combinations. I.e., if the following data is in the message:
#
# [foo: bar]
#
# It will be saved in the elasticsearch output as:
#
# {"foo":"bar"}
#
mutate {
rename => {
"[#metadata][_field_value_01]" => "[%{[#metadata][_field_name_01]}]"
"[#metadata][_field_value_02]" => "[%{[#metadata][_field_name_02]}]"
"[#metadata][_field_value_03]" => "[%{[#metadata][_field_name_03]}]"
}
tag_on_failure => []
}
For those who may not be as familiar with regex, the captures in ()? are optional regex matches, meaning that if there is no match, the expression won't fail. The tag_on_failure => [] option in the mutate filter ensures that no error will be appended to tags if one of the renames fails because there was no data to capture and, as a result, there is no field to rename.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Logstash Grok Filter - parsing custom file - logstash

Related

Using grok to create fields in sample log

Logstash: Dynamic field names based on filename

Logstash issue with KV filter

how to write the logstash output config to get ABC,AD_EF,123?

Logstash grok filter - name fields dynamically

Categories

Resources