Can I use gsub to recursively replace all fieldnames with another field? - logstash

After changing my mapping in ElasticSearch to more definitively type the data I am inputting into the system, I have unwittingly made my new variables a nested object. Upon thinking about it more, I actually like the idea of those fields being nested objects because that way I can explicitly know if that src_port statistic is from netflow or from the ASA logs, as an example.
I'd like to use a mutate (gsub, perhaps?) to cause all of my fieldnames for a given type to be renamed to newtype.fieldname. I see that there is gsub which uses a regexp, and rename which takes the literal field name, but I would like to prevent having 30 distinct gsub/rename statements when I will be replacing all of the fields in that type with the "newtype" prefix.
Is there a way to do this?

Here is an example for your reference.
input {
stdin{
type => 'netflow'
}
}
filter {
mutate {
add_field => {"%{type}.message" => "%{message}"}
remove_field => ["message"]
}
}
output {
stdout{
codec => rubydebug
}
}
In this example I have change the message field name to type.message, then delete the origin message field. I think you can use this sample to do what you want.
Hope this can help you.
I have updated my answer!
Use the ruby plugin to do what you want!
Please notice that elasticsearch uses #timestamp field to do index, so I recommend do not change the field name.
input {
stdin{
type => 'netflow'
}
}
filter {
ruby {
code => "
data = event.clone.to_hash;
type = event['type']
data.each do |k,v|
if k != '#timestamp'
newFieldName = type +'.'+ k
event[newFieldName] = v
event.remove(k)
end
end
"
}
}
output {
stdout{
codec => rubydebug
}
}

Related

Logstash: Dynamic field names based on filename

I got a filename in the format <key>:<value>-<key>:<value>.log like e.g. pr:64-author:mxinden-platform:aws.log containing logs of a test run.
I want to stream each line of the file to elasticsearch via logstash. Each line should be treated as a separate document. Each document should get the fields according to the filename. So e.g. for the above example let's say log-line 17-12-07 foo something happened bar would get the fields: pr with value 64, author with value mxinden and platform with value aws.
At the point in time, where I write the logstash configuration I do not know the names of the fields.
How do I dynamically add fields to each line based on the fields contained in the filename?
The static approach so far is:
filter {
mutate { add_field => { "file" => "%{[#metadata][s3][key]}"} }
else {
grok { match => { "file" => "pr:%{NUMBER:pr}-" } }
grok { match => { "file" => "author:%{USERNAME:author}-" } }
grok { match => { "file" => "platform:%{USERNAME:platform}-" } }
}
}
Changes to the filename structure are fine.
Answering my own question based on #dan-griffiths comment:
Solution for a file like pr=64,author=mxinden,platform=aws.log is to use the Elasticsearch kv filter like e.g.:
filter {
kv {
source => "file"
field_split => ","
}
}
where file is a field extracted from the filename via the AWS S3 input plugin.

Avoid collision of 'type' key when both event and logstash input provide values

I am trying to write a pipeline to load a file into logstash. My setup requires specifying the type field in the input section to Run multiple independent logstash config files with input,filter and output. Unfortunately the source data already contains the field type and it looks like the value from the source data is conflicting with the value provided from the input configuration.
The source data contains a json array like the following
[
{"key1":"obj1", "type":"looks like a bad choose for a key name"},
{"key1":"obj2", "type":"you can say that again"}
]
My pipeline looks like the following
input {
exec {
command => "cat /path/file_containing_above_json_array.txt"
codec => "json"
type => "typeSpecifiedInInput"
interval => 3600
}
}
output {
if[type] == "typeSpecifiedInInput" {
stdout {
codec => rubydebug
}
}
}
The output never gets called because type has been set to the value provided from the source data instead of the value provided from the input section.
How can I set up the input pipeline to avoid this conflict?
Nathan
Create a new field in your input instead of reusing 'type'. The exec{} input has add_field available.
Below is the final pipeline that uses add_field instead of type. A filter phase was added to clean up the document so that type field contains the expected value needed for writing into ElasticSearch (class of similar documents). The type value from the original JSON document is perserved in the key typeSpecifiedFromDoc The mutate step had to be broken into separate phases so that the replace would not affect type before its original value had been added as the new field typeSpecifiedFromDoc.
input {
exec {
command => "cat /path/file_containing_above_json_array.txt"
codec => "json"
add_field => ["msgType", "typeSpecifiedInInput"]
interval => 3600
}
}
filter {
if[msgType] == "typeSpecifiedInInput" {
mutate {
add_field => ["typeSpecifiedFromDoc", "%{type}"]
}
mutate {
replace => ["type", "%{msgType}"]
remove_field => ["msgType"]
}
}
}
output {
if[type] == "typeSpecifiedInInput" {
stdout {
codec => rubydebug
}
}
}

Logstash Filter geo_point with lat long?

Part of my grok filter (working) grabs the following two fields:
%{NUMBER:XCent} %{NUMBER:YCent}
which are lat, long points.
I'm attempting to add a location pin but keep getting a config failure when I use the --debug flag on my configuration file
All of my configuration works until I get to this section.
if [XCent] and [YCent] {
mutate {
add_field => {
"[location][lat]" => "%{XCent}"
"[location][lon]" => "%{YCent}"
}
}
mutate {
convert => {
"[location][lat]" => "float"
"[location][lon]" => "float"
}
}
mutate {
convert => {"[location]", "geo_point"}
}
}
My thought was that this is basically what the elastic documentation for logstash 1.4 suggested
https://www.elastic.co/guide/en/elasticsearch/reference/1.4/mapping-geo-point-type.html
Edit: found better way to apply configuration in filter, updated code.
The third mutate filter is invalid. convert accepts a hash as it's argument. And the valid conversions are integer, float, string, and boolean. You don't need this filter so you can just remove it.
To set the location field as a geo_point type you need to modify the Elasticsearch index template you are using for your data.

Eliminate the top-level field in Logstash

I am using Logstash and one of my applications is sending me fields like:
[message][UrlVisited]
[message][TotalDuration]
[message][AccountsProcessed]
I'd like to be able to collapse these fields, removing the top level message altogether. So the above fields will become:
[UrlVisited]
[TotalDuration]
[AccountsProcessed]
Is there a way to do this in Logstash?
Assuming the names of all such subfields are known in advance you can use the mutate filter:
filter {
mutate {
rename => ["[message][UrlVisited]", "UrlVisited"]
}
mutate {
rename => ["[message][TotalDuration]", "TotalDuration"]
}
mutate {
rename => ["[message][AccountsProcessed]", "AccountsProcessed"]
}
mutate {
remove_field => ["message"]
}
}
Alternatively, use a ruby filter (which works even if you don't know the field names):
filter {
ruby {
code => "
event.get('message').each {|k, v|
event.set(k, v)
}
event.remove('message')
"
}
}
This example works on Logstash 2.4 and later. For earlier versions use event['message'].each ... and event[k] = v instead.

Logstash grok filter - name fields dynamically

I've got log lines in the following format and want to extract fields:
[field1: content1] [field2: content2] [field3: content3] ...
I neither know the field names, nor the number of fields.
I tried it with backreferences and the sprintf format but got no results:
match => [ "message", "(?:\[(\w+): %{DATA:\k<-1>}\])+" ] # not working
match => [ "message", "(?:\[%{WORD:fieldname}: %{DATA:%{fieldname}}\])+" ] # not working
This seems to work for only one field but not more:
match => [ "message", "(?:\[%{WORD:field}: %{DATA:content}\] ?)+" ]
add_field => { "%{field}" => "%{content}" }
The kv filter is also not appropriate because the content of the fields may contain whitespaces.
Is there any plugin / strategy to fix this problem?
Logstash Ruby Plugin can help you. :)
Here is the configuration:
input {
stdin {}
}
filter {
ruby {
code => "
fieldArray = event['message'].split('] [')
for field in fieldArray
field = field.delete '['
field = field.delete ']'
result = field.split(': ')
event[result[0]] = result[1]
end
"
}
}
output {
stdout {
codec => rubydebug
}
}
With your logs:
[field1: content1] [field2: content2] [field3: content3]
This is the output:
{
"message" => "[field1: content1] [field2: content2] [field3: content3]",
"#version" => "1",
"#timestamp" => "2014-07-07T08:49:28.543Z",
"host" => "abc",
"field1" => "content1",
"field2" => "content2",
"field3" => "content3"
}
I have try with 4 fields, it also works.
Please note that the event in the ruby code is logstash event. You can use it to get all your event field such as message, #timestamp etc.
Enjoy it!!!
I found another way using regex:
ruby {
code => "
fields = event['message'].scan(/(?<=\[)\w+: .*?(?=\](?: |$))/)
for field in fields
field = field.split(': ')
event[field[0]] = field[1]
end
"
}
I know that this is an old post, but I just came across it today, so I thought I'd offer an alternate method. Please note that, as a rule, I would almost always use a ruby filter, as suggested in either of the two previous answers. However, I thought I would offer this as an alternative.
If there is a fixed number of fields or a maximum number of fields (i.e., there may be fewer than three fields, but there will never be more than three fields), this can be done with a combination of grok and mutate filters, as well.
# Test message is: `[fieldname: value]`
# Store values in [#metadata] so we don't have to explicitly delete them.
grok {
match => {
"[message]" => [
"\[%{DATA:[#metadata][_field_name_01]}:\s+%{DATA:[#metadata][_field_value_01]}\]( \[%{DATA:[#metadata][_field_name_02]}:\s+%{DATA:[#metadata][_field_value_02]}\])?( \[%{DATA:[#metadata][_field_name_03]}:\s+%{DATA:[#metadata][_field_value_03]}\])?"
]
}
}
# Rename the fieldname, value combinations. I.e., if the following data is in the message:
#
# [foo: bar]
#
# It will be saved in the elasticsearch output as:
#
# {"foo":"bar"}
#
mutate {
rename => {
"[#metadata][_field_value_01]" => "[%{[#metadata][_field_name_01]}]"
"[#metadata][_field_value_02]" => "[%{[#metadata][_field_name_02]}]"
"[#metadata][_field_value_03]" => "[%{[#metadata][_field_name_03]}]"
}
tag_on_failure => []
}
For those who may not be as familiar with regex, the captures in ()? are optional regex matches, meaning that if there is no match, the expression won't fail. The tag_on_failure => [] option in the mutate filter ensures that no error will be appended to tags if one of the renames fails because there was no data to capture and, as a result, there is no field to rename.

Resources