Logstash Filter - How to use the value of a field as the name of a new field with parsed json? - logstash

I have a json event coming into logstash that looks like (values are different in each event):
{
"metadata":{
"app_type":"Foo",
"app_namespace":"System.Bar"
"json_string":"{\"test\":true}"
}
}
I would like my logstash filter to output the following to elasticsearch (i'm not worried about trimming the existing fields):
{
"app_event":{
"foo":{
"system_bar":{
"test":true
}
}
}
}
I cannot figure out the syntax for how to use field values as the names of new fields.
I tried a few different ways, all resulted in a config error when starting my logstash instance. I'm only including the filter block of logstash.conf in my examples, the rest of the config is working as expected.
1.
# note this one does not have replacement and lowercase values I want
filter {
json {
source => "[metadata][json_string]"
target => "[app_event][%{[metadata][app_type]}][%{[metadata][app_namespace]}]"
}
}
I tried to create variables with values I need and then use those in the target
filter {
appType = "%{[metadata][instrument_type]}".downcase
appNamespace = "%{[metadata][app_namespace]}".downcase
appNamespace.gsub(".", "_")
json {
source => "[metadata][json_string]"
target => "[app_event][#{appType}][#{appNamespace}]"
}
}
Just to confirm that everything else was set up correctly this does work, but does not include the dynamically generated field structure I'm looking for.
filter {
json {
source => "[metadata][json_string]"
target => "[app_event]"
}
}

The json filter does not sprintf the value of target, so you cannot use a json filter in those ways. To make the field name variable you must use a ruby filter. Try
json { source => "message" target => "[#metadata][json1]" remove_field => [ "message" ] }
json { source => "[#metadata][json1][metadata][json_string]" target => "[#metadata][json2]" }
ruby {
code => '
namespace = event.get("[#metadata][json1][metadata][app_namespace]")
if namespace
namespace = namespace.downcase.sub("\.", "_")
end
type = event.get("[#metadata][json1][metadata][app_type]")
if type
type = type.downcase
end
value = event.get("[#metadata][json2]")
if type and namespace and value
event.set("app_event", { "#{type}" => { "#{namespace}" => value } } )
end
'
}

Related

Logstash date format

I am using logstash to parse json input message and then add another field from one of the parsed values:
filter {
json {
source => "message"
target => "data"
}
mutate {
add_field => {
"index_date" => "%{[data][#timestamp]}}"
}
}
}
This works fine, but now I need index_date to be only the date.
How can I format the [data][#timestamp] field to only return the date?
You will need to install the date_formatter plugin with
bin/logstash-plugin install logstash-filter-date
And then you can use something like this in your logstash filter function
date_formatter {
source => "index_data"
target => "[#metadata][indexDateOnlyDate]"
pattern => "YYYY.MM.dd"}
This should work :)

Logstash: Dynamic field names based on filename

I got a filename in the format <key>:<value>-<key>:<value>.log like e.g. pr:64-author:mxinden-platform:aws.log containing logs of a test run.
I want to stream each line of the file to elasticsearch via logstash. Each line should be treated as a separate document. Each document should get the fields according to the filename. So e.g. for the above example let's say log-line 17-12-07 foo something happened bar would get the fields: pr with value 64, author with value mxinden and platform with value aws.
At the point in time, where I write the logstash configuration I do not know the names of the fields.
How do I dynamically add fields to each line based on the fields contained in the filename?
The static approach so far is:
filter {
mutate { add_field => { "file" => "%{[#metadata][s3][key]}"} }
else {
grok { match => { "file" => "pr:%{NUMBER:pr}-" } }
grok { match => { "file" => "author:%{USERNAME:author}-" } }
grok { match => { "file" => "platform:%{USERNAME:platform}-" } }
}
}
Changes to the filename structure are fine.
Answering my own question based on #dan-griffiths comment:
Solution for a file like pr=64,author=mxinden,platform=aws.log is to use the Elasticsearch kv filter like e.g.:
filter {
kv {
source => "file"
field_split => ","
}
}
where file is a field extracted from the filename via the AWS S3 input plugin.

Avoid collision of 'type' key when both event and logstash input provide values

I am trying to write a pipeline to load a file into logstash. My setup requires specifying the type field in the input section to Run multiple independent logstash config files with input,filter and output. Unfortunately the source data already contains the field type and it looks like the value from the source data is conflicting with the value provided from the input configuration.
The source data contains a json array like the following
[
{"key1":"obj1", "type":"looks like a bad choose for a key name"},
{"key1":"obj2", "type":"you can say that again"}
]
My pipeline looks like the following
input {
exec {
command => "cat /path/file_containing_above_json_array.txt"
codec => "json"
type => "typeSpecifiedInInput"
interval => 3600
}
}
output {
if[type] == "typeSpecifiedInInput" {
stdout {
codec => rubydebug
}
}
}
The output never gets called because type has been set to the value provided from the source data instead of the value provided from the input section.
How can I set up the input pipeline to avoid this conflict?
Nathan
Create a new field in your input instead of reusing 'type'. The exec{} input has add_field available.
Below is the final pipeline that uses add_field instead of type. A filter phase was added to clean up the document so that type field contains the expected value needed for writing into ElasticSearch (class of similar documents). The type value from the original JSON document is perserved in the key typeSpecifiedFromDoc The mutate step had to be broken into separate phases so that the replace would not affect type before its original value had been added as the new field typeSpecifiedFromDoc.
input {
exec {
command => "cat /path/file_containing_above_json_array.txt"
codec => "json"
add_field => ["msgType", "typeSpecifiedInInput"]
interval => 3600
}
}
filter {
if[msgType] == "typeSpecifiedInInput" {
mutate {
add_field => ["typeSpecifiedFromDoc", "%{type}"]
}
mutate {
replace => ["type", "%{msgType}"]
remove_field => ["msgType"]
}
}
}
output {
if[type] == "typeSpecifiedInInput" {
stdout {
codec => rubydebug
}
}
}

Logstash Checking Existence of & Parsing Sub-Field

In Logstash I have a JSON payload which I've decoded like so:
filter
{
json
{
source => "payload"
target => "payloadData"
}
}
Now I want to check if payloadData._extraData exists, and then deserialise that too.
I've tried this method, working from the example given in this related question:
filter
{
json
{
source => "payload"
target => "payloadData"
}
if [payloadData._extraData] =~ /.+/
{
sourcce => "payloadData._extraData"
target => "payloadData.extraData"
}
}
But it doesn't do anything (no crash, no error message, just doesn't do anything)
The correct syntax is:
if [payloadData][_extraData] =~ /.+/ { }
Example input:
{"foo":"bar","spam":"eggs","abc":"xyz","one":"two","three":"four","five":"six","seven":{"eight":"nine"}}
Config:
filter {
json { source => message }
if [seven][eight] =~ /.+/ {
# do something
}
}
Apart from that, the code inside your if statement doesn't do anything. You need to specify a filter that should be executed. e.g.:
if [payloadData][_extraData] =~ /.+/ {
json {
source => "payloadData[_extraData]"
target => "payloadData[extraData]"
}
}
What do you want to deserialize in your if statement? The first json filter should recognize nested objects.

Can I use gsub to recursively replace all fieldnames with another field?

After changing my mapping in ElasticSearch to more definitively type the data I am inputting into the system, I have unwittingly made my new variables a nested object. Upon thinking about it more, I actually like the idea of those fields being nested objects because that way I can explicitly know if that src_port statistic is from netflow or from the ASA logs, as an example.
I'd like to use a mutate (gsub, perhaps?) to cause all of my fieldnames for a given type to be renamed to newtype.fieldname. I see that there is gsub which uses a regexp, and rename which takes the literal field name, but I would like to prevent having 30 distinct gsub/rename statements when I will be replacing all of the fields in that type with the "newtype" prefix.
Is there a way to do this?
Here is an example for your reference.
input {
stdin{
type => 'netflow'
}
}
filter {
mutate {
add_field => {"%{type}.message" => "%{message}"}
remove_field => ["message"]
}
}
output {
stdout{
codec => rubydebug
}
}
In this example I have change the message field name to type.message, then delete the origin message field. I think you can use this sample to do what you want.
Hope this can help you.
I have updated my answer!
Use the ruby plugin to do what you want!
Please notice that elasticsearch uses #timestamp field to do index, so I recommend do not change the field name.
input {
stdin{
type => 'netflow'
}
}
filter {
ruby {
code => "
data = event.clone.to_hash;
type = event['type']
data.each do |k,v|
if k != '#timestamp'
newFieldName = type +'.'+ k
event[newFieldName] = v
event.remove(k)
end
end
"
}
}
output {
stdout{
codec => rubydebug
}
}

Resources