I got a filename in the format <key>:<value>-<key>:<value>.log like e.g. pr:64-author:mxinden-platform:aws.log containing logs of a test run.
I want to stream each line of the file to elasticsearch via logstash. Each line should be treated as a separate document. Each document should get the fields according to the filename. So e.g. for the above example let's say log-line 17-12-07 foo something happened bar would get the fields: pr with value 64, author with value mxinden and platform with value aws.
At the point in time, where I write the logstash configuration I do not know the names of the fields.
How do I dynamically add fields to each line based on the fields contained in the filename?
The static approach so far is:
filter {
mutate { add_field => { "file" => "%{[#metadata][s3][key]}"} }
else {
grok { match => { "file" => "pr:%{NUMBER:pr}-" } }
grok { match => { "file" => "author:%{USERNAME:author}-" } }
grok { match => { "file" => "platform:%{USERNAME:platform}-" } }
}
}
Changes to the filename structure are fine.
Answering my own question based on #dan-griffiths comment:
Solution for a file like pr=64,author=mxinden,platform=aws.log is to use the Elasticsearch kv filter like e.g.:
filter {
kv {
source => "file"
field_split => ","
}
}
where file is a field extracted from the filename via the AWS S3 input plugin.
Related
I have a json event coming into logstash that looks like (values are different in each event):
{
"metadata":{
"app_type":"Foo",
"app_namespace":"System.Bar"
"json_string":"{\"test\":true}"
}
}
I would like my logstash filter to output the following to elasticsearch (i'm not worried about trimming the existing fields):
{
"app_event":{
"foo":{
"system_bar":{
"test":true
}
}
}
}
I cannot figure out the syntax for how to use field values as the names of new fields.
I tried a few different ways, all resulted in a config error when starting my logstash instance. I'm only including the filter block of logstash.conf in my examples, the rest of the config is working as expected.
1.
# note this one does not have replacement and lowercase values I want
filter {
json {
source => "[metadata][json_string]"
target => "[app_event][%{[metadata][app_type]}][%{[metadata][app_namespace]}]"
}
}
I tried to create variables with values I need and then use those in the target
filter {
appType = "%{[metadata][instrument_type]}".downcase
appNamespace = "%{[metadata][app_namespace]}".downcase
appNamespace.gsub(".", "_")
json {
source => "[metadata][json_string]"
target => "[app_event][#{appType}][#{appNamespace}]"
}
}
Just to confirm that everything else was set up correctly this does work, but does not include the dynamically generated field structure I'm looking for.
filter {
json {
source => "[metadata][json_string]"
target => "[app_event]"
}
}
The json filter does not sprintf the value of target, so you cannot use a json filter in those ways. To make the field name variable you must use a ruby filter. Try
json { source => "message" target => "[#metadata][json1]" remove_field => [ "message" ] }
json { source => "[#metadata][json1][metadata][json_string]" target => "[#metadata][json2]" }
ruby {
code => '
namespace = event.get("[#metadata][json1][metadata][app_namespace]")
if namespace
namespace = namespace.downcase.sub("\.", "_")
end
type = event.get("[#metadata][json1][metadata][app_type]")
if type
type = type.downcase
end
value = event.get("[#metadata][json2]")
if type and namespace and value
event.set("app_event", { "#{type}" => { "#{namespace}" => value } } )
end
'
}
I am finding that logstash is not a fan of my filter. Would be nice to have a second set of eyes on it.
First - my log file - has the following entries with new lines for every volume.
/vol/vol0/ 298844160 6916836 291927324 2% /vol/vol0/
My config file looks as follows:
INPUT
file {
type => "testing"
path => "/opt/log_repo/ssh/netapp/*"
tags => "netapp"
start_position => "beginning"
sincedb_path => "/dev/null"
}
FILTER
if [type] == "testing" {
grok{
match => [ "#message", "{UNIXPATH:volume}%{SPACE}%{INT:total}%{SPACE}%{INT:used}%{SPACE}%{INT:avail}%{SPACE}%{PROG:cap}%{SPACE}%{UNIXPATH:vols}"]
}
}
OUTPUT
if [type] == "testing" {
elasticsearch {
action => "index"
hosts => ["http://localhost:9200"]
index => ["testing4-%{+YYYY.MM.dd}"]
}
}
When I run this it tells me it has a bad config file. If I change the filter to:
match => [ "#message", "{UNIXPATH:volume}" ]
It creates a new field called volume with the volume name. I am using the space component because the log is simply not consistent. Some volumes will have 4 spaces between the usable space and some will have more or less depending on the volume name and the size.
To get to this configuration i leveraged the following sites:
https://grokdebug.herokuapp.com/discover?#
http://grokconstructor.appspot.com/do/constructionstep
Still struggling on what I am missing.... Any help would be greatly appreciated.
UPDATE: After adding the recommendation below it still doesn't create a new field.
_index string
message string
type string
tags string
path string
#timestamp date
#version string
host string
_source _source
_id string
_type string
_score
Your pattern doesn't matrch the sample log for a very simple and silly reason - you are missing % at the start of your pattern. If you will add it then it works like a charm:
So the full filter is:
if [type] == "testing" {
grok{
match => [ "#message", "%{UNIXPATH:volume}%{SPACE}%{INT:total}%{SPACE}%{INT:used}%{SPACE}%{INT:avail}%{SPACE}%{PROG:cap}%{SPACE}%{UNIXPATH:vols}"]
}
}
I am trying to write a pipeline to load a file into logstash. My setup requires specifying the type field in the input section to Run multiple independent logstash config files with input,filter and output. Unfortunately the source data already contains the field type and it looks like the value from the source data is conflicting with the value provided from the input configuration.
The source data contains a json array like the following
[
{"key1":"obj1", "type":"looks like a bad choose for a key name"},
{"key1":"obj2", "type":"you can say that again"}
]
My pipeline looks like the following
input {
exec {
command => "cat /path/file_containing_above_json_array.txt"
codec => "json"
type => "typeSpecifiedInInput"
interval => 3600
}
}
output {
if[type] == "typeSpecifiedInInput" {
stdout {
codec => rubydebug
}
}
}
The output never gets called because type has been set to the value provided from the source data instead of the value provided from the input section.
How can I set up the input pipeline to avoid this conflict?
Nathan
Create a new field in your input instead of reusing 'type'. The exec{} input has add_field available.
Below is the final pipeline that uses add_field instead of type. A filter phase was added to clean up the document so that type field contains the expected value needed for writing into ElasticSearch (class of similar documents). The type value from the original JSON document is perserved in the key typeSpecifiedFromDoc The mutate step had to be broken into separate phases so that the replace would not affect type before its original value had been added as the new field typeSpecifiedFromDoc.
input {
exec {
command => "cat /path/file_containing_above_json_array.txt"
codec => "json"
add_field => ["msgType", "typeSpecifiedInInput"]
interval => 3600
}
}
filter {
if[msgType] == "typeSpecifiedInInput" {
mutate {
add_field => ["typeSpecifiedFromDoc", "%{type}"]
}
mutate {
replace => ["type", "%{msgType}"]
remove_field => ["msgType"]
}
}
}
output {
if[type] == "typeSpecifiedInInput" {
stdout {
codec => rubydebug
}
}
}
I currently have a file name like this.
[SERIALNUMBER][2014_12_04][00_45_22][141204T014214]AB_DEF.log
i basically want to extract the year from the file (2014) and add it to the index name in logstash conf file.logstash.conf
Below is my conf file.
input {
file {
path => "C:/ABC/DEF/HJK/LOGS/**/*"
start_position => beginning
type => syslog
}
}
filter {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}"}
}
}
output {
elasticsearch {
index => "type1logs"
}
stdout {}
}
Please help.
thanks
IIRC, you get a field called 'path'. You can then run another grok{} filter, using 'path' as input and extracting the data you want.
Based on your use of COMBINEDAPACHELOG in your existing config, your log entries already have the date. It's a more common practice to use this timestamp field in the date{} filter to change the #timestamp field. Then, the default elasticsearch{} output config will create an index called "logstash-YYYY.MM.DD".
Having daily indexes like this (usually) makes data retention easier.
After changing my mapping in ElasticSearch to more definitively type the data I am inputting into the system, I have unwittingly made my new variables a nested object. Upon thinking about it more, I actually like the idea of those fields being nested objects because that way I can explicitly know if that src_port statistic is from netflow or from the ASA logs, as an example.
I'd like to use a mutate (gsub, perhaps?) to cause all of my fieldnames for a given type to be renamed to newtype.fieldname. I see that there is gsub which uses a regexp, and rename which takes the literal field name, but I would like to prevent having 30 distinct gsub/rename statements when I will be replacing all of the fields in that type with the "newtype" prefix.
Is there a way to do this?
Here is an example for your reference.
input {
stdin{
type => 'netflow'
}
}
filter {
mutate {
add_field => {"%{type}.message" => "%{message}"}
remove_field => ["message"]
}
}
output {
stdout{
codec => rubydebug
}
}
In this example I have change the message field name to type.message, then delete the origin message field. I think you can use this sample to do what you want.
Hope this can help you.
I have updated my answer!
Use the ruby plugin to do what you want!
Please notice that elasticsearch uses #timestamp field to do index, so I recommend do not change the field name.
input {
stdin{
type => 'netflow'
}
}
filter {
ruby {
code => "
data = event.clone.to_hash;
type = event['type']
data.each do |k,v|
if k != '#timestamp'
newFieldName = type +'.'+ k
event[newFieldName] = v
event.remove(k)
end
end
"
}
}
output {
stdout{
codec => rubydebug
}
}