I have following pipeline, the requirement is, I need to write "metrics" data to ONE file and EVENT data to another file. I am having two issues with this pipeline.
The file output not creating a "timestamped file" every 30 seconds, instead it is creating a file with this name output%{[#metadata][ts]}.csv and keep on appending the data.
The CSV output, creating a new file with timestamp every 30 seconds, but somehow it is creating an extra file once with the name output%{[#metadata][ts]} and keep on appending meta info to this file.
Can someone please guide me how I can fix this?
input {
beats {
port => 5045
}
}
filter {
ruby {
code => '
event.set("[#metadata][ts]", Time.now.to_i / 30)
event.set("[#metadata][suffix]", "output" + (Time.now.to_i / 30).to_s + ".csv")
'
}
}
filter {
metrics {
meter => [ "code" ]
add_tag => "metric"
clear_interval => 30
flush_interval => 30
}
}
output {
if "metric" in [tags] {
file {
flush_interval => 30
codec => line { format => "%{[code][count]} %{[code][count]}"}
path => "C:/lgstshop/local/csv/output%{[#metadata][ts]}.csv"
}
stdout {
codec => line {
format => "rate: %{[code][count]}"
}
}
}
file {
path => "output.log"
}
csv {
fields => [ "created", "level", "code"]
path => "C:/lgstshop/local/output%{[#metadata][ts]}.evt"
}
}
A metrics filter generates new events in the pipeline. Those events will only go through filters that come after it. Thus the metric events do not have a [#metadata][ts] field, so the sprintf references in the output section are not substituted. Move the ruby filter so that it is after the metrics filter.
If you do not want the metrics sent to the csv then wrap that output with if "metric" not in [tags] { or put it in an else of the existing conditional.
Related
I have a testing environment to test some logstash plugin before to move to production.
For now, I am using kiwi syslog generator, to generate some syslog for testing.
The field I have are as follow:
#timestamp
message
+ elastic medatadata
Starting from this basic fields, I start filtering my data.
The first thing is to add a new field based on the timestamp and message as follow:
input {
syslog {
port => 514
}
}
filter {
prune {
whitelist_names =>["timestamp","message","newfield", "message_count"]
}
mutate {
add_field => {"newfield" => "%{#timestamp}%{message}"}
}
}
The prune is just to don't process unwanted data.
And this works just fine as I am getting a new field with those 2 values.
The next step was to run some aggregation based on specific content of the message, such as if the message contains logged in or logged out
and to do this, I used the aggregation filter
grok {
match => {
"message" => [
"(?<[#metadata][event_type]>logged out)",
"(?<[#metadata][event_type]>logged in)",
"(?<[#metadata][event_type]>workstation locked)"
]
}
}
aggregate {
task_id => "%{message}"
code => "
map['message_count'] ||= 0; map['message_count'] += 1;
"
push_map_as_event_on_timeout => true
timeout_timestamp_field => "#timestamp"
timeout => 60
inactivity_timeout => 50
timeout_tags => ['_aggregatetimeout']
}
}
This worked as expected but I am having a problem here. When the aggregation times out. the only field populated for the specific aggregation, is the message_count
As you can see in the above screenshot, the newfield and message(the one on the total left, sorry it didn't fit in the screenshot) are both empty.
For the demostration and testing purpose that's is absolutely fine, but it will because unmanageable if I get hundreds of syslog per second not knowing to with message that message_count refers to.
Please, I am struggling here and I don't know how to solve this issue, can please somebody help me to understand how I can fill the newfield with the content of the message that it refers to?
This is my whole logstash configuration to make it easier.
input {
syslog {
port => 514
}
}
filter {
prune {
whitelist_names =>["timestamp","message","newfield", "message_count"]
}
mutate {
add_field => {"newfield" => "%{#timestamp}%{message}"}
}
grok {
match => {
"message" => [
"(?<[#metadata][event_type]>logged out)",
"(?<[#metadata][event_type]>logged in)",
"(?<[#metadata][event_type]>workstation locked)"
]
}
}
aggregate {
task_id => "%{message}"
code => "
map['message_count'] ||= 0; map['message_count'] += 1;
"
push_map_as_event_on_timeout => true
timeout_timestamp_field => "#timestamp"
timeout => 60
inactivity_timeout => 50
timeout_tags => ['_aggregatetimeout']
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "logstash_index"
}
stdout {
codec => rubydebug
}
csv {
path => "C:\Users\adminuser\Desktop\syslog\syslogs-%{+yyyy.MM.dd}.csv"
fields => ["timestamp", "message", "message_count", "newfield"]
}
}
push_map_as_event_on_timeout => true
When you use this, and a timeout occurs, it creates a new event using the contents of the map. If you want fields from the original messages to be in the new event then you have to add them to the map. For the task_id there is a shorthand notation to do this using the timeout_task_id_field option on the filter, otherwise you have explicitly add them
map['newfield'] ||= event.get('newfield');
I got a filename in the format <key>:<value>-<key>:<value>.log like e.g. pr:64-author:mxinden-platform:aws.log containing logs of a test run.
I want to stream each line of the file to elasticsearch via logstash. Each line should be treated as a separate document. Each document should get the fields according to the filename. So e.g. for the above example let's say log-line 17-12-07 foo something happened bar would get the fields: pr with value 64, author with value mxinden and platform with value aws.
At the point in time, where I write the logstash configuration I do not know the names of the fields.
How do I dynamically add fields to each line based on the fields contained in the filename?
The static approach so far is:
filter {
mutate { add_field => { "file" => "%{[#metadata][s3][key]}"} }
else {
grok { match => { "file" => "pr:%{NUMBER:pr}-" } }
grok { match => { "file" => "author:%{USERNAME:author}-" } }
grok { match => { "file" => "platform:%{USERNAME:platform}-" } }
}
}
Changes to the filename structure are fine.
Answering my own question based on #dan-griffiths comment:
Solution for a file like pr=64,author=mxinden,platform=aws.log is to use the Elasticsearch kv filter like e.g.:
filter {
kv {
source => "file"
field_split => ","
}
}
where file is a field extracted from the filename via the AWS S3 input plugin.
I send json-formatted logs to my Logstash server. The log looks something like this (Note: The whole message is really on ONE line, but I show it in multi-line to ease reading)
2016-09-01T21:07:30.152Z 153.65.199.92
{
"type":"trm-system",
"host":"susralcent09",
"timestamp":"2016-09-01T17:17:35.018470-04:00",
"#version":"1",
"customer":"cf_cim",
"role":"app_server",
"sourcefile":"/usr/share/tomcat/dist/logs/trm-system.log",
"message":"some message"
}
What do I need to put in my Logstash configuration to get the "sourcefile" value, and ultimately get the filename, e.g., trm-system.log?
If you pump the hash field (w/o the timestamp) into ES it should recognize it.
If you want to do it inside a logstash pipeline you would use the json filter and point the source => to the second part of the line (possibly adding the timestamp prefix back in).
This results in all fields added to the current message, and you can access them directly or all combined:
Config:
input { stdin { } }
filter {
# split line in Timestamp and Json
grok { match => [ message , "%{NOTSPACE:ts} %{NOTSPACE:ip} %{GREEDYDATA:js}"] }
# parse json part (called "js") and add new field from above
json { source => "js" }
}
output {
# stdout { codec => rubydebug }
# you access fields directly with %{fieldname}:
stdout { codec => line { format => "sourcefile: %{sourcefile}"} }
}
Sample run
2016-09-01T21:07:30.152Z 153.65.199.92 { "sourcefile":"/usr" }
sourcefile: /usr
and with rubydebug (host and #timestamp removed):
{
"message" => "2016-09-01T21:07:30.152Z 153.65.199.92 { \"sourcefile\":\"/usr\" }",
"#version" => "1",
"ts" => "2016-09-01T21:07:30.152Z",
"ip" => "153.65.199.92",
"js" => "{ \"sourcefile\":\"/usr\" }",
"sourcefile" => "/usr"
}
As you can see, the field sourcefile is directly known with the value in the rubydebug output.
Depending on the source of your log records you might need to use the multiline codec as well. You might also want to delete the js field, rename the #timestamp to _parsedate and parse ts into the records timestamp (for Kibana to be happy). This is not shown in the sample. I would also remove message to save space.
I am trying to write a pipeline to load a file into logstash. My setup requires specifying the type field in the input section to Run multiple independent logstash config files with input,filter and output. Unfortunately the source data already contains the field type and it looks like the value from the source data is conflicting with the value provided from the input configuration.
The source data contains a json array like the following
[
{"key1":"obj1", "type":"looks like a bad choose for a key name"},
{"key1":"obj2", "type":"you can say that again"}
]
My pipeline looks like the following
input {
exec {
command => "cat /path/file_containing_above_json_array.txt"
codec => "json"
type => "typeSpecifiedInInput"
interval => 3600
}
}
output {
if[type] == "typeSpecifiedInInput" {
stdout {
codec => rubydebug
}
}
}
The output never gets called because type has been set to the value provided from the source data instead of the value provided from the input section.
How can I set up the input pipeline to avoid this conflict?
Nathan
Create a new field in your input instead of reusing 'type'. The exec{} input has add_field available.
Below is the final pipeline that uses add_field instead of type. A filter phase was added to clean up the document so that type field contains the expected value needed for writing into ElasticSearch (class of similar documents). The type value from the original JSON document is perserved in the key typeSpecifiedFromDoc The mutate step had to be broken into separate phases so that the replace would not affect type before its original value had been added as the new field typeSpecifiedFromDoc.
input {
exec {
command => "cat /path/file_containing_above_json_array.txt"
codec => "json"
add_field => ["msgType", "typeSpecifiedInInput"]
interval => 3600
}
}
filter {
if[msgType] == "typeSpecifiedInInput" {
mutate {
add_field => ["typeSpecifiedFromDoc", "%{type}"]
}
mutate {
replace => ["type", "%{msgType}"]
remove_field => ["msgType"]
}
}
}
output {
if[type] == "typeSpecifiedInInput" {
stdout {
codec => rubydebug
}
}
}
I am using Logstash to get the log from a url using http_poller. This works fine. The problem I have is that the log that gets received does not get send to Elastic Search in the right way. I tried splitting the result in different events but the only event that shows in Kibana is the last event from the log. Since I am pulling the log every 2 minutes, a lot of log information gets lost this way.
The input is like this:
input {
http_poller {
urls => {
logger1 => {
method => get
url => "http://servername/logdirectory/thislog.log"
}
}
keepalive => true
automatic_retries => 0
# Check the site every 2 minutes
interval => 120
request_timeout => 110
# Wait no longer than 110 seconds for the request to complete
# Store metadata about the request in this field
metadata_target => http_poller_metadata
type => 'log4j'
codec => "json"
# important tag settings
tags => stackoverflow
}
}
I then use a filter to add some fields and to split the logs
filter {
if "stackoverflow" in [tags] {
split {
terminator => "\n"
}
mutate {
add_field => {
"Application" => "app-stackoverflow"
"Environment" => "Acceptation"
}
}
}
}
The output then gets send to the Kibana server using the following output conf
output {
redis {
host => "kibanaserver.internal.com"
data_type => "list"
key => "logstash N"
}
}
Any suggestions why not all the events are stored in Kibana?