Logstash aggregation return empty message - logstash

I have a testing environment to test some logstash plugin before to move to production.
For now, I am using kiwi syslog generator, to generate some syslog for testing.
The field I have are as follow:
#timestamp
message
+ elastic medatadata
Starting from this basic fields, I start filtering my data.
The first thing is to add a new field based on the timestamp and message as follow:
input {
syslog {
port => 514
}
}
filter {
prune {
whitelist_names =>["timestamp","message","newfield", "message_count"]
}
mutate {
add_field => {"newfield" => "%{#timestamp}%{message}"}
}
}
The prune is just to don't process unwanted data.
And this works just fine as I am getting a new field with those 2 values.
The next step was to run some aggregation based on specific content of the message, such as if the message contains logged in or logged out
and to do this, I used the aggregation filter
grok {
match => {
"message" => [
"(?<[#metadata][event_type]>logged out)",
"(?<[#metadata][event_type]>logged in)",
"(?<[#metadata][event_type]>workstation locked)"
]
}
}
aggregate {
task_id => "%{message}"
code => "
map['message_count'] ||= 0; map['message_count'] += 1;
"
push_map_as_event_on_timeout => true
timeout_timestamp_field => "#timestamp"
timeout => 60
inactivity_timeout => 50
timeout_tags => ['_aggregatetimeout']
}
}
This worked as expected but I am having a problem here. When the aggregation times out. the only field populated for the specific aggregation, is the message_count
As you can see in the above screenshot, the newfield and message(the one on the total left, sorry it didn't fit in the screenshot) are both empty.
For the demostration and testing purpose that's is absolutely fine, but it will because unmanageable if I get hundreds of syslog per second not knowing to with message that message_count refers to.
Please, I am struggling here and I don't know how to solve this issue, can please somebody help me to understand how I can fill the newfield with the content of the message that it refers to?
This is my whole logstash configuration to make it easier.
input {
syslog {
port => 514
}
}
filter {
prune {
whitelist_names =>["timestamp","message","newfield", "message_count"]
}
mutate {
add_field => {"newfield" => "%{#timestamp}%{message}"}
}
grok {
match => {
"message" => [
"(?<[#metadata][event_type]>logged out)",
"(?<[#metadata][event_type]>logged in)",
"(?<[#metadata][event_type]>workstation locked)"
]
}
}
aggregate {
task_id => "%{message}"
code => "
map['message_count'] ||= 0; map['message_count'] += 1;
"
push_map_as_event_on_timeout => true
timeout_timestamp_field => "#timestamp"
timeout => 60
inactivity_timeout => 50
timeout_tags => ['_aggregatetimeout']
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "logstash_index"
}
stdout {
codec => rubydebug
}
csv {
path => "C:\Users\adminuser\Desktop\syslog\syslogs-%{+yyyy.MM.dd}.csv"
fields => ["timestamp", "message", "message_count", "newfield"]
}
}

push_map_as_event_on_timeout => true
When you use this, and a timeout occurs, it creates a new event using the contents of the map. If you want fields from the original messages to be in the new event then you have to add them to the map. For the task_id there is a shorthand notation to do this using the timeout_task_id_field option on the filter, otherwise you have explicitly add them
map['newfield'] ||= event.get('newfield');

Related

In logstash config file, How to calculate the difference between two UNIX format date fields?

This is the filter part in my logstash config file.
filter {
mutate {
split => ["message", "|"]
add_field => {
"start_time" => "%{[message][1]}"
"end_time" => "%{[message][2]}"
"channel" => "%{[message][5]}"
"[range_time][gte]" => "%{[message][1]}"
"[range_time][lte]" => "%{[message][2]}"
# "duration" => "%{[end_time]-[start_time]}"
}
# remove_field => ["message"]
}
date {
match => ["start_time", "yyyyMMddHHmmss"]
target => "start_time"
}
date {
match => ["end_time", "yyyyMMddHHmmss"]
target => "end_time"
}
ruby {
code =>
"
event.set('start_time', event.get('start_time').to_i)
event.set('end_time', event.get('end_time').to_i)
"
}
mutate {
remove_field => ["message", "#timestamp"]
}
ruby {
init => "require 'time'"
code => "event['duration'] = event['end_time'] - event['start_time'];"
}
In the end, I wanna create a new field named duration to represent the difference between end_time and start_time.
Obviously, the last ruby part was wrong. How could I write for this part?
To start you must make sure that the field you want to put the duration in exists prior to your setting of the value.
Make sure you add the field up front.
As it will be numeric you could do it like this
mutate {
add_field {
"duration" => 0
}
}
After this you can calculate the value and set it using ruby
ruby {
code => "event.set('duration', event.get('end_time').to_i - event.get('start_time').to_i)"
}

Logstash metric filter for log-level

Can someone please help me with my metric filter. i want to set up logstash to check the log-level= Error for every 5s and if log-level = ERROR exceeds more than 1 , should send an email. i am using logstash 2.2.4
input {
file {
path => "/var/log/logstash/example"
start_position => beginning
}
}
filter {
grok{
match => { "message" => "\[%{TIMESTAMP_ISO8601:timestamp}\]\[%{LOGLEVEL:log-level}\s*\]" }
}
if [log-level] == "ERROR" {
metrics {
meter => [ "log-level" ]
flush_interval => 5
clear_interval => 5
}
}
}
output {
if [log-level] == "ERROR" {
if [log-level][count] < 1 {
email {
port => 25
address => "mail.abc.com"
authentication => "login"
use_tls => true
from => "alerts#logstash.com"
subject => "logstash alert"
to => "siya#abc.com"
via => "smtp"
body => "here is the event line %{message}"
debug => true
}
}
}
}
Editorial:
I am not a fan of the metrics {} filter, because it breaks assumptions. Logstash is multi-threaded, and metrics is one of the filters that only keeps state within its thread. If you use it, you need to be aware that if you're running 4 pipeline workers, you have 4 independent threads keeping their own state. This breaks the assumption that all events coming "into logstash" will be counted "by the metrics filter".
For your use-case, I'd recommend not using Logstash to issue this email, and instead rely on an external polling mechanism that hits your backing stores.
Because this is the metrics filter, I highly recommend you set your number of filter-workers to 1. This is the -w command-line option when logstash starts. You'll lose parallelism, but you'll gain the ability for a single filter to see all events. If you don't, you can get cases where all, say, 6 threads each see an ERROR event; and you will get six emails.
Your config could use some updates. It's recommended to add a tag or something to the metrics {} filter.
metrics {
meter => [ "log-level" ]
flush_interval => 5
clear_interval => 5
add_tag => "error_metric"
}
}
This way, you can better filter your email segment.
output {
if [tags] include "error_metric" and [log-level][count] > 1 {
email {
}
}
}
This is because the metrics {} filter creates a new event when it flushes, rather than amending an existing one. You need to catch the new event with your filters.

How to call another filter from within a ruby filter in logstash.

I'm building out logstash and would like to build functionality to anonymize fields as specified in the message.
Given the message below, the field fta is a list of fields to anonymize. I would like to just use %{fta} and pass it through to the anonymize filter, but that doesn't seem to work.
{ "containsPII":"True", "fta":["f1","f2"], "f1":"test", "f2":"5551212" }
My config is as follows
input {
stdin { codec => json }
}
filter {
if [containsPII] {
anonymize {
algorithm => "SHA1"
key => "123456789"
fields => %{fta}
}
}
}
output {
stdout {
codec => rubydebug
}
}
The output is
{
"containsPII" => "True",
"fta" => [
[0] "f1",
[1] "f2"
],
"f1" => "test",
"f2" => "5551212",
"#version" => "1",
"#timestamp" => "2016-07-13T22:07:04.036Z",
"host" => "..."
}
Does anyone have any thoughts? I have tried several permutations at this point with no luck.
Thanks,
-D
EDIT:
After posting in the Elastic forums, I found out that this is not possible using base logstash functionality. I will try using the ruby filter instead. So, to ammend my question, How do I call another filter from within the ruby filter? I tried the following with no luck and honestly can't even figure out where to look. I'm very new to ruby.
filter {
if [containsPII] {
ruby {
code => "event['fta'].each { |item| event[item] = LogStash::Filters::Anonymize.execute(event[item],'12345','SHA1') }"
add_tag => ["Rubyrun"]
}
}
}
You can execute the filters from ruby script. Steps will be:
Create the required filter instance in the init block of inline ruby script.
For every event call the filter method of the filter instance.
Following is the example for above problem statement. It will replace my_ip field in event with its SHA1.
Same can be achieved using ruby script file.
Following is the sample config file.
input { stdin { codec => json_lines } }
filter {
ruby {
init => "
require 'logstash/filters/anonymize'
# Create instance of filter with applicable parameters
#anonymize = LogStash::Filters::Anonymize.new({'algorithm' => 'SHA1',
'key' => '123456789',
'fields' => ['my_ip']})
# Make sure to call register
#anonymize.register
"
code => "
# Invoke the filter
#anonymize.filter(event)
"
}
}
output { stdout { codec => rubydebug {metadata => true} } }
Well, I wasn't able to figure out how to call another filter from within a ruby filter, but I did get to the functional goal.
filter {
if [fta] {
ruby {
init => "require 'openssl'"
code => "event['fta'].each { |item| event[item] = OpenSSL::HMAC.hexdigest(OpenSSL::Digest::SHA256.new, '123456789', event[item] ) }"
}
}
}
If the field FTA exists, it will SHA2 encode each of the fields listed in that array.

Logstash http_poller only shows last logmessage in Kibana

I am using Logstash to get the log from a url using http_poller. This works fine. The problem I have is that the log that gets received does not get send to Elastic Search in the right way. I tried splitting the result in different events but the only event that shows in Kibana is the last event from the log. Since I am pulling the log every 2 minutes, a lot of log information gets lost this way.
The input is like this:
input {
http_poller {
urls => {
logger1 => {
method => get
url => "http://servername/logdirectory/thislog.log"
}
}
keepalive => true
automatic_retries => 0
# Check the site every 2 minutes
interval => 120
request_timeout => 110
# Wait no longer than 110 seconds for the request to complete
# Store metadata about the request in this field
metadata_target => http_poller_metadata
type => 'log4j'
codec => "json"
# important tag settings
tags => stackoverflow
}
}
I then use a filter to add some fields and to split the logs
filter {
if "stackoverflow" in [tags] {
split {
terminator => "\n"
}
mutate {
add_field => {
"Application" => "app-stackoverflow"
"Environment" => "Acceptation"
}
}
}
}
The output then gets send to the Kibana server using the following output conf
output {
redis {
host => "kibanaserver.internal.com"
data_type => "list"
key => "logstash N"
}
}
Any suggestions why not all the events are stored in Kibana?

How to get metric plugin working in logstash

I am failing to understand how to print the metric.
With following logstash config
input {
generator {
type => "generated"
}
}
filter {
metrics {
type => "generated"
meter => "events"
add_tag => "metric"
}
}
output {
stdout {
tags => "metric"
message => "rate: %{events.rate_1m}"
}
}
all I see is
rate: %{events.rate_1m}
rate: %{events.rate_1m}
instead of actual value.
When I enable debug in stdout I see that #fileds have
the data that metric is support to print.
"#fields" => {
"events.count" => 114175,
"events.rate_1m" => 6478.26368594885,
"events.rate_5m" => 5803.767865770155,
"events.rate_15m" => 5686.915084346328
},
How do I access #fields.events.count?
logstash version = 1.1.13
It looks like a known issue in logstash 1.1.13 and lower.
One need to escape '.' in %{events.rate_1m} as %{events\.rate_1m}
Details are in this logstash JIRA

Resources