logstash prune plugin causes json to become text (/ruby objects) - logstash

I'm using logstash that outputs to coralogix. It works fine, unless i use the prune plugin to whitelist fields. once i use the prune plugin, it outputs text/string instead of a json object.
the used code:
input {
beats {
port => 5000
}
}
filter {
if [should_prune] {
prune {
whitelist_names => [
"^test$",
"^#timestamp$",
"^tags$"
]
}
add_tag => [ "pruned" ]
}
}
output {
coralogix_logger {
config_params => {
"PRIVATE_KEY" => "********"
"APP_NAME" => "myawesomeapp"
"SUB_SYSTEM" => "subapp"
}
is_json => true
}
}
example of output is:
{"#timestamp"=>2019-07-15T06:47:57.364Z, "tags"=>["pruned"], "test"=>"ok"}
instead of:
{"#timestamp":2019-07-15T06:47:57.364Z, "tags":["pruned"], "test":"ok"}
any logs that are not pruned (in this scenario, doesn't contain the should_prune field), are passed just fine.
any ideas?
thanks!

Apparently my issues was not related to the prune plugin, it was related to the plugin I used to output my logs (in my case - coralogix). had to update my output plugin (https://github.com/coralogix/logstash-output-coralogix).
if you encounter this issue and not using coralogix, another method to resolve this can be using the "json_encode" plugin https://www.elastic.co/guide/en/logstash/current/plugins-filters-json_encode.html
generally speaking, it should work out of the box - just like the configuration i used in the initial question.

Related

Logstash filter according to fields source declare in filebeat config

input {
beats
{
port => 5042
}
}
output {
if [source] == "access"
{
elasticsearch {
hosts => ["16.113.56.102:9200"]
index => "logstsh-access-nginxlogs-%{+YYYY.MM.dd}"
}
}
else if [source] == "error"
{
elasticsearch {
hosts => ["16.113.56.102:9200"]
index => "logstsh-error-nginxlogs-%{+YYYY.MM.dd}"
}
}
}
I would like to separate the log file using the fields source and that declared at filebeat input, for the, from kibana side, the log have the source either is access/error, however, the logstash won't pass the log to elastic search, I'm wondering is it the right way to declare the source? I try to use the absolute in the input part, it works like a charm, so I think the issues is with the input filebeat or logstash.
First of all, I recommend you use one output if you don't need to send the data to multiple elasticsearch clusters. For example:
output {
elasticsearch {
hosts => ["16.113.56.102:9200"]
index => "logstash-%{source}-nginxlogs-%{+YYYY.MM.dd}"
}
}
}
In filebeat, you can add fields into each documents with the processor
processors:
- add_fields:
target: project
fields:
name: myproject
id: '574734885120952459'
Ref: https://www.elastic.co/guide/en/beats/filebeat/current/add-fields.html
Note: to understand if the problem because of Logstash you can use the following method.
Why are the logs not indexed in the elasticsearch logstash structure I designed?

Logstash metric filter for log-level

Can someone please help me with my metric filter. i want to set up logstash to check the log-level= Error for every 5s and if log-level = ERROR exceeds more than 1 , should send an email. i am using logstash 2.2.4
input {
file {
path => "/var/log/logstash/example"
start_position => beginning
}
}
filter {
grok{
match => { "message" => "\[%{TIMESTAMP_ISO8601:timestamp}\]\[%{LOGLEVEL:log-level}\s*\]" }
}
if [log-level] == "ERROR" {
metrics {
meter => [ "log-level" ]
flush_interval => 5
clear_interval => 5
}
}
}
output {
if [log-level] == "ERROR" {
if [log-level][count] < 1 {
email {
port => 25
address => "mail.abc.com"
authentication => "login"
use_tls => true
from => "alerts#logstash.com"
subject => "logstash alert"
to => "siya#abc.com"
via => "smtp"
body => "here is the event line %{message}"
debug => true
}
}
}
}
Editorial:
I am not a fan of the metrics {} filter, because it breaks assumptions. Logstash is multi-threaded, and metrics is one of the filters that only keeps state within its thread. If you use it, you need to be aware that if you're running 4 pipeline workers, you have 4 independent threads keeping their own state. This breaks the assumption that all events coming "into logstash" will be counted "by the metrics filter".
For your use-case, I'd recommend not using Logstash to issue this email, and instead rely on an external polling mechanism that hits your backing stores.
Because this is the metrics filter, I highly recommend you set your number of filter-workers to 1. This is the -w command-line option when logstash starts. You'll lose parallelism, but you'll gain the ability for a single filter to see all events. If you don't, you can get cases where all, say, 6 threads each see an ERROR event; and you will get six emails.
Your config could use some updates. It's recommended to add a tag or something to the metrics {} filter.
metrics {
meter => [ "log-level" ]
flush_interval => 5
clear_interval => 5
add_tag => "error_metric"
}
}
This way, you can better filter your email segment.
output {
if [tags] include "error_metric" and [log-level][count] > 1 {
email {
}
}
}
This is because the metrics {} filter creates a new event when it flushes, rather than amending an existing one. You need to catch the new event with your filters.

How to call another filter from within a ruby filter in logstash.

I'm building out logstash and would like to build functionality to anonymize fields as specified in the message.
Given the message below, the field fta is a list of fields to anonymize. I would like to just use %{fta} and pass it through to the anonymize filter, but that doesn't seem to work.
{ "containsPII":"True", "fta":["f1","f2"], "f1":"test", "f2":"5551212" }
My config is as follows
input {
stdin { codec => json }
}
filter {
if [containsPII] {
anonymize {
algorithm => "SHA1"
key => "123456789"
fields => %{fta}
}
}
}
output {
stdout {
codec => rubydebug
}
}
The output is
{
"containsPII" => "True",
"fta" => [
[0] "f1",
[1] "f2"
],
"f1" => "test",
"f2" => "5551212",
"#version" => "1",
"#timestamp" => "2016-07-13T22:07:04.036Z",
"host" => "..."
}
Does anyone have any thoughts? I have tried several permutations at this point with no luck.
Thanks,
-D
EDIT:
After posting in the Elastic forums, I found out that this is not possible using base logstash functionality. I will try using the ruby filter instead. So, to ammend my question, How do I call another filter from within the ruby filter? I tried the following with no luck and honestly can't even figure out where to look. I'm very new to ruby.
filter {
if [containsPII] {
ruby {
code => "event['fta'].each { |item| event[item] = LogStash::Filters::Anonymize.execute(event[item],'12345','SHA1') }"
add_tag => ["Rubyrun"]
}
}
}
You can execute the filters from ruby script. Steps will be:
Create the required filter instance in the init block of inline ruby script.
For every event call the filter method of the filter instance.
Following is the example for above problem statement. It will replace my_ip field in event with its SHA1.
Same can be achieved using ruby script file.
Following is the sample config file.
input { stdin { codec => json_lines } }
filter {
ruby {
init => "
require 'logstash/filters/anonymize'
# Create instance of filter with applicable parameters
#anonymize = LogStash::Filters::Anonymize.new({'algorithm' => 'SHA1',
'key' => '123456789',
'fields' => ['my_ip']})
# Make sure to call register
#anonymize.register
"
code => "
# Invoke the filter
#anonymize.filter(event)
"
}
}
output { stdout { codec => rubydebug {metadata => true} } }
Well, I wasn't able to figure out how to call another filter from within a ruby filter, but I did get to the functional goal.
filter {
if [fta] {
ruby {
init => "require 'openssl'"
code => "event['fta'].each { |item| event[item] = OpenSSL::HMAC.hexdigest(OpenSSL::Digest::SHA256.new, '123456789', event[item] ) }"
}
}
}
If the field FTA exists, it will SHA2 encode each of the fields listed in that array.

Avoid collision of 'type' key when both event and logstash input provide values

I am trying to write a pipeline to load a file into logstash. My setup requires specifying the type field in the input section to Run multiple independent logstash config files with input,filter and output. Unfortunately the source data already contains the field type and it looks like the value from the source data is conflicting with the value provided from the input configuration.
The source data contains a json array like the following
[
{"key1":"obj1", "type":"looks like a bad choose for a key name"},
{"key1":"obj2", "type":"you can say that again"}
]
My pipeline looks like the following
input {
exec {
command => "cat /path/file_containing_above_json_array.txt"
codec => "json"
type => "typeSpecifiedInInput"
interval => 3600
}
}
output {
if[type] == "typeSpecifiedInInput" {
stdout {
codec => rubydebug
}
}
}
The output never gets called because type has been set to the value provided from the source data instead of the value provided from the input section.
How can I set up the input pipeline to avoid this conflict?
Nathan
Create a new field in your input instead of reusing 'type'. The exec{} input has add_field available.
Below is the final pipeline that uses add_field instead of type. A filter phase was added to clean up the document so that type field contains the expected value needed for writing into ElasticSearch (class of similar documents). The type value from the original JSON document is perserved in the key typeSpecifiedFromDoc The mutate step had to be broken into separate phases so that the replace would not affect type before its original value had been added as the new field typeSpecifiedFromDoc.
input {
exec {
command => "cat /path/file_containing_above_json_array.txt"
codec => "json"
add_field => ["msgType", "typeSpecifiedInInput"]
interval => 3600
}
}
filter {
if[msgType] == "typeSpecifiedInInput" {
mutate {
add_field => ["typeSpecifiedFromDoc", "%{type}"]
}
mutate {
replace => ["type", "%{msgType}"]
remove_field => ["msgType"]
}
}
}
output {
if[type] == "typeSpecifiedInInput" {
stdout {
codec => rubydebug
}
}
}

Extracting fields from paths in logstash

I am configuring logstash to collect logs from multiple workers on multiple hosts. I'm currently adding fields for host:
input {
file {
path => "/data/logs/box-1/worker-*.log"
add_field => {
"original_host" => "box-1"
}
}
file {
path => "/data/logs/box-2/worker-*.log"
add_field => {
"original_host" => "box-2"
}
}
However, I'd also like to add a field {'worker': 'A'} and so on. I have lots of workers, so I don't want to write a file { ... } block for every combination of host and worker.
Do I have any alternatives?
You should be able to do a path => "/data/logs/*/worker-*.log" and then add a grok filter to pull out what you need.
filter { grok { match => [ "path", "/(?<original_host>[^/]+)/worker-(?<worker>.*).log" ] } }
or something very close to that.... might want to surround it with if [path] =~ /worker/ depending on what else you have in your config file.

Resources