Extracting fields from paths in logstash - logstash

I am configuring logstash to collect logs from multiple workers on multiple hosts. I'm currently adding fields for host:
input {
file {
path => "/data/logs/box-1/worker-*.log"
add_field => {
"original_host" => "box-1"
}
}
file {
path => "/data/logs/box-2/worker-*.log"
add_field => {
"original_host" => "box-2"
}
}
However, I'd also like to add a field {'worker': 'A'} and so on. I have lots of workers, so I don't want to write a file { ... } block for every combination of host and worker.
Do I have any alternatives?

You should be able to do a path => "/data/logs/*/worker-*.log" and then add a grok filter to pull out what you need.
filter { grok { match => [ "path", "/(?<original_host>[^/]+)/worker-(?<worker>.*).log" ] } }
or something very close to that.... might want to surround it with if [path] =~ /worker/ depending on what else you have in your config file.

Related

How to create multiple indexes in logstash.conf file with common host

I am pretty new to logstash.
In our application we are creating multiple indexes, from the below thread i could understand how to resolve that
How to create multiple indexes in logstash.conf file?
but that results in many duplicate lines in the conf file (for host, ssl, etc.). So i wanted to check if there is any better way of doing it?
output {
stdout {codec => rubydebug}
if [type] == "trial" {
elasticsearch {
hosts => "localhost:9200"
index => "trial_indexer"
}
} else {
elasticsearch {
hosts => "localhost:9200"
index => "movie_indexer"
}
}
Instead of above config, can i have something like below?
output {
stdout {codec => rubydebug}
elasticsearch {
hosts => "localhost:9200"
}
if [type] == "trial" {
elasticsearch {
index => "trial_indexer"
}
} else {
elasticsearch {
index => "movie_indexer"
}
}
What you are looking for is using Environment Variables in logstash pipeline. You define this once, and can use same redundant values like you said for HOST, SSL etc.
For more information Logstash Use Environmental Variables
e.g.,
output {
elasticsearch{
hosts => ${ES_HOST}
index => "%{type}-indexer"
}
}
Let me know, if that helps.

Logstash: Dynamic field names based on filename

I got a filename in the format <key>:<value>-<key>:<value>.log like e.g. pr:64-author:mxinden-platform:aws.log containing logs of a test run.
I want to stream each line of the file to elasticsearch via logstash. Each line should be treated as a separate document. Each document should get the fields according to the filename. So e.g. for the above example let's say log-line 17-12-07 foo something happened bar would get the fields: pr with value 64, author with value mxinden and platform with value aws.
At the point in time, where I write the logstash configuration I do not know the names of the fields.
How do I dynamically add fields to each line based on the fields contained in the filename?
The static approach so far is:
filter {
mutate { add_field => { "file" => "%{[#metadata][s3][key]}"} }
else {
grok { match => { "file" => "pr:%{NUMBER:pr}-" } }
grok { match => { "file" => "author:%{USERNAME:author}-" } }
grok { match => { "file" => "platform:%{USERNAME:platform}-" } }
}
}
Changes to the filename structure are fine.
Answering my own question based on #dan-griffiths comment:
Solution for a file like pr=64,author=mxinden,platform=aws.log is to use the Elasticsearch kv filter like e.g.:
filter {
kv {
source => "file"
field_split => ","
}
}
where file is a field extracted from the filename via the AWS S3 input plugin.

Multiple identical messages with logstash/kibana

I'm running an ELK stack on my local filesystem. I have the following configuration file set up:
input {
file {
path => "/var/log/rfc5424"
type => "RFC"
}
}
filter {
grok {
match => { "message" => "%{SYSLOG5424LINE}" }
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
}
}
I have a kibana instance running as well. I write a line to /var/log/rfc5424:
$ echo '<11>1' "$(date +'%Y-%m-%dT%H:%M:%SZ')" 'test-machine test-tag f81d4fae-7dec-11d0-a765-00a0c91e6bf6 log [nsId orgID="12 \"hey\" 345" projectID="2345[hehe]6"] this is a test message' >> /var/log/rfc5424
And it shows up in Kibana. Great! However, weirdly, it shows up six times:
As far as I can tell everything about these message is identical, and I only have one instance of logstash/kibana running, so I have no idea what could be causing this duplication.
Check out if there is .swp or .tmp file for your configuration under conf directory.
Add document id to documents:
output {
elasticsearch {
hosts => ["localhost:9200"]
document_id => "%{uuid_field}"
}
}

Logstash Grok Filter key/value pairs

Working on getting our ESET log files (json format) into elasticsearch. I'm shipping logs to our syslog server (syslog-ng), then to logstash, and elasticsearch. Everything is going as it should. My problem is in trying to process the logs in logstash...I cannot seem to separate the key/value pairs into separate fields.
Here's a sample log entry:
Jul 8 11:54:29 192.168.1.144 1 2016-07-08T15:55:09.629Z era.somecompany.local ERAServer 1755 Syslog {"event_type":"Threat_Event","ipv4":"192.168.1.118","source_uuid":"7ecab29a-7db3-4c79-96f5-3946de54cbbf","occured":"08-Jul-2016 15:54:54","severity":"Warning","threat_type":"trojan","threat_name":"HTML/Agent.V","scanner_id":"HTTP filter","scan_id":"virlog.dat","engine_version":"13773 (20160708)","object_type":"file","object_uri":"http://malware.wicar.org/data/java_jre17_exec.html","action_taken":"connection terminated","threat_handled":true,"need_restart":false,"username":"BATHSAVER\\sickes","processname":"C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe"}
Here is my logstash conf:
input {
udp {
type => "esetlog"
port => 5515
}
tcp {
type => "esetlog"
port => 5515
}
filter {
if [type] == "esetlog" {
grok {
match => { "message" => "%{DATA:timestamp}\ %{IPV4:clientip}\ <%{POSINT:num1}>%{POSINT:num2}\ %{DATA:syslogtimestamp}\ %{HOSTNAME}\ %{IPORHOST}\ %{POSINT:syslog_pid\ %{DATA:type}\ %{GREEDYDATA:msg}" }
}
kv {
source => "msg"
value_split => ":"
target => "kv"
}
}
}
output {
elasticsearch {
hosts => ['192.168.1.116:9200']
index => "eset-%{+YYY.MM.dd}"
}
}
When the data is displayed in kibana other than the data and time everything is lumped together in the "message" field only, with no separate key/value pairs.
I've been reading and searching for a week now. I've done similar things with other log files with no problems at all so not sure what I'm missing. Any help/suggestions is greatly appreciated.
Can you try belows configuration of logstash
grok {
match => {
"message" =>["%{CISCOTIMESTAMP:timestamp} %{IPV4:clientip} %{POSINT:num1} %{TIMESTAMP_ISO8601:syslogtimestamp} %{USERNAME:hostname} %{USERNAME:iporhost} %{NUMBER:syslog_pid} Syslog %{GREEDYDATA:msg}"]
}
}
json {
source => "msg"
}
It's working and tested in http://grokconstructor.appspot.com/do/match#result
Regards.

How to call another filter from within a ruby filter in logstash.

I'm building out logstash and would like to build functionality to anonymize fields as specified in the message.
Given the message below, the field fta is a list of fields to anonymize. I would like to just use %{fta} and pass it through to the anonymize filter, but that doesn't seem to work.
{ "containsPII":"True", "fta":["f1","f2"], "f1":"test", "f2":"5551212" }
My config is as follows
input {
stdin { codec => json }
}
filter {
if [containsPII] {
anonymize {
algorithm => "SHA1"
key => "123456789"
fields => %{fta}
}
}
}
output {
stdout {
codec => rubydebug
}
}
The output is
{
"containsPII" => "True",
"fta" => [
[0] "f1",
[1] "f2"
],
"f1" => "test",
"f2" => "5551212",
"#version" => "1",
"#timestamp" => "2016-07-13T22:07:04.036Z",
"host" => "..."
}
Does anyone have any thoughts? I have tried several permutations at this point with no luck.
Thanks,
-D
EDIT:
After posting in the Elastic forums, I found out that this is not possible using base logstash functionality. I will try using the ruby filter instead. So, to ammend my question, How do I call another filter from within the ruby filter? I tried the following with no luck and honestly can't even figure out where to look. I'm very new to ruby.
filter {
if [containsPII] {
ruby {
code => "event['fta'].each { |item| event[item] = LogStash::Filters::Anonymize.execute(event[item],'12345','SHA1') }"
add_tag => ["Rubyrun"]
}
}
}
You can execute the filters from ruby script. Steps will be:
Create the required filter instance in the init block of inline ruby script.
For every event call the filter method of the filter instance.
Following is the example for above problem statement. It will replace my_ip field in event with its SHA1.
Same can be achieved using ruby script file.
Following is the sample config file.
input { stdin { codec => json_lines } }
filter {
ruby {
init => "
require 'logstash/filters/anonymize'
# Create instance of filter with applicable parameters
#anonymize = LogStash::Filters::Anonymize.new({'algorithm' => 'SHA1',
'key' => '123456789',
'fields' => ['my_ip']})
# Make sure to call register
#anonymize.register
"
code => "
# Invoke the filter
#anonymize.filter(event)
"
}
}
output { stdout { codec => rubydebug {metadata => true} } }
Well, I wasn't able to figure out how to call another filter from within a ruby filter, but I did get to the functional goal.
filter {
if [fta] {
ruby {
init => "require 'openssl'"
code => "event['fta'].each { |item| event[item] = OpenSSL::HMAC.hexdigest(OpenSSL::Digest::SHA256.new, '123456789', event[item] ) }"
}
}
}
If the field FTA exists, it will SHA2 encode each of the fields listed in that array.

Resources