File input add_field not adding field to every row

File input add_field not adding field to every row - logstash

I am parsing several logfiles of different load balanced serverclusters with my logstash config and would like to add a field "log_origin" to each file's entries for the later easy filtering.
Here's my input->file config in a simple example:
input {
file {
type => "node1"
path => "C:/Development/node1/log/*"
add_field => [ "log_origin", "live_logs" ]
}
file {
type => "node2"
path => "C:/Development/node2/log/*"
add_field => [ "log_origin", "live_logs" ]
}
file {
type => "node3"
path => "C:/Development/node1/log/*"
add_field => [ "log_origin", "live_logs" ]
}
file {
type => "node4"
path => "C:/Development/node1/log/*"
add_field => [ "log_origin", "live_logs" ]
}
}
filter {
grok {
match => [
"message","%{DATESTAMP:log_timestamp}%{SPACE}\[%{DATA:class}\]%{SPACE}%{LOGLEVEL:loglevel}%{SPACE}%{GREEDYDATA:log_message}"
]
}
date {
match => [ "log_timestamp", "dd.MM.YY HH:mm:ss", "ISO8601" ]
target => "#timestamp"
}
mutate {
lowercase => ["loglevel"]
strip => ["loglevel"]
}
if "_grokparsefailure" in [tags] {
multiline {
pattern => ".*"
what => "previous"
}
}
if[fields.log_origin] == "live_logs"{
if [type] == "node1" {
mutate {
add_tag => "realsServerName1"
}
}
if [type] == "node2" {
mutate {
add_tag => "realsServerName2"
}
}
if [type] == "node3" {
mutate {
add_tag => "realsServerName3"
}
}
if [type] == "node4" {
mutate {
add_tag => "realsServerName4"
}
}
}
}
output {
stdout { }
elasticsearch { embedded => true }
}
I would have expected logstash to add this field with the value given to every logentry it finds, but it doesn't. Maybe I am completely taking the wrong approach here?
Edit: I am not able to retrieve the logs directly from the nodes, but have to copy them over to my "server". Otherwise i would be able to just use the filepath for distinguishing different clusters...
Edit: It's working. I should have cleand my data in between. Old entries without the field added cluttered up my results.

The add_field expects a hash. It should be
add_field => {
"log_origin" => "live_logs"
}

Related

Using if-else with Logstash split

I have a string field called description delimited with _.
I split it as follows:
filter {
mutate {
split => ["description", "_"]
add_field => {"location" => "%{[description][3]}"}
}
How can I check if the split values are empty or not?
I have attempted:
if !["%{[description][3]}"] {
# do something
}
if ![[description][3]] {
# do something
}
if ![description][3] {
# do something
}
None of them work.
The goal is to have the value of the new field location as its actual value or a generic value such as NA.

you made a really simple mistake with your mutate split.
this
mutate {
split => ["description", "_"]
add_field => {"location" => "%{[description][3]}"}
}
should have been
mutate {
split => ["description"=> "_"] <=== see I removed the comma and added =>
add_field => {"location" => "%{[description][3]}"}
}
here is sample I tested out with
filter {
mutate {
remove_field => ["headers", "#version"]
add_field => { "description" => "Python_Java_ruby_perl " }
}
mutate {
split => {"description" => "_"}
}
if [description][4] {
mutate {
add_field => {"result" => "The 4 th field exists"}
}
} else {
mutate {
add_field => {"result" => "The 4 th field DOES NOT exists"}
}
}
and the result on console (since there is no 4 th element, it went to else block
{
"host" => "0:0:0:0:0:0:0:1",
"result" => "The 4 th field DOES NOT exists", <==== from else block
"#timestamp" => 2020-01-14T19:35:41.013Z,
"message" => "hello",
"description" => [
[0] "Python",
[1] "Java",
[2] "ruby",
[3] "perl "
]
}

Logs are overwritten in the specified index under the same _id

I'm using filebeat - 6.5.1, Logstash - 6.5.1 and elasticsearch - 6.5.1
I'm using multiple GROK in the single config file and trying to send the logs into Elasticsearch
Below is my Filebeat.yml
filebeat.prospectors:
type: log
paths:
var/log/message
fields:
type: apache_access
tags: ["ApacheAccessLogs"]
type: log
paths:
var/log/indicate
fields:
type: apache_error
tags: ["ApacheErrorLogs"]
type: log
paths:
var/log/panda
fields:
type: mysql_error
tags: ["MysqlErrorLogs"]
output.logstash:
The Logstash hosts
hosts: ["logstash:5044"]
Below is my logstash config file -
input {
beats {
port => 5044
tags => [ "ApacheAccessLogs", "ApacheErrorLogs", "MysqlErrorLogs" ]
}
}
filter {
if "ApacheAccessLogs" in [tags] {
grok {
match => [
"message" , "%{COMBINEDAPACHELOG}+%{GREEDYDATA:extra_fields}",
"message" , "%{COMMONAPACHELOG}+%{GREEDYDATA:extra_fields}"
]
overwrite => [ "message" ]
}
mutate {
convert => ["response", "integer"]
convert => ["bytes", "integer"]
convert => ["responsetime", "float"]
}
geoip {
source => "clientip"
target => "geoip"
add_tag => [ "apache-geoip" ]
}
date {
match => [ "timestamp" , "dd/MMM/YYYY:HH:mm:ss Z" ]
remove_field => [ "timestamp" ]
}
useragent {
source => "agent"
}
}
if "ApacheErrorLogs" in [tags] {
grok {
match => { "message" => ["[%{APACHE_TIME:[apache2][error][timestamp]}] [%{LOGLEVEL:[apache2][error][level]}]( [client %{IPORHOST:[apache2][error][client]}])? %{GREEDYDATA:[apache2][error][message]}",
"[%{APACHE_TIME:[apache2][error][timestamp]}] [%{DATA:[apache2][error][module]}:%{LOGLEVEL:[apache2][error][level]}] [pid %{NUMBER:[apache2][error][pid]}(:tid %{NUMBER:[apache2][error][tid]})?]( [client %{IPORHOST:[apache2][error][client]}])? %{GREEDYDATA:[apache2][error][message1]}" ] }
pattern_definitions => {
"APACHE_TIME" => "%{DAY} %{MONTH} %{MONTHDAY} %{TIME} %{YEAR}"
}
remove_field => "message"
}
mutate {
rename => { "[apache2][error][message1]" => "[apache2][error][message]" }
}
date {
match => [ "[apache2][error][timestamp]", "EEE MMM dd H:m:s YYYY", "EEE MMM dd H:m:s.SSSSSS YYYY" ]
remove_field => "[apache2][error][timestamp]"
}
}
if "MysqlErrorLogs" in [tags] {
grok {
match => { "message" => ["%{LOCALDATETIME:[mysql][error][timestamp]} ([%{DATA:[mysql][error][level]}] )?%{GREEDYDATA:[mysql][error][message]}",
"%{TIMESTAMP_ISO8601:[mysql][error][timestamp]} %{NUMBER:[mysql][error][thread_id]} [%{DATA:[mysql][error][level]}] %{GREEDYDATA:[mysql][error][message1]}",
"%{GREEDYDATA:[mysql][error][message2]}"] }
pattern_definitions => {
"LOCALDATETIME" => "[0-9]+ %{TIME}"
}
remove_field => "message"
}
mutate {
rename => { "[mysql][error][message1]" => "[mysql][error][message]" }
}
mutate {
rename => { "[mysql][error][message2]" => "[mysql][error][message]" }
}
date {
match => [ "[mysql][error][timestamp]", "ISO8601", "YYMMdd H:m:s" ]
remove_field => "[apache2][access][time]"
}
}
}
output {
if "ApacheAccessLogs" in [tags] {
elasticsearch { hosts => ["elasticsearch:9200"]
index => "apache"
document_id => "apacheaccess"
}
}
if "ApacheErrorLogs" in [tags] {
elasticsearch { hosts => ["elasticsearch:9200"]
index => "apache"
document_id => "apacheerror"
}
}
if "MysqlErrorLogs" in [tags] {
elasticsearch { hosts => ["elasticsearch:9200"]
index => "apache"
document_id => "sqlerror"
}
}
stdout { codec => rubydebug }
}
The data is sent to elastic search but only 3 records are getting created for each document_id in the same index.
Only 3 records are created and every new logs incoming are overwritten onto the same document_id and the old one is lost.
Can you guys please help me out?

The definition of document_id is to provide an unique document id for an event. In your case, as they are static (apacheaccess, apacheerror, sqlerror), there will be only 1 event per index ingested into elasticsearch, overide by the newest event.
As you have 3 distinct data type, what you seems to be looking for provide for each event type (ApacheAccessLogs, ApacheErrorLogs, MysqlErrorLogs) a different index, as following :
output {
if "ApacheAccessLogs" in [tags] {
elasticsearch {
hosts => ["elasticsearch:9200"]
index => "apache-access"
}
}
if "ApacheErrorLogs" in [tags] {
elasticsearch {
hosts => ["elasticsearch:9200"]
index => "apache-error"
}
}
if "MysqlErrorLogs" in [tags] {
elasticsearch {
hosts => ["elasticsearch:9200"]
index => "mysql-error"
}
}
stdout {
codec => rubydebug
}
}
There are not many cases where you need to set the id manually (eg. in case of reingest of data), as Logstash & Elasticsearch will manage that by themself.
But if that's the case, and you can't use a field to identify each event individually, you could use the logstash-filter-fingerprint, that is made for that.

Grok help for a custom metric

I have a log line like this:
09 Nov 2018 15:51:35 DEBUG api.MapAnythingProvider - Calling API For Client: XXX Number of ELEMENTS Requested YYY
I want to ignore all other log lines and only want those lines that have the words "Calling API For Client" in it. Further, I am only interested in the String XXX and Number YYY.
Thanks for the help.

input {
file {
path => ["C:/apache-tomcat-9.0.7/logs/service/service.log"]
sincedb_path => "nul"
start_position => "beginning"
}
}
filter {
grok {
match => {
"message" => "%{MONTHDAY:monthDay} %{MONTH:mon} %{YEAR:year} %{TIME:ts} %{WORD:severity} %{JAVACLASS:claz} - %{GREEDYDATA:logmessage}"
}
}
grok {
match => {
"logmessage" => "%{WORD:keyword} %{WORD:customer} %{WORD:key2} %{NUMBER:mapAnythingCreditsConsumed:float} %{WORD:key3} %{NUMBER:elementsFromCache:int}"
}
}
if "_grokparsefailure" in [tags] {
drop {}
}
mutate {
remove_field => [ "monthDay", "mon", "ts", "severity", "claz", "keyword", "key2", "path", "message", "year", "key3" ]
}
}
output {
if [logmessage] =~ /ExecutingJobFor/ {
elasticsearch {
hosts => ["localhost:9200"]
index => "test"
manage_template => false
}
stdout {
codec => rubydebug
}
}
}

logstash-output-file-as time based

My output of logstash directed to the file called apache.log.
This file needs to be generated in every hour.
For Example: apache-2018-04-16-10:00.log or something similar to this.
Here my configuration file :
# INPUT HERE
input {
beats {
port => 5044
}
}
# FILTER HERE
filter {
if [source]=="/var/log/apache2/error.log"
{
mutate {
remove_tag => [ "beats_input_codec_plain_applied" ]
add_tag => [ "apache_logs" ]
}
}
if [source]=="/var/log/apache2/access.log"
{
mutate {
remove_tag => [ "beats_input_codec_plain_applied" ]
add_tag => [ "apache_logs" ]
}
}
}
# OUTPUT HERE
output {
if "apache_logs" in [tags] {
file {
path => "/home/ubuntu/apache/apache-%{+yyyy-mm-dd}.log"
codec => "json"
}
}
}
Please help out to solve.

From the joda-time documentation (http://www.joda.org/joda-time/key_format.html), you have H hour of day (0~23). So output configuration to solve your problem would be:
output {
if "apache_logs" in [tags] {
file {
path => "/home/ubuntu/apache/apache-%{+yyyy-mm-dd-HH}.log"
codec => "json"
}
}
}

Logstash compare a field to a number

I'm searching a way to compare a Logstash field to a number in a conditional statement, but couldn't find anything in the documentation.
Something like this for example:
if [myfiels] => 1{
mutate {
add_field => ["fild", "1"]
}
or
if [myfiels] >= 1 and [myfiels] <= 3 {
mutate {
add_field => ["fild", "2"]
}
Thanks.

You first need to convert column type.
input {
stdin{}
}
filter {
grok {
match => ["message","%{NUMBER:num}" ]
}
mutate {
convert => { "num" => "integer" }
}
if [num] >= 5 {
mutate {
add_field => { "xyz" => "123" }
}
}
}
output {
stdout { codec => rubydebug }
}

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

File input add_field not adding field to every row - logstash

The add_field expects a hash. It should be add_field => { "log_origin" => "live_logs" }

Related

Using if-else with Logstash split

Logs are overwritten in the specified index under the same _id

Grok help for a custom metric

logstash-output-file-as time based

Logstash compare a field to a number

Categories

Resources