Filtering messages out of logstash.conf

Filtering messages out of logstash.conf - logstash

My logstash.conf can be seen below.
How would I go about filtering out messages which include a specific string? In this case, some of my messages are reading as "DEBUG: xxx-xxx-xxx", and I would like for these to be filtered OUT of logstash.
input {
tcp {
port => 5000
type => syslog
}
udp {
port => 5000
type => syslog
}
}
filter {
if [loglevel] == "debug" {
drop { }
}
if [type] == "syslog" {
grok {
match => {
"message" => "%{SYSLOG5424PRI}%{NONNEGINT:ver} +(?:% {TIMESTAMP_ISO8601:ts}|-) +(?:%{HOSTNAME:containerid}|-) +(?:%{NOTSPACE:containername}|-)
+(?:%{NOTSPACE:proc}|-) +(?:%{WORD:msgid}|-) +(?:%{SYSLOG5424SD:sd}|-|) +%{GREEDYDATA:msg}" }
}
syslog_pri { }
date {
match => [ "syslog_timestamp", "MMM d HH:mm:ss", "MMM dd HH:mm:ss" ]
}
if !("_grokparsefailure" in [tags]) {
mutate {
replace => [ "#source_host", "%{syslog_hostname}" ]
replace => [ "#message", "%{syslog_message}" ]
}
}
mutate {
remove_field => [ "syslog_hostname", "syslog_message", "syslog_timestamp" ]
}
}
}
output {
elasticsearch { host => "elasticsearch" }
stdout { codec => rubydebug }
}
EDIT:
I should clarify that it is the GREEDYDATA:msg field that I wish to drop if it includes a "DEBUG" message.

You're looking for drop:
filter {
if [myField] == "badValue" {
drop { }
}
}
While it's better to use exact matches, you can also do regexp conditionals:
filter {
if [myField] =~ "DEBUG" {
drop { }
}
}

Related

Logs are overwritten in the specified index under the same _id

I'm using filebeat - 6.5.1, Logstash - 6.5.1 and elasticsearch - 6.5.1
I'm using multiple GROK in the single config file and trying to send the logs into Elasticsearch
Below is my Filebeat.yml
filebeat.prospectors:
type: log
paths:
var/log/message
fields:
type: apache_access
tags: ["ApacheAccessLogs"]
type: log
paths:
var/log/indicate
fields:
type: apache_error
tags: ["ApacheErrorLogs"]
type: log
paths:
var/log/panda
fields:
type: mysql_error
tags: ["MysqlErrorLogs"]
output.logstash:
The Logstash hosts
hosts: ["logstash:5044"]
Below is my logstash config file -
input {
beats {
port => 5044
tags => [ "ApacheAccessLogs", "ApacheErrorLogs", "MysqlErrorLogs" ]
}
}
filter {
if "ApacheAccessLogs" in [tags] {
grok {
match => [
"message" , "%{COMBINEDAPACHELOG}+%{GREEDYDATA:extra_fields}",
"message" , "%{COMMONAPACHELOG}+%{GREEDYDATA:extra_fields}"
]
overwrite => [ "message" ]
}
mutate {
convert => ["response", "integer"]
convert => ["bytes", "integer"]
convert => ["responsetime", "float"]
}
geoip {
source => "clientip"
target => "geoip"
add_tag => [ "apache-geoip" ]
}
date {
match => [ "timestamp" , "dd/MMM/YYYY:HH:mm:ss Z" ]
remove_field => [ "timestamp" ]
}
useragent {
source => "agent"
}
}
if "ApacheErrorLogs" in [tags] {
grok {
match => { "message" => ["[%{APACHE_TIME:[apache2][error][timestamp]}] [%{LOGLEVEL:[apache2][error][level]}]( [client %{IPORHOST:[apache2][error][client]}])? %{GREEDYDATA:[apache2][error][message]}",
"[%{APACHE_TIME:[apache2][error][timestamp]}] [%{DATA:[apache2][error][module]}:%{LOGLEVEL:[apache2][error][level]}] [pid %{NUMBER:[apache2][error][pid]}(:tid %{NUMBER:[apache2][error][tid]})?]( [client %{IPORHOST:[apache2][error][client]}])? %{GREEDYDATA:[apache2][error][message1]}" ] }
pattern_definitions => {
"APACHE_TIME" => "%{DAY} %{MONTH} %{MONTHDAY} %{TIME} %{YEAR}"
}
remove_field => "message"
}
mutate {
rename => { "[apache2][error][message1]" => "[apache2][error][message]" }
}
date {
match => [ "[apache2][error][timestamp]", "EEE MMM dd H:m:s YYYY", "EEE MMM dd H:m:s.SSSSSS YYYY" ]
remove_field => "[apache2][error][timestamp]"
}
}
if "MysqlErrorLogs" in [tags] {
grok {
match => { "message" => ["%{LOCALDATETIME:[mysql][error][timestamp]} ([%{DATA:[mysql][error][level]}] )?%{GREEDYDATA:[mysql][error][message]}",
"%{TIMESTAMP_ISO8601:[mysql][error][timestamp]} %{NUMBER:[mysql][error][thread_id]} [%{DATA:[mysql][error][level]}] %{GREEDYDATA:[mysql][error][message1]}",
"%{GREEDYDATA:[mysql][error][message2]}"] }
pattern_definitions => {
"LOCALDATETIME" => "[0-9]+ %{TIME}"
}
remove_field => "message"
}
mutate {
rename => { "[mysql][error][message1]" => "[mysql][error][message]" }
}
mutate {
rename => { "[mysql][error][message2]" => "[mysql][error][message]" }
}
date {
match => [ "[mysql][error][timestamp]", "ISO8601", "YYMMdd H:m:s" ]
remove_field => "[apache2][access][time]"
}
}
}
output {
if "ApacheAccessLogs" in [tags] {
elasticsearch { hosts => ["elasticsearch:9200"]
index => "apache"
document_id => "apacheaccess"
}
}
if "ApacheErrorLogs" in [tags] {
elasticsearch { hosts => ["elasticsearch:9200"]
index => "apache"
document_id => "apacheerror"
}
}
if "MysqlErrorLogs" in [tags] {
elasticsearch { hosts => ["elasticsearch:9200"]
index => "apache"
document_id => "sqlerror"
}
}
stdout { codec => rubydebug }
}
The data is sent to elastic search but only 3 records are getting created for each document_id in the same index.
Only 3 records are created and every new logs incoming are overwritten onto the same document_id and the old one is lost.
Can you guys please help me out?

The definition of document_id is to provide an unique document id for an event. In your case, as they are static (apacheaccess, apacheerror, sqlerror), there will be only 1 event per index ingested into elasticsearch, overide by the newest event.
As you have 3 distinct data type, what you seems to be looking for provide for each event type (ApacheAccessLogs, ApacheErrorLogs, MysqlErrorLogs) a different index, as following :
output {
if "ApacheAccessLogs" in [tags] {
elasticsearch {
hosts => ["elasticsearch:9200"]
index => "apache-access"
}
}
if "ApacheErrorLogs" in [tags] {
elasticsearch {
hosts => ["elasticsearch:9200"]
index => "apache-error"
}
}
if "MysqlErrorLogs" in [tags] {
elasticsearch {
hosts => ["elasticsearch:9200"]
index => "mysql-error"
}
}
stdout {
codec => rubydebug
}
}
There are not many cases where you need to set the id manually (eg. in case of reingest of data), as Logstash & Elasticsearch will manage that by themself.
But if that's the case, and you can't use a field to identify each event individually, you could use the logstash-filter-fingerprint, that is made for that.

Grok help for a custom metric

I have a log line like this:
09 Nov 2018 15:51:35 DEBUG api.MapAnythingProvider - Calling API For Client: XXX Number of ELEMENTS Requested YYY
I want to ignore all other log lines and only want those lines that have the words "Calling API For Client" in it. Further, I am only interested in the String XXX and Number YYY.
Thanks for the help.

input {
file {
path => ["C:/apache-tomcat-9.0.7/logs/service/service.log"]
sincedb_path => "nul"
start_position => "beginning"
}
}
filter {
grok {
match => {
"message" => "%{MONTHDAY:monthDay} %{MONTH:mon} %{YEAR:year} %{TIME:ts} %{WORD:severity} %{JAVACLASS:claz} - %{GREEDYDATA:logmessage}"
}
}
grok {
match => {
"logmessage" => "%{WORD:keyword} %{WORD:customer} %{WORD:key2} %{NUMBER:mapAnythingCreditsConsumed:float} %{WORD:key3} %{NUMBER:elementsFromCache:int}"
}
}
if "_grokparsefailure" in [tags] {
drop {}
}
mutate {
remove_field => [ "monthDay", "mon", "ts", "severity", "claz", "keyword", "key2", "path", "message", "year", "key3" ]
}
}
output {
if [logmessage] =~ /ExecutingJobFor/ {
elasticsearch {
hosts => ["localhost:9200"]
index => "test"
manage_template => false
}
stdout {
codec => rubydebug
}
}
}

logstash-output-file-as time based

My output of logstash directed to the file called apache.log.
This file needs to be generated in every hour.
For Example: apache-2018-04-16-10:00.log or something similar to this.
Here my configuration file :
# INPUT HERE
input {
beats {
port => 5044
}
}
# FILTER HERE
filter {
if [source]=="/var/log/apache2/error.log"
{
mutate {
remove_tag => [ "beats_input_codec_plain_applied" ]
add_tag => [ "apache_logs" ]
}
}
if [source]=="/var/log/apache2/access.log"
{
mutate {
remove_tag => [ "beats_input_codec_plain_applied" ]
add_tag => [ "apache_logs" ]
}
}
}
# OUTPUT HERE
output {
if "apache_logs" in [tags] {
file {
path => "/home/ubuntu/apache/apache-%{+yyyy-mm-dd}.log"
codec => "json"
}
}
}
Please help out to solve.

From the joda-time documentation (http://www.joda.org/joda-time/key_format.html), you have H hour of day (0~23). So output configuration to solve your problem would be:
output {
if "apache_logs" in [tags] {
file {
path => "/home/ubuntu/apache/apache-%{+yyyy-mm-dd-HH}.log"
codec => "json"
}
}
}

Logstash compare a field to a number

I'm searching a way to compare a Logstash field to a number in a conditional statement, but couldn't find anything in the documentation.
Something like this for example:
if [myfiels] => 1{
mutate {
add_field => ["fild", "1"]
}
or
if [myfiels] >= 1 and [myfiels] <= 3 {
mutate {
add_field => ["fild", "2"]
}
Thanks.

You first need to convert column type.
input {
stdin{}
}
filter {
grok {
match => ["message","%{NUMBER:num}" ]
}
mutate {
convert => { "num" => "integer" }
}
if [num] >= 5 {
mutate {
add_field => { "xyz" => "123" }
}
}
}
output {
stdout { codec => rubydebug }
}

File input add_field not adding field to every row

I am parsing several logfiles of different load balanced serverclusters with my logstash config and would like to add a field "log_origin" to each file's entries for the later easy filtering.
Here's my input->file config in a simple example:
input {
file {
type => "node1"
path => "C:/Development/node1/log/*"
add_field => [ "log_origin", "live_logs" ]
}
file {
type => "node2"
path => "C:/Development/node2/log/*"
add_field => [ "log_origin", "live_logs" ]
}
file {
type => "node3"
path => "C:/Development/node1/log/*"
add_field => [ "log_origin", "live_logs" ]
}
file {
type => "node4"
path => "C:/Development/node1/log/*"
add_field => [ "log_origin", "live_logs" ]
}
}
filter {
grok {
match => [
"message","%{DATESTAMP:log_timestamp}%{SPACE}\[%{DATA:class}\]%{SPACE}%{LOGLEVEL:loglevel}%{SPACE}%{GREEDYDATA:log_message}"
]
}
date {
match => [ "log_timestamp", "dd.MM.YY HH:mm:ss", "ISO8601" ]
target => "#timestamp"
}
mutate {
lowercase => ["loglevel"]
strip => ["loglevel"]
}
if "_grokparsefailure" in [tags] {
multiline {
pattern => ".*"
what => "previous"
}
}
if[fields.log_origin] == "live_logs"{
if [type] == "node1" {
mutate {
add_tag => "realsServerName1"
}
}
if [type] == "node2" {
mutate {
add_tag => "realsServerName2"
}
}
if [type] == "node3" {
mutate {
add_tag => "realsServerName3"
}
}
if [type] == "node4" {
mutate {
add_tag => "realsServerName4"
}
}
}
}
output {
stdout { }
elasticsearch { embedded => true }
}
I would have expected logstash to add this field with the value given to every logentry it finds, but it doesn't. Maybe I am completely taking the wrong approach here?
Edit: I am not able to retrieve the logs directly from the nodes, but have to copy them over to my "server". Otherwise i would be able to just use the filepath for distinguishing different clusters...
Edit: It's working. I should have cleand my data in between. Old entries without the field added cluttered up my results.

The add_field expects a hash. It should be
add_field => {
"log_origin" => "live_logs"
}

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Filtering messages out of logstash.conf - logstash

You're looking for drop: filter { if [myField] == "badValue" { drop { } } } While it's better to use exact matches, you can also do regexp conditionals: filter { if [myField] =~ "DEBUG" { drop { } } }

Related

Logs are overwritten in the specified index under the same _id

Grok help for a custom metric

logstash-output-file-as time based

Logstash compare a field to a number

File input add_field not adding field to every row

Categories

Resources