Logs are overwritten in the specified index under the same _id - logstash

I'm using filebeat - 6.5.1, Logstash - 6.5.1 and elasticsearch - 6.5.1
I'm using multiple GROK in the single config file and trying to send the logs into Elasticsearch
Below is my Filebeat.yml
filebeat.prospectors:
type: log
paths:
var/log/message
fields:
type: apache_access
tags: ["ApacheAccessLogs"]
type: log
paths:
var/log/indicate
fields:
type: apache_error
tags: ["ApacheErrorLogs"]
type: log
paths:
var/log/panda
fields:
type: mysql_error
tags: ["MysqlErrorLogs"]
output.logstash:
The Logstash hosts
hosts: ["logstash:5044"]
Below is my logstash config file -
input {
beats {
port => 5044
tags => [ "ApacheAccessLogs", "ApacheErrorLogs", "MysqlErrorLogs" ]
}
}
filter {
if "ApacheAccessLogs" in [tags] {
grok {
match => [
"message" , "%{COMBINEDAPACHELOG}+%{GREEDYDATA:extra_fields}",
"message" , "%{COMMONAPACHELOG}+%{GREEDYDATA:extra_fields}"
]
overwrite => [ "message" ]
}
mutate {
convert => ["response", "integer"]
convert => ["bytes", "integer"]
convert => ["responsetime", "float"]
}
geoip {
source => "clientip"
target => "geoip"
add_tag => [ "apache-geoip" ]
}
date {
match => [ "timestamp" , "dd/MMM/YYYY:HH:mm:ss Z" ]
remove_field => [ "timestamp" ]
}
useragent {
source => "agent"
}
}
if "ApacheErrorLogs" in [tags] {
grok {
match => { "message" => ["[%{APACHE_TIME:[apache2][error][timestamp]}] [%{LOGLEVEL:[apache2][error][level]}]( [client %{IPORHOST:[apache2][error][client]}])? %{GREEDYDATA:[apache2][error][message]}",
"[%{APACHE_TIME:[apache2][error][timestamp]}] [%{DATA:[apache2][error][module]}:%{LOGLEVEL:[apache2][error][level]}] [pid %{NUMBER:[apache2][error][pid]}(:tid %{NUMBER:[apache2][error][tid]})?]( [client %{IPORHOST:[apache2][error][client]}])? %{GREEDYDATA:[apache2][error][message1]}" ] }
pattern_definitions => {
"APACHE_TIME" => "%{DAY} %{MONTH} %{MONTHDAY} %{TIME} %{YEAR}"
}
remove_field => "message"
}
mutate {
rename => { "[apache2][error][message1]" => "[apache2][error][message]" }
}
date {
match => [ "[apache2][error][timestamp]", "EEE MMM dd H:m:s YYYY", "EEE MMM dd H:m:s.SSSSSS YYYY" ]
remove_field => "[apache2][error][timestamp]"
}
}
if "MysqlErrorLogs" in [tags] {
grok {
match => { "message" => ["%{LOCALDATETIME:[mysql][error][timestamp]} ([%{DATA:[mysql][error][level]}] )?%{GREEDYDATA:[mysql][error][message]}",
"%{TIMESTAMP_ISO8601:[mysql][error][timestamp]} %{NUMBER:[mysql][error][thread_id]} [%{DATA:[mysql][error][level]}] %{GREEDYDATA:[mysql][error][message1]}",
"%{GREEDYDATA:[mysql][error][message2]}"] }
pattern_definitions => {
"LOCALDATETIME" => "[0-9]+ %{TIME}"
}
remove_field => "message"
}
mutate {
rename => { "[mysql][error][message1]" => "[mysql][error][message]" }
}
mutate {
rename => { "[mysql][error][message2]" => "[mysql][error][message]" }
}
date {
match => [ "[mysql][error][timestamp]", "ISO8601", "YYMMdd H:m:s" ]
remove_field => "[apache2][access][time]"
}
}
}
output {
if "ApacheAccessLogs" in [tags] {
elasticsearch { hosts => ["elasticsearch:9200"]
index => "apache"
document_id => "apacheaccess"
}
}
if "ApacheErrorLogs" in [tags] {
elasticsearch { hosts => ["elasticsearch:9200"]
index => "apache"
document_id => "apacheerror"
}
}
if "MysqlErrorLogs" in [tags] {
elasticsearch { hosts => ["elasticsearch:9200"]
index => "apache"
document_id => "sqlerror"
}
}
stdout { codec => rubydebug }
}
The data is sent to elastic search but only 3 records are getting created for each document_id in the same index.
Only 3 records are created and every new logs incoming are overwritten onto the same document_id and the old one is lost.
Can you guys please help me out?

The definition of document_id is to provide an unique document id for an event. In your case, as they are static (apacheaccess, apacheerror, sqlerror), there will be only 1 event per index ingested into elasticsearch, overide by the newest event.
As you have 3 distinct data type, what you seems to be looking for provide for each event type (ApacheAccessLogs, ApacheErrorLogs, MysqlErrorLogs) a different index, as following :
output {
if "ApacheAccessLogs" in [tags] {
elasticsearch {
hosts => ["elasticsearch:9200"]
index => "apache-access"
}
}
if "ApacheErrorLogs" in [tags] {
elasticsearch {
hosts => ["elasticsearch:9200"]
index => "apache-error"
}
}
if "MysqlErrorLogs" in [tags] {
elasticsearch {
hosts => ["elasticsearch:9200"]
index => "mysql-error"
}
}
stdout {
codec => rubydebug
}
}
There are not many cases where you need to set the id manually (eg. in case of reingest of data), as Logstash & Elasticsearch will manage that by themself.
But if that's the case, and you can't use a field to identify each event individually, you could use the logstash-filter-fingerprint, that is made for that.

Related

Add log4net Level field to logstash.conf file

I'm trying to add LEVEL field (so it shows up in Kibana). My logstash.conf
Input:
2018-03-18 15:43:40.7914 - INFO: Tick
2018-03-18 15:43:40.7914 - ERROR: Tock
file:
input {
beats {
port => 5044
}
}
filter {
grok {
match => {
"message" => "(?m)^%{TIMESTAMP_ISO8601:timestamp}~~\[%{DATA:thread}\]~~\[%{DATA:user}\]~~\[%{DATA:requestId}\]~~\[%{DATA:userHost}\]~~\[%{DATA:requestUrl}\]~~%{DATA:level}~~%{DATA:logger}~~%{DATA:logmessage}~~%{DATA:exception}\|\|"
}
match => {
"levell" => "(?m)^%{DATA:level}"
}
add_field => {
"received_at" => "%{#timestamp}"
"received_from" => "%{host}"
"level" => "levell"
}
remove_field => ["message"]
}
date {
match => [ "timestamp", "yyyy-MM-dd HH:mm:ss:SSS" ]
}
}
output {
elasticsearch {
hosts => ["http://localhost:9200"]
sniffing => true
index => "filebeat-%{+YYYY.MM.dd}"
document_type => "%{[#metadata][type]}"
#user => "elastic"
#password => "changeme"
}
stdout { codec => rubydebug }
}
this prints out "levell" instead of "INFO/ERROR" etc
EDIT:
Input:
2018-03-18 15:43:40.7914 - INFO: Tick
configuration:
# Sample Logstash configuration for creating a simple
# Beats -> Logstash -> Elasticsearch pipeline.
input {
beats {
port => 5044
}
}
filter {
grok {
match => { "message" => "(?m)^%{TIMESTAMP_ISO8601:timestamp}~~\[%{DATA:thread}\]~~\[%{DATA:user}\]~~\[%{DATA:requestId}\]~~\[%{DATA:userHost}\]~~\[%{DATA:requestUrl}\]~~%{DATA:level}~~%{DATA:logger}~~%{DATA:logmessage}~~%{DATA:exception}\|\|" }
add_field => {
"received_at" => "%{#timestamp}"
"received_from" => "%{host}"
}
}
grok {
match => { "message" => "- %{LOGLEVEL:level}" }
remove_field => ["message"]
}
date {
match => [ "timestamp", "yyyy-MM-dd HH:mm:ss:SSS" ]
}
}
output {
elasticsearch {
hosts => ["http://localhost:9200"]
sniffing => true
index => "filebeat-%{+YYYY.MM.dd}"
document_type => "%{[#metadata][type]}"
#user => "elastic"
#password => "changeme"
}
stdout { codec => rubydebug }
}
Output I'm getting. Still missing received_at and level:
In that part of the configuration:
add_field => {
"received_at" => "%{#timestamp}"
"received_from" => "%{host}"
"level" => "levell"
}
When using "level" => "levell", you just put the String levell in the field level. To put the value of the field named levell, you have to use %{levell}. So in you case, it would look like:
add_field => {
"received_at" => "%{#timestamp}"
"received_from" => "%{host}"
"level" => "%{levell}"
}
Also the grok#match, according to the documentation:
A hash that defines the mapping of where to look, and with which patterns.
So trying to match on the levell field won't work, since it look like it doesn't exist yet. And the grok pattern you're using to match the message field don't match the example you provided.

Grok help for a custom metric

I have a log line like this:
09 Nov 2018 15:51:35 DEBUG api.MapAnythingProvider - Calling API For Client: XXX Number of ELEMENTS Requested YYY
I want to ignore all other log lines and only want those lines that have the words "Calling API For Client" in it. Further, I am only interested in the String XXX and Number YYY.
Thanks for the help.
input {
file {
path => ["C:/apache-tomcat-9.0.7/logs/service/service.log"]
sincedb_path => "nul"
start_position => "beginning"
}
}
filter {
grok {
match => {
"message" => "%{MONTHDAY:monthDay} %{MONTH:mon} %{YEAR:year} %{TIME:ts} %{WORD:severity} %{JAVACLASS:claz} - %{GREEDYDATA:logmessage}"
}
}
grok {
match => {
"logmessage" => "%{WORD:keyword} %{WORD:customer} %{WORD:key2} %{NUMBER:mapAnythingCreditsConsumed:float} %{WORD:key3} %{NUMBER:elementsFromCache:int}"
}
}
if "_grokparsefailure" in [tags] {
drop {}
}
mutate {
remove_field => [ "monthDay", "mon", "ts", "severity", "claz", "keyword", "key2", "path", "message", "year", "key3" ]
}
}
output {
if [logmessage] =~ /ExecutingJobFor/ {
elasticsearch {
hosts => ["localhost:9200"]
index => "test"
manage_template => false
}
stdout {
codec => rubydebug
}
}
}

Logstash issue with json_formater [LogStash::Json::ParserError: Unexpected character ('-' (code 45)): was expecting comma to separate ARRAY entries]

I have an issue with converting value through logstash, I can't find solution for it. it seems to be linked to the date.
#Log line
[2017-08-15 12:30:17] api.INFO: {"sessionId":"a216925---ff5992be7520924ff25992be75209c7","action":"processed","time":1502789417,"type":"bookingProcess","page":"order"} [] []
Logstash configuration
filter {
if [type] == "api-prod-log" {
grok {
match => {"message" => "\[%{TIMESTAMP_ISO8601:timestamp}\] %{WORD:module}.%{WORD:level}: (?<log_message>.*) \[\] \[\]" }
add_field => [ "received_from", "%{host}" ]
}
json {
source => "log_message"
target => "flightSearchRequest"
remove_field=>["log_message"]
}
date {
match => [ "timestamp", "YYYY-MM-dd HH:mm:ss" ]
timezone => "Asia/Jerusalem"
}
}
}
Any idea ?
Thanks
What version of Logstash are you using?
On Logstash 5.2.2 with the following Logstash config:
input {
stdin{}
}
filter {
grok {
match => {"message" => '\[%{TIMESTAMP_ISO8601:timestamp}\] %{WORD:module}.%{WORD:level}: (?<log_message>.*) \[\] \[\]' }
}
json {
source => "log_message"
target => "flightSearchRequest"
remove_field=>["log_message"]
}
date {
match => [ "timestamp", "YYYY-MM-dd HH:mm:ss" ]
timezone => "Asia/Jerusalem"
}
}
output{
stdout {codec => "rubydebug"}
}
I get a perfectly correct result and no errors, when I pass your log line as input:
{
"#timestamp" => 2017-08-15T09:30:17.000Z,
"flightSearchRequest" => {
"action" => "processed",
"sessionId" => "a216925---ff5992be7520924ff25992be75209c7",
"time" => 1502789417,
"page" => "order",
"type" => "bookingProcess"
},
"level" => "INFO",
"module" => "api",
"#version" => "1",
"message" => "[2017-08-15 12:30:17] api.INFO: {\"sessionId\":\"a216925---ff5992be7520924ff25992be75209c7\",\"action\":\"processed\",\"time\":1502789417,\"type\":\"bookingProcess\",\"page\":\"order\"} [] []",
"timestamp" => "2017-08-15 12:30:17"
}
I've removed the check for "type" in the beginning, can you test if that can affect the result?

Filtering/Visualising JMX fields

I am successfully able to integrate JMX plugin for logstash. Now to i am trying to Visualize the JMX data.
For this i am trying to add custom fields to the parsed jmx data.
Example:
input{
beats{
port => 27080
congestion_threshold => 1500
}
jmx {
path => "file://Machine01/Users/username/projects/Logstash/logstash/bin/jmx"
polling_frequency => 15
type => "jmx"
nb_thread => 4
}
}
filter {
if [type] == "Type1"{
grok{
break_on_match => false
patterns_dir => ["C:\Users\users\projects\Logstash\logstash\bin\patterns"]
match => { "message" => "%{YEAR:Year}%{MONTHNUM:Month}%{MONTHDAY:Day} %{HOUR:Hour}%{MINUTE:Minute}%{SECOND:Second} %{LogLevel:LogVerbosity} %{MODULE:MODULENAME}%{SPACE}%{MESSAGEID:MESSAGEID} %{SUBMODULE:SUBMODULE} %{MESSAGE:MESSAGE}"}
add_field => [ "received_at", "%{#timestamp}" ]
add_field => [ "received_from", "%{host}" ]
add_tag => ["Groked"]
}
if "_grokparsefailure" in [tags] {
drop { }
}
if [type] == "jmx" {
if ("OperatingSystem.ProcessCpuLoad" in [metric_path] or "OperatingSystem.SystemCpuLoad" in [metric_path]) {
ruby {
code => "event['cpuLoad'] = event['metric_value_number'] * 100"
add_tag => [ "cpuLoad" ]
}
}
}
}
}
output {
if [type] == "jmx" {
elasticsearch {
hosts => ["http://localhost:9200"]
index => "jmx"
}
} else {
elasticsearch {
hosts => ["http://localhost:9200"]
manage_template => true
index => "%{[#metadata][beat]}-%{+YYYY.MM.dd}"
document_type => "%{[#metadata][type]}"
}
}
}
But, In KIbana it is not displaying any such newly added field, this is the data i am getting in Kibana.
#version:1
#timestamp:May 30th 2016, 18:50:36.622
host:host
path:file://Machine01/Users/username/projects/Logstash/logstash/bin/jmx
type:jmx
metric_path:OperatingSystem.ProcessCpuLoad
metric_value_number:0.003
_id:AVUB0r_4sUXN-4lFtxGq
_type:jmx
_index:jmx _score:
How can i change this to add new field which i have defined in the filter.
Also, is there any better way to visualise JMX data on Kibana.

File input add_field not adding field to every row

I am parsing several logfiles of different load balanced serverclusters with my logstash config and would like to add a field "log_origin" to each file's entries for the later easy filtering.
Here's my input->file config in a simple example:
input {
file {
type => "node1"
path => "C:/Development/node1/log/*"
add_field => [ "log_origin", "live_logs" ]
}
file {
type => "node2"
path => "C:/Development/node2/log/*"
add_field => [ "log_origin", "live_logs" ]
}
file {
type => "node3"
path => "C:/Development/node1/log/*"
add_field => [ "log_origin", "live_logs" ]
}
file {
type => "node4"
path => "C:/Development/node1/log/*"
add_field => [ "log_origin", "live_logs" ]
}
}
filter {
grok {
match => [
"message","%{DATESTAMP:log_timestamp}%{SPACE}\[%{DATA:class}\]%{SPACE}%{LOGLEVEL:loglevel}%{SPACE}%{GREEDYDATA:log_message}"
]
}
date {
match => [ "log_timestamp", "dd.MM.YY HH:mm:ss", "ISO8601" ]
target => "#timestamp"
}
mutate {
lowercase => ["loglevel"]
strip => ["loglevel"]
}
if "_grokparsefailure" in [tags] {
multiline {
pattern => ".*"
what => "previous"
}
}
if[fields.log_origin] == "live_logs"{
if [type] == "node1" {
mutate {
add_tag => "realsServerName1"
}
}
if [type] == "node2" {
mutate {
add_tag => "realsServerName2"
}
}
if [type] == "node3" {
mutate {
add_tag => "realsServerName3"
}
}
if [type] == "node4" {
mutate {
add_tag => "realsServerName4"
}
}
}
}
output {
stdout { }
elasticsearch { embedded => true }
}
I would have expected logstash to add this field with the value given to every logentry it finds, but it doesn't. Maybe I am completely taking the wrong approach here?
Edit: I am not able to retrieve the logs directly from the nodes, but have to copy them over to my "server". Otherwise i would be able to just use the filepath for distinguishing different clusters...
Edit: It's working. I should have cleand my data in between. Old entries without the field added cluttered up my results.
The add_field expects a hash. It should be
add_field => {
"log_origin" => "live_logs"
}

Resources