Filebeat multiline pattern for PHP stack trace - logstash

I am trying to import the PHP FPM logs into an ELK stack. For this I use the filebeat to read the files. Before sending this data to logstash, the multiline log entries should be merged.
For this I built this filebeat configuration:
filebeat.inputs:
# Each - is an input. Most options can be set at the input level, so
# you can use different inputs for various configurations.
# Below are the input specific configurations.
- type: filestream
# Change to true to enable this input configuration.
enabled: true
# Paths that should be crawled and fetched. Glob based paths.
paths:
- '/var/log/app/fpm/*.log'
multiline.type: pattern
multiline.pattern: '^\[\d{2}-\w{3}-\d{4} \d{2}:\d{2}:\d{2} [\w/]*\] PHP\s*at.*'
multiline.negate: false
multiline.match: after
processors:
- add_fields:
fields.docker.service: "fpm"
But as you can see in the ruby debug output from logstash, the messages were not merged:
{
"#timestamp" => 2021-08-10T13:54:10.149Z,
"agent" => {
"version" => "7.13.4",
"hostname" => "3cb76d7d4c7d",
"id" => "61dec25e-12ec-4a65-9f1f-ec72a5aa83ee",
"ephemeral_id" => "631db0d8-60ad-4625-891c-3da09cb0a442",
"type" => "filebeat"
},
"input" => {
"type" => "filestream"
},
"log" => {
"offset" => 344,
"file" => {
"path" => "/var/log/app/fpm/error.log"
}
},
"tags" => [
[0] "beats_input_codec_plain_applied",
[1] "_grokparsefailure"
],
"fields" => {
"docker" => {
"service" => "fpm"
}
},
"#version" => "1",
"message" => "[17-Jun-2021 13:07:56 Europe/Berlin] PHP [WARN] (/var/www/html/Renderer/RendererTranslator.php:92) - unable to translate type integer. It is not a string (/url.php)",
"ecs" => {
"version" => "1.8.0"
}
}
{
"input" => {
"type" => "filestream"
},
"module" => "PHP IES\\ServerException",
"ecs" => {
"version" => "1.8.0"
},
"#version" => "1",
"log" => {
"offset" => 73,
"file" => {
"path" => "/var/log/ies/fpm/error.log"
}
},
"#timestamp" => 2021-06-17T11:10:41.000Z,
"agent" => {
"version" => "7.13.4",
"hostname" => "3cb76d7d4c7d",
"id" => "61dec25e-12ec-4a65-9f1f-ec72a5aa83ee",
"ephemeral_id" => "631db0d8-60ad-4625-891c-3da09cb0a442",
"type" => "filebeat"
},
"tags" => [
[0] "beats_input_codec_plain_applied"
],
"fields" => {
"docker" => {
"service" => "fpm"
}
},
"message" => "core.login"
}
{
"#timestamp" => 2021-08-10T13:54:10.149Z,
"agent" => {
"version" => "7.13.4",
"hostname" => "3cb76d7d4c7d",
"id" => "61dec25e-12ec-4a65-9f1f-ec72a5aa83ee",
"ephemeral_id" => "631db0d8-60ad-4625-891c-3da09cb0a442",
"type" => "filebeat"
},
"ecs" => {
"version" => "1.8.0"
},
"input" => {
"type" => "filestream"
},
"tags" => [
[0] "beats_input_codec_plain_applied",
[1] "_grokparsefailure"
],
"fields" => {
"docker" => {
"service" => "fpm"
}
},
"#version" => "1",
"message" => "[17-Jun-2021 13:10:41 Europe/Berlin] PHP at App\\Module\\ComponentModel\\ComponentModel->doPhase(/var/www/html/Component/Container.php:348)",
"log" => {
"offset" => 204,
"file" => {
"path" => "/var/log/app/fpm/error.log"
}
}
}
I tested the regular expression with Rubular and it matches the stack trace messages.
What am I doing wrong here?

Instead of adjusting the filebeat configuration, I adjusted the log configuration of the application.
Now JSON files are written, which can be easily read with the filebeat. The consideration of the line break is then no longer necessary.

You need to set multiline.negate to true.

Related

Logstash - Drop logs containing kv value

I am unsuccessfully trying to drop logs based on the value of the kv value field.
filter {
if [type] == "cef" {
mutate {
add_field => { "tmp_message" => "%{message}" }
split => ["message", "|"]
add_field => { "version" => "%{message[0]}" }
add_field => { "device_vendor" => "%{message[1]}" }
add_field => { "device_product" => "%{message[2]}" }
add_field => { "device_version" => "%{message[3]}" }
add_field => { "sig_id" => "%{message[4]}" }
add_field => { "sig_name" => "%{message[5]}" }
add_field => { "sig_severity" => "%{message[6]}" }
}
kv {
field_split => " "
trim_value => "<>\[\],"
}
mutate {
replace => { "message" => "%{tmp_message}" }
remove_field => [ "tmp_message" ]
}
}
if [FTNTFGTsrcintfrole_s] == "wan" {
drop { }
}
[FTNTFGTsrcintfrole_s] is one of the keys that are parsed out by kv. If the value of the key is "wan", it should drop the log. That's not happening.
How can I filter out those logs?
Edit: Here is an example of the parsed data
{
"dst" => "xxx.xxx.xxx.xxx",
"FTNTFGTtz" => "+0000",
"FTNTFGTsubtype" => "forward",
"message" => "%{tmp_message}",
"host" => "xxx.xxx.xxx.xxx",
"spt" => "59975",
"type" => "cef",
"deviceInboundInterface" => "ssl.root",
"FTNTFGTdstintfrole" => "wan",
"FTNTFGTduration" => "180",
"FTNTFGTdstcountry" => "United",
"FTNTFGTpolicyid" => "47",
"FTNTFGTpolicytype" => "policy",
"FTNTFGTpoluuid" => "801d40c2-3b60-51ea-d66a-293bf886d27e",
"FTNTFGTeventtime" => "1633506791693710149",
"sourceTranslatedAddress" => "xxx.xxx.xxx.xxx",
"dpt" => "8253",
"app" => "udp/8253",
"FTNTFGTpolicyname" => "xxxxxxxx",
"tags" => [
[0] "fortigate",
[1] "_mutate_error"
],
"act" => "accept",
"FTNTFGTlogid" => "0000000013",
"in" => "64",
"sourceTranslatedPort" => "59975",
"FTNTFGTsentpkt" => "1",
"FTNTFGTtrandisp" => "snat",
"FTNTFGTsrcintfrole" => "wan",
"#version" => "1",
"FTNTFGTrcvdpkt" => "1",
"deviceExternalId" => "xxxxx",
"FTNTFGTauthserver" => "xxxxx",
"#timestamp" => 2021-10-06T07:53:11.729Z,
"FTNTFGTsrccountry" => "Reserved",
"deviceOutboundInterface" => "wan1",
"proto" => "17",
"out" => "48",
"src" => "xxx.xxx.xxx.xxx",
"externalId" => "870512",
"FTNTFGTlevel" => "notice",
"FTNTFGTvd" => "root",
"duser" => "xxxxx",
"cat" => "traffic:forward",
"FTNTFGTappcat" => "unscanned"
}
I found the answer thanks to #YLR and #Filip. The SIEM was adding "_s" to the key name when creating the field leading me to believe that that was the original key name and in turn what I was filtering for. After seeing the log output and realizing that wasn't the case, I corrected the filter and it worked.

Logstash Aggregate filter plugin Not working properly

Hi i am new to logstash and was trying the demo in their documentation here https://www.elastic.co/guide/en/logstash/current/plugins-filters-aggregate.html#plugins-filters-aggregate "example-1" i was using the same exact script and input but still got different output because of this i was expecting single entry in kibana but it shows 3 entries please help
grok {
match => [ "message", "%{LOGLEVEL:loglevel} - %{NOTSPACE:taskid} - %{NOTSPACE:logger} - %{WORD:label}( - %{INT:duration:int})?" ]
}
if [logger] == "TASK_START" {
aggregate {
task_id => "%{taskid}"
code => "map['sql_duration'] = 0"
map_action => "create"
}
}
if [logger] == "SQL" {
aggregate {
task_id => "%{taskid}"
code => "map['sql_duration'] += event.get('duration')"
map_action => "update"
}
}
if [logger] == "TASK_END" {
aggregate {
task_id => "%{taskid}"
code => "event.set('sql_duration', map['sql_duration'])"
map_action => "update"
end_of_task => true
timeout => 120
}
}
}
INPUT
INFO - 12345 - TASK_START - start
INFO - 12345 - SQL - sqlQuery1 - 12
INFO - 12345 - SQL - sqlQuery2 - 34
INFO - 12345 - TASK_END - end
EXPECTED OUTPUT
{
"message" => "INFO - 12345 - TASK_END - end message",
"sql_duration" => 46
}
MY OUTPUT
{
"host" => "BEN",
"message" => "INFO - 12345 - TASK_START - start\r",
"#timestamp" => 2021-04-27T14:17:28.151Z,
"loglevel" => "INFO",
"taskid" => "12345",
"logger" => "TASK_START",
"path" => "C:/software/Notepad++/log72.log",
"type" => "technical1234",
"label" => "start",
"#version" => "1"
}
{
"host" => "BEN",
"message" => "INFO - 12345 - SQL - sqlQuery1 - 12\r",
"#timestamp" => 2021-04-27T14:17:28.174Z,
"type" => "technical1234",
"label" => "sqlQuery1",
"taskid" => "12345",
"loglevel" => "INFO",
"logger" => "SQL",
"duration" => 12,
"path" => "C:/software/Notepad++/log72.log",
"#version" => "1"
}
{
"host" => "BEN",
"message" => "INFO - 12345 - SQL - sqlQuery2 - 34\r",
"#timestamp" => 2021-04-27T14:17:28.175Z,
"type" => "technical1234",
"label" => "sqlQuery2",
"taskid" => "12345",
"loglevel" => "INFO",
"logger" => "SQL",
"duration" => 34,
"path" => "C:/software/Notepad++/log72.log",
"#version" => "1"
}

how to use elapsed filter- logstash

I am working in the Elapsed filter. I read the guide of Elapsed filter in logstash. then i made a sample config file and csv to test the working of Elapsed filter. But it seems to be not working. There is no change in uploading the data to ES. i have attached the csv file and config code. Can you give some examples for how to use the elapsed filter.
Here's my csv data:
here's my config file:
input {
file {
path => "/home/paulsteven/log_cars/aggreagate.csv"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
csv {
separator => ","
quote_char => "%"
columns => ["state","city","haps","ads","num_id","serial"]
}
elapsed {
start_tag => "taskStarted"
end_tag => "taskEnded"
unique_id_field => "num_id"
}
}
output {
elasticsearch {
hosts => "localhost:9200"
index => "el03"
document_type => "details"
}
stdout{}
}
Output in ES:
{
"city" => "tirunelveli",
"path" => "/home/paulsteven/log_cars/aggreagate.csv",
"num_id" => "2345-1002-4501",
"message" => "tamil nadu,tirunelveli,hap0,ad1,2345-1002-4501,1",
"#version" => "1",
"serial" => "1",
"haps" => "hap0",
"state" => "tamil nadu",
"host" => "smackcoders",
"ads" => "ad1",
"#timestamp" => 2019-05-06T10:03:51.443Z
}
{
"city" => "chennai",
"path" => "/home/paulsteven/log_cars/aggreagate.csv",
"num_id" => "2345-1002-4501",
"message" => "tamil nadu,chennai,hap0,ad1,2345-1002-4501,5",
"#version" => "1",
"serial" => "5",
"haps" => "hap0",
"state" => "tamil nadu",
"host" => "smackcoders",
"ads" => "ad1",
"#timestamp" => 2019-05-06T10:03:51.447Z
}
{
"city" => "kottayam",
"path" => "/home/paulsteven/log_cars/aggreagate.csv",
"num_id" => "2345-1002-4501",
"message" => "kerala,kottayam,hap1,ad2,2345-1002-4501,9",
"#version" => "1",
"serial" => "9",
"haps" => "hap1",
"state" => "kerala",
"host" => "smackcoders",
"ads" => "ad2",
"#timestamp" => 2019-05-06T10:03:51.449Z
}
{
"city" => "Jalna",
"path" => "/home/paulsteven/log_cars/aggreagate.csv",
"num_id" => "2345-1002-4501",
"message" => "mumbai,Jalna,hap2,ad3,2345-1002-4501,13",
"#version" => "1",
"serial" => "13",
"haps" => "hap2",
"state" => "mumbai",
"host" => "smackcoders",
"ads" => "ad3",
"#timestamp" => 2019-05-06T10:03:51.452Z
}
You have to tag your events in order Logstash could find the start / end tags.
Basically you have to know when an event is considered a start event and when it's an end event.
Elapsed filter plugin works only for two events (for example a request event and a response event in order to get the latency between them)
Both these two kinds of event need to own an ID field which identify uniquely that particular task. The name of this field is stored in unique_id_field.
For your example you have to identify a pattern for start and end event, let's say that you have in your csv a column type (see the code below) when type contains "START", the line is considered start event and if it contains "END" it's an end event, pretty straightforward, and a columnn id that stores the unique identifier.
filter {
csv {
separator => ","
quote_char => "%"
columns => ["state","city","haps","ads","num_id","serial", "type", "id"]
}
grok {
match => { "type" => ".*START.*" }
add_tag => [ "taskStarted" ]
}grok {
match => { "type" => ".*END*" }
add_tag => [ "taskTerminated" ]
} elapsed {
start_tag => "taskStarted"
end_tag => "taskTerminated"
unique_id_field => "id"
}
}
I feel like your need is different.
If you want to aggregate more than two events, all the events with the same value for column state for example, please check out this plugin

Replace #timestamp in logstash

I'am getting crazy with my logstash configuration.
I can't find a way to replace the #timestamp field with another:
Here is what logstash receive:
{
"offset" => 6718968,
"Varnish_txid" => "639657758",
"plateform" => "cdnfronts",
"Referer" => "-",
"input_type" => "log",
"respsize" => "281",
"source" => "/var/log/varnish/varnish4xx-5xx.log",
"UA" => "Microsoft-WebDAV-MiniRedir/5.1.2600",
"type" => "varnish-logs",
"tags" => [
[0] "json",
[1] "varnish",
[2] "beats_input_codec_json_applied",
[3] "_dateparsefailure"
],
"st_snt2c_or_sntfromb" => "405",
"RemoteHost" => "32.26.21.21",
"#timestamp" => 2017-02-14T13:38:47.808Z,
"Varnish.Handling" => "pass",
"tot_bytes_rcvby_c_or_sntby_b" => "-",
"time_req_rcv4c_or_snt4b" => "[14/Feb/2017:14:38:44 +0100]",
"#version" => "1",
"beat" => {
"hostname" => "cdn1",
"name" => "cdn1",
"version" => "5.1.2"
},
"host" => "cdn1",
"time_1st_byte" => "0.010954",
"Varnish_side" => "c",
"reqfirstline" => "OPTIONS http://a.toto.com/ HTTP/1.1"
}
Here is my logstash conf :
input {
beats {
port => 5000
codec => "json"
ssl => true
ssl_certificate => "/etc/logstash/ssl/logstash-forwarder.crt"
ssl_key => "/etc/logstash/ssl/logstash-forwarder.key"
}
}
filter {
if "json" in [tags] {
json {
source => "message"
}
if "varnish" in [tags] {
date {
locale => "en"
match => [ "[time_req_rcv4c_or_snt4b]","dd/MMM/yyyy:HH:mm:ss Z" ]
remove_field => "[time_req_rcv4c_or_snt4b]"
}
}
}
}
output {
if "varnish" in [tags] {
elasticsearch {
hosts => ["elasticsearch:9200"]
index => "logstash-varnish-%{+YYYY.MM.dd}"
}
} else {
elasticsearch {
hosts => ["elasticsearch:9200"]
}
}
stdout {
codec => rubydebug
}
}
I tried :
match => [ "time_req_rcv4c_or_snt4b","dd/MMM/yyyy:HH:mm:ss Z" ]
remove_field => "time_req_rcv4c_or_snt4b"
and
match => [ "[time_req_rcv4c_or_snt4b]","dd/MMM/yyyy:HH:mm:ss Z" ]
remove_field => "[time_req_rcv4c_or_snt4]
Anybody can explain me what i missed. I didn't find anything relevant on google for the moment.
From your output:
"time_req_rcv4c_or_snt4b" => "[14/Feb/2017:14:38:44 +0100]",
Your date field has [] around it, so you need to match those in your date pattern or strip them off when you first match the date.

logstash - dynamic field names

I have problem with dynamics field names in my Logstash configuration.
This is my test config:
input {
generator {
lines => [ "May 15 13:42:55 logstash puppet-agent[3551]: Finished catalog run in 43",
"May 16 14:57:07 logstash puppet-agent[3551]: Starting Puppet client version" ]
count => 7
}
}
filter {
grok {
match => [ "message", "%{SYSLOGBASE} %{WORD:log}.*" ]
}
if "Starting" in [log] {
metrics {
meter => [ "%{logsource}.%{log}" ]
add_tag => [ "metric" ]
add_field => { "server" => "%{logsource}"
"bad" => "true" }
clear_interval => 5
}
}
}
output {
stdout { codec => rubydebug }
}
and here is my output: (just end of output)
{
"message" => "May 15 13:42:55 logstash puppet-agent[3551]: Finished catalog run in 43",
"#version" => "1",
"#timestamp" => "2016-06-07T07:37:50.138Z",
"host" => "logstash.test.lan",
"sequence" => 6,
"timestamp" => "May 15 13:42:55",
"logsource" => "test",
"program" => "puppet-agent",
"pid" => "3551",
"log" => "Finished"
}
{
"message" => "May 16 14:57:07 logstash puppet-agent[3551]: Starting Puppet client version",
"#version" => "1",
"#timestamp" => "2016-06-07T07:37:50.138Z",
"host" => "logstash.test.lan",
"sequence" => 6,
"timestamp" => "May 16 14:57:07",
"logsource" => "test",
"program" => "puppet-agent",
"pid" => "3551",
"log" => "Starting"
}
{
"#version" => "1",
"#timestamp" => "2016-06-07T07:37:50.288Z",
"message" => "Counting: 7",
"logstash.Starting" => {
"count" => 7,
"rate_1m" => 0.0,
"rate_5m" => 0.0,
"rate_15m" => 0.0
},
"server" => "%{logsource}",
"bad" => "true",
"tags" => [
[0] "metric"
]
}
Why field server donĀ“t have logstash as value from the input logs? %{logsource} works for meter option, so why not for add_field?
Thx for help.
When a log event is received, the SYSLOGBASE information is extracted from the content. This is where the %{logsource} value is defined. If the event isn't coming from a log entry that contains SYSLOGBASE information, then logsource will be undefined.
When you receive a start message, logsource is defined in scope and is added to your message.
The metrics plugin is generating a new message per interval. This message is generated from scratch so it does not have the value of logsource or anything else that would normally be obtained from an individual log entry.

Resources