Logstash. Get fields by position number - logstash

Background
I have the scheme: logs from my app go through rsyslog to central log server, then to Logstash and Elasticsearch.
Logs from app is a pure JSON, but rsyslog adds to log "timestamp", "app name" and "server name" fileds. And log becomes to this:
timestamp app-name server-name [JSON]
Question
How can I remove first three fields with Logstash filters?
Can I get fields by position numbers (like in awk) and do something like:
filter {
somefilter_name {
remove_field => $1, $2, $3
}
}
Or maybe my vision is totally wrong and I must do this in another way?
Thank you!

Use grok{} to match them (they may be useful on their own!) and put the remainder of the event back into the [message] field:
Given input like:
2015-06-16 13:37:30 myApp myServer { "jsonField": "jsonValue" }
And this config:
grok {
pattern => "%{TIMESTAMP_ISO8601:timestamp} %{WORD:app} %{WORD:server} %{GREEDYDATA:message}"
overwrite => [ "message" ]
}
json {
source => "message"
}
Will produce this document:
{
"message" => "{ \"jsonField\": \"jsonValue\" }",
"#version" => "1",
"#timestamp" => "2015-06-16T20:38:55.658Z",
"host" => "0.0.0.0",
"timestamp" => "2015-06-16 13:37:30",
"app" => "myApp",
"server" => "myServer",
"jsonField" => "jsonValue"
}

Related

logstash GROK filter along with KV plugin couldn't able to process the events

i am new to ELK. when i onboarded the below log file, it is going to "dead letter queue" in logstash because logstash couldn't able to process the events.I have written the GROK filter to parse the events but logstash still couldn't not process the events. Any help would be appreciated.
Below is the sample log format.
25193662345 [http-nio-8080-exec-44] DEBUG c.s.b.a.m.PerformanceMetricsFilter - method=PUT status=201 appLogicTime=1, streamInTime=0, blobStorageTime=31, totalTime=33 tenantId=b9sdfs-1033-4444-aba5-csdfsdfsf, immutableBlobId=bss_c_586331/Sample_app12-sdas-157123148464.txt, blobSize=2862, domain=abc
2519366789 [http-nio-8080-exec-47] DEBUG q.s.b.y.m.PerformanceMetricsFilter - method=PUT status=201 appLogicTime=1, streamInTime=0, blobStorageTime=32, totalTime=33 tenantId=b0csdfsd-1066-4444-adf4-ce7bsdfssdf, immutableBlobId=bss_c_586334/Sample_app15-615223-157sadas6648465.txt, blobSize=2862, domain=cde
GROK filter:
dissect { mapping => { "message" => "%{NUMBER:number} [%{thread}] %{level} %{class} - %{[#metadata][msg]}" } }
kv { source => "[#metadata][msg]" field_split => "," }
Thanks
You have basically two problems in your configuration.
1.) You are using the dissect filter, not grok, both are used to parse messages, but grok uses regular expressions to validate the value of the field and dissect is just positional, it does not perform any validation, if you have a WORD value in the position of a field that expects a NUMBER, grok will fail, but dissect will not.
If your log lines always have the same pattern, you should continue to use dissect since it is faster and needs less cpu.
Your correct dissect mapping should be:
dissect {
mapping => { "message" => "%{number} [%{thread}] %{level} %{class} - %{[#metadata][msg]}" }
}
2.) The field that contains the kv message is wrong, it has fields separated by space and by comma, kv won't work this way.
After your dissect filter this is the content of [#metadata][msg].
method=PUT status=201 appLogicTime=1, streamInTime=0, blobStorageTime=32, totalTime=33 tenantId=b0csdfsd-1066-4444-adf4-ce7bsdfssdf, immutableBlobId=bss_c_586334/Sample_app15-615223-157sadas6648465.txt, blobSize=2862, domain=cde
To solve this you should use a mutate filter to remove the comma from the [#metadata][msg] and use the kv filter with the default configurations.
This should be your filter configuration
filter {
dissect {
mapping => { "message" => "%{number} [%{thread}] %{level} %{class} - %{[#metadata][msg]}" }
}
mutate {
gsub => ["[#metadata][msg]",",",""]
}
kv {
source => "[#metadata][msg]"
}
}
Your output should be something like this:
{
"number" => "2519366789",
"#timestamp" => 2019-11-03T16:42:11.708Z,
"thread" => "http-nio-8080-exec-47",
"appLogicTime" => "1",
"domain" => "cde",
"method" => "PUT",
"level" => "DEBUG",
"blobSize" => "2862",
"#version" => "1",
"immutableBlobId" => "bss_c_586334/Sample_app15-615223-157sadas6648465.txt",
"streamInTime" => "0",
"status" => "201",
"blobStorageTime" => "32",
"message" => "2519366789 [http-nio-8080-exec-47] DEBUG q.s.b.y.m.PerformanceMetricsFilter - method=PUT status=201 appLogicTime=1, streamInTime=0, blobStorageTime=32, totalTime=33 tenantId=b0csdfsd-1066-4444-adf4-ce7bsdfssdf, immutableBlobId=bss_c_586334/Sample_app15-615223-157sadas6648465.txt, blobSize=2862, domain=cde",
"totalTime" => "33",
"tenantId" => "b0csdfsd-1066-4444-adf4-ce7bsdfssdf",
"class" => "q.s.b.y.m.PerformanceMetricsFilter"
}

How to capture repeated pattern in logstash(5.4.0) grok?

I would appreciate if someone can help me out with logstash grok.
Given a log like below ,
IN 192.168.11.2 IN 192.168.11.3
My goal is to put the ip address into array using grok. List of ip is dynamic and possible to extend more than 2.
e.g
tmp = [
"192.168.11.2", "192.168.11.3"
]
However, if I use a filter like below it ends up in single field.
filter {
grok {
match => { "message" => "(?<tmp>(IN %{IPV4}(\s)?)*)" }
}
}
Result,
"path" => "/tmp/sample.csv",
"#timestamp" => 2017-08-24T05:00:08.093Z,
"tmp" => "IN 192.168.11.2 IN 192.168.11.3",
"#version" => "1",
"host" => "host.ywlocal.net",
"message" => "IN 192.168.11.2 IN 192.168.11.3"
Would this be possible?
You can use the ruby filter for more advanced parsing:
filter {
ruby {
code => "event.set('ips') = event.get('message').scan(/\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\/)"
}
}
Regexp is not 100% correct to match ip address but should work for your needs

Kibana does not show fields from grok filter in filebeat

I have log file with apache logs that i want to show in Kibana.
The logs start with IP. I have debuged my pattern and it passes.
I'm trying to add fields in the beats input configuration file, but are not show in Kibana even after refresh of the fields.
Here is the configuration file
filter {
if[type] == "apache" {
grok {
match => { "message" => "%{HOST:log_host}%{GREEDYDATA:remaining}" }
add_field => { "testip" => "%{log_host}" }
add_field => { "data_left" => "%{remaining}" }
}
}
...
Just to add that I have restarted all the services: logstash, elasticsearch, kibana after the new configuration.
The issue could be that your grok pattern is using too rigid of patterns.
Chances are that HOST should be IPORHOST based on your test_ip field's name.
Assuming that the data is actually coming in with the type defined as apache, then it should be:
filter {
if [type] == "apache" {
grok {
match => {
message => "%{IPORHOST:log_host}%{GREEDYDATA:remaining}"
}
add_field => {
testip => "%{log_host}"
data_left => "%{remaining}"
}
}
}
}
Having said that, your usage of add_field is completely unnecessary. The grok pattern itself is creating two fields: log_host and remaining, so there's no need to define extra fields called testip and data_left.
Perhaps even more usefully, you don't need to craft your own Apache web log grok pattern. The COMBINEDAPACHELOG pattern already exists, which gives all of the standard fields automatically.
filter {
if [type] == "apache" {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
# Set #timestamp to the log's time and drop the unneeded timestamp
date {
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
remove_field => "timestamp"
}
}
}
You can see this in a more complete example in the Logstash documentation here.

logstash : how to extract data from log4j message?

I try to extract data from my log4j message with logstash.
The message look like this :
Method findAll - Start by : bokc
I would like to extract the method name : "findAll" and the user "bokc".
How can I do this?
I use logstash 1.5.2 and my config is :
input {
log4j {
mode => "server"
type => "log4j-artemis"
port => 4560
}
}
filter {
multiline {
type => "log4j-artemis"
pattern => "^\\s"
what => "previous"
}
mutate {
add_field => [ "source_ip", "%{host}" ]
}
}
Use a grok filter:
filter {
grok {
match => [
"message",
"^Method %{WORD:method} - Start by : %{USER:user}"
]
tag_on_failure => []
}
}
This extracts the two words into the fields "method" and "user". The setting of tag_on_failure makes sure that non-matching messages aren't tagged with _grokparsefailure. Since most messages aren't supposed to match the pattern it doesn't make sense to mark them as failures.

Logstash not adding fields

I am using Logstash 1.4.2 and I have the following conf file.
I would expect to see in Kibana in the "Fields" section on the left the options for "received_at" and "received_from" and "description", but I don't.
I see
#timestamp
#version
_id
_index
_type host path
I do see in the _source section on the right side the following...
received_at:2015-05-11 14:19:40 UTC received_from:PGP02 descriptionError1!
So home come these don't appear in the list of "Popular Fields"?
I'd like to filter the right side to not show EVERY field in the _source section on the right. Excuse the redaction blocks.
input
{
file {
path => "C:/ServerErrlogs/office-log.txt"
start_position => "beginning"
sincedb_path => "c:/tools/logstash-1.4.2/office-log.sincedb"
tags => ["product_qa", "office"]
}
file {
path => "C:/ServerErrlogs/dis-log.txt"
start_position => "beginning"
sincedb_path => "c:/tools/logstash-1.4.2/dis-log.sincedb"
tags => ["product_qa", "dist"]
}
}
filter {
grok {
match => ["path","%{GREEDYDATA}/%{GREEDYDATA:filename}\.log"]
match => [ "message", "%{TIMESTAMP_ISO8601:logdate}: %{LOGLEVEL:loglevel} (?<logmessage>.*)" ]
add_field => [ "received_at", "%{#timestamp}" ]
add_field => [ "received_from", "%{host}" ]
}
date {
match => [ "logdate", "ISO8601", "yyyy-MM-dd HH:mm:ss,SSSSSSSSS" ]
}
#logdate is now parsed into timestamp, remove original log message too
mutate {
remove_field => ['message', 'logdate' ]
add_field => [ "description", "Error1!" ]
}
}
output {
elasticsearch {
protocol => "http"
host => "0.0.0.x"
}
}
Update:
I have tired searching with a query like:
tags: data AND loglevel : INFO
then saving this query, and then reloading the page.
But still I don't see loglevel appearing as 'Popular Fields'
If the fields don't appear on the left side, it's probably a kibana caching problem. Go to Settings->Indices, select your index, and click the orange Refresh button.
I had the same issue with logstash not adding fields and after quite a lot of searching and testing other things, suddenly I had the solution (but I´am using the logstash-logback-encoder, so I have JSON already - if you don´t, then you need to transform things into JSON in the logstash "input"-phase).
I added a "json" plugin-filter, that did the magic for me:
filter {
json {
source => "message"
}
}

Resources