logstash : how to extract data from log4j message? - log4j

I try to extract data from my log4j message with logstash.
The message look like this :
Method findAll - Start by : bokc
I would like to extract the method name : "findAll" and the user "bokc".
How can I do this?
I use logstash 1.5.2 and my config is :
input {
log4j {
mode => "server"
type => "log4j-artemis"
port => 4560
}
}
filter {
multiline {
type => "log4j-artemis"
pattern => "^\\s"
what => "previous"
}
mutate {
add_field => [ "source_ip", "%{host}" ]
}
}

Use a grok filter:
filter {
grok {
match => [
"message",
"^Method %{WORD:method} - Start by : %{USER:user}"
]
tag_on_failure => []
}
}
This extracts the two words into the fields "method" and "user". The setting of tag_on_failure makes sure that non-matching messages aren't tagged with _grokparsefailure. Since most messages aren't supposed to match the pattern it doesn't make sense to mark them as failures.

Related

supporting different kind of logs with Logstash

I'm trying to collect logs using Logstash where I have different kinds of logs in the same file.
I want to extract certain fields if they exist in the log, and otherwise do something else.
input{
file {
path => ["/home/ubuntu/XXX/XXX/results/**/log_file.txt"]
start_position => "beginning"
}
}
filter {
grok{
match => { "message" => ["%{WORD:logger} %{SPACE}\-%{SPACE} %{LOGLEVEL:level} %{SPACE}\-%{SPACE} %{DATA:message} %{NUMBER:score:float}",
"%{WORD:logger} %{SPACE}\-%{SPACE} %{LOGLEVEL:level} %{SPACE}\-%{SPACE} %{DATA:message}"]}
}
}
output{
elasticsearch {
hosts => ["X.X.X.X:9200"]
}
stdout { codec => rubydebug }
}
for example, log type 1 is:
root - INFO - Best score yet: 35.732
and type 2 is:
root - INFO - Starting an experiment
One of the problems I face is that when a message doesn't contain a number, the field still exists as null in the JSON created which prevents me from using desired functionalities in Kibana.
One option, is just to add a tag on logstash when field is not defined to be able to have a way to filter easily on kibana side. The null value is only set during the insertion in elasticsearch (on logstash side, the field is not defined)
This solution looks like this in you case :
filter {
grok{
match => { "message" => ["%{WORD:logger} %{SPACE}\-%{SPACE} %{LOGLEVEL:level} %{SPACE}\-%{SPACE} %{DATA:message} %{NUMBER:score:float}",
"%{WORD:logger} %{SPACE}\-%{SPACE} %{LOGLEVEL:level} %{SPACE}\-%{SPACE} %{DATA:message}"]}
}
if ![score] {
mutate { add_tag => [ "score_not_set" ] }
}
}

logstash GROK filter along with KV plugin couldn't able to process the events

i am new to ELK. when i onboarded the below log file, it is going to "dead letter queue" in logstash because logstash couldn't able to process the events.I have written the GROK filter to parse the events but logstash still couldn't not process the events. Any help would be appreciated.
Below is the sample log format.
25193662345 [http-nio-8080-exec-44] DEBUG c.s.b.a.m.PerformanceMetricsFilter - method=PUT status=201 appLogicTime=1, streamInTime=0, blobStorageTime=31, totalTime=33 tenantId=b9sdfs-1033-4444-aba5-csdfsdfsf, immutableBlobId=bss_c_586331/Sample_app12-sdas-157123148464.txt, blobSize=2862, domain=abc
2519366789 [http-nio-8080-exec-47] DEBUG q.s.b.y.m.PerformanceMetricsFilter - method=PUT status=201 appLogicTime=1, streamInTime=0, blobStorageTime=32, totalTime=33 tenantId=b0csdfsd-1066-4444-adf4-ce7bsdfssdf, immutableBlobId=bss_c_586334/Sample_app15-615223-157sadas6648465.txt, blobSize=2862, domain=cde
GROK filter:
dissect { mapping => { "message" => "%{NUMBER:number} [%{thread}] %{level} %{class} - %{[#metadata][msg]}" } }
kv { source => "[#metadata][msg]" field_split => "," }
Thanks
You have basically two problems in your configuration.
1.) You are using the dissect filter, not grok, both are used to parse messages, but grok uses regular expressions to validate the value of the field and dissect is just positional, it does not perform any validation, if you have a WORD value in the position of a field that expects a NUMBER, grok will fail, but dissect will not.
If your log lines always have the same pattern, you should continue to use dissect since it is faster and needs less cpu.
Your correct dissect mapping should be:
dissect {
mapping => { "message" => "%{number} [%{thread}] %{level} %{class} - %{[#metadata][msg]}" }
}
2.) The field that contains the kv message is wrong, it has fields separated by space and by comma, kv won't work this way.
After your dissect filter this is the content of [#metadata][msg].
method=PUT status=201 appLogicTime=1, streamInTime=0, blobStorageTime=32, totalTime=33 tenantId=b0csdfsd-1066-4444-adf4-ce7bsdfssdf, immutableBlobId=bss_c_586334/Sample_app15-615223-157sadas6648465.txt, blobSize=2862, domain=cde
To solve this you should use a mutate filter to remove the comma from the [#metadata][msg] and use the kv filter with the default configurations.
This should be your filter configuration
filter {
dissect {
mapping => { "message" => "%{number} [%{thread}] %{level} %{class} - %{[#metadata][msg]}" }
}
mutate {
gsub => ["[#metadata][msg]",",",""]
}
kv {
source => "[#metadata][msg]"
}
}
Your output should be something like this:
{
"number" => "2519366789",
"#timestamp" => 2019-11-03T16:42:11.708Z,
"thread" => "http-nio-8080-exec-47",
"appLogicTime" => "1",
"domain" => "cde",
"method" => "PUT",
"level" => "DEBUG",
"blobSize" => "2862",
"#version" => "1",
"immutableBlobId" => "bss_c_586334/Sample_app15-615223-157sadas6648465.txt",
"streamInTime" => "0",
"status" => "201",
"blobStorageTime" => "32",
"message" => "2519366789 [http-nio-8080-exec-47] DEBUG q.s.b.y.m.PerformanceMetricsFilter - method=PUT status=201 appLogicTime=1, streamInTime=0, blobStorageTime=32, totalTime=33 tenantId=b0csdfsd-1066-4444-adf4-ce7bsdfssdf, immutableBlobId=bss_c_586334/Sample_app15-615223-157sadas6648465.txt, blobSize=2862, domain=cde",
"totalTime" => "33",
"tenantId" => "b0csdfsd-1066-4444-adf4-ce7bsdfssdf",
"class" => "q.s.b.y.m.PerformanceMetricsFilter"
}

Kibana does not show fields from grok filter in filebeat

I have log file with apache logs that i want to show in Kibana.
The logs start with IP. I have debuged my pattern and it passes.
I'm trying to add fields in the beats input configuration file, but are not show in Kibana even after refresh of the fields.
Here is the configuration file
filter {
if[type] == "apache" {
grok {
match => { "message" => "%{HOST:log_host}%{GREEDYDATA:remaining}" }
add_field => { "testip" => "%{log_host}" }
add_field => { "data_left" => "%{remaining}" }
}
}
...
Just to add that I have restarted all the services: logstash, elasticsearch, kibana after the new configuration.
The issue could be that your grok pattern is using too rigid of patterns.
Chances are that HOST should be IPORHOST based on your test_ip field's name.
Assuming that the data is actually coming in with the type defined as apache, then it should be:
filter {
if [type] == "apache" {
grok {
match => {
message => "%{IPORHOST:log_host}%{GREEDYDATA:remaining}"
}
add_field => {
testip => "%{log_host}"
data_left => "%{remaining}"
}
}
}
}
Having said that, your usage of add_field is completely unnecessary. The grok pattern itself is creating two fields: log_host and remaining, so there's no need to define extra fields called testip and data_left.
Perhaps even more usefully, you don't need to craft your own Apache web log grok pattern. The COMBINEDAPACHELOG pattern already exists, which gives all of the standard fields automatically.
filter {
if [type] == "apache" {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
# Set #timestamp to the log's time and drop the unneeded timestamp
date {
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
remove_field => "timestamp"
}
}
}
You can see this in a more complete example in the Logstash documentation here.

logstash grok multiline - how to merge to previous line any line that doesn't start with timestamp

sometimes I print to log indented pretty jsons which printed in multiple lines. so I need to be able to tell logstash to append these prints to the original line of the original event.
example:
xxx p:INFO d:2015-07-21 11:11:58,906 sourceThread:3iMind-Atlas-akka.actor.default-dispatcher-2 queryUserId: queryId: hrvJobId:6c1a4d60-e5e6-40d8-80aa-a4dc00e9f0c4 etlStreamId:70 etlOmdId: etlDocId: logger:tim.atlas.module.etl.mq.MQConnectorEtl msg:(st:Consuming) received NotifyMQ. sending to [openmind_exchange/job_ack] message:
{
"JobId" : "6c1a4d60-e5e6-40d8-80aa-a4dc00e9f0c4",
"Time" : "2015-07-21T11:11:58.904Z",
"Errors" : [ ],
"FeedItemSchemaCounts" : {
"Document" : 1,
"DocumentMetadata" : 1
},
"OtherSchemaCounts" : { }
}
Since I've set a special log4j appender to function solely as logstash input, this task should be quiet easy. I control the layout of the log, so I can add as many prefix/suffix indicators as I please.
here's how my appender look like:
log4j.appender.logstash-input.layout.ConversionPattern=xxx p:%p d:%d{yyyy-MM-dd HH:mm:ss,SSS}{UTC} sourceThread:%X{sourceThread} queryUserId:%X{userId} queryId:%X{queryId} hrvJobId:%X{hrvJobId} etlStreamId:%X{etlStreamId} etlOmdId:%X{etlOmdId} etlDocId:%X{etlDocId} logger:%c msg:%m%n
as you can see I've prefixed every message with 'xxx' so I could tell logstash to append any line which doesn't start with 'xxx' to the previous line
here's my logstash configuration:
if [type] == "om-svc-atlas" {
grok {
match => [ "message" , "(?m)p:%{LOGLEVEL:loglevel} d:%{TIMESTAMP_ISO8601:logdate} sourceThread:%{GREEDYDATA:sourceThread} queryUserId:%{GREEDYDATA:userId} queryId:%{GREEDYDATA:queryId} hrvJobId:%{GREEDYDATA:hrvJobId} etlStreamId:%{GREEDYDATA:etlStreamId} etlOmdId:%{GREEDYDATA:etlOmdId} etlDocId:%{GREEDYDATA:etlDocId} logger:%{GREEDYDATA:logger} msg:%{GREEDYDATA:msg}" ]
add_tag => "om-svc-atlas"
}
date {
match => [ "logdate" , "YYYY-MM-dd HH:mm:ss,SSS" ]
timezone => "UTC"
}
multiline {
pattern => "<please tell me what to put here to tell logstash to append any line which doesnt start with xxx to the previous line>"
what => "previous"
}
}
yes it was easy indeed :
if [type] == "om-svc-atlas" {
grok {
match => [ "message" , "(?m)p:%{LOGLEVEL:loglevel} d:%{TIMESTAMP_ISO8601:logdate} sourceThread:%{GREEDYDATA:sourceThread} queryUserId:%{GREEDYDATA:userId} queryId:%{GREEDYDATA:queryId} hrvJobId:%{GREEDYDATA:hrvJobId} etlStreamId:%{GREEDYDATA:etlStreamId} etlOmdId:%{GREEDYDATA:etlOmdId} etlDocId:%{GREEDYDATA:etlDocId} logger:%{GREEDYDATA:logger} msg:%{GREEDYDATA:msg}" ]
add_tag => "om-svc-atlas"
}
date {
match => [ "logdate" , "YYYY-MM-dd HH:mm:ss,SSS" ]
timezone => "UTC"
}
multiline {
pattern => "^(?!xxx).+"
what => "previous"
}
}

logstash output not showing the desired timestamp

I am trying to get the desired time stamp format from logstash output. I can''t get that if I use this format in syslog
Please share your thoughts about convert to the other format that’s in the _source field like Yyyy-mm-ddThh:mm:ss.sssZ format?
filter {
grok {
match => [ "logdate", "Yyyy-mm-ddThh:mm:ss.sssZ" ]
overwrite => ["host", "message"]
}
_source: {
message: "activity_log: {"created_at":1421114642210,"actor_ip":"192.168.1.1","note":"From system","user":"4561c9d7aaa9705a25f66d","user_id":null,"actor":"4561c9d7aaa9705a25f66d","actor_id":null,"org_id":null,"action":"user.failed_login","data":{"transaction_id":"d6768c473e366594","name":"user.failed_login","timing":{"start":1422127860691,"end":14288720480691,"duration":0.00257},"actor_locatio
I am using this code in syslog file
filter {
if [message] =~ /^activity_log: / {
grok {
match => ["message", "^activity_log: %{GREEDYDATA:json_message}"]
}
json {
source => "json_message"
remove_field => "json_message"
}
date {
match => ["created_at", "UNIX_MS"]
}
mutate {
rename => ["[json][repo]", "repo"]
remove_field => "json"
}
}
}
output {
elasticsearch { host => localhost }
stdout { codec => rubydebug }
}
thanks
"message" => "<134>feb 1 20:06:12 {\"created_at\":1422765535789, pid=5450 tid=28643 version=b0b45ac proto=http ip=192.168.1.1 duration_ms=0.165809 fs_sent=0 fs_recv=0 client_recv=386 client_sent=0 log_level=INFO msg=\"http op done: (401)\" code=401" }
"#version" => "1",
"#timestamp" => "2015-02-01T20:06:12.726Z",
"type" => "activity_log",
"host" => "192.168.1.1"
The pattern in your grok filter doesn't make sense. You're using a Joda-Time pattern (normally used for the date filter) and not a grok pattern.
It seems your message field contains a JSON object. That's good, because it makes it easy to parse. Extract the part that comes after "activity_log: " to a temporary json_message field,
grok {
match => ["message", "^activity_log: %{GREEDYDATA:json_message}"]
}
and parse that field as JSON with the json filter (removing the temporary field if the operation was successful):
json {
source => "json_message"
remove_field => ["json_message"]
}
Now you should have the fields from the original message field at the top level of your message, including the created_at field with the timestamp you want to extract. That number is the number of milliseconds since the epoch so you can use the UNIX_MS pattern in a date filter to extract it into #timestamp:
date {
match => ["created_at", "UNIX_MS"]
}

Resources