Issue with GROK match for access logs

Issue with GROK match for access logs - logstash

I am getting a grokparsefailure on some of these apache logs, that is not making sense to me. One of the kibana tags for these is the grokparsefailure. Obviously something is wrong here but I am having trouble figuring out what that is.
Example log entry that resulted in a failure:
127.0.0.1 - - [10/Oct/2016:19:05:54 +0000] "POST /v1/api/query.random HTTP/1.1" 201 - "-" "-" 188
Logstash output config file:
filter {
if [type] == "access" {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
}
}
filter {
if [type] == "requests" {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
}
}
output {
elasticsearch {
hosts => ["http://ESCLUSTER:9200"]
index => "%{[#metadata][beat]}-%{+YYYY.MM.dd}"
document_type => "[type]"
}
stdout {
codec => rubydebug
}
}

There are two spaces instead of one between the two - and between the - and the [: 127.0.0.1 - - [.
The pattern (%{IPORHOST:clientip} %{HTTPDUSER:ident} %{HTTPDUSER:auth}) expect only one space at this points.
So either you correct your log format so that all logs are of the same format, or you replace %{COMBINEDAPACHELOG} by
%{IPORHOST:clientip} %{HTTPDUSER:ident}%{SPACE}%{HTTPDUSER:auth}%{SPACE}\[%{HTTPDATE:timestamp}\] \"(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\" %{NUMBER:response} (?:%{NUMBER:bytes}|-) %{QS:referrer} %{QS:agent}
This pattern is equivalent to the COMBINEDAPACHELOG pattern, but I replace the space at the beginning by the %{SPACE} pattern which match one or more space.

Related

supporting different kind of logs with Logstash

I'm trying to collect logs using Logstash where I have different kinds of logs in the same file.
I want to extract certain fields if they exist in the log, and otherwise do something else.
input{
file {
path => ["/home/ubuntu/XXX/XXX/results/**/log_file.txt"]
start_position => "beginning"
}
}
filter {
grok{
match => { "message" => ["%{WORD:logger} %{SPACE}\-%{SPACE} %{LOGLEVEL:level} %{SPACE}\-%{SPACE} %{DATA:message} %{NUMBER:score:float}",
"%{WORD:logger} %{SPACE}\-%{SPACE} %{LOGLEVEL:level} %{SPACE}\-%{SPACE} %{DATA:message}"]}
}
}
output{
elasticsearch {
hosts => ["X.X.X.X:9200"]
}
stdout { codec => rubydebug }
}
for example, log type 1 is:
root - INFO - Best score yet: 35.732
and type 2 is:
root - INFO - Starting an experiment
One of the problems I face is that when a message doesn't contain a number, the field still exists as null in the JSON created which prevents me from using desired functionalities in Kibana.

One option, is just to add a tag on logstash when field is not defined to be able to have a way to filter easily on kibana side. The null value is only set during the insertion in elasticsearch (on logstash side, the field is not defined)
This solution looks like this in you case :
filter {
grok{
match => { "message" => ["%{WORD:logger} %{SPACE}\-%{SPACE} %{LOGLEVEL:level} %{SPACE}\-%{SPACE} %{DATA:message} %{NUMBER:score:float}",
"%{WORD:logger} %{SPACE}\-%{SPACE} %{LOGLEVEL:level} %{SPACE}\-%{SPACE} %{DATA:message}"]}
}
if ![score] {
mutate { add_tag => [ "score_not_set" ] }
}
}

Logstash grok config error - while parsing file name

I am new to logstash and grok and I am trying to parse AWS ECS logs in an S3 bucket in the following format -
File Name - my-logs-s3-bucket/3d265ee3-d2ee-4029-a3d9-fd2255d69b92/ecs-fargate-container-8ff0e472-c76f-4f61-a363-64c2b80aa842/000000.gz
Sample Lines -
2019-05-09T16:16:16.983Z JBoss Bootstrap Environment
2019-05-09T16:16:16.983Z JBOSS_HOME: /app/jboss
2019-05-09T16:16:16.983Z JAVA_OPTS: -server -XX:+UseCompressedOops -Djboss.server.log.dir=/var/log/jboss -Xms128m -Xmx4096m
And logstash.conf
input {
s3 {
region => "us-east-1"
bucket => "my-logs-s3-bucket"
interval => "7200"
}
}
filter {
grok {
match => ["message", "%{TIMESTAMP_ISO8601:tstamp}"]
}
date {
match => ["tstamp", "ISO8601"]
}
mutate {
remove_field => ["tstamp"]
add_field => {
"file" => "%{[#metadata][s3][key]}"
}
######### NEED HELP HERE - START #########
#grok {
# match => [ "file", "ecs-fargate-container-%{DATA:containerlogname}"]
#}
######### NEED HELP HERE - END #########
}
}
output {
stdout { codec => rubydebug {
#metadata => true
}
}
}
I am able to see all the logs parsed and the file name extracted when I run logstash using the above configuration and the file name from the output looks like below -
"file" => "myapp-logs/3d265ee3-d2ee-4029-a3d9-fd2255d69b92/ecs-fargate-container-8ff0e472-c76f-4f61-a363-64c2b80aa842/000000.gz",
I am trying to use grok to extract the file name as either ecs-fargate-container-8ff0e472-c76f-4f61-a363-64c2b80aa842 or 8ff0e472-c76f-4f61-a363-64c2b80aa842 by uncommenting grok config lines between #NEED HELP HERE - START and ending with the below error -
Expected one of #, => at line 21, column 10 (byte 536) after filter {\n grok {\n match => [\"message\", \"%{TIMESTAMP_ISO8601:tstamp}\"]\n }\n date {\n match => [\"tstamp\", \"ISO8601\"]\n }\n mutate {\n #remove_field => [\"tstamp\"]\n add_field => {\n \"file\" => \"%{[#metadata][s3][key]}\"\n }\n grok ", :
I am not sure where I am going wrong with this. Please advice.

Your grok filter was inside the mutate filter, try the following.
filter {
grok {
match => ["message", "%{TIMESTAMP_ISO8601:tstamp}"]
}
date {
match => ["tstamp", "ISO8601"]
}
mutate {
remove_field => ["tstamp"]
add_field => { "file" => "%{[#metadata][s3][key]}" }
}
grok {
match => [ "file", "ecs-fargate-container-%{DATA:containerlogname}"]
}
}

Kibana does not show fields from grok filter in filebeat

I have log file with apache logs that i want to show in Kibana.
The logs start with IP. I have debuged my pattern and it passes.
I'm trying to add fields in the beats input configuration file, but are not show in Kibana even after refresh of the fields.
Here is the configuration file
filter {
if[type] == "apache" {
grok {
match => { "message" => "%{HOST:log_host}%{GREEDYDATA:remaining}" }
add_field => { "testip" => "%{log_host}" }
add_field => { "data_left" => "%{remaining}" }
}
}
...
Just to add that I have restarted all the services: logstash, elasticsearch, kibana after the new configuration.

The issue could be that your grok pattern is using too rigid of patterns.
Chances are that HOST should be IPORHOST based on your test_ip field's name.
Assuming that the data is actually coming in with the type defined as apache, then it should be:
filter {
if [type] == "apache" {
grok {
match => {
message => "%{IPORHOST:log_host}%{GREEDYDATA:remaining}"
}
add_field => {
testip => "%{log_host}"
data_left => "%{remaining}"
}
}
}
}
Having said that, your usage of add_field is completely unnecessary. The grok pattern itself is creating two fields: log_host and remaining, so there's no need to define extra fields called testip and data_left.
Perhaps even more usefully, you don't need to craft your own Apache web log grok pattern. The COMBINEDAPACHELOG pattern already exists, which gives all of the standard fields automatically.
filter {
if [type] == "apache" {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
# Set #timestamp to the log's time and drop the unneeded timestamp
date {
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
remove_field => "timestamp"
}
}
}
You can see this in a more complete example in the Logstash documentation here.

Logstash does not update #timestamp from apache log

If you are backfilling logs into logstash you are supposed to try and pull somehow the proper timestamps. Otherwise they get assigned to the time the log line was received by logstash.
This is achieved using date filter like:
date { match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ] }
But unfortunately this does not work for me.
So i have the following apache logline:
10.80.161.251 - - [15/Oct/2015:09:13:45 +0000] "- -" "POST /xxx HTTP/1.1" 200 696 29416 "-" "xxx" 4026
And the following pattern
ACCESS_LOG %{IPORHOST:clientip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:[#metadata][timestamp]}\] "(?:TLSv%{NUMBER:tlsversion}|-) (?:%{NOTSPACE:cypher}|-)" "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" %{NUMBER:response} (?:%{NUMBER:bytes_in}|-) (?:%{NUMBER:bytes_out}|-) %{QS:referrer} %{QS:agent} %{NUMBER:tts}
And the following logstash config
# INPUTS
input {
file {
path => '/var/log/test.log'
type => 'apache-access'
}
}
# filter/mix/match
filter {
if [type] == 'apache-access' {
grok {
patterns_dir => [ '/root/logstash-patterns' ]
match => [ "message", "%{ACCESS_LOG}" ]
}
if !("_grokparsefailure" in [tags]) {
mutate { add_field => ["timestamp_submitted", "%{#timestamp}"] }
date {
match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
}
}
}
}
# now output
output {
stdout { codec => rubydebug }
}
What i am doing wrong here. I tried adding timezones, locales and what not. And it still does not work. Any help is greatly appreciated (plus a drink of choice if you happen to be in sofia, bulgaria).

Note to self: read more carefully
The issue here is not matching against the proper field.
Because of the default pattern for apache logs the timestamp from the logline is in [#metadata][timestamp] and not in timestamp
So the date match filter should be:
date {
match => [ "[#metadata][timestamp]", "dd/MMM/yyyy:HH:mm:ss Z" ]
}

Logstash 1.4.2 grok filter: _grokparsefailure

i am trying to parse this log line:
- 2014-04-29 13:04:23,733 [main] INFO (api.batch.ThreadPoolWorker) Command-line options for this run:
here's the logstash config file i use:
input {
stdin {}
}
filter {
grok {
match => [ "message", " - %{TIMESTAMP_ISO8601:time} \[%{WORD:main}\] %{LOGLEVEL:loglevel} %{JAVACLASS:class} %{DATA:mydata} "]
}
date {
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
}
output {
elasticsearch {
host => "localhost"
}
stdout { codec => rubydebug }
}
Here's the output i get:
{
"message" => " - 2014-04-29 13:04:23,733 [main] INFO (api.batch.ThreadPoolWorker) Commans run:",
"#version" => "1",
"#timestamp" => "2015-02-02T10:53:58.282Z",
"host" => "NAME_001.corp.com",
"tags" => [
[0] "_grokparsefailure"
]
}
Please if anyone can help me find where the problem is on the gork pattern.
I tried to parse that line in http://grokdebug.herokuapp.com/ but it parses only the timestamp, %{WORD} and %{LOGLEVEL} the rest is ignored!

There are two error in your config.
First
The error in GROK is the JAVACLASS, you have to include ( ) in the pattern, For example: \(%{JAVACLASS:class}\.
Second
The date filter match have two value, first is the field you want to parse, so in your example it is time, not timestamp. The second value is the date pattern. You can refer to here
Here is the config
input {
stdin {
}
}
filter {
grok {
match => [ "message", " - %{TIMESTAMP_ISO8601:time} \[%{WORD:main}\] %{LOGLEVEL:loglevel} \(%{JAVACLASS:class}\) %{GREEDYDATA:mydata}"
]
}
date {
match => [ "time" , "YYYY-MM-dd HH:mm:ss,SSS" ]
}
}
output
{
stdout {
codec => rubydebug
}
}
FYI. Hope this can help you.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Issue with GROK match for access logs - logstash

Related

supporting different kind of logs with Logstash

Logstash grok config error - while parsing file name

Kibana does not show fields from grok filter in filebeat

Logstash does not update #timestamp from apache log

Logstash 1.4.2 grok filter: _grokparsefailure

Categories

Resources