If you are backfilling logs into logstash you are supposed to try and pull somehow the proper timestamps. Otherwise they get assigned to the time the log line was received by logstash.
This is achieved using date filter like:
date { match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ] }
But unfortunately this does not work for me.
So i have the following apache logline:
10.80.161.251 - - [15/Oct/2015:09:13:45 +0000] "- -" "POST /xxx HTTP/1.1" 200 696 29416 "-" "xxx" 4026
And the following pattern
ACCESS_LOG %{IPORHOST:clientip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:[#metadata][timestamp]}\] "(?:TLSv%{NUMBER:tlsversion}|-) (?:%{NOTSPACE:cypher}|-)" "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" %{NUMBER:response} (?:%{NUMBER:bytes_in}|-) (?:%{NUMBER:bytes_out}|-) %{QS:referrer} %{QS:agent} %{NUMBER:tts}
And the following logstash config
# INPUTS
input {
file {
path => '/var/log/test.log'
type => 'apache-access'
}
}
# filter/mix/match
filter {
if [type] == 'apache-access' {
grok {
patterns_dir => [ '/root/logstash-patterns' ]
match => [ "message", "%{ACCESS_LOG}" ]
}
if !("_grokparsefailure" in [tags]) {
mutate { add_field => ["timestamp_submitted", "%{#timestamp}"] }
date {
match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
}
}
}
}
# now output
output {
stdout { codec => rubydebug }
}
What i am doing wrong here. I tried adding timezones, locales and what not. And it still does not work. Any help is greatly appreciated (plus a drink of choice if you happen to be in sofia, bulgaria).
Note to self: read more carefully
The issue here is not matching against the proper field.
Because of the default pattern for apache logs the timestamp from the logline is in [#metadata][timestamp] and not in timestamp
So the date match filter should be:
date {
match => [ "[#metadata][timestamp]", "dd/MMM/yyyy:HH:mm:ss Z" ]
}
Related
I'm building a ELK Setup and its working fine , however i'm getting into a situation where i want to remove certain fields from by system-log data while processing through logstash like remove_field & remove_tag which i've defined in my logstash configuration file but that's not working.
Looking for any esteem and expert advice to correct the config to make it running, thanks very much in advanced.
My logstash configuration file:
[root#sandbox-prd~]# cat /etc/logstash/conf.d/syslog.conf
input {
file {
path => [ "/data/SYSTEMS/*/messages.log" ]
start_position => beginning
sincedb_path => "/dev/null"
max_open_files => 64000
type => "sj-syslog"
}
}
filter {
if [type] == "sj-syslog" {
grok {
match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp } %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}" }
add_field => [ "received_at", "%{#timestamp}" ]
remove_field => ["#version", "host", "_type", "_index", "_score", "path"]
remove_tag => ["_grokparsefailure"]
}
syslog_pri { }
date {
match => [ "syslog_timestamp", "MMM d HH:mm:ss", "MMM dd HH:mm:ss" ]
}
}
}
output {
if [type] == "sj-syslog" {
elasticsearch {
hosts => "sandbox-prd02:9200"
manage_template => false
index => "sj-syslog-%{+YYYY.MM.dd}"
document_type => "messages"
}
}
}
Data sample appearing on the Kibana Portal
syslog_pid:6662 type:sj-syslog syslog_message:(root) CMD (LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg --lock-file /var/lock/mrtg/mrtg_l --confcache-file /var/lib/mrtg/mrtg.ok) syslog_severity:notice syslog_hostname:dbaprod01 syslog_severity_code:5 syslog_timestamp:Feb 11 10:25:02 #timestamp:February 11th 2019, 23:55:02.000 message:Feb 11 10:25:02 dbaprod01 CROND[6662]: (root) CMD (LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg --lock-file /var/lock/mrtg/mrtg_l --confcache-file /var/lib/mrtg/mrtg.ok) syslog_facility:user-level syslog_facility_code:1 syslog_program:CROND received_at:February 11th 2019, 10:25:03.353 _id:KpHo2mgBybCgY5IwmRPn _type:messages
_index:sj-syslog-2019.02.11 _score: -
MY Resource Details:
OS version : Linux 7
Logstash Version: 6.5.4
You can't remove _type and _index, those are metadata fields needed by elasticsearch to work, they have information about your index name and the mapping of your data, the _score field is also a metadata field, generated at search time, it's not on your document.
since I've upgraded our ELK-stack from 5.0.2 to 5.2 our grok filters fail and I've no idea why. Maybe I've overlooked something in the changelogs?
Filter
filter {
if [type] == "nginx_access" {
grok {
match => { "message" => "%{IPORHOST:remote_addr} - %{USERNAME:remote_user} \[%{TIMESTAMP_ISO8601:timestamp}\] \"%{WORD:method} %{URIPATHPARAM:request} HTTP/%{NUMBER:httpversion}\" %{INT:status} %{INT:body_bytes_sent} %{QS:http_referer} %{QS:http_user_agent} \"%{DATA:host_uri}\" \"%{DATA:proxy}\" \"%{DATA:upstream_addr}\" \"%{WORD:cache_status}\" \[%{NUMBER:request_time}\] \[(?:%{NUMBER:proxy_response_time}|-)\]" }
add_field => [ "received_at", "%{#timestamp}" ]
}
mutate {
convert => {
"proxy_response_time" => "float"
"request_time" => "float"
"body_bytes_sent" => "integer"
}
}
}
}
Error
Invalid format: \"2017-02-05T15:55:38+01:00\" is malformed at \"-02-05T15:55:38+01:00\"
Full Error
[2017-02-05T15:55:49,500][WARN ][logstash.outputs.elasticsearch] Failed action. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"filebeat-2017.02.05", :_type=>"nginx_access", :_routing=>nil}, 2017-02-05T14:55:38.000Z proxy2 4.3.2.1 - - [2017-02-05T15:55:38+01:00] "HEAD / HTTP/1.1" 200 0 "-" "Zabbix" "example.com" "host1:10040" "1.2.3.4:10040" "MISS" [0.095] [0.095]], :response=>{"index"=>{"_index"=>"filebeat-2017.02.05", "_type"=>"nginx_access", "_id"=>"AVoOxh7p5p68dsalXDFX", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse [timestamp]", "caused_by"=>{"type"=>"illegal_argument_exception", "reason"=>"Invalid format: \"2017-02-05T15:55:38+01:00\" is malformed at \"-02-05T15:55:38+01:00\""}}}}}
The whole thing works perfectly on http://grokconstructor.appspot.com and the TIMESTAMP_ISO8601 still seems the right choice (https://github.com/logstash-plugins/logstash-patterns-core/blob/master/patterns/grok-patterns)
Techstack
Ubuntu 16.04
Elasticsearch 5.2.0
Logstash 5.2.0
Filebeat 5.2.0
Kibana 5.2.0
Any idas?
Cheers,
Finn
UPDATE
So this version works for some reason
filter {
if [type] == "nginx_access" {
grok {
match => { "message" => "%{IPORHOST:remote_addr} - %{USERNAME:remote_user} \[%{TIMESTAMP_ISO8601:timestamp}\] \"%{WORD:method} %{URIPATHPARAM:request} HTTP/%{NUMBER:httpversion}\" %{INT:status} %{INT:body_bytes_sent} %{QS:http_referer} %{QS:http_user_agent} \"%{DATA:host_uri}\" \"%{DATA:proxy}\" \"%{DATA:upstream_addr}\" \"%{WORD:cache_status}\" \[%{NUMBER:request_time}\] \[(?:%{NUMBER:proxy_response_time}|-)\]" }
add_field => [ "received_at", "%{#timestamp}" ]
}
date {
match => [ "timestamp" , "yyyy-MM-dd'T'HH:mm:ssZ" ]
target => "timestamp"
}
mutate {
convert => {
"proxy_response_time" => "float"
"request_time" => "float"
"body_bytes_sent" => "integer"
}
}
}
}
If someone can shed some light why I have to redefine a valid ISO8601 date I would be happy to know.
Make sure you specify the format of timestamp you are expecting in your documents, where the mapping could look like:
PUT index
{
"mappings": {
"your_index_type": {
"properties": {
"date": {
"type": "date",
"format": "yyyy-MM-ddTHH:mm:ss+01:SS" <-- make sure to give the correct one
}
}
}
}
}
If you do not specify it correctly, Elasticsearch will expect the timestamp value in format of ISO. OR you could do a date match for your timestamp field, which could look something like this within your filter:
date {
match => [ "timestamp" , "yyyy-MM-ddTHH:mm:ss+01:SS" ] <--match the timestamp (I'm not sure what +01:ss stands for, make sure it matches)
target => "timestamp"
locale => "en"
timezone => "UTC"
}
Or you could add a new field and match that to the timestamp if you wish, and then you could remove it if you aren't really using it, since you have the timestamp on the new field. Hope it helps.
I am getting a grokparsefailure on some of these apache logs, that is not making sense to me. One of the kibana tags for these is the grokparsefailure. Obviously something is wrong here but I am having trouble figuring out what that is.
Example log entry that resulted in a failure:
127.0.0.1 - - [10/Oct/2016:19:05:54 +0000] "POST /v1/api/query.random HTTP/1.1" 201 - "-" "-" 188
Logstash output config file:
filter {
if [type] == "access" {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
}
}
filter {
if [type] == "requests" {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
}
}
output {
elasticsearch {
hosts => ["http://ESCLUSTER:9200"]
index => "%{[#metadata][beat]}-%{+YYYY.MM.dd}"
document_type => "[type]"
}
stdout {
codec => rubydebug
}
}
There are two spaces instead of one between the two - and between the - and the [: 127.0.0.1 - - [.
The pattern (%{IPORHOST:clientip} %{HTTPDUSER:ident} %{HTTPDUSER:auth}) expect only one space at this points.
So either you correct your log format so that all logs are of the same format, or you replace %{COMBINEDAPACHELOG} by
%{IPORHOST:clientip} %{HTTPDUSER:ident}%{SPACE}%{HTTPDUSER:auth}%{SPACE}\[%{HTTPDATE:timestamp}\] \"(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\" %{NUMBER:response} (?:%{NUMBER:bytes}|-) %{QS:referrer} %{QS:agent}
This pattern is equivalent to the COMBINEDAPACHELOG pattern, but I replace the space at the beginning by the %{SPACE} pattern which match one or more space.
sometimes I print to log indented pretty jsons which printed in multiple lines. so I need to be able to tell logstash to append these prints to the original line of the original event.
example:
xxx p:INFO d:2015-07-21 11:11:58,906 sourceThread:3iMind-Atlas-akka.actor.default-dispatcher-2 queryUserId: queryId: hrvJobId:6c1a4d60-e5e6-40d8-80aa-a4dc00e9f0c4 etlStreamId:70 etlOmdId: etlDocId: logger:tim.atlas.module.etl.mq.MQConnectorEtl msg:(st:Consuming) received NotifyMQ. sending to [openmind_exchange/job_ack] message:
{
"JobId" : "6c1a4d60-e5e6-40d8-80aa-a4dc00e9f0c4",
"Time" : "2015-07-21T11:11:58.904Z",
"Errors" : [ ],
"FeedItemSchemaCounts" : {
"Document" : 1,
"DocumentMetadata" : 1
},
"OtherSchemaCounts" : { }
}
Since I've set a special log4j appender to function solely as logstash input, this task should be quiet easy. I control the layout of the log, so I can add as many prefix/suffix indicators as I please.
here's how my appender look like:
log4j.appender.logstash-input.layout.ConversionPattern=xxx p:%p d:%d{yyyy-MM-dd HH:mm:ss,SSS}{UTC} sourceThread:%X{sourceThread} queryUserId:%X{userId} queryId:%X{queryId} hrvJobId:%X{hrvJobId} etlStreamId:%X{etlStreamId} etlOmdId:%X{etlOmdId} etlDocId:%X{etlDocId} logger:%c msg:%m%n
as you can see I've prefixed every message with 'xxx' so I could tell logstash to append any line which doesn't start with 'xxx' to the previous line
here's my logstash configuration:
if [type] == "om-svc-atlas" {
grok {
match => [ "message" , "(?m)p:%{LOGLEVEL:loglevel} d:%{TIMESTAMP_ISO8601:logdate} sourceThread:%{GREEDYDATA:sourceThread} queryUserId:%{GREEDYDATA:userId} queryId:%{GREEDYDATA:queryId} hrvJobId:%{GREEDYDATA:hrvJobId} etlStreamId:%{GREEDYDATA:etlStreamId} etlOmdId:%{GREEDYDATA:etlOmdId} etlDocId:%{GREEDYDATA:etlDocId} logger:%{GREEDYDATA:logger} msg:%{GREEDYDATA:msg}" ]
add_tag => "om-svc-atlas"
}
date {
match => [ "logdate" , "YYYY-MM-dd HH:mm:ss,SSS" ]
timezone => "UTC"
}
multiline {
pattern => "<please tell me what to put here to tell logstash to append any line which doesnt start with xxx to the previous line>"
what => "previous"
}
}
yes it was easy indeed :
if [type] == "om-svc-atlas" {
grok {
match => [ "message" , "(?m)p:%{LOGLEVEL:loglevel} d:%{TIMESTAMP_ISO8601:logdate} sourceThread:%{GREEDYDATA:sourceThread} queryUserId:%{GREEDYDATA:userId} queryId:%{GREEDYDATA:queryId} hrvJobId:%{GREEDYDATA:hrvJobId} etlStreamId:%{GREEDYDATA:etlStreamId} etlOmdId:%{GREEDYDATA:etlOmdId} etlDocId:%{GREEDYDATA:etlDocId} logger:%{GREEDYDATA:logger} msg:%{GREEDYDATA:msg}" ]
add_tag => "om-svc-atlas"
}
date {
match => [ "logdate" , "YYYY-MM-dd HH:mm:ss,SSS" ]
timezone => "UTC"
}
multiline {
pattern => "^(?!xxx).+"
what => "previous"
}
}
i am trying to parse this log line:
- 2014-04-29 13:04:23,733 [main] INFO (api.batch.ThreadPoolWorker) Command-line options for this run:
here's the logstash config file i use:
input {
stdin {}
}
filter {
grok {
match => [ "message", " - %{TIMESTAMP_ISO8601:time} \[%{WORD:main}\] %{LOGLEVEL:loglevel} %{JAVACLASS:class} %{DATA:mydata} "]
}
date {
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
}
output {
elasticsearch {
host => "localhost"
}
stdout { codec => rubydebug }
}
Here's the output i get:
{
"message" => " - 2014-04-29 13:04:23,733 [main] INFO (api.batch.ThreadPoolWorker) Commans run:",
"#version" => "1",
"#timestamp" => "2015-02-02T10:53:58.282Z",
"host" => "NAME_001.corp.com",
"tags" => [
[0] "_grokparsefailure"
]
}
Please if anyone can help me find where the problem is on the gork pattern.
I tried to parse that line in http://grokdebug.herokuapp.com/ but it parses only the timestamp, %{WORD} and %{LOGLEVEL} the rest is ignored!
There are two error in your config.
First
The error in GROK is the JAVACLASS, you have to include ( ) in the pattern, For example: \(%{JAVACLASS:class}\.
Second
The date filter match have two value, first is the field you want to parse, so in your example it is time, not timestamp. The second value is the date pattern. You can refer to here
Here is the config
input {
stdin {
}
}
filter {
grok {
match => [ "message", " - %{TIMESTAMP_ISO8601:time} \[%{WORD:main}\] %{LOGLEVEL:loglevel} \(%{JAVACLASS:class}\) %{GREEDYDATA:mydata}"
]
}
date {
match => [ "time" , "YYYY-MM-dd HH:mm:ss,SSS" ]
}
}
output
{
stdout {
codec => rubydebug
}
}
FYI. Hope this can help you.