Logstash S3 output prefix - event date field value - logstash

How to set Logstash S3 output prefix dynamically with an event field value in format: "%{+YYYY}/%{+MM}/%{+dd}/%{+HH}" ?
input:
{"record_time":"2017-03-09T04:07:51.520Z"}
required s3 prefix:
2017/03/09/04

You can use grok to match record_time to extract year, month, day, hour and then mutate into s3 prefix:
grok {
match => {
"record_time" => "%{INT:year}-%{INT:month}-%{INT:day}T%{INT:hour}:%{GREEDYDATA}"
}
}
mutate {
# Create s3 prefix
add_field => {
"s3_prefix" => "%{year}/%{month}/%{day}/%{hour}"
}
# If you don't need separate values, remove them
remove_field => ["year", "month", "day", "hour"]
}

Related

Error parsing date with grok filter in logstash

I need to parse the date and timestamp in the log to show in #timestamp field. I am able to parse timestamp but not date.
Input Log:
"2010-08-18","00:01:55","text"
My Filter:
grok {
match => { "message" => '"(%{DATE})","(%{TIME})","(%{GREEDYDATA:message3})"’}
}
Here DATE throws grokparsefailure.
Also not sure how to update the #timestamp field.
Appreciate your help.
The %{DATE} pattern is not what you want. It's looking for something in M/D/Y, M-D-Y, D-M-Y, or D/M/Y format.
For a file like this, you could consider using the csv filter:
filter {
csv {
columns => ["date","time","message3"]
add_filed => {
"date_time" => "%{date} %{time}"
}
}
date {
match => [ "date_time", "yyyy-MM-dd HH:mm:ss" ]
remove_field => ["date", "time", "date_time" ]
}
}
This will handle the case where message3 has embedded quotes in it that have been escaped.

_dateparsefailure when the date already has a match

I'm trying to config logstash to process some test log, but I keep getting a dateparsefailure and I don't understand why. My input is
2016-09-18 00:00:02,013 UTC, idf="639b26a731284b43beac8b26f829bcab"
And my config (I've also tried including the timezone into the pattern):
input {
file {
path => "/tmp/test.log"
start_position => "beginning"
}
}
filter {
date {
match => ["message", "yyyy-MM-dd HH:mm:ss,SSS"]
timezone => "UTC"
add_field => { "debug" => "timestampMatched"}
}
grok {
match => { "message" => "%{YEAR:year}-%{MONTHNUM:month}-%{MONTHDAY:day} %{HOUR:hour}:%{MINUTE:minute}:%{SECOND:second},%{NUMBER:milis} UTC, idf=\"%{WORD:idf}\""}
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
}
stdout {
codec => rubydebug
}
}
Finaly, the error:
{:timestamp=>"2016-09-21T10:04:32.060000+0000", :message=>"Failed parsing date from field", :field=>"message", :value=>"2016-09-18 00:00:02,013 UTC, idf=\"639b26a731284b43beac8b26f829bcab\"", :exception=>"Invalid format: \"2016-09-18 00:00:02,013 UTC, idf=\"639b26a731284b4...\" is malformed at \" UTC, idf=\"639b26a731284b4...\"", :config_parsers=>"yyyy-MM-dd HH:mm:ss,SSS", :config_locale=>"default=en_US", :level=>:warn, :file=>"logstash/filters/date.rb", :line=>"354", :method=>"filter"}
It says that the date it is malformed after the end of it. Why does this happen, shouldn't it 'stop searching' since the date has already a match?
Before you can use the date filter, you first have to use grok to separate the date and the rest of the message. The date filter only accepts a timestamp. If you have any other information in the field the error you are describing will occur.
Using your provided logline I would recommend this:
filter {
grok {
match => { "message" => "%{TIMESTAMP_ISO8601:timedate} %{GREEDYDATA}"}
}
date {
match => [ "timedate" => "yyyy-MM-dd HH:mm:ss,SSS"]
}
}
In this minimal example I match the timestamp in the timedate field and then crunch it trough the date filter.

Logstash csv filter create index name based on timestamp

I want to create ES index based on the dates matching from the logfile. I am using logstash CSV filter to process the logs. For instance, the log data appears like below
2016-02-21 00:02:32.238,123.abc.com,data
2016-02-22 00:04:40.145,345.abc.com,data
Below is the logstash configuration file. Obviously the index will be created as testlog, however, i want the index to be created as testlog-2016.02.21 and testlog-2016.02.22, given that YYYY.mm.dd is the logstash preferred format for index dates. I have done this with grok filters, and I am trying the achieve the same with csv, but this doesn't seem to work.
filter {
csv {
columns => [ "timestamp", "host", "data" ]
separator => ","
remove_field => ["message"]
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "testlog"
}
}
We are on Logstash 2.1.0, ES 2.1.0 and Kibana 4.3.0 version
Any inputs appreciated
You need to specify the #timestamp field in filter, and you also need to specify your index name as below:
filter {
csv {
columns => [ "timestamp", "host", "data" ]
separator => ","
remove_field => ["message"]
}
date {
match => [ "timestamp", "ISO8601" ] #specify timestamp format
timezone => "UTC" #specify timezone
target => "#timestamp"
}
}
output {
elasticsearch
hosts => ["localhost:9200"]
index => "testlog-%{+YYYY.MM.dd}"
}
}

logstash output not showing the desired timestamp

I am trying to get the desired time stamp format from logstash output. I can''t get that if I use this format in syslog
Please share your thoughts about convert to the other format that’s in the _source field like Yyyy-mm-ddThh:mm:ss.sssZ format?
filter {
grok {
match => [ "logdate", "Yyyy-mm-ddThh:mm:ss.sssZ" ]
overwrite => ["host", "message"]
}
_source: {
message: "activity_log: {"created_at":1421114642210,"actor_ip":"192.168.1.1","note":"From system","user":"4561c9d7aaa9705a25f66d","user_id":null,"actor":"4561c9d7aaa9705a25f66d","actor_id":null,"org_id":null,"action":"user.failed_login","data":{"transaction_id":"d6768c473e366594","name":"user.failed_login","timing":{"start":1422127860691,"end":14288720480691,"duration":0.00257},"actor_locatio
I am using this code in syslog file
filter {
if [message] =~ /^activity_log: / {
grok {
match => ["message", "^activity_log: %{GREEDYDATA:json_message}"]
}
json {
source => "json_message"
remove_field => "json_message"
}
date {
match => ["created_at", "UNIX_MS"]
}
mutate {
rename => ["[json][repo]", "repo"]
remove_field => "json"
}
}
}
output {
elasticsearch { host => localhost }
stdout { codec => rubydebug }
}
thanks
"message" => "<134>feb 1 20:06:12 {\"created_at\":1422765535789, pid=5450 tid=28643 version=b0b45ac proto=http ip=192.168.1.1 duration_ms=0.165809 fs_sent=0 fs_recv=0 client_recv=386 client_sent=0 log_level=INFO msg=\"http op done: (401)\" code=401" }
"#version" => "1",
"#timestamp" => "2015-02-01T20:06:12.726Z",
"type" => "activity_log",
"host" => "192.168.1.1"
The pattern in your grok filter doesn't make sense. You're using a Joda-Time pattern (normally used for the date filter) and not a grok pattern.
It seems your message field contains a JSON object. That's good, because it makes it easy to parse. Extract the part that comes after "activity_log: " to a temporary json_message field,
grok {
match => ["message", "^activity_log: %{GREEDYDATA:json_message}"]
}
and parse that field as JSON with the json filter (removing the temporary field if the operation was successful):
json {
source => "json_message"
remove_field => ["json_message"]
}
Now you should have the fields from the original message field at the top level of your message, including the created_at field with the timestamp you want to extract. That number is the number of milliseconds since the epoch so you can use the UNIX_MS pattern in a date filter to extract it into #timestamp:
date {
match => ["created_at", "UNIX_MS"]
}

LogStash: How to make a copy of the #timestamp field while maintaining the same time format?

I would like to create a copy of the #timestamp field such that it uses the same format as #timestamp.
I've tried the following:
mutate
{
add_field => ["read_time", "%{#timestamp}"]
}
but while #timestamp is in the format: 2014-08-01T18:34:46.824Z,
the read_time is in this format 2014-08-01 18:34:46.824 UTC
This is an issue as Kibana doesn't understand the "UTC" format for histograms.
Is there a way using the date filter to do this?
Kibana can't understand because the read_time field is a string, not a timestamp!
You can use ruby filter to do what you need. Just copy the #timestamp to a new field read_time and the field time is in timestamp, not string. The add_field is add a new field with string type!
Here is my config:
input {
stdin{}
}
filter {
ruby {
code => "event['read_time'] = event['#timestamp']"
}
mutate
{
add_field => ["read_time_string", "%{#timestamp}"]
}
}
output {
stdout {
codec => "rubydebug"
}
}
You can try and see the output, the output is:
{
"message" => "3243242",
"#version" => "1",
"#timestamp" => "2014-08-08T01:09:49.647Z",
"host" => "BENLIM",
"read_time" => "2014-08-08T01:09:49.647Z",
"read_time_string" => "2014-08-08 01:09:49 UTC"
}
Hope this can help you.
You don't need to run any Ruby code. You can just use the add_field setting of the Mutate filter plugin:
mutate {
# Preserve "#timestamp" as "logstash_intake_timestamp"
add_field => { "logstash_intake_timestamp"=> "%{#timestamp}" }
}
date {
# Redefines "#timestamp" field from parsed timestamp, rather than its default value (time of ingestion by Logstash)
# FIXME: include timezone:
match => [ "timestamp_in_weird_custom_format", "YYYY-MM-dd HH:mm:ss:SSS" ]
tag_on_failure => ["timestamp_parse_failed"]
target => "#timestamp"
}

Resources