How to modify #timestamp with an entry in logs using logstash - logstash

I have some logs which has only time as its entries
1. 17:20:45.331|ERR|....
2. 17:20:54.715|SYS|.....Logging started for [....] (Date=[07/28/2014], ...
3. 17:20:54.716|SYS....
and so on
I have the date in only one line of the logs. based on that i want to create a timestamp such as that logging date in logs + the time in each entry
Iam able to get the time in each entry. i can get the log_message => " Logging started for [....] (Date=[07/28/2014], ..." as one entry.
Is it possible to get the date from this entry and modify all other entry's timestamp?
how can I add time and the date and modify the timestamp?
Any help will be appreciated as iam new to logstash
My filter in logstash conf
filter {
grok { match => [ "message", "%{TIME:time}\|%{WORD:Message_type}\|%{GREEDYDATA:Component}\|%{NUMBER:line_number}\| %{GREEDYDATA:log_message}"]
}
date {
match => ["timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ] => need to modify this as date+%{time}
}
}
time field has milliseconds also.

Your options are:
Change how things are logged to get the date included
Write something to fix the logs before they are picked up by logstash (ie something that looks for the entry any modifies the log)
use the memorize plugin that I wrote (and I submitted a pull request for to try and get it in a future version).
The plugin is detailed in this answer. The caveat with this solution is that if the plugin misses the line that has the date, you'll have issues with the remainder of the file. This could happen if you restart logstash, so you'll need to add in some logic to handle this -- in this case below, I assume that if it hasn't seen the date, it's today.
An implementation using the memorize plugin would look like this:
filter {
if ([message] =~ /Date=/) {
grok { match => [ "message", "Date=%{DATE:date}" ] }
}
# either add the field date to the saved date or pull the date from the saved data
memorize { fields => ["date"] }
# if we still don't have a date, lets just assume it's today
if ([date] == '') {
ruby {
code => 'event["date"]=ime.now.strftime("%m/%d/%Y")'
}
}
if ([message] !~ /Date=/) {
# grok to parse message
grok { match => [ "message", "%{TIME:time}\|%{WORD:Message_type}\|%{GREEDYDATA:Component}\|%{NUMBER:line_number}\| %{GREEDYDATA:log_message}"]
# now add in date
mutate {
add_field => {
datetime => "%{date} %{time}"
}
}
}
}
(This example has not been tested, so there may be syntax/logic errors, but it should get you down the right path).

Related

How do I parse a json-formatted log message in Logstash to get a certain key/value pair?

I send json-formatted logs to my Logstash server. The log looks something like this (Note: The whole message is really on ONE line, but I show it in multi-line to ease reading)
2016-09-01T21:07:30.152Z 153.65.199.92
{
"type":"trm-system",
"host":"susralcent09",
"timestamp":"2016-09-01T17:17:35.018470-04:00",
"#version":"1",
"customer":"cf_cim",
"role":"app_server",
"sourcefile":"/usr/share/tomcat/dist/logs/trm-system.log",
"message":"some message"
}
What do I need to put in my Logstash configuration to get the "sourcefile" value, and ultimately get the filename, e.g., trm-system.log?
If you pump the hash field (w/o the timestamp) into ES it should recognize it.
If you want to do it inside a logstash pipeline you would use the json filter and point the source => to the second part of the line (possibly adding the timestamp prefix back in).
This results in all fields added to the current message, and you can access them directly or all combined:
Config:
input { stdin { } }
filter {
# split line in Timestamp and Json
grok { match => [ message , "%{NOTSPACE:ts} %{NOTSPACE:ip} %{GREEDYDATA:js}"] }
# parse json part (called "js") and add new field from above
json { source => "js" }
}
output {
# stdout { codec => rubydebug }
# you access fields directly with %{fieldname}:
stdout { codec => line { format => "sourcefile: %{sourcefile}"} }
}
Sample run
2016-09-01T21:07:30.152Z 153.65.199.92 { "sourcefile":"/usr" }
sourcefile: /usr
and with rubydebug (host and #timestamp removed):
{
"message" => "2016-09-01T21:07:30.152Z 153.65.199.92 { \"sourcefile\":\"/usr\" }",
"#version" => "1",
"ts" => "2016-09-01T21:07:30.152Z",
"ip" => "153.65.199.92",
"js" => "{ \"sourcefile\":\"/usr\" }",
"sourcefile" => "/usr"
}
As you can see, the field sourcefile is directly known with the value in the rubydebug output.
Depending on the source of your log records you might need to use the multiline codec as well. You might also want to delete the js field, rename the #timestamp to _parsedate and parse ts into the records timestamp (for Kibana to be happy). This is not shown in the sample. I would also remove message to save space.

how to get part of the path name and add it to the index

I currently have a file name like this.
[SERIALNUMBER][2014_12_04][00_45_22][141204T014214]AB_DEF.log
i basically want to extract the year from the file (2014) and add it to the index name in logstash conf file.logstash.conf
Below is my conf file.
input {
file {
path => "C:/ABC/DEF/HJK/LOGS/**/*"
start_position => beginning
type => syslog
}
}
filter {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}"}
}
}
output {
elasticsearch {
index => "type1logs"
}
stdout {}
}
Please help.
thanks
IIRC, you get a field called 'path'. You can then run another grok{} filter, using 'path' as input and extracting the data you want.
Based on your use of COMBINEDAPACHELOG in your existing config, your log entries already have the date. It's a more common practice to use this timestamp field in the date{} filter to change the #timestamp field. Then, the default elasticsearch{} output config will create an index called "logstash-YYYY.MM.DD".
Having daily indexes like this (usually) makes data retention easier.

Keeping last part of field only in logstash

How can I trim to only the last part of a key in logstash?
I have URLs formatted in the form of http://aaa.bbb/get?a=1&b=2, putting them into 'request' and splitting the field based on '?&' to save the GET parameters.
I care only about the specific API call, and not the host or protocol. What filter(s) can I chain to keep only the part after the final '/'? I've read up a bit on patterns but haven't stumbled upon how to reference the last part of a split field.
grok {
match => [ "message", "%{TIMESTAMP_ISO8601:timestamp}
%{NOTSPACE:loadbalancer} %{IP:client_ip}:%{NUMBER:client_port:int}
%{IP:backend_ip}:%{NUMBER:backend_port:int}
%{NUMBER:request_processing_time:float}
%{NUMBER:backend_processing_time:float}
%{NUMBER:response_processing_time:float}
%{NUMBER:elb_status_code:int}
%{NUMBER:backend_status_code:int}
%{NUMBER:received_bytes:int} %{NUMBER:sent_bytes:int}
%{QS:request}" ]
}
date {
match => [ "timestamp", "ISO8601" ]
}
kv {
field_split => "&?"
source => "request"
}
I would suggest taking the existing URI-related patterns and modifying them to your needs. You will note that URIPATHPARAM parses out the URIPATH and URIPARAM but doesn't shove them into fields.
So, make your own URIPATHPARAM:
MYURIPATHPARM URIPATHPARAM %{URIPATH:uripath}(?:%{URIPARAM:uriparam})?
and then call it from your own URI:
MYURI URI %{URIPROTO}://(?:%{USER}(?::[^#]*)?#)?(?:%{URIHOST})?(?:%{MYURIPATHPARAM})?
In your previous grok{}, you ended up with %{request}. Make a new grok{} that runs [request] through MYURI, and you should end up with the two fields that you're after.

logstash if statement within grok statement

I'm creating a logstash grok filter to pull events out of a backup server, and I want to be able to test a field for a pattern, and if it matches the pattern, further process that field and pull out additional information.
To that end I'm embedding an if statement within the grok statement itself. This is causing the test to fail with Error: Expected one of #, => right after the if.
This is the filter statement:
filter {
grok {
patterns_dir => "./patterns"
# NetWorker logfiles have some unusual fields that include undocumented engineering codes and what not
# time is in 12h format (ugh) so custom patterns need to be used.
match => [ "message", "%{NUMBER:engcode1} %{DATESTAMP_12H:timestamp} %{NUMBER:engcode2} %{NUMBER:engcode3} %{NUMBER:engcode4} %{NUMBER:ppid} %{NUMBER:pid} %{NUMBER:engcode5} %{WORD:processhost} %{WORD:processname} %{GREEDYDATA:daemon_message}" ]
# attempt to find completed savesets and pull that info from the daemon_message field
if [daemon_message] =~ /done\ saving\ to\ pool/ {
grok {
match => [ "daemon_message", "%{WORD:savehost}\:%{WORD:saveset} done saving to pool \'%{WORD:pool}\' \(%{WORD:volume}\) %{WORD:saveset_size}" ]
}
}
}
date {
# This is requred to set the time from the logline to the timestamp and not have it create it's own.
# Note the use of the trailing 'a' to denote AM or PM.
match => ["timestamp", "MM/dd/yyyy HH:mm:ss a"]
}
}
This block fails with the following:
$ /opt/logstash/bin/logstash -f ./networker_daemonlog.conf --configtest
Error: Expected one of #, => at line 12, column 12 (byte 929) after # Basic dumb simple networker daemon log grok filter for the NetWorker daemon.log
# no smarts to this and not really pulling any useful info from the files (yet)
filter {
grok {
... lines deleted ...
# attempt to find completed savesets and pull that info from the daemon_message field
if
I'm new to logstash, and I realise that using a conditional within the grok statement may not be possible, but I'd prefer doing conditional processing this way to additional match lines as this would leave the daemon_message field intact for other uses while pulling out the data I want.
ETA: I should also point out that totally removing the if statement allows the configtest to pass and the filter to parse logs.
Thanks in advance...
Conditionals go outside the filters, so something like:
if [field] == "value" {
grok {
...
}
]
would be correct. In your case, do the first grok, then test to run the second, i.e.:
grok {
match => [ "message", "%{NUMBER:engcode1} %{DATESTAMP_12H:timestamp} %{NUMBER:engcode2} %{NUMBER:engcode3} %{NUMBER:engcode4} %{NUMBER:ppid} %{NUMBER:pid} %{NUMBER:engcode5} %{WORD:processhost} %{WORD:processname} %{GREEDYDATA:daemon_message}" ]
}
if [daemon_message] =~ /done\ saving\ to\ pool/ {
grok {
match => [ "daemon_message", "%{WORD:savehost}\:%{WORD:saveset} done saving to pool \'%{WORD:pool}\' \(%{WORD:volume}\) %{WORD:saveset_size}" ]
}
}
This is really running two regexps for a record that matches. Since grok will only make fields when the regexp matches, you can do this:
grok {
match => [ "message", "%{NUMBER:engcode1} %{DATESTAMP_12H:timestamp} %{NUMBER:engcode2} %{NUMBER:engcode3} %{NUMBER:engcode4} %{NUMBER:ppid} %{NUMBER:pid} %{NUMBER:engcode5} %{WORD:processhost} %{WORD:processname} %{GREEDYDATA:daemon_message}" ]
}
grok {
match => [ "daemon_message", "%{WORD:savehost}\:%{WORD:saveset} done saving to pool \'%{WORD:pool}\' \(%{WORD:volume}\) %{WORD:saveset_size}" ]
}
You'd have to measure the performance across your actual log files since this will run fewer regexps, but the second one is more complicated.
If you really want to go nuts, you can do all of this in one grok{}, using the break_on_match feature.

How to correctly set timestamp in logstach from iis log

I am trying to parse iis log files using logstash and send them to elasticsearch.
I have the following log line
2014-02-25 07:49:32 172.17.0.96 GET /config/integration - 80 - 172.17.28.37 Mozilla/5.0+(Windows+NT+6.1;+WOW64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/33.0.1750.117+Safari/537.36 401 2 5 15
And use this filter:
filter {
if [message] =~ "^#" {
drop {}
}
grok {
match => ["message", "%{TIMESTAMP_ISO8601} %{IP:host_ip} %{URIPROTO:method} %{URIPATH:path} (?:-|%{NOTSPACE:uri_query}sern) %{NUMBER:port} %{NOTSPACE:username} %{IP:client_ip} %{NOTSPACE:useragent} %{NUMBER:response} %{NUMBER:subresponse} %{NUMBER:scstatus} %{NUMBER:timetaken}"]
}
date {
match => ["logtime", "YYYY-MM-dd HH:mm:ss"]
}
}
Everything gets parsed correctly but in the result the #timstamp field is the time I run the parsing, not the time of the log event. This causes all the log events to end up stacked together at the time I start logstash when I view them. I would like the #timestamp to be the time of the actual event.
What am I doing wrong?
First, you can specific a log time field in grok. Then, you use date filter to parse the log time to #timestamp. The #timestamp will update to the log time. For example,
filter {
if [message] =~ "^#" {
drop {}
}
grok {
match => ["message", "%{TIMESTAMP_ISO8601:logtime} %{IP:host_ip} %{URIPROTO:method} %{URIPATH:path} (?:-|%{NOTSPACE:uri_query}sern) %{NUMBER:port} %{NOTSPACE:username} %{IP:client_ip} %{NOTSPACE:useragent} %{NUMBER:response} %{NUMBER:subresponse} %{NUMBER:scstatus} %{NUMBER:timetaken}"]
}
date {
match => ["logtime", "YYYY-MM-dd HH:mm:ss"]
}
}
I solved it, didn't realize I had to store the time from the log entry into something, in this case eventtime
grok {
match => ["message", "%{DATESTAMP:eventtime} %{IP:host_ip} %{URIPROTO:method} %{URIPATH:path} (?:-|%{NOTSPACE:uri_query}sern) %{NUMBER:port} %{NOTSPACE:username} %{IP:client_ip} %{NOTSPACE:useragent} %{NUMBER:response} %{NUMBER:subresponse} %{NUMBER:scstatus} %{NUMBER:timetaken}"]
}
and then use that value to set #timestamp (wich is the implicit target field of the date filter)
date {
match => ["eventtime", "YY-MM-dd HH:mm:ss"]
}
a small gotcha was the format without leading year in the date expression, I guess DATESTAMP removes the century from the year.

Resources