Logstash creating directory too early - linux

I have my logstash instance create a new directory everyday to store its logs. The config file is below. It seems to create a directory (and start using it) in the evening a day early; as opposed to creating it right after midnight (when the date actually changes). I am on the West coast (UTC−08:00). I am on an OEL os.
Configuration:
input {
udp {
port => 6379
}
}
filter {
ruby {
code => "event['#timestamp'] = event['#timestamp'].localtime('-08:00')"
}
}
output {
file {
path => ["/logstash-1.4.1/logs/%{+YYYY-MM-dd}/logstash_in.txt"]
}
elasticsearch {
protocol => http
}
stdout {
codec => rubydebug
}
}
My system date and time are correct:
[root#xxx]# date
Mon Jul 14 18:22:37 PDT 2014

For short answer, the file output path timestamp %{+YYYY-MM-dd} is refer to UTC time.
That's means your directory will be create at your evening time.
For long answer, you can refer to the file output source code. The path is
path = event.sprintf(#path)
And drill down to the event.rb
t = #data["#timestamp"]
formatter = org.joda.time.format.DateTimeFormat.forPattern(key[1 .. -1])\
.withZone(org.joda.time.DateTimeZone::UTC)
#next org.joda.time.Instant.new(t.tv_sec * 1000 + t.tv_usec / 1000).toDateTime.toString(formatter)
# Invoke a specific Instant constructor to avoid this warning in JRuby
# > ambiguous Java methods found, using org.joda.time.Instant(long)
org.joda.time.Instant.java_class.constructor(Java::long).new_instance(
t.tv_sec * 1000 + t.tv_usec / 1000
).to_java.toDateTime.toString(formatter)
The path paramter %{+YYYY-MM-dd} is based on the UTC time: (org.joda.time.DateTimeZone::UTC).
So, there are two solution to do what you need,
a) Modify event.rb to use your timezone, instead of UTC.
b) Create a your own day field and use that field you specific %{+YYYY-MM-dd}
Here is my configuration:
filter {
ruby {
code => "
ownTime = event['#timestamp'].localtime('-08:00')
event['day'] = ownTime.strftime('%Y-%m-%d')
"
}
}
output {
file {
path => "/logstash-1.4.1/logs/%{day}/logstash_in.txt"
}
stdout {
codec => "rubydebug"
}
}
Hope this can help you.

If you want to convert the timezone based on the timezone's name:
filter {
date {
match => [ "#timestamp", "ISO8601" ]
timezone => "America/New_York"
}
}

Related

logstash file output not working with metadata fileds

I have following pipeline, the requirement is, I need to write "metrics" data to ONE file and EVENT data to another file. I am having two issues with this pipeline.
The file output not creating a "timestamped file" every 30 seconds, instead it is creating a file with this name output%{[#metadata][ts]}.csv and keep on appending the data.
The CSV output, creating a new file with timestamp every 30 seconds, but somehow it is creating an extra file once with the name output%{[#metadata][ts]} and keep on appending meta info to this file.
Can someone please guide me how I can fix this?
input {
beats {
port => 5045
}
}
filter {
ruby {
code => '
event.set("[#metadata][ts]", Time.now.to_i / 30)
event.set("[#metadata][suffix]", "output" + (Time.now.to_i / 30).to_s + ".csv")
'
}
}
filter {
metrics {
meter => [ "code" ]
add_tag => "metric"
clear_interval => 30
flush_interval => 30
}
}
output {
if "metric" in [tags] {
file {
flush_interval => 30
codec => line { format => "%{[code][count]} %{[code][count]}"}
path => "C:/lgstshop/local/csv/output%{[#metadata][ts]}.csv"
}
stdout {
codec => line {
format => "rate: %{[code][count]}"
}
}
}
file {
path => "output.log"
}
csv {
fields => [ "created", "level", "code"]
path => "C:/lgstshop/local/output%{[#metadata][ts]}.evt"
}
}
A metrics filter generates new events in the pipeline. Those events will only go through filters that come after it. Thus the metric events do not have a [#metadata][ts] field, so the sprintf references in the output section are not substituted. Move the ruby filter so that it is after the metrics filter.
If you do not want the metrics sent to the csv then wrap that output with if "metric" not in [tags] { or put it in an else of the existing conditional.

Logstash pull server date as config variable

As part of my logstash config I want to pull the current date from the server, which it uses as part of it's API query using http_poller.
Is there any way to do that? I've tried something along the lines of this:
$(date +%d%m%y%H%M%S)
But it doesn't get picked up. This is the config:
input{
http_poller {
#proxy => { host => "" }
proxy => ""
urls => {
q1 => {
method => post
url => ""
headers => {.... }
body => '{
"rsid": "....",
"globalFilters": [
{
"type": "dateRange",
"dateRange": "%{+ddMMyyHHmmss}"
}
................
}'
}
}
request_timeout => 60
# Supports "cron", "every", "at" and "in" schedules by rufus scheduler
schedule => { cron => "* * * * * UTC"}
codec => "json"
metadata_target => "http_poller_metadata"
}
}
output {
elasticsearch {
hosts => ["xxxx"]
index => "xxxx"
}
}
There is nothing like a variable declaration we can do in input ..
a workaround rather is to define env variables with dates eg. on windows (powershell script) -
$env:startDate=(Get-Date).AddDays(-1).ToString('yyyy-MM-dd')
$env:endDate=(Get-Date).AddDays(0).ToString('yyyy-MM-dd')
Then we can use these variables as ${startDate} in the url .
However once logstash is started the dates remain static.
Guess we need to restart logstash script everyday for it to take the new value of date.
Another alternative is to write a proxy webservice, which will probably be in java or other languages , where the java class can be declared with variables and then it invokes the actual webservice and returns the response back to the logstash script.
This issue is pending in logstash since 2016... not sure why it cannot be addressed !

how to get part of the path name and add it to the index

I currently have a file name like this.
[SERIALNUMBER][2014_12_04][00_45_22][141204T014214]AB_DEF.log
i basically want to extract the year from the file (2014) and add it to the index name in logstash conf file.logstash.conf
Below is my conf file.
input {
file {
path => "C:/ABC/DEF/HJK/LOGS/**/*"
start_position => beginning
type => syslog
}
}
filter {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}"}
}
}
output {
elasticsearch {
index => "type1logs"
}
stdout {}
}
Please help.
thanks
IIRC, you get a field called 'path'. You can then run another grok{} filter, using 'path' as input and extracting the data you want.
Based on your use of COMBINEDAPACHELOG in your existing config, your log entries already have the date. It's a more common practice to use this timestamp field in the date{} filter to change the #timestamp field. Then, the default elasticsearch{} output config will create an index called "logstash-YYYY.MM.DD".
Having daily indexes like this (usually) makes data retention easier.

Duplicate entries into Elastic Search while logstash instance is running

I have been trying to send logs from logstash to elasticsearch.Suppose I am running a logstash instance and while it is running,I make a change to the file which the logstash instance is monitoring,then all the logs which have been previously saved in the elasticsearch are saved again,hence duplicates are formed.
Also,when the logstash instance is closed and is restarted again,the logs gets duplicated in the elasticsearch.
How do I counter this problem?
How to send only the newest added entry in the file from logstash to elasticsearch?
My logstash instance command is the following:
bin/logstash -f logstash-complex.conf
and the configuration file is this:
input
{
file
{
path => "/home/amith/Desktop/logstash-1.4.2/accesslog1"
}
}
filter
{
if [path] =~ "access"
{
mutate
{
replace =>
{ "type" => "apache_access" } }
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
}
date {
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
}
}
output {
elasticsearch {
host => localhost
index => feb9
}
stdout { codec => rubydebug }
}
I got the solution.
I was opening the file,adding a record and saving it ,due to which logstash treated the same file as a different file each time I saved it as it registered different inode number for the same file.
The solution is to append a line to the file without opening the file but by running the following command.
echo "the string you want to add to the file" >> filename
[ELK stack]
I wanted some custom configs in
/etc/logstash/conf.d/vagrant.conf
so the first step was to make a backup: /etc/logstash/conf.d/vagrant.conf.bk
This caused logstash to add 2 entries in elasticseach for each entry in <file>.log;
the same if i had 3 files in /etc/logstash/conf.d/*.conf.* in ES i had 8 entries for each line in *.log
As you mentioned in your question.
when the logstash instance is closed and is restarted again,the logs gets duplicated in the elasticsearch.
So, it probably you have delete the .since_db. Please have a look at here.
Try to specific the since_db and start_position. For example:
input
{
file
{
path => "/home/amith/Desktop/logstash-1.4.2/accesslog1"
start_position => "end"
sincedb_path => /home/amith/Desktop/sincedb
}
}

How to modify #timestamp with an entry in logs using logstash

I have some logs which has only time as its entries
1. 17:20:45.331|ERR|....
2. 17:20:54.715|SYS|.....Logging started for [....] (Date=[07/28/2014], ...
3. 17:20:54.716|SYS....
and so on
I have the date in only one line of the logs. based on that i want to create a timestamp such as that logging date in logs + the time in each entry
Iam able to get the time in each entry. i can get the log_message => " Logging started for [....] (Date=[07/28/2014], ..." as one entry.
Is it possible to get the date from this entry and modify all other entry's timestamp?
how can I add time and the date and modify the timestamp?
Any help will be appreciated as iam new to logstash
My filter in logstash conf
filter {
grok { match => [ "message", "%{TIME:time}\|%{WORD:Message_type}\|%{GREEDYDATA:Component}\|%{NUMBER:line_number}\| %{GREEDYDATA:log_message}"]
}
date {
match => ["timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ] => need to modify this as date+%{time}
}
}
time field has milliseconds also.
Your options are:
Change how things are logged to get the date included
Write something to fix the logs before they are picked up by logstash (ie something that looks for the entry any modifies the log)
use the memorize plugin that I wrote (and I submitted a pull request for to try and get it in a future version).
The plugin is detailed in this answer. The caveat with this solution is that if the plugin misses the line that has the date, you'll have issues with the remainder of the file. This could happen if you restart logstash, so you'll need to add in some logic to handle this -- in this case below, I assume that if it hasn't seen the date, it's today.
An implementation using the memorize plugin would look like this:
filter {
if ([message] =~ /Date=/) {
grok { match => [ "message", "Date=%{DATE:date}" ] }
}
# either add the field date to the saved date or pull the date from the saved data
memorize { fields => ["date"] }
# if we still don't have a date, lets just assume it's today
if ([date] == '') {
ruby {
code => 'event["date"]=ime.now.strftime("%m/%d/%Y")'
}
}
if ([message] !~ /Date=/) {
# grok to parse message
grok { match => [ "message", "%{TIME:time}\|%{WORD:Message_type}\|%{GREEDYDATA:Component}\|%{NUMBER:line_number}\| %{GREEDYDATA:log_message}"]
# now add in date
mutate {
add_field => {
datetime => "%{date} %{time}"
}
}
}
}
(This example has not been tested, so there may be syntax/logic errors, but it should get you down the right path).

Resources