Filebeat and Logstash read old files sometimes - linux

I have a folder with log files from 2016-present and setup filebeat with "ignore_older: 48h". All the files get rotated so that "log" is always the new one, "log.1" is the next etc.
Logs are on linux NFS partition mounted on the logstash host.
I expect filebeat to get only log files that where changed in the last 24h and ignore the older ones.
The above happens except from time to time it also gets older files in no specific order.
I ran "stat" command on one of the older file from 2018 and i see the following:
Access: 2019-03-02 03:15:32.254460960 +0000
Modify: 2018-09-06 13:12:00.331460890 +0000
Change: 2019-02-28 03:34:33.946462475 +0000
I run filebeat version 6.4.2
Is this data confusing Logstash? What is it actually looking at when checking if a file has changed. How can i stop it from taking older files.
UPDATE:
My filebeat configuration looks like this:
- type: log
enabled: true
paths:
- /path/to/my/log/file/log*
fields:
logname: "log.name"
include_lines: ["SOME_TEXT"]
ignore_older: 48h
Logs are in CSV format.
On another host i do the same but with logstash directly, the input config is like this:
input {
file {
path => "/path/to/my/log/file/log*"
mode => "tail"
start_position => "beginning"
close_older => "24h"
ignore_older => "2w"
}
}
I have the same issue here.

You can try to do two things, one is to remove the * after log in the path like this
- /path/to/my/log/file/log
Since filebeat will read a rotated log file even after it is moved until it reaches a certain age.
Or for logstash the path parameter is an array and you create a list of files to be read, if you know how often the files get rotated:
path => [ "path/to/my/log/file.log", "/path/to/my/log/file1.log", "path/to/my/log/file2.log"]

Related

rsyslog , collect log from files outside /var/log

I have different logs that are written to our moutend nfs share that i need to send to our syslog-server (graylog) they are located outside /var/log folder.
So i add some extra conf in /etc/rsyslog.d/
For this example i have two files with following config:
atlassian-application-confluence-log.conf
module(load="imfile")
module(load="imklog")
$MaxMessageSize 50k
global(workDirectory="/atlassian/test/confluence/logs")
# This is the main application log file
input(type="imfile"
File="/atlassian/test/confluence/logs/atlassian-confluence.log"
Tag="atlassian"
PersistStateInterval="200"
)
# This file contains entries related to the search index.
input(type="imfile"
File="/atlassian/test/confluence/logs/atlassian-confluence-index.log"
Tag="atlassian"
PersistStateInterval="200"
)
# Send to Graylog
action(type="omfwd" target="log-server-company.com" port="5140")
# if you want to keep a local copy of the logs.
action(type="omfile" File="/var/log/rsyslog.log" template="RSYSLOG_TraditionalFileFormat")
atlassian-application-jira-log.conf
module(load="imfile")
module(load="imklog")
$MaxMessageSize 50k
global(workDirectory="/atlassian/test/jira/log")
# Contains logging for most of Jira, including logs that aren’t specifically written elsewhere
input(type="imfile"
File="/atlassian/test/jira/log/atlassian-jira.log"
Tag="atlassian"
PersistStateInterval="200"
)
# Send to Graylog
action(type="omfwd" target="log-server-company.com" port="5140")
# if you want to keep a local copy of the logs.
action(type="omfile" File="/var/log/rsyslog.log" template="RSYSLOG_TraditionalFileFormat")
So to my problem.
When i check the Rsyslogd configuration with following command:
rsyslogd -N1 -f /etc/rsyslog.d/atlassian-application-confluence-log.conf
It says it is valid.
When i restart the rsyslog service i get the following errors:
module 'imfile' already in this config, cannot be added [v8.2102.0-10.el8 try https://www.rsyslog.com/e/2221 ]
module 'imklog' already in this config, cannot be added [v8.2102.0-10.el8 try https://www.rsyslog.com/e/2221 ]
error during parsing file /etc/rsyslog.d/atlassian-tomcat-confluence-log.conf, on or before line 6: parameter 'workdirectory' specified more than once - one instance is ignored. Fix config [v8.2102.0-10.el8 try https://www.rsyslog.com/e/2207]>
error during parsing file /etc/rsyslog.d/atlassian-tomcat-confluence-log.conf, on or before line 6: parameter 'workDirectory' not known -- typo in config file? [v8.2102.0-10.el8 try https://www.rsyslog.com/e/2207]
module 'imfile' already in this config, cannot be added [v8.2102.0-10.el8 try https://www.rsyslog.com/e/2221 ]
module 'imklog' already in this config, cannot be added [v8.2102.0-10.el8 try https://www.rsyslog.com/e/2221 ]
[origin software="rsyslogd" swVersion="8.2102.0-10.el8" x-pid="379288" x-info="https://www.rsyslog.com"] start
imjournal: journal files changed, reloading... [v8.2102.0-10.el8 try https://www.rsyslog.com/e/0 ]
How can i get rid of the warnings?
I have already tried to put the two modules in /etc/rsyslog.conf
I get following errors from that config:
parameter 'PersistStateInterval' not known
parameter 'Tag' not known
parameter 'File' not known
If there are multiple configuration files, they are processed in ascending sort order of the file name (numerically/alphabetically), See: $IncludeConfig.
Therefore you don't have to include any configuration parameters (modules, work directories, rulesets etc.) multiple times. You can include them once in the config which is loaded first.

Logstash throws unexpected error: <ArgumentError: Setting "" hasn't been registered>,

I followed instruction: https://www.elastic.co/guide/en/logstash/current/plugins-inputs-jmx.html
to setup cassandra jmx metric monitoring.
My logstash.yml is as follows:
input {
jmx {
path => "/home/foo/elastic/logstash"
polling_frequency => 15
type => "jmx"
nb_thread => 4
}
}
output {
stdout { codec => rubydebug }
}
Under /home/foo/elastic/logstash, I define a jmx.conf file with following info:
//Required, JMX listening host/ip
"host" : "192.168.1.139",
//Required, JMX listening port
"port" : 7199,
//Optional, the username to connect to JMX
"username" : "foo",
//Optional, the password to connect to JMX
"password": "foo",
//Optional, use this alias as a prefix in the metric name. If not set use <host>_<port>
"alias" : "cassandra",
Then I run logstash in the command:
sudo bin/logstash -f /etc/logstash/logstash.yml --path.settings /etc/logstash --debug
I get the following error:
Sending Logstash's logs to /usr/share/logstash/logs which is now configured via log4j2.properties
[FATAL] 2017-10-28 22:57:23.812 [main] runner - An unexpected error occurred! {:error=>#, :backtrace=>["/usr/share/logstash/logstash-core/lib/logstash/settings.rb:32:in get_setting'", "/usr/share/logstash/logstash-core/lib/logstash/settings.rb:64:inset_value'", "/usr/share/logstash/logstash-core/lib/logstash/settings.rb:83:in merge'", "org/jruby/RubyHash.java:1342:ineach'", "/usr/share/logstash/logstash-core/lib/logstash/settings.rb:83:in merge'", "/usr/share/logstash/logstash-core/lib/logstash/settings.rb:135:invalidate_all'", "/usr/share/logstash/logstash-core/lib/logstash/runner.rb:243:in execute'", "/usr/share/logstash/vendor/bundle/jruby/1.9/gems/clamp-0.6.5/lib/clamp/command.rb:67:inrun'", "/usr/share/logstash/logstash-core/lib/logstash/runner.rb:204:in run'", "/usr/share/logstash/vendor/bundle/jruby/1.9/gems/clamp-0.6.5/lib/clamp/command.rb:132:inrun'", "/usr/share/logstash/lib/bootstrap/environment.rb:71:in `(root)'"]}
I figure it out myself.
There are 2 types of settings for Logstash, one is for Logstash, one for pipeline. Typically the one in /etc/logstash is for Logstash, and pipeline one is in /etc/logstash/conf.d. I used pipeline one for Logstash, which creates all errors.
The Problem
The error message is not pretty clear but the logstash.yml file should contains data in YAML format. And here it doesn't (Your data input { jmx { path => ... is clearly not on YAML format.
Why does it matter ?
Because Logstash has two types of configuration files:
pipeline configuration files with the .conf extension which define the Logstash processing pipeline and
settings files, which specify options that control Logstash startup and execution. ( logstash.yml, pipelines.yml, jvm.options, etc...). This means :
If You want to configure data input, filter or output, you should put your configuration in a logstash.conf file (or in a file having a .conf extension)
If you want to configure logstash and you do not want to pass options or flags at the command line level, you can set them here in logstash.yml.
How to fix it ?
You can ignore (or delete) the file /etc/logstash/logstash.yml if you do not have any option to pass to set logstash startup behaviour.
Set the content of the file /etc/logstash/pipelines.yml to the following
# This file is where you define your pipelines. You can define multiple.
# For more information on multiple pipelines, see the documentation:
# https://www.elastic.co/guide/en/logstash/current/multiple-pipelines.html
- pipeline.id: main
path.config: "/etc/logstash/conf.d/*.conf"
Put your logstash data configuration pipeline in the file /etc/logstash/conf.d/logstash.conf as following
input {
jmx {
path => "/home/foo/elastic/logstash"
polling_frequency => 15
type => "jmx"
nb_thread => 4
}
}
output {
stdout { codec => rubydebug }
}
Et voilà :=)
I believe that you did mistake in config file - it should be the JSON object, but I don't see the opening { at beginning of config - see example of config file in article that you refer.
I tried following steps and it worked for me:
The input and output section specified by you under logstash.yml file should be in conf file e.g. jmx.conf
Comment everything under logstash.yml so that default settings are loaded by logstash.
It resolved the issue.

Logstash reparses file if it is just renamed

I am parsing files using logstash and storing it in mongodb. I donot want logstash to reparse the file if the file is just renamed. How can i achieve this?
I included the sincedp_path field and my command is like this.
input {
file {
path => "/file.log"
sincedb_path => "/logstash"
}
}
output {
mongodb {
collection ="collect" database => "db" uri => "mongodb://localhost"
}
}
This gives the following error:
A plugin had an unrecoverable error. Will restart this plugin.
Error: Permission denied - /logstash.13562.1005.292789 or /logstash {:level=>:error}
Errno::EACCES: Permission denied - /logstash.13562.1005.545871 or /logstash
rename at org/jruby/RubyFile.java:987
atomic_write at /logstash/vendor/bundle/jruby/1.9/gems/filewatch- 0.6.4/lib/filewatch/helper.rb:39
_sincedb_write at /logstash/vendor/bundle/jruby/1.9/gems/filewatch-0.6.4/lib/filewatch/tail.rb:236
sincedb_write at /logstash/vendor/bundle/jruby/1.9/gems/filewatch-0.6.4/lib/filewatch/tail.rb:206
teardown at /logstash/vendor/bundle/jruby/1.9/gems/logstash-input-file-1.0.0/lib/logstash/inputs/file.rb:157
inputworker at /logstash/vendor/bundle/jruby/1.9/gems/logstash-core-1.5.3-java/lib/logstash/pipeline.rb:203
synchronize at org/jruby/ext/thread/Mutex.java:149
inputworker at /logstash/vendor/bundle/jruby/1.9/gems/logstash-core-1.5.3-java/lib/logstash/pipeline.rb:203
start_input at /logstash/vendor/bundle/jruby/1.9/gems/logstash-core-1.5.3-java/lib/logstash/pipeline.rb:171
How can i solve this?
The default behavior of logstash is to track files by inode in the sincedb. This would handle files being renamed properly. If you are experiencing it not working, then chances are your sincedb is set to a directory/file that logstash can't write to. You can explicitly say where your sincedb: http://logstash.net/docs/1.4.2/inputs/file#sincedb_path
file {
sincedb_path => '/some/writable/directory'
}

Logstash: since_db not getting created

Was playing with the since_db option and it appears that the sincedb file isn't getting created. Below is my Logstash File configuration. I have verified that I can create the file manually so there is no permission issue. Would appreciate if anyone can throw more light on the same.
input {
file {
path => "/home/tom/fileData/*.log"
type => "log"
sincedb_path => "/home/tom/sincedb"
start_position => beginning
}
}
Can the user running logstash write to the sincedb_path location, if not that is what needs to get fixed. Other than that, your logstash configuration should work fine.

Use regex as input file path in Logstash

I would like to parse a directory of logs files with logstash.
When the logs are formatted like this :
server-20140604.log
server-20140603.log
server-20140602.log
There is no problem, I am using globs like this :
input {
file {
path=>["D:/*.log"]
}
}
But my logs are formatted like this :
server.log
server.log.1
server.log.2
client.log
client.log.1
client.log.2
So I would like to know how to tell to logstash to parse in the folder all the files starting with "server" expression in their names. I really need to do it like that, because I have other files in the folder (i.e client logs) that I don't want to parse but also cannot remove from the folder.
With this configuration I can only parse all the log files start with prefix server.
input {
file {
path => ["D:/server*"]
}
}
output {
stdout {
codec => rubydebug
}
}
I think the possible problem you have meet is the start_position config. It means that where does logstash start to read the logs. Please refer to here. Remember this option only modifies first contact situations where a file is new and not seen before. If a file has already been seen before, this option has no effect.
When you stop logstash, logstash will save a .sincedb* in your home directory. Next time you start it, logstash will start read the file according to .sindb*. If you do not input new logs to server.log, logstash will never parse the old logs.
What you can try to do is delete all the .sincedb before you start logstash and add start_posistion to your config. In your comment you have say if you overwrite the server.log logstash can parse the file from beginning, it is because logstash detect it as a new file and the .sincedb* do not save any information about this file. So logstash will parse it! You can try to find out your .sincedb and try to delete it.

Resources