Logstash file limit? - linux

I am running logstash 1.4.1 and ES 1.1.1. Logstash is reading log files from a central server (multiple servers logs present here usomg rsyslog), so for each day a dir like 2014-11-17 is created and file is created.
Problem I faced was that the first time I ran logstash it gave
Caused by: java.net.SocketException: Too many open files
So then I changed the nofile limit to 64000 in /etc/security/limits.conf and it worked fine.
Now, my problem is that with new files being created each day my number of files number will go on increasing and logstash will keep a handle on all open files.
How do others handle log streams when number of files are too large to be handled?
Shall I set it to unlimited?

If you archive the files from your server, logstash will stop watching them and release the file handle.
If you don't delete or move them, maybe there's a file pattern that only matches new files? Sometimes the current file is "foo.log" before being rotated to "foo.2014-11-17.log".
Logstash Forwarder (a lighter-weight shipper than the full Logstash) has a concept of "Dead Time", where it will stop watching a file if it's been inactive (default 24 hours).

Related

Generate auto increment sequence in logstash

I am pushing logs to Elastic Search from Logstash and then i need to get back the logs in the order they were written. Sorting by time stamp does not help because there could me multiple log statements in the same time. I followed the solution in Include monotonically increasing value in logstash field? and it worked perfectly in my windows system.
But when the code was moved to the linux production environment, logstash is not starting up. Failing with the below error
reason=>"Couldn't find any filter plugin named 'seq'. Are you sure
this is correct? Trying to load the seq filter plugin resulted in this
error: no such file to load -- logstash/filters/seq", :level=>:error}
Check if the seq.rb file is in the filter folder.
Also check if the line ending of your seq.rb are linux. If you transferred the file from a windows machine to a linux, the problem might come from here.

How to automate data ingestion with logstash pipelines?

I'm using ELK in order to ingest, store and visualize data, no fancy things..
Everything is working fine but each time I have new data to ingest I have to execute manually the command /opt/logstash/bin/logstash -f mypipeline.conf
I was wondering how to automate this last step in order to ingest the data in elastisearch each time new data arrive in the inut folder defined in my pipeline conf file?
I'm using the input plugin :
file {
path => "/path/to/myfiles*.csv"
start_position => beginning
sincedb_path => "/dev/null"
}
I guess I'm missing an important option that would allow to check if new files are present or not..
"discover_interval" or "stat_interval" ? or the sincedb-path ??
thx
The setting you are looking for is discover_interval Reference Here
discover_interval controls the number of seconds between times that Logstash re-evaluates the path to check for new files, it is by default set to 15 seconds. If Logstash is running, then placing a file into the proper directory and waiting 20 seconds should show data from that file in elastic.
If this doesn't seem to be the case, try setting the value manually to something like discover_interval => 10. Setting this too low could generate a lot of unnecessary overhead for your process.
found it .. just put the pipeline.conf files in the /etc/logstash/conf.d directory , at start all files in this directory will be loaded and executed.

Best way to manually periodically import log files into Graylog using logstash

I'm currently using logstash to import dozens of log files from different webapps into Graylog. It works great the files are tagged so I know from wich webapp they originate.
I can't change the webapp thus I can't add a GELF appender to the log4j conf of the webapp. The idea is to periodically retrieve the log files, parse them and import them with logstash into Graylog.
My problem is how do I make sure I don't import a log event I've already imported.
For example, I have a log file that has a log pattern that increments: log.1, log.2, etc. So I'll have log events that could be in log.1 the first time and 2 weeks later when I reimport them they'll maybe be in log.3.
I'm afraid I can't handle that with logstash's file input "sincedb_path" and "start_position".
So here are a few options I've gathered and I'd like your input about them, if anyone encountered the same issue:
Use a logstash filter dropping all events before a certain date,
requires to keep an index of every last log date of every file
imported (potentially 50+) and a lot of configuration writing
Use of a drool rule in GrayLog to refuse logs with timestamps prior
to last log received for a given type
Ask to change the log pattern to be something like log.date instead
of a log pattern that renames files (but I'd rather avoid this one)
Any other idea?

Logstash file input plugin

Currently I am using file input plugin to go over my log archive but file input plugin is not the right solution for me because file input plugin inherently expects that file is stream of events and not as a static file. Now, this is causing a great deal of problem for me because my log archive has a 100,000 + log files and I logstash opens a handle on all these files which are never going to change.
I am facing following problems
1) Logstash fails with problem mentioned in SO
2) With those many open file handles log archival storage is getting very slow.
Does anybody know a way to let logstash know that treat files statically or once a file is processed do not keep file handle on it.
In logstash Jira bug, I was told to write my own plugin with some other suggestions which won't help me much.
Logstash file input can process static file. You need to add this configuration
file {
path => "/your/logs/path"
start_position => "beginning"
}
After adding the start_position, logstash reads the file from the beginning. Please refer here for more information. Remember that this option only modifies “first contact” situations where a file is new and not seen before. If a file has already been seen before, this option has no effect. Otherwise you have set your sincedb_path to /dev/null .
For the first question, I have answer in the comment. Please try to add the maximum file opened.
For my suggestion, You can try to write a script copy the log file to the logstash monitor path and move it out constantly. You have to estimate the time that logstash process a log file.
look out for this also turn on -v and --debug for logstash
{:timestamp=>"2016-05-06T18:47:35.896000+0530",
:message=>"_discover_file: /datafiles/server.log:
**skipping because it was last modified more than 86400.0 seconds ago**",
:level=>:debug, :file=>"filewatch/watch.rb", :line=>"330",
:method=>"_discover_file"}
solution is to touch the file or change the ignore_older setting

Old logs are not imported into ES by logstash

When I start logstash, the old logs are not imported into ES.
Only the new request logs are recorded in ES.
Now I've see this in the doc.
Even if I set the start_position=>"beginning", old logs are not inserted.
This only happens when I run logstash on linux.
If I run it with the same config, old logs are imported.
I don't even need to set start_position=>"beginning" on windows.
Any idea about this ?
When you read an input log to Logstash, Logstash will keep an record about the position it read on this file, that's call sincedb.
Where to write the sincedb database (keeps track of the current position of monitored log files).
The default will write sincedb files to some path matching "$HOME/.sincedb*"
So, if you want to import old log files, you must delete all the .sincedb* at your $HOME.
Then, you need to set
start_position=>"beginning"
at your configuration file.
Hope this can help you.
Please see this line also.
This option only modifies "first contact" situations where a file is new and not seen before. If a file has already been seen before, this option has no effect.

Resources