Logstash file input plugin - logstash

Currently I am using file input plugin to go over my log archive but file input plugin is not the right solution for me because file input plugin inherently expects that file is stream of events and not as a static file. Now, this is causing a great deal of problem for me because my log archive has a 100,000 + log files and I logstash opens a handle on all these files which are never going to change.
I am facing following problems
1) Logstash fails with problem mentioned in SO
2) With those many open file handles log archival storage is getting very slow.
Does anybody know a way to let logstash know that treat files statically or once a file is processed do not keep file handle on it.
In logstash Jira bug, I was told to write my own plugin with some other suggestions which won't help me much.

Logstash file input can process static file. You need to add this configuration
file {
path => "/your/logs/path"
start_position => "beginning"
}
After adding the start_position, logstash reads the file from the beginning. Please refer here for more information. Remember that this option only modifies “first contact” situations where a file is new and not seen before. If a file has already been seen before, this option has no effect. Otherwise you have set your sincedb_path to /dev/null .
For the first question, I have answer in the comment. Please try to add the maximum file opened.
For my suggestion, You can try to write a script copy the log file to the logstash monitor path and move it out constantly. You have to estimate the time that logstash process a log file.

look out for this also turn on -v and --debug for logstash
{:timestamp=>"2016-05-06T18:47:35.896000+0530",
:message=>"_discover_file: /datafiles/server.log:
**skipping because it was last modified more than 86400.0 seconds ago**",
:level=>:debug, :file=>"filewatch/watch.rb", :line=>"330",
:method=>"_discover_file"}
solution is to touch the file or change the ignore_older setting

Related

Can I delete a file after it is read, or notify when all are processed?

I have a list of files being read into Elasticsearch through Logstash. I wanted to know how to know when the files were all caught up.
I was thinking of deleting the file after it is fully read in.
I havent seen anything though in regards to notification or ack of file completion, or file deletion.
Id love some insight, as i figure it would be a part of the config file. I just have no insight.
Ideally: I would love delete the file after it is fully consumed. That way, i can work my way through all of the file types, starting with txt files.
Using the file input in logstash you can do that, you need to change two config options, mode and file_completed_action.
You need to change the mode option to read, the default is tail, and add the file_completed_action with the value of delete.
file {
mode => "read"
path => "/path/to/your/files/*.log"
file_completed_action => "delete"
}

Logstash save/ modify configuration in environment

In my system, I use logstash, filebeat and elasticsearch
Filebeat reads the logs, required fields in the logs are filtered with logstash and saved in elasticsearch.
I have a customer requirement to switch on/off saving some fields in the log by a single config change by the customer.
My planned approach is to keep the switch variable as an environment variable in "/etc/default/logstash" location and let the customer change the variables with a file operation.
But I have found out that the logtash config is not reloaded when we change that file even if we set the "config.reload.automatic: true". So I cannot continue my planned approach.
Also letting customer edit the logstast ".conf" files is not a good approach either because the code is so complex.
Please advice on this issue.
Thanks,
I have found that it is not possible to reload the value of a variable in the environment without restarting logstash. So I have used a file read solution. The config block is as below.
ruby {
code => "event.set( 'variable1',IO.readlines('/etc/logstash/input.txt')[0])"
}
This has fixed my problem. But I would like to know is there a performance impact in executing file operation in each event

How to automate data ingestion with logstash pipelines?

I'm using ELK in order to ingest, store and visualize data, no fancy things..
Everything is working fine but each time I have new data to ingest I have to execute manually the command /opt/logstash/bin/logstash -f mypipeline.conf
I was wondering how to automate this last step in order to ingest the data in elastisearch each time new data arrive in the inut folder defined in my pipeline conf file?
I'm using the input plugin :
file {
path => "/path/to/myfiles*.csv"
start_position => beginning
sincedb_path => "/dev/null"
}
I guess I'm missing an important option that would allow to check if new files are present or not..
"discover_interval" or "stat_interval" ? or the sincedb-path ??
thx
The setting you are looking for is discover_interval Reference Here
discover_interval controls the number of seconds between times that Logstash re-evaluates the path to check for new files, it is by default set to 15 seconds. If Logstash is running, then placing a file into the proper directory and waiting 20 seconds should show data from that file in elastic.
If this doesn't seem to be the case, try setting the value manually to something like discover_interval => 10. Setting this too low could generate a lot of unnecessary overhead for your process.
found it .. just put the pipeline.conf files in the /etc/logstash/conf.d directory , at start all files in this directory will be loaded and executed.

Configure Logstash to wait before parsing a file

I wonder if you can configure logstash in the following way:
Background Info:
Every day I get a xml file pushed to my server, which should be parsed.
To indicate a complete file transfer afterwards I get an empty .ctl (custom file) transfered to the same folder.
The files both have the following name schema 'feedback_{year}{yearday}_UTC{hoursminutesseconds}_51.{extention}' (e.g. feedback_16002_UTC235953_51.xml). So they have the same file name but one is with .xml and the other is a .ctl file.
Question:
Is there a way to configure logstash to wait parsing the xml file until the according .ctl file is present?
EDIT:
Is there maybe a way to archiev that with filebeat?
EDIT2:
It would also be enough to be able to configure logstash in a way that it will wait x minutes before starting to process a new file, if that is easier.
Thanks for any help in advance
Your problem is that you don't want to start the parser before the file transfer hasn't been completed. So, why don't push the data to a file (file-complete.xml) when you find your flag file (empty.ctl)?
Here is the possible logic for a script and runs using crontab:
if empty.ctl exists:
Clear file-complete.xml
Add the content of file.xml to file-complete.xml.
Remove empty.ctl
This way, you'd need to parse the data from file-complete.xml. I think is simpler to debug and configure.
Hope it helps,

Old logs are not imported into ES by logstash

When I start logstash, the old logs are not imported into ES.
Only the new request logs are recorded in ES.
Now I've see this in the doc.
Even if I set the start_position=>"beginning", old logs are not inserted.
This only happens when I run logstash on linux.
If I run it with the same config, old logs are imported.
I don't even need to set start_position=>"beginning" on windows.
Any idea about this ?
When you read an input log to Logstash, Logstash will keep an record about the position it read on this file, that's call sincedb.
Where to write the sincedb database (keeps track of the current position of monitored log files).
The default will write sincedb files to some path matching "$HOME/.sincedb*"
So, if you want to import old log files, you must delete all the .sincedb* at your $HOME.
Then, you need to set
start_position=>"beginning"
at your configuration file.
Hope this can help you.
Please see this line also.
This option only modifies "first contact" situations where a file is new and not seen before. If a file has already been seen before, this option has no effect.

Resources