File input not picking up copied or moved files Logstash - logstash

I'm successfully using logstash to parse json formated events and send them to elasticsearch. Each event is created in a seprate file. One event per file with .json extension.
Logstash is correctly picking up a file when i create it using "vi mydoc.json", paste in the content and save. However it does not pick up if I cp or mv a file.
The objective is to automatically copy files to a directory and then parse them by logstash.
Each file has a different name and size. I tried looking at logstash code to figure out what attribute it uses but couldnt find relevant code. I also tried deleting the .sincedb files but didn't help either.
The input config is as follows:
input {
file {
path => "/opt/rp/*.json"
type => "tp"
start_position => "beginning"
stat_interval => 1
}
}
How can I have logstash pick up copied files? What file stat attribute does it use to check if a file is new?
Thanks

You can switch from Logstash to Apache Flume: Flume has a Spooling-Directory-Source (Logstash's input {}) and a Elasticsearch-Sink (Logstash's output {}).
The Spooling-Directory-Source is exactly what you are looking for, afaics.
If you don't want to rewrite your Logstash-filter{} you can use Flume to collect the files and sink them into one file (see File-Roll-Sink) and let Logstash consume the events from it.
Be aware, that the file operations in Flume's spooling dir have to be atomic. Don't change a processed file or append to it.

Related

logstash not reading the current file after log rotation happens

I read input from log file and write to kafka. even after log rotation, inode doesnt change. after log rotation, still reads rotated log file(xx.log.2020-xx-xx) instead of pointing to main file(xx.log).
Below is my config file setting for input file.
Do I need to add any other config to ignore reading old files.
input {
file {
path => "C:/Users/xx.log"
}
}
it's the same issue as this one. Logstash handles pretty well file rotation by default.
All you need to do is to make sure to use a glob pattern (e.g. ...log*) that identifies all of your log files and Logstash will keep track of them:
input {
file {
path => "C:/Users/xx.log*"
}
}

File Name and Variable in Flume

Right now I am working in a project where we are trying to read tomcat access log using flume and process those data in Spark and dump those in DB in proper format. But problem is that tomcat access log file is a daily rolling file and file name will change every day. Some thing like...
localhost_access_log.2017-09-19.txt
localhost_access_log.2017-09-18.txt
localhost_access_log.2017-09-17.txt
and my flume conf file for source section is something like
# Describe/configure the source
flumePullAgent.sources.nc1.type = exec
flumePullAgent.sources.nc1.command = tail -F /tomcatLog/localhost_access_log.2017-09-17.txt
#flumePullAgent.sources.nc1.selector.type = replicating
Which is running tail command on a fixed file name(I used fixed name , for testing only). How can I pass the file name as a parameter in flume conf file.
In fact , If some how I able to pass the file name as parameter , then also it will not be a actual solution. say , I start flume today with some file name (example : "localhost_access_log.2017-09-19.txt"), tomorrow when I will change the file name (localhost_access_log.2017-09-19.txt to localhost_access_log.2017-09-20.txt) some one has to stop the flume and restart with new file name. In that case it will not be a continues process, I have to stop / start the flume using cron job or something like this. Another problem is that I will loss some data(The server we are working now is high throughput server , 700-800 TPS almost ) every day during the processing time.(I mean time it will take to generate the new file name+time to stop flume+time to start flule)
Any one , have idea how to run flume with roll over file name in production environment? Any help will be highly appreciated...
exec source is not suitable for your task, you can instead use Spooling Directory Source. From Flume user guide:
This source lets you ingest data by placing files to be ingested into a “spooling” directory on disk. This source will watch the specified directory for new files, and will parse events out of new files as they appear.
Then, in config file you'd mention your logs directory like this:
agent.sources.spooling_src.spoolDir = /tomcatLog

How to input multiple csv files in logstash (Elasicsearch). please give one simple example.

How to input multiple csv files in logstash (Elasicsearch). please give one simple example.
if I have five csv files in one folder & still new files may get created in same location so how can I process all new files also in logstash.
change in file path only. ( path => "/home/mahadev/data/*.csv)
So now we can able to read all csv files inside the data folder. if any new csv comes then it will get reflect into logstash.
Please note: the above file path we need to put in logstash's conf file. if you are new then please read about logstash configration file then read this.

Does logstash update the .sincedb file after a log file is completely processed or during the reading process?

Does Logstash update the .sincedb file after a log file is read till the end or during the reading process ?
For example:
Let's say there is directory which is being monitored by Logstash. A file [say file1.log with max offset (file size) as 10000 ] is copied into this directory.
Does .sincedb file gets updated/created (if not already present) with the info of file1.log when the Logstash reaches offset 10000 ?
What I think is logstash should update the .sincedb file on regular basis, but what I have noticed is that it gets updated/created after a file is completely read.
The logstash file input plugin will write the sincedb file on a regular basis based on the sincedb_write_interval setting.
By default, the sincedb database is written every 15 seconds.

Why Logstash reloading duplicate data from file in Linux?

I am using logstash, elasticsearch and kibana.
My logstash configuration file is as follows.
input {
file {
path => "/home/rocky/Logging/logFiles/test1.txt"
start_position => "end"
sincedb_path => "test.db"
}
}
output {
stdout { codec => rubydebug }
elasticsearch { host => localhost }
}
When I am running Logstash in windows environment it is working fine, but when I am using same configuration in my virtual Linux OS (Fedora) it is creating a problem.
In fedora when i am inserting anything at the end of the log file when logstash is running. Sometimes it is sending all data of the file from beginning, sometimes half data. But it should only load new data appended to that log file. also sincedb file is storing data correctly. Still it is nor giving proper data in Fedora. Please help.
I had a similar problem on my LinuxMint machine using the official logstash docker image.
I was using a text editor (Geany) to add new lines to the file. After playing around a bit more, I figured out that it must have been related to what my text editor (Geany) was doing when saving the file after I added new lines.
When I added new lines using a simple echo command instead, things worked fine:
echo "some new line" >> my_file.log
I know this thread is old, but this was the only thing that came up at all when I googled for this, so hopefully this will help someone else...

Resources