Generate auto increment sequence in logstash - logstash

I am pushing logs to Elastic Search from Logstash and then i need to get back the logs in the order they were written. Sorting by time stamp does not help because there could me multiple log statements in the same time. I followed the solution in Include monotonically increasing value in logstash field? and it worked perfectly in my windows system.
But when the code was moved to the linux production environment, logstash is not starting up. Failing with the below error
reason=>"Couldn't find any filter plugin named 'seq'. Are you sure
this is correct? Trying to load the seq filter plugin resulted in this
error: no such file to load -- logstash/filters/seq", :level=>:error}

Check if the seq.rb file is in the filter folder.
Also check if the line ending of your seq.rb are linux. If you transferred the file from a windows machine to a linux, the problem might come from here.

Related

Logstash save/ modify configuration in environment

In my system, I use logstash, filebeat and elasticsearch
Filebeat reads the logs, required fields in the logs are filtered with logstash and saved in elasticsearch.
I have a customer requirement to switch on/off saving some fields in the log by a single config change by the customer.
My planned approach is to keep the switch variable as an environment variable in "/etc/default/logstash" location and let the customer change the variables with a file operation.
But I have found out that the logtash config is not reloaded when we change that file even if we set the "config.reload.automatic: true". So I cannot continue my planned approach.
Also letting customer edit the logstast ".conf" files is not a good approach either because the code is so complex.
Please advice on this issue.
Thanks,
I have found that it is not possible to reload the value of a variable in the environment without restarting logstash. So I have used a file read solution. The config block is as below.
ruby {
code => "event.set( 'variable1',IO.readlines('/etc/logstash/input.txt')[0])"
}
This has fixed my problem. But I would like to know is there a performance impact in executing file operation in each event

How can I prune executors' logs in spark streaming

I'm working on a spark streaming job which runs on standalone mode. The executors by default append the logs in $SPARK_HOME/work/app_idxxxx/stderr and stdout files. Now the problem comes when app runs for a long time say a month or more and it generates a lot of logs inside stderr file. I would like to rollup the stderr daily for a week and archive(delete) that after that. I changed the log4j.properties with org.apache.log4j.RollingFileAppender and directed the logs to a file instead of stderr but the file doesn't respect the rolling and keeps growing.
Creating a cron job to do that is also not working since spark has a pointer to that specific file and changing the name probably not working.
I could't find any documentations for these specific logs. I really appreciate for any help.
After digging more, I finally found how to resolve the issue and I post it here so that the next person don't go through all this suffer and trial/error.
The setting for those logs are in two different places. One in $SPARK_HOME/conf/spark-default.conf add these three lines below in each executor:
spark.executor.logs.rolling.time.interval daily
spark.executor.logs.rolling.strategy time
spark.executor.logs.rolling.maxRetainedFiles 7
The other file that you need to change in each executor is $SPARK_HOME/conf/spark-env.sh add the following line:
SPARK_WORKER_OPTS="$SPARK_WORKER_OPTS -Dspark.worker.cleanup.enabled=true -Dspark.worker.cleanup.interval=1800
-Dspark.worker.cleanup.appDataTtl=864000
-Dspark.executor.logs.rolling.strategy=time
-Dspark.executor.logs.rolling.time.interval=daily
-Dspark.executor.logs.rolling.maxRetainedFiles=7 "
export SPARK_WORKER_OPTS
After these changes it started working properly. Hope this helps some people :)
if you are in standalone mode, just export an environment is enough:
export SPARK_WORKER_OPTS="-Dspark.executor.logs.rolling.strategy=time -Dspark.executor.logs.rolling.time.interval=daily -Dspark.executor.logs.rolling.maxRetainedFiles=7"
you can also refer to: http://apache-spark-user-list.1001560.n3.nabble.com/Executor-Log-Rotation-Is-Not-Working-td18024.html

Configure Logstash to wait before parsing a file

I wonder if you can configure logstash in the following way:
Background Info:
Every day I get a xml file pushed to my server, which should be parsed.
To indicate a complete file transfer afterwards I get an empty .ctl (custom file) transfered to the same folder.
The files both have the following name schema 'feedback_{year}{yearday}_UTC{hoursminutesseconds}_51.{extention}' (e.g. feedback_16002_UTC235953_51.xml). So they have the same file name but one is with .xml and the other is a .ctl file.
Question:
Is there a way to configure logstash to wait parsing the xml file until the according .ctl file is present?
EDIT:
Is there maybe a way to archiev that with filebeat?
EDIT2:
It would also be enough to be able to configure logstash in a way that it will wait x minutes before starting to process a new file, if that is easier.
Thanks for any help in advance
Your problem is that you don't want to start the parser before the file transfer hasn't been completed. So, why don't push the data to a file (file-complete.xml) when you find your flag file (empty.ctl)?
Here is the possible logic for a script and runs using crontab:
if empty.ctl exists:
Clear file-complete.xml
Add the content of file.xml to file-complete.xml.
Remove empty.ctl
This way, you'd need to parse the data from file-complete.xml. I think is simpler to debug and configure.
Hope it helps,

Best way to manually periodically import log files into Graylog using logstash

I'm currently using logstash to import dozens of log files from different webapps into Graylog. It works great the files are tagged so I know from wich webapp they originate.
I can't change the webapp thus I can't add a GELF appender to the log4j conf of the webapp. The idea is to periodically retrieve the log files, parse them and import them with logstash into Graylog.
My problem is how do I make sure I don't import a log event I've already imported.
For example, I have a log file that has a log pattern that increments: log.1, log.2, etc. So I'll have log events that could be in log.1 the first time and 2 weeks later when I reimport them they'll maybe be in log.3.
I'm afraid I can't handle that with logstash's file input "sincedb_path" and "start_position".
So here are a few options I've gathered and I'd like your input about them, if anyone encountered the same issue:
Use a logstash filter dropping all events before a certain date,
requires to keep an index of every last log date of every file
imported (potentially 50+) and a lot of configuration writing
Use of a drool rule in GrayLog to refuse logs with timestamps prior
to last log received for a given type
Ask to change the log pattern to be something like log.date instead
of a log pattern that renames files (but I'd rather avoid this one)
Any other idea?

Old logs are not imported into ES by logstash

When I start logstash, the old logs are not imported into ES.
Only the new request logs are recorded in ES.
Now I've see this in the doc.
Even if I set the start_position=>"beginning", old logs are not inserted.
This only happens when I run logstash on linux.
If I run it with the same config, old logs are imported.
I don't even need to set start_position=>"beginning" on windows.
Any idea about this ?
When you read an input log to Logstash, Logstash will keep an record about the position it read on this file, that's call sincedb.
Where to write the sincedb database (keeps track of the current position of monitored log files).
The default will write sincedb files to some path matching "$HOME/.sincedb*"
So, if you want to import old log files, you must delete all the .sincedb* at your $HOME.
Then, you need to set
start_position=>"beginning"
at your configuration file.
Hope this can help you.
Please see this line also.
This option only modifies "first contact" situations where a file is new and not seen before. If a file has already been seen before, this option has no effect.

Resources