Log rotation in logstash - logstash

I am using file as input for logs in logstash . My log files are rotated daily so , I wanted to ask how can we configure file plugin of logstash so that it work with the files that are rotated daily. And adding to this, is log rotation available with file beat as well.

I am trying to answer your questions in part.
First - log rotation.
From the docs:
Note that the rotated filename will be treated as a new file so if
start_position is set to beginning the rotated file will be
reprocessed.
https://www.elastic.co/guide/en/logstash/current/plugins-inputs-file.html
That means, that if you have a rename in your file rotation, you will likely double your file (unless the path excludes the renamed file I believe).
If your path excludes your renamed file, then it should be fine.
I fixed this in a different way (in java and python accordingly).
I disable renaming of files and instead name the log file with the date prefix. So for me, in my java app, the file name is:
my-server-log-%h-%d.log
Since I am working in a distributed environment, I incorporate the hostname into my logfile name.
%h = hostname
%d = date
This ends up in my file being named:
my-server-log-pandaadb-2016-06-20.log
This file is never renamed. I modified my rotation algorithm to simply not rename and instead at midnight create a new file and leave the previous file untouched.
This has the effect that logstash (correctly) knows that it has read all lines in the previous file. It picks up the new file since I am using wildcards in my input. No logs are duplicated.
This also works quite well in combination with rsync by the way.
I hope that helps,
Artur
Edit: I have not worked with filebeat so far, so I can't comment on that part.

Related

YAML file one line filled with null characters, #0000 character not supported while reading

I've built a python based application(which runs 24/7) that logs some information in a YAML file every few minutes. It was working perfectly for a few days. Suddenly after approximately after 2 weeks, one line in the YAML file was filled with NUL characters (416 characters of NUL to be precise).
Now the suspicion is that someone might've tried to open the already running application again, so both the applications tried to write/access the same YAML file which could've caused this. But I couldn't replicate this.
Just wanted to know the cause of this issue.
Please let me know if someone faced the same issue before.
Some context about the file writing:
The YAML file will be loaded in append mode and a list is written inside it using the command below:
with open(file_path, 'a') as file:
yaml.dump(summary_list, file)
Concurrent access is a possible cause for this especially when you're appending. For example, it may be that both instances opened the file and set the start marker on the same position, but let the file grow to the sum of both appended data dumps. That cause some part of the file not to be written, which might explain the NULs.
Whatever happened is more dependent on your OS and your filesystem than it is on YAML. But even if we knew that we couldn't tell for sure.
I recommend using a proper logging framework to avoid such issues; you can dump YAML as string to log it.

How to stream log files content that is constantly changing file names in perl?

I a series of applications on Linux systems that I need to basically constantly 'stream' out or even just 'tail' out but the challenge is the filenames are constantly rolling and changing.
The are all date encoded (dates being in different formats) and each then have different incremented formats.
Most of them start with one and increase, but one doesn't have an extension and then adds an extension past the first file and the other increments a number but once hitting 99 rolls to increment a alpha and returns the numeric to 01 and then up again as it rolls so quickly.
I just have the OS level shell scripting, OS command line utilities, and perl available to me to handle this situation for another application to pickup and read these logs.
The new files are always created right when it starts writing to the new file and groups of different logs (some I am reading some I am not) are being written to the same directory so I cannot just pickup anything hitting the directory.
If I simply 'tail -n 1000000 -f |' them today this works fine for the reader application I am using until the file changes and I cannot setup file lists ranges within the reader application, but can pre-process them so they basically appear as a continuous stream to the reader vs. the reader directly invoking commands to read them. A simple Perl log reader like this also work fine for a static filename but not for dynamic ones. It is critical I don't re-process any logs lines and just capture new lines being written to the logs.
I admit I am not any form a Perl guru, and the best answers / clue I've been able to find so far is the use of Perl's Glob function to possibly do this but the examples I've found basically reprocess all of the files on each run then seem to stop.
Example File Names I am dealing with across multiple apps I am trying to handle..
appA_YYMMDD.log
appA_YYMMDD_0001.log
appA_YYMMDD_0002.log
WS01APPB_YYMMDD.log
WS02APPB_YYMMDD.log
WS03AppB_YYMMDD.log
APPCMMDD_A01.log
APPCMMDD_B01.log
YYYYMMDD_001_APPD.log
As denoted above the files do not have the same inode and simply monitoring the directory for change is not possible as a lot of things are written there. On the dev system it has more than 50 logs being written to the directory and thousands of files and I am only trying to retrieve 5. I am seeing if multitail can be made available to try that suggestion but it is not currently available and installing any additional RPMs in the environment is generally a multi-month battle.
ls -i
24792 APPA_180901.log
24805 APPA__180902.log
17011 APPA__180903.log
17072 APPA__180904.log
24644 APPA__180905.log
17081 APPA__180906.log
17115 APPA__180907.log
So really the root of what I am trying to do is simply a continuous stream regardless if the file name changes and not have to run the extract command repeatedly nor have big breaks in the data feed while some script figures out that the file being logged to has changed. I don't need to parse the contents (my other app does that).. Is there an easy way of handling this changing file name?
How about monitoring the log directory for changes with Linux inotify, e.g. Linux::inotify2? Then you could detect when new log files are created, stop reading from the old log file and start reading from the new log file.
Try tailswitch. I created this script to tail log files that are rotated daily and have YYYY-MM-DD on their names. To use this script, you just say:
% tailswitch '*.log'
The quoting prevents the shell from interpreting the glob pattern. The script will perform glob pattern from time to time to switch to a newer file based on its name.

logrotate - backup any file

I have one file. It's not a log file. Every day I need to copy it somewhere else adding timestamp to its name. I need to keep last ten (10) days copies.
I am considering using logrotate service (server is running CentOS).
My question is if there is some limit on logrotate to rotate only log files or if I can use any other file and location. That is may I specify some other location where to put copies with timestamps and to hold only last 10 (days) copies?
Thanks in advance for your hints.
You can rotate any file in any location e.g. take a look at /etc/logrotate.d/samba (if installed) which rotates log.smbd. Just create a new file in /etc/logrotate.d/ and configure for your file to be rotated. A nice description is here for centOS not different (here).
The problem is the timestamp in the name. If you want this it is no longer rotating. With logrotate the latest archived file is always number 1 and thus all older once are getting renamed. But they keep their time of last modification stored in the file system untouched. But this time is not in the name.
As an alternative you can just create a new cron job by adding a file in one of the \etc\cron.*\ directories. This cron job can move you file adding a timestamp e.g. by using date +%y%m%d_%H:%M and create a new file using touch. No need for logrotate.

log4j fileappender doesn't switch to the new file when logrotate rotates the log file

Context:
I want to use log4j to write audit-related logs to a specific log file, let's say audit.log. I don't want to use syslogappender(udp based) because I don't want to be tolerant to data loss. Plus, I am using logrotate to rotate the audit.log when the file gets to certain size.
Problem:
I am encountering is that, when logrotate rotates the file audit.log to audit.log.1, log4j keeps writing to audit.log.1 other than writing to the audit.log.
Possible approaches:
I know I can use rollingfileappender to do the log rotation other than use logrotate, so when rollingfileappender rolls the file, it switch to the new file without hassles. But the reason I can't use rollingfileappender is that I want to use logrotate's post rotate feature to trigger some scripts after the rotation happens which can't be provided by rollingfileappender.
Another desperate way I can think of is to write a log4j customized appender myself to close the log file(audit.log.1) and open the new one(audit.log) when it detects the file is rotated.
I never used ExternallyRolledFileAppender, but if it's possible to use logrotate post rotate to send the signal to ExternallyRolledFileAppender and make log4j aware the file is rotated, and start writing to the new file?
Question:
Just wondering is there some appender like that already been invented/written? or do I have other options to solve this?
Check out logrotate's copytruncate option, it might help your case:
copytruncate
Truncate the original log file to zero size in place
after creating a copy, instead of moving the old log
file and optionally creating a new one. It can be
used when some program cannot be told to close its
logfile and thus might continue writing (appending) to
the previous log file forever. Note that there is a
very small time slice between copying the file and
truncating it, so some logging data might be lost.
When this option is used, the create option will have
no effect, as the old log file stays in place

Linux - Restoring a file

I've written a vary basic shell script that moves a specified file into the dustbin directory. The script is as follows:
#!/bin/bash
#move items to dustbin directory
mv "$#" ~/dustbin/
echo "File moved to dustbin"
This works fine for me, any file I specify gets moved to the dustbin directory. However, what I would like to do is create a new script that will move the file in the dustbin directory back to its original directory. I know I could easily write a script that would move it back to a location specified by the user, but I would prefer to have one that would move it to its original directory.
Is this possible?
I'm using Mac OS X 10.6.4 and Terminal
You will have to store where the original file is coming from then. Maybe in a seperate file, a database, or in the files attributes (meta-data).
Create a logfile with 2 columns:
The complete filename in the dustbin
The complete original path and filename
You will need this logfile anyway - what will you do when a user deleted 2 files in different directories, but with the same name? /home/user/.wgetrc and /home/user/old/.wgetrc ?
What will you do when a user deletes a file, makes a new one with the same name, and then deletes that too? You'll need versions or timestamps or something.
You need to store the original location somewhere, either in a database or in an extended attribute of the file. A database is definitely the easiest way to do it, though an extended attribute would be more robust. Looking in ~/.Trash/ I see some, but not all files have extended attributes, so I'm not sure how Apple does it.
You need to somehow encode the source directory in the file. I think the easiest would be to change the filename in the dustbin directory. So that /home/user/music/song.mp3 becomes ~/dustbin/song.mp3|home_user_music
And when you copy it back your script needs to process the file name and construct the path beginning at |.
Another approach would be to let the filesystem be your database.
A file moved from /some/directory/somewhere/filename would be moved to ~/dustbin/some/directory/somewhere/filename and you'd do find ~/dustbin -name "$file" to find it based on its basename (from user input). Then you'd just trim "~/bustbin" from the output of find and you'd have the destination ready to use. If more than one file is returned by find, you can list the proposed files for user selection. You could use ~/dustbin/$deletiondate if you wanted to make it possible to roll back to earlier versions.
You could do a cron job that would periodically remove old files and the directories (if empty).

Resources