Dynamically remove duplicate log messages

Dynamically remove duplicate log messages - linux

Recently we had a message fill up /var/log/libvirt/qemu/.log in a matter of minutes with a line that repeated that crashed our system due to the root partition being filled (20+ Gigs in minutes).
"block I/O error in device 'drive-virtio-disk0': Operation not permitted (1)"
Is there a way to ensure that duplicate lines are not pushed into logs, or a way to limit that directory from filling up? Logstash maxsize will not work for us since we run it on a daily cronjob.

It depends on which logging utility you are using (rsyslog or syslog-ng)
Rsyslog can remove repeated messages by adding lines like:
"last message repeated 3044 times".
To enable this option you should add:
$RepeatedMsgReduction on
to /etc/rsyslog.conf
I don't know if such reduction is possible with syslog-ng.
Both syslog-ng and rsyslog can completely remove lines matching some pattern:
rsyslog - take a look into this manual: http://www.rsyslog.com/discarding-unwanted-messages/
syslog-ng - take a look in filters. there is some example how to do it: https://serverfault.com/questions/540038/excluding-some-messages-from-syslog-ng

Related

what is $InputFilePollInterval in rsyslog.conf? by increasing this value will it impact on level of logging?

in rsyslog configuration file we configured like all application logs are to be write in /var/log/messages but the logs get written at very high rate, how can i decrease the level of logging at application level

Hope this is what you are looking for.
Open the file in a text editor:
/etc/rsyslog.conf
change the following parameter to what you think is good for you:
$SystemLogRateLimitInterval 3
$SystemLogRateLimitBurst 40
restart rsyslogd
service rsyslog restart
$InputFilePollInterval equivalent to: “PollingInterval”
PollingInterval seconds
Default: 10
This setting specifies how often files are to be polled for new data.
The time specified is in seconds. During each polling interval, all
files are processed in a round-robin fashion.
A short poll interval provides more rapid message forwarding, but
requires more system resources. While it is possible, we stongly
recommend not to set the polling interval to 0 seconds
.

There are a few approaches to this, and it depends on what exactly you're looking to do, but you'll likely want to look into separating your facilities into separate output files, based on severity. This can be done using RFC5424 severity priority levels in your configuration file.
By splitting logging into separate files by facility and/or severity, and setting the stop option, messages based on severity can be output to as many or few files as you like.
Example (set in the rsyslog.conf file):
*.*;auth,authpriv,kern.none /var/log/syslog
kern.* /var/log/kern.log
kern.debug stop
*.=debug;\
auth,authpriv.none;\
news.none;mail.none /var/log/debug
This configuration:
Will not output output any kern facility messages to syslog (due to kern.none)
Will output all debug level logging of kern to kern.log and "stop" there
Will output any other debug logs that are not excluded by .none to debug
How you separate things out is up to you, but I would recommend looking over the first link I included. You also may want to look into the different local facilities that can be used as separate log pipelines.

Collect some missing lines of a log file with logstash

How to collect new lines of a log file if we stop logstash in a fixed period and after we restart it? knowing that:
start_position => 'end'

As documented, the file input's start_position parameter only controls what happens the first time Logstash encounters a file. Once the first contact is made, Logstash only uses its internal bookmark stored in the sincedb file.
In other words, it's generally fine to restart or shut down Logstash a little while. It'll pick up everything when it starts up again. There are two caveats though:
If the logfile is rotated via a rename operation (which is the default in most cases) while Logstash is down and the filename pattern(s) don't cover the rotated file, the last lines will be lost. For this reason it's a good easy to have Logstash track the first rotated file too, so if you e.g. want to track /var/log/syslog and that file is rotated to /var/log/syslog.1 every morning it's wise to include both files.
Logstash 1.4.2 doesn't shut down gracefully if it receives a SIGTERM signal. All messages currently in the 20-message buffer will be lost. Additionally, I'm not sure the sincedb file is flushed. Sending SIGINT will ensure a graceful shutdown.

Reduce Size of .forever Log Files Without Disrupting forever Process

The log files (in /root/.forever) created by forever have reached a large size and is almost filling up the hard disk.
If the log file were to be deleted while the forever process is still running, forever logs 0 will return undefined. The only way for logging of the current forever process to resume is to stop it and start the node script again.
Is there a way to just trim the log file without disrupting logging or the forever process?

So Foreverjs will continue to write to the same file handle and ideally would support something that allows you to send it a signal and rotate to a different file.
Without that, which requires code change on the Forever.js package, your options look like:
A command line version:
Make a backup
Null out the file
cp forever-guid.log backup && :> forever-guid.log;
This has the slight risk of if your writing to the log file at a speedy pace, that you'll end up writing a log line between the backup and the nulling, resulting in the loss of the log line.
Use Logrotate w/copytruncate
You can set up logrotate to watch the forever log directory to copy and truncate automatically based on filesize or time.
Have your node code handle this
You can have your logging code look at how many lines the log file is and then doing the copy truncate - this would allow you to avoid the potential data loss.
EDIT: I had originally thought that split and truncate could do the job. They probably can but an implementation would look really awkward. Split doesn't have a good way to splitting the file into a short one (the original log) and a long one (the backup). Truncate (which in addition to the fact that it's not always installed) doesn't reset the write pointer, so forever just writes the same byte as it would have, resulting in strange data.

You can truncate the log file without losing its handle (reference).
cat /dev/null > largefile.txt

syslog: process specific priority

I have two user processes A and B. Both use syslog using facility LOG_USER.
I want to have different threshold levels for them:
For A, only messages of priority ERR-and-above must be logged
For B, only messages of priority CRIT-and-above must be logged
I found that if I setup /etc/syslog.conf as
user.err /var/log/messages
then messages of ERR-and-above are logged, but, from both A and B.
How can I have different minimum threshold levels for different processes?
Note: I am exploring if there is a config file based solution. Otherwise, there is another approach that works. In each process, we can use setlogmask() to install process specific priority mask.
EDIT (Nov 18): I want to use syslog and some portable solution.

A config file based solution is available. I think CentOS by default ships with rsyslog and even if it does not, you can always install rsyslog with yum. This solution works only with rsyslog and nothing else.
The is a catch, though. You can not separate log messages with rsyslog (or pretty much any syslog daemon implementation) between processes with same name ie. the same executable path. However, rsyslog does allow you to filter messages based on program name. Here lies a possible solution: most programs call openlog(3) using argv[0], ie. the executable name, as the first argument. Now since you don't reveal the actual program you're running, there is no way to determine this for you, but you can always read the sources of your own program, I guess.
In most cases the executable path is the program name, though some daemons do fiddle with argv[0] (notable examples are postfix and sendmail). Rsyslog on the other hand provides a filtering mechanism which allows one to filter messages based on the name of the sending program (you can now probably see how this is all connected to how openlog(3) is called). So, instead of trying to filter directly processes, we can do filtering on program names. And that we can affect by creating symbolic links.
So, this solution only works given following conditions: a) the process you're running does not fiddle with argv[0] after beginning execution; b) you can create symlinks to the binary, thus creating two different names for the same program; c) your program is calling openlog(3) using argv[0] as the first parameter to the call.
Given those two conditions, you can simply filter messages on /etc/rsyslog.conf like this (example directly from rsyslog documentation):
if $programname == 'prog1' then {
action(type="omfile" file="/var/log/prog1.log")
}
if $programname == 'prog2' then {
action(type="omfile" file="/var/log/prog2.log")
}
E.g. if your program is called /usr/bin/foobar and you've created symbolic links /usr/bin/prog1 and /usr/bin/prog2 both pointing at /usr/bin/foobar, the above configuration file example will then direct messages from processes started as "prog1" and "prog2" to different log files respectively. This example will not fiddle with anything else, so all those messages are still going to general log files, unless you filter them out explicitly.

This tutorial http://www.freebsd.org/cgi/man.cgi?query=syslog.conf&sektion=5 helped me. The following seem to work:
# process A: log only error and above
!A
*.err /var/log/messages
# process B: log only critical and above
!B
*.critical /var/log/messages
# all processes other than A and B: log only info and above
!-A,B
*.info /var/log/messages

Can syslog Performance Be Improved?

We have an application on Linux that used the syslog mechanism. After a week spent trying to figure out why this application was running slower than expected, we discovered that if we eliminated syslog, and just wrote directly to a log file, performance improved dramatically.
I understand why syslog is slower than direct file writes. But I was wondering: Are there ways to configure syslog to optimize its performance?

You can configure syslogd (and rsyslog at least) not to sync the log files after a log message by prepending a "-" to the log file path in the configuration file. This speeds up performance at the expense of the danger that log messages could be lost in a crash.

There are several options to improve syslog performance:
Optimizing out calls with a macro
int LogMask = LOG_UPTO(LOG_WARNING);
#define syslog(a, ...) if ((a) & LogMask ) syslog((a), __VA_ARGS__)
int main(int argc, char **argv)
{
LogMask = setlogmask(LOG_UPTO(LOG_WARNING));
...
}
An advantage of using a macro to filter syslog calls is that the entire call is
reduced to a conditional jump on a global variable, very helpful if you happen to
have DEBUG calls which are translating large datasets through other functions.
setlogmask()
setlogmask(LOG_UPTO(LOG_LEVEL))
setlogmask() will optimize the call by not logging to /dev/log, but the program will
still call the functions used as arguments.
filtering with syslog.conf
*.err /var/log/messages
"check out the man page for syslog.conf for details."
configure syslog to do asynchronous or buffered logging
metalog used to buffer log output and flushed it in blocks. stock syslog and syslog-ng
do not do this as far as I know.

Before embarking in new daemon writing you can check if syslog-ng is faster (or can be configured to be faster) than plain old syslog.

One trick you can use if you control the source to the logging application is to mask out the log level you want in the app itself, instead of in syslog.conf. I did this years ago with an app that generated a huge, huge, huge amount of debug logs. Rather than remove the calls from the production code, we just masked so that debug level calls never got sent to the daemon. I actually found the code, it's Perl but it's just a front to the setlogmask(3) call.
use Sys::Syslog;
# Start system logging
# setlogmask controls what levels we're going to let get through. If we mask
# them off here, then the syslog daemon doesn't need to be concerned by them
# 1 = emerg
# 2 = alert
# 4 = crit
# 8 = err
# 16 = warning
# 32 = notice
# 64 = info
# 128 = debug
Sys::Syslog::setlogsock('unix');
openlog($myname,'pid,cons,nowait','mail');
setlogmask(127); # allow everything but debug
#setlogmask(255); # everything
syslog('debug',"syslog opened");
Not sure why I used decimal instead of a bitmask... shrug

Write your own syslog implementation. :-P
This can be accomplished in two ways.
Write your own LD_PRELOAD hook to override the syslog functions, and make them output to stderr instead. I actually wrote a post about this many years ago: http://marc.info/?m=97175526803720 :-P
Write your own syslog daemon. It's just a simple matter of grabbing datagrams out of /dev/log! :-P
Okay, okay, so these are both facetious answers. Have you profiled syslogd to see where it's choking up most?

You may configure syslogd's level (or facility) to log asynchronously, by putting a minus before path to logfile (ie.: user.* [tab] -/var/log/user.log).
Cheers.

The syslog-async() implementation may help, at the risk of lost log lines / bounded delays at other times.
http://thekelleys.org.uk/syslog-async/
Note: 'asynchronous' here refers to queueing log events within your application, and not the asynchronous syslogd output file configuration option that other answers refer to.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Dynamically remove duplicate log messages - linux

Related

what is $InputFilePollInterval in rsyslog.conf? by increasing this value will it impact on level of logging?

Collect some missing lines of a log file with logstash

Reduce Size of .forever Log Files Without Disrupting forever Process

syslog: process specific priority

Can syslog Performance Be Improved?

Categories

Resources