Opening syslog files safely - linux

I have a syslog server running syslog-ng (soon to be running rsyslog) on RHEL 6 with almost a 1000 hosts logging to it. I want write a script to open each file (read-only and 1 at the time), pull some data from them and close them (probably in Ruby). Will this mess up syslog or cause any other issues? What other pitfalls might I need to be aware of?
My main worry is syslog trying to write data to a file that I have open, even tho I may only have it open for a very short amount of time (maybe less than a second).
possible sudo code:
foreach file
open $file
grep "search sting" $file
close $file

If the daemon only appends to the file, I think there's no risk. It's deleting/tryncating I would worry about. I think it depends on the configuration of your daemon (not so much on which daemon is used), but I would assume most configs don't actually remove data.
Deleting is typically done with logrotate and other tools. I would look at these tools' configs (e.g. make them run at a different time than the script you're mentioning).

Related

will io direction operation lock the file?

i have a growing nginx log file about 20G already, and i wish to rotate it.
1, i mv the old log file to a new log file
2, i do > old_log_file.log to truncate the old log file in about 2~3 seconds
if there's a lock(write lock?) on the old log file when i doing the truncating(about 2~3 seconds)?
at that 2~3s period, nginx returns 502 for waiting to append logs to old log file until lock released?
thank you for explaining.
On Linux, there is (almost) no mandatory file locks (more precisely, there used to be some mandatory locking feature in the kernel, but it is deprecated and you really should avoid using it). File locking happens with flock(2) or lockf(3) and is advisory and should be explicit (e.g. with flock(1) command, or some program calling flock or lockf).
So every locking related to files is practically a convention between all the software using that file (and mv(1) or the redirection by your shell don't use file locking).
Remember that a file on Linux is mostly an i-node (see inode(7)) which could have zero, one or several file paths (see path_resolution(7) and be aware of link(2), rename(2), unlink(2)) and used thru some file descriptor. Read ALP (and perhaps Operating Systems: Three Easy Pieces) for more.
No file locking happens in the scenario of your question (and the i-nodes and file descriptors involved are independent).
Consider using logrotate(8).
Some software provide a way to reload their configuration and re-open log files. You should read the documentation of your nginx.
It depends on application if it locks the file. Application that generates this log file must have option to clear log file. One example is in editor like vim file can be externally modified while it is still open in editor.

How to check if a file is opened in Linux?

The thing is, I want to track if a user tries to open a file on a shared account. I'm looking for any record/technique that helps me know if the concerned file is opened, at run time.
I want to create a script which monitors if the file is open, and if it is, I want it to send an alert to a particular email address. The file I'm thinking of is a regular file.
I tried using lsof | grep filename for checking if a file is open in gedit, but the command doesn't return anything.
Actually, I'm trying this for a pet project, and thus the question.
The command lsof -t filename shows the IDs of all processes that have the particular file opened. lsof -t filename | wc -w gives you the number of processes currently accessing the file.
The fact that a file has been read into an editor like gedit does not mean that the file is still open. The editor most likely opens the file, reads its contents and then closes the file. After you have edited the file you have the choice to overwrite the existing file or save as another file.
You could (in addition of other answers) use the Linux-specific inotify(7) facilities.
I am understanding that you want to track one (or a few) particular given file, with a fixed file path (actually a given i-node). E.g. you would want to track when /var/run/foobar is accessed or modified, and do something when that happens
In particular, you might want to install and use incrond(8) and configure it thru incrontab(5)
If you want to run a script when some given file (on a native local, e.g. Ext4, BTRS, ... but not NFS file system) is accessed or modified, use inotify incrond is exactly done for that purpose.
PS. AFAIK, inotify don't work well for remote network files, e.g. NFS filesystems (in particular when another NFS client machine is modifying a file).
If the files you are fond of are somehow source files, you might be interested by revision control systems (like git) or builder systems (like GNU make); in a certain way these tools are related to file modification.
You could also have the particular file system sits in some FUSE filesystem, and write your own FUSE daemon.
If you can restrict and modify the programs accessing the file, you might want to use advisory locking, e.g. flock(2), lockf(3).
Perhaps the data sitting in the file should be in some database (e.g. sqlite or a real DBMS like PostGreSQL ou MongoDB). ACID properties are important ....
Notice that the filesystem and the mount options may matter a lot.
You might want to use the stat(1) command.
It is difficult to help more without understanding the real use case and the motivation. You should avoid some XY problem
Probably, the workflow is wrong (having a shared file between several users able to write it), and you should approach the overall issue in some other way. For a pet project I would at least recommend using some advisory lock, and access & modify the information only thru your own programs (perhaps setuid) using flock (this excludes ordinary editors like gedit or commands like cat ...). However, your implicit use case seems to be well suited for a DBMS approach (a database does not have to contain a lot of data, it might be tiny), or some index locked file like GDBM library is handling.
Remember that on POSIX systems and Linux, several processes can access (and even modify) the same file simultaneously (unless you use some locking or synchronization).
Reading the Advanced Linux Programming book (freely available) would give you a broader picture (but it does not mention inotify which appeared aften the book was written).
You can use ls -lrt, it displays the last RW operations in the shell. Then you can conclude whether the file is opened or not. Make sure that you are in the exact directory.

Monitor STDERR of all processes running on my linux machine

I would like to monitor the STDERR channel of all the processes running on my Linux. Monitoring should preferably be done at real-time (i.e. while the process is running), but post-processing will also do. It should be done without requiring root permissions, and without breaking any security features.
I have done a good bit of searching, and found some utilities such as reptyr and screenify, and a few explanations on how to do this with gdb (for example here). However, all of these seem to be doing both too much and too little. Too much in the sense that they take full control of the process's stream handles (i.e. closing original one and opening a new one). Too little in the sense that they have serious limitations, such as the fact that require disabling security features, such as ptrace_scope.
Any advice would be highly appreciated!
Maybe this question would get more answers on SU. The only thing I could think of would be to monitor the files and devices already opened as STDERR. Of course, this would not work if STDERR is redirected to /dev/null.
You can get all the file descriptors for STDERR with:
ls -l /dev/proc/[0-9]*/fd/2
If you own the process, accessing its STDERR file descriptor or output file should be possible in the language of your choice without being root.

Clearing Large Apache Domain Logs

I am having an issue where Apache logs are growing out of proportion on several servers (Linux CentOS 5)... I will eventually disable logging completely but for now I need a quick fix to reclaim the hard disk space.
I have tried using the echo " " > /path/to/log.log or the * > /path/to/log.log but they take too long and almost crash the server as the logs are as large as 100GB
Deleting the files works fast but my question is, will it cause a problem when I restart apache. My servers are live and full of users so I can't crash them.
Your help is appreciated.
Use the truncate command
truncate -s 0 /path/to/log.log
In the longer term you should use logrotate to keep the logs from getting out of hand.
Try this:
cat /dev/null > /path/to/log.log
mv /path/to/log.log /path/to/log.log.1
Do this for your access, error and if you are really doing it on prod, you rewrite logs.
This doesn't effect Apache on *nix, since the file is open. Then restart Apache. Yes, I know I said restart, but this usually takes a second or so, so I doubt that anyone will notice -- or blame it on the network. The restarted Apache will be running with a new set of log files.
In terms of your current logs, IMO you need to keep at least the last 3 months error logs, and 1 month access logs, but look at your volumetrics to decide your rough per week volumes for error and access logs. Don't truncate the old files. If necessary do a nice tail piped to gzip -c of these to archives. If you want to split the use a loop doing a tail|head|gzip using the --bytes=nnG option. OK, you'll split across the odd line but that's better than deleting the lot as you suggest.
Of course, you could just delete the lot as you and others propose, but what are you going to do if you've realised that the site has been hacked recently? "Sorry: too late; I've deleted the evidence!"
Then for goodness sake implement a logrotate regime.

Redirecting multiple stdouts to single file

I have a program running on multiple machines with NFS and I'd like to log all their outputs into a single file. Can I just run ./my_program >> filename on every machine or is there an issue with concurrency I should be aware of? Since I'm only appending, I don't think there would be a problem, but I'm just trying to make sure.
That could work, but yes, you will have concurrency issues with it, and the log file will be basically indecipherable.
What I would recommend is that there be a log file for each machine and then on some periodical basis (say nightly), concatenate the files together with the machine name as the file name:
for i in "/path/to/logfiles/*"; do
echo "Machine: $i";
cat $i;
done > filename.log
That should give you some ideas, I think.
The NFS protocol does not support atomic append writes, so append writes are never atomic on NFS for any platform. Files WILL end up corrupt if you try.
When appending to files from multiple threads or processes, the fwrites to that file are atomic under the condition that the file was opened in appending mode, the string written to it does not exceed the filesystem blocksize and the filesystem is local. Which in NFS is not the case.
There is a workaround, although I would not know how to do it from a shellscript. The technique is called close-to-open cache consistency

Resources