Logging to a non blocking named pipe? - linux

I have a question, and I could'nt find help anywhere on stackoverflow or the web.
I have a program (celery distributed task queue) and I have multiple instances (workers) each having a logfile (celery_worker1.log, celery_worker2.log).
The important errors are stored to a database, but I like to tail these logs from time to time when running new operations to make sure everything is ok (the loglevel is lower).
My problem: these logs are taking a lot of disk space.
What I would like to do: be able to "watch" the logs (tail -f) only when I need it, without them taking a lot of space.
My ideas until now:
outputing logs to stdout, not to a file: not possible here since I have many workers outputing to different files, but I want to tail them all at once (tail -f celery_worker*.log)
using logrotate: it is an "OK" solution for me. I don't want this to be a daily task but would rather not put a minute crontab for this, and more, the server is not mine so that would mean some work on the admin-sys side
using named pipes: it looked good at first sight but I didn't know that named pipes (linux FIFO) where blocking. Hence, when I don't tail -f ALL of the pipes at the same time, or when I just quit my tail, the writing operations from the logger are blocked.
Is there a way to have a non-blocking named pipe, which would just throw to stdout when tailed, and throw to /dev/null when not?
Or are there technical difficulties to such a type of pipe? If there are, what are they?
Thank you for your answers!

Have each worker log to stdout, but connect each stdout to a utility that automatically spools and rotates logs based on size or time. multilog and svlogd are examples of such. For those programs, you'd merely tail the "current" log file.
You're right that logrotate is not quite the right solution for the problem you have.
Named pipes won't work as you want. At best, your writers could fill up their pipes and then discard subsequent logs, which is the inverse of the behavior you want.

You could try shared memory device man:shm_overview or perhaps a number of them. You need to organise them as circular buffers so they'd store last N kb of your log and whenever you read them with reader it will output everything to your console. This approach is adopted by busybox's syslog/logread suit (see logread.c).

Related

How to use the attach the same console as output for a process and input for another process?

I am trying to use suckless ii irc client. I can listen to a channel by tail -f out file. However is it also possible for me to input into the same console by starting an echo or cat command?
If I background the process, it actually displays the output in this console but that doesn't seem to be right way? Logically, I think I need to get the fd of the console (but how to do that) and then force the tail output to that fd and probably background it. And then use the present bash to start a cat > in.
Is it actually fine to do this or is that I am creating a lot of processes overhead for a simple task? In other words piping a lot of stuff is nice but it creates a lot of overhead which ideally has to be in a single process if you are going to repeat that task it a lot?
However is it also possible for me to input into the same console by starting an echo or cat command?
Simply NO! cat writes the current content. cat has no idea that the content will grow later. echo writes variables and results from the given command line. echo itself is not made for writing the content of files.
If I background the process, it actually displays the output in this console but that doesn't seem to be right way?
If you do not redirect the output, the output goes to the console. That is the way it is designed :-)
Logically, I think I need to get the fd of the console (but how to do that) and then force the tail output to that fd and probably background it.
As I understand that is the opposite direction. If you want to write to the stdin from the process, you simply can use a pipe for that. The ( useless ) example show that cat writes to the pipe and the next command will read from the pipe. You can extend to any other pipe read/write scenario. See link given below.
Example:
cat main.cpp | cat /dev/stdin
cat main.cpp | tail -f
The last one will not exit, because it waits that the pipe gets more content which never happens.
Is it actually fine to do this or is that I am creating a lot of processes overhead for a simple task? In other words piping a lot of stuff is nice but it creates a lot of overhead which ideally has to be in a single process if you are going to repeat that task it a lot?
I have no idea how time critical your job is, but I believe that the overhead is quite low. Doing the same things in a self written prog must not be faster. If all is done in a single process and no access to the file system is required, it will be much faster. But if you also use system calls, e.g. file system access, it will not be much faster I believe. You always have to pay for the work you get.
For IO redirection please read:
http://www.tldp.org/LDP/abs/html/io-redirection.html
If your scenario is more complex, you can think of named pipes instead of IO redirection. For that you can have a look at:
http://www.linuxjournal.com/content/using-named-pipes-fifos-bash

What's the right way to write to file from Node.js to avoid bottlenecking?

I'm curious what the correct methodology is to write to a log file from a process that might be called dozens (or maybe even thousands) of times simultaneously.
I have a node process which is called via http and I wish to log from it, but I don't want it to bottleneck as it attempts to open/write/close the same file from all the various simultaneous requests.
I've read that stderr might be the answer to this problem, but am curious what makes that approach any less bottlenecky. At the end of the day, if stderr is going to some central location, isn't it going to have the exact same problem?
Best practice for node (e.g. http://12factor.net/) is to write to stdout or stderr. The expectation is that the OS will handle the file management / throughput that you want, or else you can have a custom-written log collector that can do it the way you want and redirect stdout or stderr to it.

how to find what processes have written to a file on Linux

Is there a way to find out which process wrote to a give file earlier. I am having a problem where multiple processes seem to be writing to a file. I know one of the processes but not sure who else is writing to the file. I am on linux/ubuntu. Is there a way a log is mantained by the OS on what processes have written to a specified file
Create a small monitoring process which will log periodically who is currently accessing the file.
You can write a small script using fuser. Is here a quick example (to be improved)
#!/bin/bash
log=~/file-access.log
while true
do
fuser your_file >> $log
sleep 0.2s
done
But you will have to be lucky that the process writing to this file takes enough time to have the chance to detect it with fuser.
No, there is nothing by default to keep track of which processes wrote to a file after the fact.
If you can repro at will, inotify or similar can help you monitor who is writing to the file as it happens.

Nonblocking/asynchronous fifo/named pipe in shell/filesystem?

Is there a way to create non blocking/asynchronous named pipe or something similar in shell? So that programs could place lines in it, those lines would stay in ram, and when some program could read some lines from pipe, while leaving what it did not read in fifo? It is also very probable that programs can be writing and reading to this fifo at the same time. At first I though maybe this could be done using files, but after searching a web for a bit it seems nothing good can come from the fact that file is read and written at same time. Named pipes would almost work, just there are two problems: first they block reads/writes if there is no one at the other end, second even if I let writing to blocked and set two processes to write to pipe while no one is reading, by trying to write one line with each process, and then try head -n 1 <fifo> I get just one line as I need, but both writing processes terminate, and second line is lost. Any suggestions?
Edit: maybe some intermediate program could be used to help with this, acting like mediator between writers and readers?
You can use special program for this purpose - buffer. Buffer is designed to try and keep the writer side continuously busy so that it can stream when writing to tape drives, but you can use for other purposes. Internally buffer is a pair of processes communicating via a large circular queue held in shared memory, so your processes will work asynchronously. Your reader process will be blocked in case the queue is full and the writer process - in case the queue is empty. Example:
bzcat archive.bz2 | buffer -m 16000000 -b 100000 | processing_script | bzip2 > archive_processed.bz2
http://linux.die.net/man/1/buffer

How to detect that no one is writing to a file in Linux?

I am wondering, is there a simple way to tell whether another entity has a certain file open for writing? I don't have time to use iNotify continuously to wait for any current writer to finish writing. I need to do an intermittent check.
Thanks.
What exactly are you doing where you "don't have time to use iNotify continuously"? First, you should be using the IN_CLOSE_WRITE flag so that iNotify just make one notification when the file gets closed after being written. Using it continuously makes no sense. Second, if your timing is that critical, I'm thinking writing to a file isn't your ideal solution. Do you control the first writer? Do you have to worry about anything else writing to the file after the first writer closes it?
lsof LiSts Open Files. fuser also works similarly (File USER), by telling you which user is using the file.
See: http://www.refining-linux.org/archives/23/16-Introduction-to-lsof-and-fuser/
Since you seem to be wanting to use a library-style interface, and not system, see ofl-lib.c. (It's really just having removed everything but the main function from the ofl program itself.)
You can't do so easily in the general case, and even if you could, you cannot use the information in a non-racy manner (see caf's comment).
So I'd say, redesign your application so you do not need to know.

Resources