Linux - Disabing buffered I/O to file in the child processes - linux

In my application I am creating a bunch of child processes. After fork() I open a per process file, set the stdout/stderr of the created process to point to that file and then exec the intended program.
Is there an option for the parent process to setup things such a way that when the child process does a printf it gets flushed immediately to the output file without having to call flush() ? Or is there an API that can be called from the child process itself (before exec) to disable buffered I/O ?

The problem here is that printf is buffered. The underlying file descriptors are not buffered in that way (they are buffered in the kernel, but the other end can read from the same kernel buffer). You can change the buffering using setvbuf as mentioned in a comment which should have been an answer.
setvbuf(stdout, NULL, _IONBF, 0);
You do not need to do this for stdin or stderr.
You can't do this from the parent process. This is because the buffers are created by the child process. The parent process can only manipulate the underlying file descriptors (which are in the kernel), not stdout (which is part of the C library).
P.S. You mean fflush, not flush.

Related

Most efficient way to save and later send output from many child processes

I want to do the following on linux:
Spawn a child process, run it to completion, save it's stdout
and then later write that saved stdout to a file.
The issue is that I want to do step 1 a few thousand times with different processes in a thread pool before doing step 2.
What's the most efficient way of doing this?
The normal way of doing this would be to have a pipe that the child process writes to, and then call sendfile() to send it to the output file (saving the copy to/from userspace). But this won't work for a few reasons. First of all, it would require me to have thousands of fds open at a time, which isn't supported in all linux configurations. Secondly, it would cause the child processes to block when their pipes fill up, and I want them to run to completion.
I considered using memfd_create to create to stdout fd for the child process. That solves the pipe filling issue, but not the fd limit one. vmsplice looked promising: I could splice from a pipe to user memory but according to the man page:
vmsplice() really supports true splicing only from user memory to a
pipe. In the opposite direction, it actually just copies the data to
user space.
Is there a way of doing this without copying to/from userspace in the parent process, and without having a high number of fds open at once?

What can I assume about pthread_create and file descriptors?

I just debugged a program that did roughly:
pthread_create(...);
close(0);
int t = open("/named_pipe");
assert(t == 0);
Occasionally it fails, as pthread_create actually briefly opens file descriptors on the new thread – specifically /sys/devices/system/cpu/online – which if you're unlucky occur between the close and open above, making t something other than 0.
What's the safest way to do this? What if anything is guaranteed about pthread_create regarding file descriptors? Am I guaranteed that if there are 3 file descriptors open before I call pthread_create, then there'll also be 3 open when it's returned and control has been passed to my function on the new thread?
In multi-threaded programs, you need to use dup2 or dup3 to replace file descriptors. The old trick with immediate reuse after close no longer works because other threads and create file descriptors at any time. Such file descriptors can even be created (and closed) implicitly by glibc because many kernel interfaces use file descriptors.
dup2 is the standard interface. Linux also has dup3, with which you can atomically create the file descriptor with the O_CLOEXEC flag set. Otherwise, there would still be a race condition, and the descriptor could leak to a subprocess if the process ever forks and executes a new program.

Sending signalfd to another process

Process A sends a signalfd to process B. What will happen when B attempts to read()? If B adds the signalfd to an epoll, when will epoll_wait return?
There is a clue in the man page:
fork(2) semantics
After a fork(2), the child inherits a copy of the signalfd file descriptor. A read(2) from the file descriptor in the child will return information about signals queued to the child.
signalfds transferred via unix socket should behave the same as those inherited by fork(). Basically, it's irrelevant which process created the signalfd; read()ing from it always returns signals queued to the process that calls read().
There is a weird interaction with epoll, however: Since the epoll event queue is managed outside the context of any particular process, it decides the readiness of the signalfd based on the process which originally called epoll_ctl() to register interest in the signalfd. So if you arrange to watch a signalfd with an epoll FD, and then send both FDs to another process, the receiving process will see inconsistent results: epoll will signal readiness only when the sending process has a signal, but signalfd will return signals for the receiving process.
This situation is particularly easy to get into using fork(). For example, if you initialize an event loop library than uses epoll and signalfd, then call fork() (e.g. to daemonize the process), then try to use the library in the child process, you may find you cannot receive signals. (I spent all day yesterday trying to debug such a problem.)
This is inconsistent, or at least an under-documented corner-case. Read carefully signal(7).
A process A could send a signal (not a signalfd) using kill(2) or killpg(2) to a process B.
The process B is handling a signal (and there are some default behavior to handle some signals). It could install (in a POSIX-ly standardized way) a signal handler using old signal(2) or newer sigaction(2), or it could ask (in a Linux specific way) by using signalfd(2) to get some data on a file descriptor.
So signalfd gives on success a fresh file descriptor, like open or socket do.
Read the signalfd(2) documentation, it explains what is happening on B side when it reads (the kernel is sending some struct signalfd_siginfo, I imagine from the point of view of the process getting the signal, not of the process reading the file descriptor, see kernel's source file fs/signalfd.c), or waits with poll or epoll on the file descriptor given by signalfd ; the polling will succeed when a signal has been received by B.
A successful signalfd is just getting an opened file descriptor (like the file descriptors open, socket, accept, pipe are giving you) and you won't share that file descriptor with unrelated processes.
I won't make any supposition on what happens if you dare sending that file descriptor using sendmsg(2) on a unix(7) socket using SCM_RIGHTS to some other process. I guess it would be similar to pipe(7)-s or fifo(7)-s or netlink(7)-s. But I certainly won't do that: signalfd is Linux specific, and you are in an undocumented corner-case situation. Read the kernel source code to understand what is happening, or ask on kernelnewbies. And don't expect too much future kernels to behave consistently with present ones on that undocumented aspect ...

Call function in running Linux process using gdb?

I have a running Linux process that is stuck on poll(). It has some data in a buffer, but this buffer is not yet written to disk. Ordinarily I'd kill the process which would cause it to flush the buffer and and exit.
However, in this case the file it's writing to has been deleted from the file system, so I need the process to write the buffer before it exits, while the inode is still reachable via /proc//fd/
Is it possible to "kick" the process out of the poll() call and single step it until it has flushed the buffer to disk using GDB?
(For the curious, the source code is here: http://sourcecodebrowser.com/alsa-utils/1.0.15/arecordmidi_8c_source.html)

What happens when a process is forked?

I've read about fork and from what I understand, the process is cloned but which process? The script itself or the process that launched the script?
For example:
I'm running rTorrent on my machine and when a torrent completes, I have a script run against it. This script fetches data from the web so it takes a few seconds to complete. During this time, my rtorrent process is frozen. So I made the script fork using the following
my $pid = fork();
if ($pid == 0) { blah blah blah; exit 0; }
If I run this script from the CLI, it comes back to the shell within a second while it runs in the background, exactly as I intended. However, when I run it from rTorrent, it seems to be even slower than before. So what exactly was forked? Did the rtorrent process clone itself and my script ran in that, or did my script clone itself? I hope this makes sense.
The fork() function returns TWICE! Once in the parent process, and once in the child process. In general, both processes are IDENTICAL in every way, as if EACH one had just returned from fork(). The only difference is that in one, the return value from fork() is 0, and in the other it is non-zero (the PID of the child process).
So whatever process was running your Perl script (if it is an embedded Perl interpreter inside rTorrent then rTorrent would be the process) would be duplicated at exactly the point that the fork() happened.
I believe I found the problem by looking through rTorrent's source. For some processes, it will read all of the output sent to stdout before continuing. If this is happening to your process, rTorrent will block until you close the stdout process. Because you're forking, your child process shares the same stdout as the parent. Your parent process will exit, but the pipe remains open (because your child process is still running). If you did an strace of rTorrent, I'd bet that it'd be blocked on this read() call while executing your command.
Try closing/redirecting stdout in your perl script before the fork().
The entire process containing the interpreter forks. Fortunately memory is copy-on-write so it doesn't need to copy all the process memory in order to fork. However, things such as file descriptors remain open. This allows child processes to handle them, but may cause issues if they aren't closed appropriately. In general, fork() should not be used in an embedded interpreter except under extreme duress.
To answer the nominal question, since you commented that the accepted answer fails to do so, fork affects the process in which it is called. In your example of rTorrent spawning a Perl process which then calls fork, it is the Perl process which is duplicated, since it was the Perl process which called fork.
In the general case, there is no way for a process to fork any process other than itself. If it were possible to tell another arbitrary process to go fork itself, that would open up no end of security and performance issues.
My advice would be "don't do that".
If the Perl interpreter is embedded within the rtorrent process, you've almost certainly forked an entire rtorrent process, the effects of which are probably ill-defined at best. It's generally a bad idea to play with process-level stuff in an embedded interpreter regardless of language.
There's an excellent chance that some sort of lock is not being properly released, or that threads within the processes are proceeding in unintended and possibly competing ways.
When we create a process using fork the child process will have the copy of the address space.So the child also can use the address space.And it also can access the files which is opened by the parent.We can have the control over the child.To get the complete status of the child we can use wait.

Resources