What happens to wait() if multiple child process terminate simultaneously? - linux

Say I create a process with multiple child process and I call wait() in the main process. If one child terminates, its pid is returned. But what happens if a couple of child process terminate simultaneously? The call should return with one of them, and a second call should return with the other, right? Is there a particular order in which they will return (maybe there is a precedence to the child with lower pid)?

No.
SUSv4 leaves explicitly unspecified in which order (if any) child processes are reaped by one or several wait calls. There is also no "accidential" order that you could rely on, since different Linux kernel versions perform differently. (Source: M. Kerrisk, TLPI, 26.1.1, page 542).
Somewhat related trivia:
You might wonder why you can reliably wait on several child processes that terminate concurrently at all. If you think about how signals work, you might be inclined to believe that it is perfectly possible to lose child termination signals. Signals, a well-known fact, are not queued (except for realtime signals, but SIGCHLD isn't one!). Which means that if you go strictly by the letter of the book, then clearly several children terminating could cause some child termination signals becoming lost!
You can only call wait once at the same time, so you can at most consume one signal synchronously as it is generated, and have a second one made pending before your next call to wait. It appears that there is no way to account for any other signals that are generated in the mean time.
Luckily, that isn't the case. Waiting on child processes demonstrably works fine. Always, 100%. But why?
The reason is simple, the kernel is "cheating". If a child process exits while you are blocked in wait, there is no question as to what's happening. The signal is immediately delivered, the status information is being filled in, the parent unblocks, and the child process goes * poof *.
On the other hand, if a child is exiting and the parent isn't in a wait call (or if the parent is in a wait call reaping another child), the system converts the exiting process to a "zombie".
Now, when the parent process performs a wait, and there are any zombie processes, it won't block waiting for a signal to be generated (which may never happen if all the children have already exited!). Instead, it will just reap one of the zombies, pretending the signal was delivered just now.

Related

Does kill(SIGSTOP) take effect by the time kill() returns?

Suppose I have a parent process and a child process (started with e.g. fork() or clone()) running on Linux. Further suppose that there is some shared memory that both the parent and the child can modify.
Within the context of the parent process, I would like to stop the child process and know that it has actually stopped, and moreover that any shared memory writes made by the child are visible to the parent (including whatever synchronization or cache flushes that may require in a multi-processor system).
This answer, which speaks of using kill(SIGSTOP) to stop a child process, contains an interesting tidbit:
When the first kill() call succeeds, you can safely assume that the child has stopped.
Is this statement actually true, and if so, can anyone expound on it, or point me to some more detailed documentation (e.g. a Linux manpage)? Otherwise, is there another mechanism that I can use to ensure that the child process is completely stopped and is not going to be doing any more writes to the shared memory?
I'm imagining something along the lines of:
the parent sends a different signal (e.g. SIGUSR1), which the child can handle
the child handles the SIGUSR1 and does something like a pthread_cond_wait() in the signal handler to safely "stop" (though still running from the kernel perspective) -- this is not fully fleshed out in my mind yet, just an idea
I'd like to avoid reinventing the wheel if there's already an established solution to this problem. Note that the child process needs to be stopped preemptively; adding some kind of active polling to the child process is not an option in this case.
If it only existed on Linux, pthread_suspend() would be perfect ...
It definitely sounds like you should be using a custom signal with a handler, and not sigstop.
It's rare not to care about the state of the child at all, e.g. being fine with it having stored 32bits out of a single non-atomic 64bit write, or logically caught between two dependent writes.
Even if you are, POSIX allows the OS to not make shared writes immediately visible to other processes, so the child should have a chance to call msync for portability, to ensure that writes are completely synced.
The POSIX documentation on Signal Concepts strongly suggests, but does not explicitly say, that the targeted process will be STOPped by the time kill() returns:
A signal is said to be "generated" for (or sent to) a process or thread when the event that causes the signal first occurs... Examples of such events include ... invocations of the kill() and sigqueue() functions.
The documentation is at pains to distinguish signal generation from delivery (when the signal action takes effect) or acceptance. Unfortunately, it sometimes mentions actions taken in response to a stop signal upon generation, and sometimes upon delivery. Given that something must happen upon generation per se, I'd agree that the target process must be STOPped by the time your call returns.
However, at the cost of another syscall, you can be sure. Since you have a parent/child relationship in your design, you can waitpid()/WUNTRACED to receive notification that your child process has, indeed, STOPped.
Edit
See the other answer from that other guy [sic] for reasons why you might not want to do this.

exit and wait function in UNIX (or LINUX)

I’m writing a program that simulates an unix-based operating system and I have some questions:
From unix.org
The wait() function will suspend execution of the calling thread
until status information for one of its terminated child processes is available, or until delivery of a signal whose action is either to execute a signal-catching function or to terminate the process
Let's imagine there is process A with two child processes B and C. If B and C call the exit function, and then A calls the wait function, which exit status will be retrieved? The one from B or the one from C? Which first and why?
2.When the process is in the waiting state, it doesn't execute its code until for example the status information for one of the terminated child processes is available, is that right?
So it can't for example call a fork fuction while waiting, is that correct?
3.Are there any restriction on when a process can normally be killed in UNIX?
3.a. Are users authorized to kill root processes? (all of the root processes at will?)
wait() returns the PID of whatever child process exited. If two have exited, you must call wait() twice and check the returned PIDs. You shouldn't rely on the order.
Correct, the entire purpose wait() (without the WNOHANG option) is to block. So you cannot do anything else, apart from handling signals, in the waiting process.
I'm not sure exactly what you mean here, but I suspect the answer is mostly "no."
Users cannot kill root processes (at least, not without special configuration). Users also cannot kill processes owned by other users.
It is indeterminate whether B or C will be reported first.
While the process is in wait(), it can do nothing else (in a single-threaded process).
No restrictions for the most part. There are non-interruptible system calls, but the system tries to avoid getting processes hung in them.
No; a user can kill their own processes. User root can kill other people's process (in general); but no-one else can kill root's processes.

Proper way to use fork() and wait()

I have just started learning about fork and wait in Linux and came across this paragraph in the wait() manual page notes:
A child that terminates, but has not been waited for becomes a "zombie". The kernel maintains a minimal set of information about the zombie process (PID, termination status, resource usage information) in order to allow the parent to later perform a wait to obtain information about the child. As long as a zombie is not removed from the system via a wait, it will consume a slot in the kernel process table, and if this table fills, it will not be possible to create further processes. If a parent process terminates, then its "zombie" children (if any) are adopted by init(8), which automatically performs a wait to remove the zombies.
A question that came to mind after reading this:
Isn't the fact that not using wait() causes a resource waste until the parent terminates, a problem that amplifies when the parent process is meant to be a long lived process in the system?
Does this means I should always use wait() as soon as possible after using fork?
Isn't the fact that not using wait() will cause a resource waste until
the parent will terminate?
When a child process is running, there's no wastage of resource; it's still doing its task. The resource waste that your citation talks about is only when a child dies but it's parent hasn't reaped it yet i.e. not wait()ed on the child process.
a problem that amplifies when the parent process is meant to be a long
lived process in the system?
When your application runs for a very longtime and keeps forking children, there's a chance that the system might run out of resources when many child process are still running or the parent process didn't reap the exited children. It's the job of the application process to to optimally manage the resources on the system and reaping the child processes as soon as they might have done.
Does this means I should always use wait() as soon as possible after
using fork?
There's no straight "as early" or "as late" kind of answer to this. For example, parent process might want to carry on do something useful when the child is still running rather than waiting (It might be unnecessary to even check periodically if children status with WNOHANG when parent knows the children might have a long tasks to finish). So in this case, waiting as soon as forking a process might not be what you want. In general, parent should call wait() whenever it expects the child(ren) to have completed its task (or wants to know the stauts of children). The responsibility lies with the programmer to code correctly and call wait() at the most appropriate time.

System V msg_send interrupted by SIGKILL

I have a multi-process application that works like so...
There is a parent process. The parent process queries a database to find work, then forks children to process that work. The children communicate back to the parent via System V message queues to indicate they're done with their work. When the parent process picks up that message, it updates the database to indicate that the work is complete.
This works okay but I'm struggling with handling the parent process being killed.
What happens is the parent receives a SIGINT(from CTRL-C), and then sends SIGKILLs to each of the children. If a child is currently blocking on a Sys V message queue write when it receives that signal, the write is "interrupted" by the signal and the blocking canceled and the parent never learns that the child's work was done, and the database never gets updated.
That means that the next time I run the script, it will re-run any work that was blocking on the System V queue write.
I don't have a good idea for a solution for this yet. Ideally I would like to be able to force the queue write to remain blocking even when it receives that SIGKILL but I don't think such a thing is possible.
Well SIGKILL is, by definition, immediately fatal to the process which receives it and cannot be trapped or handled.
That is why you should only use it as a last resort, when the process does not respond to more polite requests to shut down. Your parent process should start off by sending something like SIGINT or SIGTERM to the children, and only reset to SIGKILL if they don't exit within a reasonable period of time.
Signals like SIGINT and SIGTERM may still cause the system call in the child to return, with EINTR, but you can handle that and retry the call and let it complete before exiting.

Should I be worried about the order, in which processes in a process goup receive signals?

I want to terminate a process group by sending SIGTERM to processes within it. This can be accomplished via the kill command, but the manuals I found provide few details about how exactly it works:
int kill(pid_t pid, int sig);
...
If pid is less than -1, then sig is sent to every process in
the process group whose ID is -pid.
However, in which order will the signal be sent to the processes that form the group? Imagine the following situation: a pipe is set between master and slave processes in the group. If slave is killed during processing kill(-pid), while the master is still not, the master might report this as an internal failure (upon receiving notification that the child is dead). However, I want all processes to understand that such termination was caused by something external to their process group.
How can I avoid this confusion? Should I be doing something more than mere kill(-pid,SIGTERM)? Or it is resolved by underlying properties of the OS, about which I'm not aware?
Note that I can't modify the code of the processes in the group!
Try doing it as a three-step process:
kill(-pid, SIGSTOP);
kill(-pid, SIGTERM);
kill(-pid, SIGCONT);
The first SIGSTOP should put all the processes into a stopped state. They cannot catch this signal, so this should stop the entire process group.
The SIGTERM will be queued for the process but I don't believe it will be delivered, since the processes are stopped (this is from memory, and I can't currently find a reference but I believe it is true).
The SIGCONT will start the processes again, allowing the SIGTERM to be delivered. If the slave gets the SIGCONT first, the master may still be stopped so it will not notice the slave going away. When the master gets the SIGCONT, it will be followed by the SIGTERM, terminating it.
I don't know if this will actually work, and it may be implementation dependent on when all the signals are actually delivered (including the SIGCHLD to the master process), but it may be worth a try.
My understanding is that you cannot rely on any specific order of signal delivery.
You could avoid the issue if you send the TERM signal to the master process only, and then have the master kill its children.
Even if all the various varieties of UNIX would promise to deliver the signals in a particular order, the scheduler might still decide to run the critical child process code before the parent code.
Even your STOP/TERM/CONT sequence will be vulnerable to this.
I'm afraid you may need something more complicated. Perhaps the child process could catch the SIGTERM and then loop until its parent exits before it exits itself? Be sure and add a timeout if you do this.
Untested: Use shared memory and put in some kind of "we're dying" semaphore, which may be checked before I/O errors are treated as real errors. mmap() with MAP_ANONYMOUS|MAP_SHARED and make sure it survives your way of fork()ing processes.
Oh, and be sure to use the volatile keyword or your semaphore is optimized away.

Resources