I have this issue with signals being lost. I mean I have this system, where signals are generated by child processes and received by other child processes of a parent process. I have used sigwait and sigprocmask to actually block and then wait for signals inside the signal receiving child processes rather than registering an asynchronous handler.
Now when I run this system. I can see that initially, the generated signals from the child processes are blocked by the receiving child processes and then using sigwait they actually process these pending signals. So the signals are pending and then fetched using sigwait and it goes on.
But as the time passes, I could see that signals are not consumed as much as before. I mean there are lots of signals generated and they are not being processed by the receiving processes. Is it possible that if I have lots of signals pending then it could result in the signals being lost?
Only real-time signals (those between SIGRTMIN and SIGRTMAX), if your OS supports them, are guaranteed to be queued (up to a maximum of SIGQUEUE_MAX queued signals). Other signals may be lost if the receiving process already has a pending signal with the same code.
From the specification for sigaction:
If a subsequent occurrence of a pending signal is generated, it is implementation-dependent as to whether the signal is delivered or accepted more than once in circumstances other than those in which queueing is required under the Realtime Signals Extension option.
Pending signals are not queued which will result in them being lost if you don't explicitly check for them or handle them. That's why you should check that all children terminate.
Sources: Link to some old lecture material.
Related
I'm a beginner in Linux and Process signal handling.
Let's say we have a process A and it execute pause() function, we know that puts the current process to sleep until a signal is received by the process.
But when we type ctrl-c, kernel also sends a SIGINT to process A and when A receives the signal, it execute the SIGINT's default handler which is terminating the current process. So my question is:
Does the process A resume first or handler get executed first?
For simplicity, let's assume process A has only a single thread, which is blocking in a pause() call, and exactly one signal gets sent to the process.
Does the process A resume first or handler get executed first?
The signal handler gets executed first, then the pause() call returns.
What if there are multiple signals?
Standard signals are not queued, so if you send say two INT signals to the process very quickly in succession, only one of them is delivered.
If there are multiple signals, the order is unspecified.
What about POSIX realtime signals? (SIGRTMIN+0 to SIGRTMAX-0)
They are just like standard named signals, except they are queued (to a limit), and if more than one of them is pending, they get delivered in increasing numerical order.
If there are both standard and realtime signals pending, it is unspecified which ones get delivered first; although in practice, in Linux and many other systems, the standard signals get delivered first, then the realtime ones.
What if there are multiple threads in the process?
The kernel will pick one thread among those that do not have the signal masked (via sigprocmask() or pthread_sigmask()), and use that thread to deliver the signal to the signal handler.
If there are more than one thread blocking in a pause() call, one of them gets woken up. If there are more than one pending signal, it is unspecified whether the one woken thread handles them all, or if more than one thread is woken up.
In general, I warmly recommend reading the man 7 signal, man 7 signal-safety, man 2 sigaction, man 2 sigqueue, and man 2 sigwaitinfo man pages. (While the links go to the Linux man pages project, each of the pages includes a Conforming To section naming the related standards, and Linux-specific behaviour is clearly marked.)
My application is sometimes terminating from SIGIO or SIGUSR1 signals even though I have blocked these signals.
My main thread starts off with blocking SIGIO and SIGUSR1, then makes 2 AIO read operations. These operations use threads to get notification about operation status. The notify functions (invoked as detached threads) start another AIO operation (they manipulate the data that has been read and start writing it back to the file) and notification is handled by sending signal (one operation uses SIGIO, the other uses SIGUSR1) to this process. I am receiving these signals synchronously by calling sigwait in the main thread. Unfortunately, sometimes my program crashes, being stopped by SIGUSR1 or SIGIO signal (which should be blocked by a sigmask).
One possible solution is to set SIG_IGN handlers for them but this doesn't solve the problem. Their handlers shouldn't be invoked, rather should they be retrieved from pending signals by sigwait in the next iteration of the main program loop.
I have no idea which thread handles this signal in this manner. Maybe it's the init who receives this signal? Or some shell thread? I have no idea.
I'd hazard a guess that the signal is being received by one of your AIO callback threads, or by the very thread which generates the signal. (Prove me wrong and I'll delete this answer.)
Unfortunately per the standard, "[t]he signal mask of [a SIGEV_THREAD] thread is implementation-defined." For example, on Linux (glibc 2.12), if I block SIGUSR1 in main, then contrive to run a SIGEV_THREAD handler from an aio_read call, the handler runs with SIGUSR1 unblocked.
This makes SIGEV_THREAD handlers unsuitable for an application that must reliably and portably handle signals.
Say I create a process with multiple child process and I call wait() in the main process. If one child terminates, its pid is returned. But what happens if a couple of child process terminate simultaneously? The call should return with one of them, and a second call should return with the other, right? Is there a particular order in which they will return (maybe there is a precedence to the child with lower pid)?
No.
SUSv4 leaves explicitly unspecified in which order (if any) child processes are reaped by one or several wait calls. There is also no "accidential" order that you could rely on, since different Linux kernel versions perform differently. (Source: M. Kerrisk, TLPI, 26.1.1, page 542).
Somewhat related trivia:
You might wonder why you can reliably wait on several child processes that terminate concurrently at all. If you think about how signals work, you might be inclined to believe that it is perfectly possible to lose child termination signals. Signals, a well-known fact, are not queued (except for realtime signals, but SIGCHLD isn't one!). Which means that if you go strictly by the letter of the book, then clearly several children terminating could cause some child termination signals becoming lost!
You can only call wait once at the same time, so you can at most consume one signal synchronously as it is generated, and have a second one made pending before your next call to wait. It appears that there is no way to account for any other signals that are generated in the mean time.
Luckily, that isn't the case. Waiting on child processes demonstrably works fine. Always, 100%. But why?
The reason is simple, the kernel is "cheating". If a child process exits while you are blocked in wait, there is no question as to what's happening. The signal is immediately delivered, the status information is being filled in, the parent unblocks, and the child process goes * poof *.
On the other hand, if a child is exiting and the parent isn't in a wait call (or if the parent is in a wait call reaping another child), the system converts the exiting process to a "zombie".
Now, when the parent process performs a wait, and there are any zombie processes, it won't block waiting for a signal to be generated (which may never happen if all the children have already exited!). Instead, it will just reap one of the zombies, pretending the signal was delivered just now.
I have a parent process spawning several child processes. I want to know when any child process exits by registering a SIGCHLD signal handler.
The question is, what happens if another SIGCHLD (or any other signal) is received, while the parent process is already in a signal handler?
I can think of the following outcomes:
The signal is ignored
The signal is queued, and will be processed as soon as the current handler returns
The current handler is in turn interrupted, just like the main program
Which one is correct?
In your concrete example (the same signal being received), the signal is delivered after the signal handler has finished (so bullet point #2 is correct). Note, however, that you may "lose" signals.
The reason for that is that while a signal is being inside its handler, it is blocked. Blocked signals are set to pending, but not queued. The term "pending" means that the operating system remembers that there is a signal waiting to be delivered at the next opportunity, and "not queued" means that it does this by setting a flag somewhere, but not by keeping an exact record of how many signals have arrived.
Thus, you may receive 2 or 3 (or 10) more SIGCHLD while in your handler, but only see one (so in some cases, bullet point #1 can be correct, too).
Note that several flags that you can pass to sigaction can affect the default behaviour, such as SA_NODEFER (prevents blocking signal) and SA_NOCLDWAIT (may not generate signal at all on some systems).
Now of course, if you receive a different type of signal, there's no guarantee that it won't interrupt your handler. For that reason, one preferrably doesn't use non signal safe functions.
Assume a multi-threaded application, with a signal handler defined in it.
Now if a signal is delivered to the PROCESS, and signal handler is invoked - My doubt is what happens to other threads during the period signal handler is running. Do they keep running, as if nothing has happened or they are suspended for that period .. or ...?
Also if someone can tell me WHY to justify the answer?
The specification is pretty clear how signals and threads interact:
Signals generated for the process shall be delivered to exactly one of those threads within the process which is in a call to a sigwait() function selecting that signal or has not blocked delivery of the signal.
As the signal is delivered to exactly one thread, other threads are unaffected (and keep running).
The threads are independent: a signal from one thread to a second thread will not affect any of the others. The why is because they are independent. The only reason why it would affect the others is if the signal handler of the thread in question somehow interacts with other threads.