Delivery of signal to mulitithreaded program with sigmask

Delivery of signal to mulitithreaded program with sigmask - linux

If I have a program that has N running threads, and N-1 of them block delivery of the SIGUSR1 signal using pthread_sigmask:
int rc;
sigset_t signal_mask;
sigemptyset(&signal_mask);
sigaddset(&signal_mask, SIGUSR1);
rc = pthread_sigmask(SIG_BLOCK, &signal_mask, NULL);
if (rc != 0) {
// handle error
}
When the OS (Linux, recent kernel) delivers SIGUSR1 to the process, is it guaranteed to be delivered to the unblocked thread? Or could it, for example, try some subset of the blocked threads and then give up?

Yes, it is guaranteed that a process-directed signal will be delivered to one of the threads that has it unblocked (if there are any). The relevant quote from POSIX Signal Generation and Delivery:
Signals generated for the process shall be delivered to exactly one of
those threads within the process which is in a call to a sigwait()
function selecting that signal or has not blocked delivery of the
signal.

Related

When several signals arrive at a process, what is the order between the process handling the signals?

When several signals arrives at a process, what is the order between the process handling the signals?
What data structure is used to store the signals which have arrived at a process but not yet been delivered?
For example, from APUE
Since the process group is orphaned when the parentterminates, POSIX.1 requires that every process in the newly orphaned process group that is stopped (as our child is) be sent the hang-up signal (SIGHUP) followed by the continue signal (SIGCONT)
This causes the child to be continued, after processing the hang-up signal. The
default action for the hang-up signal is to terminate the process, so we have to
provide a signal handler to catch the signal. We therefore expect the printf in
the sig_hup function to appear before the printf in the pr_ids function.
As order of SIGCONT and SIGHUP sent to orphaned linux process group says:
The SIGHUP cannot be delivered until the child's execution is resumed.
When a process is stopped, all signal delivery is suspended except for
SIGCONT and SIGKILL.
So, the SIGHUP does arrive first, but it cannot be processed until the
SIGCONT awakens the process execution.
SIGHUP arrives before SIGCONT at a stopped process. SIGHUP can't be delievered while SIGCONT can be.
Is SIGCONT handled before or after SIGHUP? The first quote seems to say "after", while the second quote seems to say "before" by "until".
If "before":
How can SIGCONT be arranged to jump ahead of SIGHUP to be delivered?
How can SIGHUP not be discarded when SIGCONF jumps before it to be delievered?
Are the above implemented based on some data structures such as FIFO queue or FILO stack?
Thanks.

The situation is probably confused by different implementations and by the introduction of POSIX real time signals. signal(7) says that real-time signals are distinguished from old style signals by
Real-time signals are delivered in a guaranteed order. Multiple
real-time signals of the same type are delivered in the order
they were sent. If different real-time signals are sent to a
process, they are delivered starting with the lowest-numbered
signal. (I.e., low-numbered signals have highest priority.) By
contrast, if multiple standard signals are pending for a process,
the order in which they are delivered is unspecified.
As of the old style signals described in "The Design of the Unix Operating System" by Bach (before the introduction of POSIX real-time signals).
To send a signal to a process, the kernel sets a bit in the signal field of the process table entry, corresponding to the type of signal received. ... the kernel checks for receipt of a signal when the process returns from kernel mode to user mode and when it leaves the sleep state at a suitably low signalling priority.
You can see some of the current linux data structures at sched.h. Looking at this I suspect that the old-style bitmap has gone, and a combination of bitmap and linked list is used to handle both old style and POSIX real time signals, but I have not gone through enough of the code to be sure of this.

To add to mcdowella's response:
1) The specifics of "signal handling" can vary from platform to platform
2) In the specific case of Linux:
http://man7.org/linux/man-pages/man7/signal.7.html
Linux supports both POSIX reliable signals (hereinafter "standard
signals") and POSIX real-time signals.
3) See also:
About the delivery of standard signals
Why Linux decides that standard signals have higher priority than rt-signals?
The Linux Kernel - Signals

SIGCONT has special semantics.
Regardless of whether SIGCONT is caught, is ignored, or has default disposition, its generation will clear all pending stop signals and resume execution of a stopped process. [IEEE Std 1003.1-2017] Again, this resumption happens before any other signals are delivered, and even before SIGCONT's handler (if any) is invoked.
(This special “dispositionless” semantic makes sense. In order for a process to execute a signal handler, the process must itself be executing.)
POSIX is clearer than APUE here, saying that "[t]he default action for SIGCONT is to resume execution at the point where the process was stopped, after first handling any pending unblocked signals."
As others have mentioned, the actual order in which pending signals are delivered is implementation-specific. Linux, at least, delivers basic UNIX signals in ascending numeric order.
To demonstrate all this, consider the following code. It STOPs a process, then sends it several signals, then CONTinues it, having installed handlers for all catchable signals so we can see what is handled when:
#define _POSIX_SOURCE
#include <signal.h>
#include <sys/types.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
static int signals[] = { SIGSTOP, SIGURG, SIGUSR1, SIGHUP, SIGCONT, 0 };
static void
handler(int signo) {
// XXX not async-signal-safe
printf("<signal %d>\n", signo);
}
int
main(int argc, char **argv) {
int *sig = signals;
struct sigaction sa = { .sa_flags = 0, .sa_handler = handler };
sigfillset(&sa.sa_mask);
sig++; // can't catch SIGSTOP
while (*sig) {
sigaction(*sig, &sa, NULL); // XXX error check
sig++;
}
if (fork() == 0) { // XXX error check
sleep(2); // faux synchronization - let parent pause()
sig = signals;
while (*sig) {
printf("sending signal %d\n", *sig);
kill(getppid(), *sig);
sig++;
}
exit(0);
}
pause();
return 0;
}
For me, this prints
sending signal 19
sending signal 23
sending signal 10
sending signal 1
sending signal 18
<signal 1>
<signal 10>
<signal 18>
<signal 23>

Can ioctl system calls in linux be interrupted by signals?

since signals are asynchronous in nature there is a chance that it can interrupt a process whenever a process is running in kernel mode or user Mode.
For example, In Robert Love's system programming , read function was done which checks for -EINTR and will restart the system call with remaining bytes to read.
ssize_t ret;
while (len != 0 && (ret = read (fd, buf, len)) != 0) {
if (ret == -1) {
if (errno == EINTR)
continue;
perror ("read");
break;
}
len -= ret;
buf += ret;
}
I came across few ioctl calls from userspace which simply return error by checking the error code < 0. I found similar thing was done in the below link as well.
IOCTL call and checking return value
will same mechanism(checking -EINTR in read call) be needed for other system calls like ioctl? Is is always necessary to check the return code for every system call for -EINTR irrespective of whether my program involves signal handling or not?
I also heard about the automatic restart functionality supported by linux for certain system calls, so drivers return -ERESTARTSYS if corresponding ioctl call fails, but i am not sure ioctl system all will come under this category and will be restarted transparently so userspace donot have to worry about ioctl failure due to signal interruption?
What happens if ioctl was partially executed and signal interrupted in the middle? Is still kernel automatically restart the ioctl call without userspace involvement?

Ioctl is a device driver wilcard call (meaning that the driver is free to implement whatever functionality it wants, sleeping or not), so if it decides to sleep on an event, it decides also if it wants to be interrupted by a kernel interrupt. If it decides not to be interrupted, the kernel will handle the interrupt only after the process wakes up (or will deliver it to another thread that is not in such a state)
Anyway, interrupts are delivered to user process only in user mode, so never the kernel will suspend a system call, to go user mode to execute the user signal handler, then return to kernel mode to end the system call in course. This would break the atomicity of system calls. If the ioctl call sleeps in non-interruptible mode, the signal will have to wait and any handler (or system action in due to the interrupt) will happen after the ioctl call.
In multithreaded environments, a process with multiple threads can receive the interrupt (another, different thread) if one of the threads is in such a situation. The kernel simply delivers the signal to one of the available threads that is able to handle it.

pthread_sigmask not working properly with aio callback threads

My application is sometimes terminating from SIGIO or SIGUSR1 signals even though I have blocked these signals.
My main thread starts off with blocking SIGIO and SIGUSR1, then makes 2 AIO read operations. These operations use threads to get notification about operation status. The notify functions (invoked as detached threads) start another AIO operation (they manipulate the data that has been read and start writing it back to the file) and notification is handled by sending signal (one operation uses SIGIO, the other uses SIGUSR1) to this process. I am receiving these signals synchronously by calling sigwait in the main thread. Unfortunately, sometimes my program crashes, being stopped by SIGUSR1 or SIGIO signal (which should be blocked by a sigmask).
One possible solution is to set SIG_IGN handlers for them but this doesn't solve the problem. Their handlers shouldn't be invoked, rather should they be retrieved from pending signals by sigwait in the next iteration of the main program loop.
I have no idea which thread handles this signal in this manner. Maybe it's the init who receives this signal? Or some shell thread? I have no idea.

I'd hazard a guess that the signal is being received by one of your AIO callback threads, or by the very thread which generates the signal. (Prove me wrong and I'll delete this answer.)
Unfortunately per the standard, "[t]he signal mask of [a SIGEV_THREAD] thread is implementation-defined." For example, on Linux (glibc 2.12), if I block SIGUSR1 in main, then contrive to run a SIGEV_THREAD handler from an aio_read call, the handler runs with SIGUSR1 unblocked.
This makes SIGEV_THREAD handlers unsuitable for an application that must reliably and portably handle signals.

Handling multiple SIGCHLD

In a system running Linux 2.6.35+ my program creates many child processes and monitors them. If a child process dies I do some clean-up and spawn the process again. I use signalfd() to get the SIGCHLD signal in my process. signalfd is used asynchronously using libevent.
When using signal handlers for non-real time signals, while the signal handler is running for a particular signal further occurrence of the same signal has to be blocked to avoid getting into recursive handlers. If multiple signals arrive at that time then kernel invokes the handler only once (when the signal is unblocked).
Is it the same behavior when using signalfd() as well? Since signalfd based handling doesn't have the typical problems associated with the asynchronous execution of the normal signal handlers I was thinking kernel can queue all the further occurrences of SIGCHLD?
Can anyone clarify the Linux behavior in this case ...

On Linux, multiple children terminating before you read a SIGCHLD with signalfd() will be compressed into a single SIGCHLD. This means that when you read the SIGCHLD signal, you have to clean up after all children that have terminated:
// Do this after you've read() a SIGCHLD from the signalfd file descriptor:
while (1) {
int status;
pid_t pid = waitpid(-1, &status, WNOHANG);
if (pid <= 0) {
break;
}
// something happened with child 'pid', do something about it...
// Details are in 'status', see waitpid() manpage
}
I should note that I have in fact seen this signal compression when two child processed terminated at the same time. If I did only a single waitpid(), one of the children that terminated was not handled; and the above loop fixed it.
Corresponding documentation:
http://man7.org/linux/man-pages/man7/signal.7.html "By contrast, if multiple instances of a standard signal are delivered while that signal is currently blocked, then only one instance is queued"
http://man7.org/linux/man-pages/man3/sigwait.3p.html "If prior to the call to sigwait() there are multiple pending instances of a single signal number, it is implementation-defined whether upon successful return there are any remaining pending signals for that signal number."

Actually the hassle-free way would be the waitfd functionally that would allow you to add a specific pid to poll()/epoll(). Unfortunately, it wasn't accepted to Linux years ago when it was proposed.

How do I suspend another thread (not the current one)?

I'm trying to implement a simulation of a microcontroller. This simulation is not meant to do a clock cycle precise representation of one specific microcontroller but check the general correctness of the code.
I thought of having a "main thread" executing normal code and a second thread executing ISR code. Whenever an ISR needs to be run, the ISR thread suspends the "main thread".
Of course, I want to have a feature to block interrupts.
I thought of solving this with a mutex that the ISR thread holds whenever it executes ISR code while the main thread holds it as long as "interrupts are blocked".
A POR (power on reset) can then be implemented by not only suspending but killing the main thread (and starting a new one executing the POR function).
The windows API provides the necessary functions.
But it seems to be impossible to do the above with posix threads (on linux).
I don't want to change the actual hardware independent microcontroller code. So inserting anything to check for pending interrupts is not an option.
Receiving interrupts at non well behaved points is desirable, as this also happens on microcontrollers (unless you block interrupts).
Is there a way to suspend another thread on linux? (Debuggers must use that option somehow, I think.)
Please, don't tell me this is a bad idea. I know that is true in most circumstances. But the main code does not use standard libs or lock/mutexes/semaphores.

SIGSTOP does not work - it always stops the entire process.
Instead you can use some other signals, say SIGUSR1 for suspending and SIGUSR2 for resuming:
// at process start call init_pthread_suspending to install the handlers
// to suspend a thread use pthread_kill(thread_id, SUSPEND_SIG)
// to resume a thread use pthread_kill(thread_id, RESUME_SIG)
#include <signal.h>
#define RESUME_SIG SIGUSR2
#define SUSPEND_SIG SIGUSR1
static sigset_t wait_mask;
static __thread int suspended; // per-thread flag
void resume_handler(int sig)
{
suspended = 0;
}
void suspend_handler(int sig)
{
if (suspended) return;
suspended = 1;
do sigsuspend(&wait_mask); while (suspended);
}
void init_pthread_suspending()
{
struct sigaction sa;
sigfillset(&wait_mask);
sigdelset(&wait_mask, SUSPEND_SIG)
sigdelset(&wait_mask, RESUME_SIG);
sigfillset(&sa.sa_mask);
sa.sa_flags = 0;
sa.sa_handler = resume_handler;
sigaction(RESUME_SIG, &sa, NULL);
sa.sa_handler = suspend_handler;
sigaction(SUSPEND_SIG, &sa, NULL);
}
I am very annoyed by replies like "you should not suspend another thread, that is bad".
Guys why do you assume others are idiots and don't know what they are doing? Imagine that others, too, have heard about deadlocking and still, in full consciousness, want to suspend other threads.
If you don't have a real answer to their question why do you waste your and the readers' time.
An yes, IMO pthreads are very short-sighted api, a disgrace for POSIX.

The Hotspot JAVA VM uses SIGUSR2 to implement suspend/resume for JAVA threads on linux.
A procedure based on on a signal handler for SIGUSR2 might be:
Providing a signal handler for SIGUSR2 allows a thread to request a lock
(which has already been acquired by the signal sending thread).
This suspends the thread.
As soon as the suspending thread releases the lock, the signal handler can
(and will?) get the lock. The signal handler releases the lock immediately and
leaves the signal handler.
This resumes the thread.
It will probably be necessary to introduce a control variable to make sure that the main thread is in the signal handler before starting the actual processing of the ISR.
(The details depend on whether the signal handler is called synchronously or asynchronously.)
I don't know, if this is exactly how it is done in the Java VM, but I think the above procedure does what I need.

Somehow I think sending the other thread SIGSTOP works.
However, you are far better off writing some thread communication involving senaogires.mutexes and global variables.
You see, if you suspend the other thread in malloc() and you call malloc() -> deadlock.
Did I mention that lots of C standard library functions, let alone other libraries you use, will call malloc() behind your back?
EDIT:
Hmmm, no standard library code. Maybe use setjmp/longjump() from signal handler to simulate the POR and a signal handier to simulate interrupt.
TO THOSE WHO KEEP DOWNVOTING THIS: The answer was accepted for the contents after EDIT, which is a specific scenario that cannot be used in any other scenario.

Solaris has the thr_suspend(3C) call that would do what you want. Is switching to Solaris a possibility?
Other than that, you're probably going to have to do some gymnastics with mutexes and/or semaphores. The problem is that you'll only suspend when you check the mutex, which will probably be at a well-behaved point. Depending on what you're actually trying to accomplish, this might now be desirable.

It makes more sense to have the main thread execute the ISRs - because that's how the real controller works (presumably). Just have it check after each emulated instruction if there is both an interrupt pending, and interrupts are currently enabled - if so, emulate a call to the ISR.
The second thread is still used - but it just listens for the conditions which cause an interrupt, and mark the relevant interrupt as pending (for the other thread to later pick up).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string