Process A sends a signalfd to process B. What will happen when B attempts to read()? If B adds the signalfd to an epoll, when will epoll_wait return?
There is a clue in the man page:
fork(2) semantics
After a fork(2), the child inherits a copy of the signalfd file descriptor. A read(2) from the file descriptor in the child will return information about signals queued to the child.
signalfds transferred via unix socket should behave the same as those inherited by fork(). Basically, it's irrelevant which process created the signalfd; read()ing from it always returns signals queued to the process that calls read().
There is a weird interaction with epoll, however: Since the epoll event queue is managed outside the context of any particular process, it decides the readiness of the signalfd based on the process which originally called epoll_ctl() to register interest in the signalfd. So if you arrange to watch a signalfd with an epoll FD, and then send both FDs to another process, the receiving process will see inconsistent results: epoll will signal readiness only when the sending process has a signal, but signalfd will return signals for the receiving process.
This situation is particularly easy to get into using fork(). For example, if you initialize an event loop library than uses epoll and signalfd, then call fork() (e.g. to daemonize the process), then try to use the library in the child process, you may find you cannot receive signals. (I spent all day yesterday trying to debug such a problem.)
This is inconsistent, or at least an under-documented corner-case. Read carefully signal(7).
A process A could send a signal (not a signalfd) using kill(2) or killpg(2) to a process B.
The process B is handling a signal (and there are some default behavior to handle some signals). It could install (in a POSIX-ly standardized way) a signal handler using old signal(2) or newer sigaction(2), or it could ask (in a Linux specific way) by using signalfd(2) to get some data on a file descriptor.
So signalfd gives on success a fresh file descriptor, like open or socket do.
Read the signalfd(2) documentation, it explains what is happening on B side when it reads (the kernel is sending some struct signalfd_siginfo, I imagine from the point of view of the process getting the signal, not of the process reading the file descriptor, see kernel's source file fs/signalfd.c), or waits with poll or epoll on the file descriptor given by signalfd ; the polling will succeed when a signal has been received by B.
A successful signalfd is just getting an opened file descriptor (like the file descriptors open, socket, accept, pipe are giving you) and you won't share that file descriptor with unrelated processes.
I won't make any supposition on what happens if you dare sending that file descriptor using sendmsg(2) on a unix(7) socket using SCM_RIGHTS to some other process. I guess it would be similar to pipe(7)-s or fifo(7)-s or netlink(7)-s. But I certainly won't do that: signalfd is Linux specific, and you are in an undocumented corner-case situation. Read the kernel source code to understand what is happening, or ask on kernelnewbies. And don't expect too much future kernels to behave consistently with present ones on that undocumented aspect ...
Related
Suppose I have a parent process and a child process (started with e.g. fork() or clone()) running on Linux. Further suppose that there is some shared memory that both the parent and the child can modify.
Within the context of the parent process, I would like to stop the child process and know that it has actually stopped, and moreover that any shared memory writes made by the child are visible to the parent (including whatever synchronization or cache flushes that may require in a multi-processor system).
This answer, which speaks of using kill(SIGSTOP) to stop a child process, contains an interesting tidbit:
When the first kill() call succeeds, you can safely assume that the child has stopped.
Is this statement actually true, and if so, can anyone expound on it, or point me to some more detailed documentation (e.g. a Linux manpage)? Otherwise, is there another mechanism that I can use to ensure that the child process is completely stopped and is not going to be doing any more writes to the shared memory?
I'm imagining something along the lines of:
the parent sends a different signal (e.g. SIGUSR1), which the child can handle
the child handles the SIGUSR1 and does something like a pthread_cond_wait() in the signal handler to safely "stop" (though still running from the kernel perspective) -- this is not fully fleshed out in my mind yet, just an idea
I'd like to avoid reinventing the wheel if there's already an established solution to this problem. Note that the child process needs to be stopped preemptively; adding some kind of active polling to the child process is not an option in this case.
If it only existed on Linux, pthread_suspend() would be perfect ...
It definitely sounds like you should be using a custom signal with a handler, and not sigstop.
It's rare not to care about the state of the child at all, e.g. being fine with it having stored 32bits out of a single non-atomic 64bit write, or logically caught between two dependent writes.
Even if you are, POSIX allows the OS to not make shared writes immediately visible to other processes, so the child should have a chance to call msync for portability, to ensure that writes are completely synced.
The POSIX documentation on Signal Concepts strongly suggests, but does not explicitly say, that the targeted process will be STOPped by the time kill() returns:
A signal is said to be "generated" for (or sent to) a process or thread when the event that causes the signal first occurs... Examples of such events include ... invocations of the kill() and sigqueue() functions.
The documentation is at pains to distinguish signal generation from delivery (when the signal action takes effect) or acceptance. Unfortunately, it sometimes mentions actions taken in response to a stop signal upon generation, and sometimes upon delivery. Given that something must happen upon generation per se, I'd agree that the target process must be STOPped by the time your call returns.
However, at the cost of another syscall, you can be sure. Since you have a parent/child relationship in your design, you can waitpid()/WUNTRACED to receive notification that your child process has, indeed, STOPped.
Edit
See the other answer from that other guy [sic] for reasons why you might not want to do this.
In Windows there is the API WaitForMultipleObjects which will, if one event is registered in many threads, only wake one thread if the event occurs. I now have to port an application that uses this in its threadpool and I am looking for the best practive to do this in Linux.
I am aware of epoll which can wait for fds (which i can create with pipe), but waiting on one FD in multiple threads may wake every thread on event when only one is needed.
What would be the best practice to implement this behaviour on Linux? I really dont want to split up an event to have as many FDs as there are worker threads, as this may hit the FD limit on some systems as I have many events (which all would be split up).
What I thought about is create 1 master thread that will delegate work to an available worker (or queue the task if all workers are working), but that would mean that I have one additional context switch (and thus giving up computation time) as the master will wake up and then wake up another worker. I would do this if there is no other possibility to cleanly implement this. Unfortunately I cannot get rid of the current architecture so I need to get around this.
Is there any API that would be applicable for this kind of problem?
epoll() is the correct solution, although you could consider using eventfd() file descriptors rather than pipe() file descriptors for the event signalling. See this text from the epoll(7) man page:
If multiple threads (or processes, if child processes have inherited
the epoll file descriptor across fork(2)) are blocked in
epoll_wait(2) waiting on the same the same epoll file descriptor
and a file descriptor in the interest list that is marked for
edge-triggered (EPOLLET) notification becomes ready, just one of the
threads (or processes) is awoken from epoll_wait(2). This provides
a useful optimization for avoiding "thundering herd" wake-ups in some
scenarios.
So to get this single-wakeup behaviour, you have to be calling epoll_wait() in each thread on the same epoll descriptor, and you have to have registered your event-notifying file descriptors in the epoll set as edge-triggered.
I'm a beginner in Linux and Process signal handling.
Let's say we have a process A and it execute pause() function, we know that puts the current process to sleep until a signal is received by the process.
But when we type ctrl-c, kernel also sends a SIGINT to process A and when A receives the signal, it execute the SIGINT's default handler which is terminating the current process. So my question is:
Does the process A resume first or handler get executed first?
For simplicity, let's assume process A has only a single thread, which is blocking in a pause() call, and exactly one signal gets sent to the process.
Does the process A resume first or handler get executed first?
The signal handler gets executed first, then the pause() call returns.
What if there are multiple signals?
Standard signals are not queued, so if you send say two INT signals to the process very quickly in succession, only one of them is delivered.
If there are multiple signals, the order is unspecified.
What about POSIX realtime signals? (SIGRTMIN+0 to SIGRTMAX-0)
They are just like standard named signals, except they are queued (to a limit), and if more than one of them is pending, they get delivered in increasing numerical order.
If there are both standard and realtime signals pending, it is unspecified which ones get delivered first; although in practice, in Linux and many other systems, the standard signals get delivered first, then the realtime ones.
What if there are multiple threads in the process?
The kernel will pick one thread among those that do not have the signal masked (via sigprocmask() or pthread_sigmask()), and use that thread to deliver the signal to the signal handler.
If there are more than one thread blocking in a pause() call, one of them gets woken up. If there are more than one pending signal, it is unspecified whether the one woken thread handles them all, or if more than one thread is woken up.
In general, I warmly recommend reading the man 7 signal, man 7 signal-safety, man 2 sigaction, man 2 sigqueue, and man 2 sigwaitinfo man pages. (While the links go to the Linux man pages project, each of the pages includes a Conforming To section naming the related standards, and Linux-specific behaviour is clearly marked.)
after attaching a pthread using its pid and manipulating the content of its debug registers, while waiting using waitpid(-1, &status, __WALL) ; I would like to be able to stop that thread and make additional manipulations (defining another breakpoint etc).
when I try sending a signal using kill() and waiting for the thread to be ready for additional ptrace requests, for just one target thread, it works fine. on the other hand, when the number of traced threads increase, i got stuck within waitpid() call and never get unblocked.
is there a safe and fast mechanism to stop an attached thread that is running for additional modifications?
cheers.
When sending a signal to a thread, do not use the pid. Sending a signal to a process (which is what you are doing) sends it to some random thread within that process, which is almost certainly not what you would like to do. The tool to send threads signals is ptrhread_kill.
That's where things become a little more hairy. The ptrace interface uses "thread ID" (or tid). These are framed in the same context as process IDs, i.e. - integers. pthread_kill, on the other hand, uses the pthread_t type, which is an opaque, and is not the same thing.
Since using ptrace means you are in dark magic land already, the simplest solution is to use tgkill. Just place your tid and pid in the relevant fields, and you're golden.
Of course, tgkill is not an exported function. You'll need to wrap it in syscall in order to invoke it.
If a send cause a SIGPIPE signal, which thead would handle it ? The thread which send or a random thread? In other words, the Linux system send the signal by kill or pthread_kill?
Asynchronous signals like SIGPIPE can go to any thread. You can use signal masks to limit which of the threads is eligible.
Synchronous signals like SIGSEGV will be delivered on the thread that caused them.
Summary
The answer to this question has two facets: How the system in question should behave and how it actually behaves.
Since most programmers expect Linux to be mostly POSIX-compatible, we can look into that standard, which actually unambiguously specifies the behavior – the signal is sent directly to the thread which did the write. But whether Linux adheres to it is unclear and Linux documentation is not helpful here. An examination of Linux behavior suggests it conforms to POSIX, but doesn't prove it, and a reading of the source gives us the necessary proof about the current version of Linux.
tl;dr: It is always handled by the thread that did the write.
POSIX Standard
The POSIX standard mandates (since IEEE Std. 1003.1-2001/Cor 2-2004) that SIGPIPE generated as a result of write to a pipe with no readers be delivered to the thread doing the write. See EPIPE in the ERRORS section of the description of write() (emphasis mine):
[EPIPE] An attempt is made to write to a pipe or FIFO that is not open for reading by any process, or that only has one end open. A SIGPIPE signal shall also be sent to the thread.
Linux documentation
That said, it is not clear whether Linux handles this correctly. The page man 7 signal doesn't give concrete lists of thread- and process-directed signals, just examples, and its definition of thread-directed signals doesn't include SIGPIPE:
A signal may be thread-directed because it was generated as a consequence of executing a specific machine-language instruction that triggered a hardware exception […]
SIGPIPE is not a result of a specific instruction, nor is it triggered by a hardware exception.
Glibc documentation doesn't discuss kernel-generated synchronous thread-directed signals at all (i.e. not even SIGSEGV or SIGBUS are discussed as being thread-directed), and there are years-old reports of bugs in NPTL, although these may have been fixed in the meantime.
Observable Linux behavior
I wrote a program which spawns a thread, which blocks SIGPIPE using pthread_sigmask, creates a pipe pair, closes the read end and writes a byte into the write end. If the signal is thread-directed, nothing should happen until the signal is unblocked again. If the signal is process-directed, the main thread should handle the signal and the process should die. The reason for this again comes from POSIX: If there is a thread which has the (process-directed) signal unblocked, it should be delivered there instead of queueing:
Signals generated for the process shall be delivered to exactly one of those threads within the process which […] has not blocked delivery of the signal. If […] all threads within the process block delivery of the signal, the signal shall remain pending on the process until […] a thread unblocks delivery of the signal, or the action associated with the signal is set to ignore the signal.
My experimentation suggests that on modern (2020) Linux with recent Glibc the signal is indeed directed to the thread which did the write, because blocking it with pthread_sigmask in the writing thread prevents SIGPIPE from being delivered until it's unblocked.
Linux 5.4.28 source
The behavior observed above doesn't prove anything, because it is entirely possible that Linux simply violates POSIX in several places and the signal delivery depends on some factors I didn't take into account. To get the proof we seek, we can read the source. Of course, this only tells us about the current behavior, not about the intended one – but if we find the current behavior to be POSIX-conforming, it is probably here to stay.
Disclaimer: I'm not a kernel hacker and the following is a result of a cursory reading of the sources. I might have missed something important.
In kernel/signal.c, there is a SYNCHRONOUS_MASK listing the synchronous signals which are handled specially. These are SIGSEGV, SIGBUS, SIGILL, SIGTRAP, SIGFPE and SIGSYS – SIGPIPE is not in the list. However, that doesn't answer the question – it can be thread-directed without being synchronous.
So how is SIGPIPE sent? It originates from pipe_write() in fs/pipe.c, which calls send_sig() on task_struct current. The use of current already hints that the signal is thread-directed, but let's press on. The send_sig() function is defined in kernel/signal.c and through some indirection ultimately calls __send_signal() with pid_type type = PIDTYPE_PID.
In Linux terminology, PID refers to a single thread. And sure enough, with those parameters, the pending signal list is the thread-specific one, not the shared one; and complete_signal() (called at the end of the function) doesn't even try to find a thread to wake up, it just returns because the thread has already been chosen. I don't fully understand how the signal queues work, but it seems that the queue is per-thread and so the current thread is the one that gets the signal.