How are threads terminated during a linux crash? - linux

If you have a multithreaded program (Linux 2.26 kernel), and one thread does something that causes a segfault, will the other threads still be scheduled to run? How are the other threads terminated? Can someone explain the process shutdown procedure with regard to multithreaded programs?

When a fatal signal is delivered to a thread, either the do_coredump() or the do_group_exit() function is called. do_group_exit() sets the thread group exit code and then signals all the other threads in the thread group to exit with zap_other_threads(), before exiting the current thread. (do_coredump() calls coredump_wait() which similarly calls zap_threads()).
zap_other_threads() posts a SIGKILL for every other thread in the thread group and wakes it up with signal_wake_up(). signal_wake_up() calls kick_process(), which will boot the thread into kernel mode so that it can recieve the signal, using an IPI1 if necessary (eg. if it's executing on another CPU).
1. Inter-Processor Interrupt

Will the other thread still be scheduled to run?
No. The SEGV is a process-level issue. Unless you've handled the SEGV (which is almost always a bad idea) your whole process will exit, and all threads with it.
I suspect that the other threads aren't handled very nicely. If the handler calls exit() or _exit() thread cleanup handlers won't get called. This may be a good thing if your program is severely corrupted, it's going to be hard to trust much of anything after a seg fault.
One note from the signal man page:
According to POSIX, the behaviour of a process is undefined after it ignores a SIGFPE, SIGILL, or SIGSEGV signal that was not generated by the kill(2) or the raise(3) functions.
After a segfault you really don't want to be doing anything other than getting the heck out of that program.

Related

Do I need to check for my threads exiting?

I have an embedded application, running as a single process on Linux.
I use sigaction() to catch problems, such as segmentation fault, etc.
The process has a few threads, all of which, like the app, should run forever.
My question is whether (and how) I should detect if one of the threads dies.
Would a seg fault in a thread be caught by the application’s sigaction() handler?
I was thinking of using pthread_cleanup_push/pop, but this page says “If any thread within a process calls exit, _Exit, or _exit, then the entire process terminates”, so I wonder if a thread dying would be caught at the process level …
It is not a must that you need to check whether the child thread is completed.
If you have a need of doing something after the child thread completes its processing you can call thread_join() from the main thread, so that it will wait till the child threads completes execution and you can do the rest after this. If you are using thread_exit in the main thread it will get terminated once it is done, leaving the spawned threads to continue execution. The process will get killed only after all the threads completes execution.
If you want to check the status of the spawned threads you can use a flag to detect whether it is running or not. Check this link for more details
How do you query a pthread to see if it is still running?

pthread_sigmask not working properly with aio callback threads

My application is sometimes terminating from SIGIO or SIGUSR1 signals even though I have blocked these signals.
My main thread starts off with blocking SIGIO and SIGUSR1, then makes 2 AIO read operations. These operations use threads to get notification about operation status. The notify functions (invoked as detached threads) start another AIO operation (they manipulate the data that has been read and start writing it back to the file) and notification is handled by sending signal (one operation uses SIGIO, the other uses SIGUSR1) to this process. I am receiving these signals synchronously by calling sigwait in the main thread. Unfortunately, sometimes my program crashes, being stopped by SIGUSR1 or SIGIO signal (which should be blocked by a sigmask).
One possible solution is to set SIG_IGN handlers for them but this doesn't solve the problem. Their handlers shouldn't be invoked, rather should they be retrieved from pending signals by sigwait in the next iteration of the main program loop.
I have no idea which thread handles this signal in this manner. Maybe it's the init who receives this signal? Or some shell thread? I have no idea.
I'd hazard a guess that the signal is being received by one of your AIO callback threads, or by the very thread which generates the signal. (Prove me wrong and I'll delete this answer.)
Unfortunately per the standard, "[t]he signal mask of [a SIGEV_THREAD] thread is implementation-defined." For example, on Linux (glibc 2.12), if I block SIGUSR1 in main, then contrive to run a SIGEV_THREAD handler from an aio_read call, the handler runs with SIGUSR1 unblocked.
This makes SIGEV_THREAD handlers unsuitable for an application that must reliably and portably handle signals.

When does a process handle a signal

I want to know when does a linux process handles the signal.
Assuming that the process has installed the signal handler for a signal, I wanted to know when would the process's normal execution flow be interrupted and signal handler called.
According to http://www.tldp.org/LDP/tlk/ipc/ipc.html, the process would handle the signal when it exits from a system call. This would mean that a normal instruction like a = b+c (or its equivalent machine code) would not be interrupted because of signal.
Also, there are system calls which would get interrupted (and fail with EINTR or get restarted) upon receiving a signal. This means that signal is processed even before the system call completes. This behaviour seems to b conflicting with what I have mentioned in the previous paragraph.
So, I am not clear as to when is the signal processed and in which process states would it be handled by the process. Can it be interrupted
Anytime it enters from kernel space to user space, or
Anytime it is in user space, or
Anytime the process is scheduled for execution by the scheduler
Thanks!
According to http://www.tldp.org/LDP/tlk/ipc/ipc.html, the process would handle the signal when it exits from a system call. This would mean that a normal instruction like a = b+c (or its equivalent machine code) would not be interrupted because of signal.
Well, if that were the case, a CPU-intensive process would not obey the process scheduler. The scheduler, in fact, can interrupt a process at any point of time when its time quantum has elapsed. Unless it is a FIFO real-time process.
A more correct definition: One point when a signal is delivered to the process is when the control flow leaves the kernel mode to resume executing user-mode code. That doesn't necessarily involve a system call.
A lot of the semantics of signal handling are documented (for Linux, anyway - other OSes probably have similar, but not necessarily in the same spot) in the section 7 signal manual page, which, if installed on your system, can be accessed like this:
man 7 signal
If manual pages are not installed, online copies are pretty easy to find.

Is segmentation fault handler thread-safe?

When segmentation fault occurs on Linux within multithreaded application and handler is called, are all other threads instantly stopped before handler is called?
So, is it appropriate to rely on fact that no any parralel code will execute during segmentation fault handling?
Thank you.
From the signal(7) manual page:
A signal may be generated (and thus pending) for a process as a whole (e.g., when sent using kill(2)) or for a specific thread (e.g., certain signals, such as SIGSEGV and SIGFPE, generated as a consequence of executing a specific machine-language instruction are thread directed, as are signals targeted at a specific thread using pthread_kill(3)). A process-directed signal may be delivered to any one of the threads that does not currently have the signal blocked. If more than one of the threads has the signal unblocked, then the kernel chooses an arbitrary thread to which to deliver the signal.
This paragraph says that certain signals, like SIGSEGV, are thread specific. Which should answer your question.

How to kill thread spawned using CLONE_THREAD and blocked on a shared resource in kernel space?

I have a test case where there are threads spawned using CLONE_THREAD option in clone() .Here if i want to kill a particular thread I suppose we should be using SYS_tgkill in systemcall(). But will the kill actually affect a thread if it is waiting in kernel space(say a futex_wait)?
I tried killing a thread created in the above manner.But when SIGKILL is sent to the same the whole process is getting killed.Am i missing something in using syscall(SYS_tgkill,pid,tid,9) ?
SIGKILL always kills the target process. There is no way around this; it's unblockable, unignorable, and uncatchable.
You could try sending another signal (like SIGUSR1 or SIGHUP or SIGRTMIN) and having a signal handler installed that calls pthread_exit (but note that this function is not async-signal-safe, so you must ensure that the signal handler did not interrupt another async-signal-unsafe function) or use cancellation (pthread_cancel) to stop the blocked thread.
This should work for normal blocking operations (like waiting for data from a pipe or socket), but it will not help you if the thread is in an uninterruptable sleep state (like trying to read from a badly scratched CD or failing hard disk).

Resources