linux: finding cause of realtime signal - linux

I have a linux test prog with some custom drivers. When running, my prog will suddenly exit with "realtime signal 5". The core dump shows the signal being handled by a thread that was in a nanosleep call, so I guess it's an asynchronous signal coming from somewhere.
Can anyone recommend a strategy for tracking down the origin of the signal? eg are there specific functions in the kernel I can add some logging to (send_sig maybe?). Thanks.

Related

Returning from signal handler of fatal-by-default signal

I want to have a signal handler on the fatal signals that default to dumping core that will log the ocurrence and then the core will still be dumped (unless disabled with ulimit or core pattern).
I have tested (on Linux 4.15) that if the signal handler simply returns, this is what happens. However, I have not found any explicit statement in the documentation that would clearly state this.
So is it defined, in POSIX or Linux documentation, what shall happen when the signal handler returns, and where?
I did the first test by tweaking the code I needed to make work and it was more convoluted then I thought. When I tested with simple example program, the only way that works for all cases is to reset the handler and re-raise the signal as described in the accepted answer.
Core files have a definition in POSIX.1-2017 XBD (3.117 Core File):
A file of unspecified format that may be generated when a process terminates abnormally.
POSIX.1-2017 XSH (2.4.3 Signal Actions, under SIG_DFL) contains the following text (with any emphasized part from here on meaning that the corresponding text in the standard is XSI-shaded):
If the default action is to terminate the process abnormally, the process is terminated as if by a call to _exit(), except that the status made available to wait(), waitid(), and waitpid() indicates abnormal termination by the signal.
If the default action is to terminate the process abnormally with additional actions, implementation-defined abnormal termination actions, such as creation of a core file, may also occur.
In XBD (13. Headers, under <signal.h>) we see SIGABRT, SIGBUS, SIGFPE, SIGILL, SIGQUIT, SIGSEGV, SIGSYS, SIGTRAP, SIGXCPU and SIGXFSZ tagged as
A -- Abnormal termination of the process with additional actions.
So from a POSIX perspective you can't rely on a core file being generated, irrespective of signal dispositions.
However, every signal with a default action of "A" in POSIX is listed with a default disposition of "Core" in the Linux manual (signal(7)). That may be what the following excerpt of the manual about SIGSYS, SIGXCPU and SIGXFSZ refers to:
Linux 2.4 conforms to the POSIX.1-2001 requirements for these signals, terminating the process with a core dump.
As the POSIX excerpts above tell us, it's not a hard requirement in POSIX.1-2017.
Now that still doesn't answer the question if registering a signal-catching function nullifies the signal action of abnormal termination. I believe that if it does, it results in undefined behavior for at least a few signals, as per the following paragraph from XSH (2.4.3 Signal Actions, under Pointer to a Function):
The behavior of a process is undefined after it returns normally from a signal-catching function for a SIGBUS, SIGFPE, SIGILL, or SIGSEGV signal that was not generated by kill(), sigqueue(), or raise().
So to avoid UB in all cases, I believe you have to reset the signal disposition to SIG_DFL and then re-raise() the signal from within the signal handler anyway. Also, any handlers catching those signals should probably run on an alternate signal stack but I'm not quite sure if that would make it generally safe to do so and if it is in the first place.

Linux Signals and Interrupt handler

Reading about interrupts in linux, I understand that their handlers will run till completion (lets not consider the bottom halves here). So, assume that my code has SIGINT handler registered (using the signal()/sigaction() call) with a while(1)-loop in it (i.e the handler never returns).
If I quit my program abruptly while running, then shouldn't this scenario freeze my machine entirely? Won't my machine with only one CPU core go into an infinite loop?
What I mean is; since my interrupt handler is not returning, won't the CPU be stuck in executing the while(1) code only? (i.e no other process will get the chance of running, because there won't be any context-switch/preemption inside the handler or can the interrupt handler get preempted in between running the while(1) loop?)
You definitely mix signal handlers and interrupt handlers, despite they have similar handling. Unlike you are writing kernel code you won't meet interrupt handlers directly.
But, game rules for signal handlers are very similar. You should either exit from a signal handler or finish the program (and, the latter is analog for stopping the whole system, for the kernel land). This includes exotic ways for exiting signal handlers as longjmp().
From kernel POV, a process in forever loop in an interrupt handler doesn't differ from a process with the same loop in a usual code piece like main(). Entering a signal handler modifies signal mask but doesn't change things radically. Such process can be stopped, traced, killed in the same manner as outside of signal.
(All this doesn't concern some special process classes with advanced credentials. E.g. X Window server can be special because it disables some kernel activity during its video adapter handling. But you likely should know the needed safety rules when writing such software.)

Implementation of Signals under Linux and Windows?

I am not new to the use of signals in programming. I mostly work in C/C++ and Python.
But I am interested in knowing how signals are actually implemented in Linux (or Windows).
Does the OS check after each CPU instruction in a signal descriptor table if there are any registered signals left to process? Or is the process manager/scheduler responsible for this?
As signal are asynchronous, is it true that a CPU instruction interrupts before it complete?
The OS definitely does not process each and every instruction. No way. Too slow.
When the CPU encounters a problem (like division by 0, access to a restricted resource or a memory location that's not backed up by physical memory), it generates a special kind of interrupt, called an exception (not to be confused with C++/Java/etc high level language exception abstract).
The OS handles these exceptions. If it's so desired and if it's possible, it can reflect an exception back into the process from which it originated. The so-called Structured Exception Handling (SEH) in Windows is this kind of reflection. C signals should be implemented using the same mechanism.
On the systems I'm familiar with (although I can't see why it should be much different elsewhere), signal delivery is done when the process returns from the kernel to user mode.
Let's consider the one cpu case first. There are three sources of signals:
the process sends a signal to itself
another process sends the signal
an interrupt handler (network, disk, usb, etc) causes a signal to be sent
In all those cases the target process is not running in userland, but in kernel mode. Either through a system call, or through a context switch (since the other process couldn't send a signal unless our target process isn't running), or through an interrupt handler. So signal delivery is a simple matter of checking if there are any signals to be delivered just before returning to userland from kernel mode.
In the multi cpu case if the target process is running on another cpu it's just a matter of sending an interrupt to the cpu it's running on. The interrupt does nothing other than force the other cpu to go into kernel mode and back so that signal processing can be done on the way back.
A process can send signal to another process. process can register its own signal handler to handle the signal. SIGKILL and SIGSTOP are two signals which can not be captured.
When process executes signal handler, it blocks the same signal, That means, when signal handler is in execution, if another same signal arrives, it will not invoke the signal handler [ called blocking the signal], but it makes the note that the signal has arrived [ ie: pending signal]. once the already running signal handler is executed, then the pending signal is handled. If you do not want to run the pending signal, then you can IGNORE the signal.
The problem in the above concept is:
Assume the following:
process A has registered signal handler for SIGUSR1.
1) process A gets signal SIGUSR1, and executes signalhandler()
2) process A gets SIGUSR1,
3) process A gets SIGUSR1,
4) process A gets SIGUSR1,
When step (2) occurs, is it made as 'pending signal'. Ie; it needs to be served.
And when the step (3) occors, it is just ignored as, there is only one bit
available to indicate the pending signal for each available signals.
To avoid such problem, ie: if we dont want to loose the signals, then we can use
real time signals.
2) Signals are executed synchronously,
Eg.,
1) process is executing in the middle of signal handler for SIGUSR1,
2) Now, it gets another signal SIGUSR2,
3) It stops the SIGUSR1, and continues with SIGUSR2,
and once it is done with SIGUSR2, then it continues with SIGUSR1.
3) IMHO, what i remember about checking if there are any signal has arrived to the process is:
1) When context switch happens.
Hope this helps to some extend.

Is segmentation fault handler thread-safe?

When segmentation fault occurs on Linux within multithreaded application and handler is called, are all other threads instantly stopped before handler is called?
So, is it appropriate to rely on fact that no any parralel code will execute during segmentation fault handling?
Thank you.
From the signal(7) manual page:
A signal may be generated (and thus pending) for a process as a whole (e.g., when sent using kill(2)) or for a specific thread (e.g., certain signals, such as SIGSEGV and SIGFPE, generated as a consequence of executing a specific machine-language instruction are thread directed, as are signals targeted at a specific thread using pthread_kill(3)). A process-directed signal may be delivered to any one of the threads that does not currently have the signal blocked. If more than one of the threads has the signal unblocked, then the kernel chooses an arbitrary thread to which to deliver the signal.
This paragraph says that certain signals, like SIGSEGV, are thread specific. Which should answer your question.

Where linux signals are sent or processed inside the kernel?

How is the signalling(interrupts) mechanism handled in kernel? The cause why I ask is: somehow a SIGABRT signal is received by my application and I want to find where does that come from..
You should be looking in your application for the cause, not in the kernel.
Usually a process receives SIGABRT when it directly calls abort or when an assert fails. Finding exactly the piece of the kernel that delivers the signal will gain you nothing.
In conclusion, your code or a library your code is using is causing this. See abort(3) and assert.
cnicutar's answer is the best guess IMHO.
It is possible that the signal has been emitted by another process, although in the case of SIGBART it most likely to be emitted by the same process which receives it via the abort(3) libc function.
In doubt, you can run your application with strace -e kill yourapp you args ... to quickly check if that kill system call is indeed invoked from within your program or dependent libraries. Or use gdb catch syscall.
Note that in some cases the kernel itself can emit signals, such as a SIGKILL when the infamous "OOM killer" goes into action.
BTW, signals are delivered asynchronously, they disrupt the normal workflow of your program. This is why they're painful to trace. Besides machinery such as SystemTap I don't know how to trace or log signals emission and delivery within the kernel.

Resources