As you see, This is a sample in APUE.
#include "apue.h"
static void sig_int(int sig);
int main(int argc, char **argv)
{
char buf[MAXLINE];
pid_t pid;
int status;
if (signal(SIGINT, sig_int) == SIG_ERR) //sig_int is a simple handler function
err_sys("signal error");
printf("%% ");
while (fgets(buf, MAXLINE, stdin) != NULL) {
//This is a loop to implement a simple shell
}
return 0;
}
This is the signal handler
void sig_int(int sig)
/*When I enter Ctrl+C, It'll say a got SIGSTOP, but it would terminate.*/
{
if (sig == SIGINT)
printf("got SIGSTOP\n");
}
When I enter Ctrl+C, It'll say got SIGSTOP, but it terminates right now.
The short version is that the signal interrupts the current system call. You're doing fgets(), which likely now blocks in a read() system-call. The read() call is interrupted, it returns -1 and sets errno to EINTR.
This causes fgets to return NULL, your loop ends, and the program is finished.
Some background
glibc on linux implements two different concepts for signal(). One where system calls are automatically restarted across signals, and one where they are not.
When a signal occurs and the process is blocked in a system call, the system call is interrupted("cancelled"). Execution resumes in the user space application, and the signal handler occurs. The interrupted system call will return an error, and set errno to EINTR.
What happens next depends on whether system calls are restarted or not across signals.
If system calls are restartable, the runtime (glibc) simply retries the system call. For the read() system call, this would be similar to read() being implemented as:
ssize_t read(int fd, void *buf, size_t len)
{
ssize_t sz;
while ((sz = syscall_read(fd, buf, len)) == -1
&& errno == EINTR);
return sz;
}
If system calls are not automatically restarted, read() would behave similar to:
ssize_t read(int fd, void *buf, size_t len)
{
ssize_t sz;
sz = syscall_read(fd, buf, len));
return sz;
}
In the latter case it would be up to your application to check whether read() failed because it was interrupted by a signal. And it is up you, to determine if read() just failed temporarily due to a signal getting handled, and it's up you you to re-try the read() call
signal vs sigaction
By using sigaction() instead of signal(), you get control over
whether system calls are restared or not. The relevant flag you specify with sigaction() is
SA_RESTART
Provide behavior compatible with BSD signal semantics by making certain system calls restartable across signals.
This flag is meaningful only when establishing a sig‐
nal handler. See signal(7) for a discussion of system call restarting.
BSD vs SVR4 semantics
If you use signal(), it depends on what semantics you want. As seen in the description of SA_RESTART, if it is BSD signal semantics, system calls are restarted. This is the default behavior in glibc.
Another difference is that BSD semantics leave the signal handler installed by signal() installed after a signal is handled. SVR4 semantics uninstalls the signal handler, and your signal handler will have to re-install the handler if you want to catch more signals.
apue.h
The "apue.h" however, defines the macro _XOPEN_SOURCE 600 before including <signal.h>. This will cause signal() to have SVR4 semantics, where system calls are not restarted. Which will cause your fgets() call to "fail".
Don't use signal(), use sigaction()
Due to all these differences in behavior, use sigaction() instead of signal. sigaction() lets you control what happens instead of having the semantics change based on a (possibly) hidden #define as is the case with signal()
Related
I'm trying to understand what is happening behind the scenes with this code. This was asked at a final exam of an Intro to OS course I'm taking. As I understand it, when returning from kernel mode to user mode, the system checks if there are any pending signals (by examining the signal vector) and tends to them. So as the program returns from the kill syscall the OS sees that SIGUSR1 is pending and invokes the handler. By that logic, the handler should print "stack" in an infinite loop, but when running this code it actually prints "stackoverflow" in an infinite loop. Why is this happening?
Thanks in advance.
void handler(int signo) {
printf("stack");
kill(getpid(), SIGUSR1);
printf("overflow\n");
}
int main() {
struct sigaction act;
act.sa_handler = &handler;
sigaction(SIGUSR1, &act, NULL);
kill(getpid(), SIGUSR1);
return 0;
}
You actually have undefined behavior here, as you're calling sigaction with an incompletely initialized struct sigaction object. So depending on what values happen to be in the sa_flags and sa_mask fields, a variety of different things might happen.
Some of these would not block SIGUSR1 while the signal handler is running, which would mean that a new signal handler would run immediately when the first calls kill (so before the first handler returns and pops its stack frame). So you end up with many recursive handler stack frames on the stack (and outputs of 'stack') until it overflows.
Other combos would block the signal so it would not immediately trigger a second signal handler. Instead the signal would be "pending" until the first signal handler returns.
We have a legacy embedded system which uses SDL to read images and fonts from an NFS share.
If there's a network problem, TTF_OpenFont() and IMG_Load() hang essentially forever. A test application reveals that open() behaves in the same way.
It occurred to us that a quick fix would be to call alarm() before the calls which open files on the NFS share. The man pages weren't entirely clear whether open() would fail with EINTR when interrupted by SIGALRM, so we put together a test app to verify this approach. We set up a signal handler with sigaction::sa_flags set to zero to ensure that SA_RESTART was not set.
The signal handler was called, but open() was not interrupted. (We observed the same behaviour with SIGINT and SIGTERM.)
I suppose the system treats open() as a "fast" operation even on "slow" infrastructure such as NFS.
Is there any way to change this behaviour and allow open() to be interrupted by a signal?
The man pages weren't entirely clear whether open() would fail with
EINTR when interrupted by SIGALRM, so we put together a test app to
verify this approach.
open(2) is a slow syscall (slow syscalls are those that can sleep forever, and can be awaken when, and if, a signal is caught in the meantime) only for some file types. In general, opens that block the caller until some condition occurs are usually interruptible. Known examples include opening a FIFO (named pipe), or (back in the old days) opening a physical terminal device (it sleeps until the modem is dialed).
NFS-mounted filesystems probably don't cause open(2) to sleep in an interruptible state. After all, you are most likely opening a regular file, and in that case open(2) will not be interruptable.
Is there any way to change this behaviour and allow open() to be
interrupted by a signal?
I don't think so, not without doing some (non-trivial) changes to the kernel.
I would explore the possibility of using setjmp(3) / longjmp(3) (see the manpage if you're not familiar; it's basically non-local gotos). You can initialize the environment buffer before calling open(2), and issue a longjmp(3) in the signal handler. Here's an example:
#include <stdio.h>
#include <stdlib.h>
#include <setjmp.h>
#include <unistd.h>
#include <signal.h>
static jmp_buf jmp_env;
void sighandler(int signo) {
longjmp(jmp_env, 1);
}
int main(void) {
struct sigaction sigact;
sigact.sa_handler = sighandler;
sigact.sa_flags = 0;
sigemptyset(&sigact.sa_mask);
if (sigaction(SIGALRM, &sigact, NULL) < 0) {
perror("sigaction(2) error");
exit(EXIT_FAILURE);
}
if (setjmp(jmp_env) == 0) {
/* First time through
* This is where we would open the file
*/
alarm(5);
/* Simulate a blocked open() */
while (1)
; /* Intentionally left blank */
/* If open(2) is successful here, don't forget to unset
* the alarm
*/
alarm(0);
} else {
/* SIGALRM caught, open(2) canceled */
printf("open(2) timed out\n");
}
return 0;
}
It works by saving the context environment with the help of setjmp(3) before calling open(2). setjmp(3) returns 0 the first time through, and returns whatever value was passed to longjmp(3) otherwise.
Please be aware that this solution is not perfect. Here are some points to keep in mind:
There is a window of time between the call to alarm(2) and the call to open(2) (simulated here with while (1) { ... }) where the process may be preempted for a long time, so there is a chance the alarm expires before we actually attempt to open the file. Sure, with a large timeout such as 2 or 3 seconds this will most likely not happen, but it's still a race condition.
Similarly, there is a window of time between successfully opening the file and canceling the alarm where, again, the process may be preempted for a long time and the alarm may expire before we get the chance to cancel it. This is slightly worse because we have already opened the file so we will "leak" the file descriptor. Again, in practice, with a large timeout this will likely never happen, but it's a race condition nevertheless.
If the code catches other signals, there may be another signal handler in the midst of execution when SIGALRM is caught. Using longjmp(3) inside the signal handler will destroy the execution context of these other signal handlers, and depending on what they were doing, very nasty things may happen (inconsistent state if the signal handlers were manipulating other data structures in the program, etc.). It's as if it started executing, and suddenly crashed somewhere in the middle. You can fix it by: a) carefully setting up all signal handlers such that SIGALRM is blocked before they are invoked (this ensures that the SIGALRM handler does not begin execution until other handlers are done) and b) blocking these other signals before catching SIGALRM. Both actions can be accomplished by setting the sa_mask field of struct sigaction with the necessary mask (the operating system atomically sets the process's signal mask to that value before beginning execution of the handler and unsets it before returning from the handler). OTOH, if the rest of the code doesn't catch signals, then this is not a problem.
sleep(3) may be implemented with alarm(2), and alarm(2) and setitimer(2) share the same timer; if other portions in the code make use of any of these functions, they will interfere and the result will be a huge mess.
Just make sure you weigh in these disadvantages before blindly using this approach. The use of setjmp(3) / longjmp(3) is usually discouraged and makes programs considerably harder to read, understand and maintain. It's not elegant, but right now I don't think you have a choice, unless you're willing to do some core refactoring in the project.
If you do end up using setjmp(3), then at the very least document these limitations.
Maybe there is a strategy of using a separate thread to do the open so the main thread is not held up longer than desired.
When my application crashes with a segmentation fault I'd like to get a core dump from the system. I do that by configuring before hand
ulimit -c unlimited
I would also like to have an indication in my application logs that a segmentation fault has occured. I do that by using sigaction(). If I do that however, the signal does not reach its default handling and a core dump is not saved.
How can I have both the system core dump an a log line from my own signal handler at the same time?
Overwrite the default signal handler for SIGSEGV to call your custom logging function.
After it is logged, restore and trigger the default handler that will create the core dump.
Here is a sample program using signal:
void sighandler(int signum)
{
myLoggingFunction();
// this is the trick: it will trigger the core dump
signal(signum, SIG_DFL);
kill(getpid(), signum);
}
int main()
{
signal(SIGSEGV, sighandler);
// ...
}
The same idea should also work with sigaction.
Source: How to handle SIGSEGV, but also generate a core dump
The answer: set the sigaction with flag SA_RESETHAND and just return from the handler. The same instruction occurs again, causing a segmentation fault again and invoking the default handler.
There's no need to do anything special in your signal handler
As explained at: Where does signal handler return back to? by default the program returns to the very instruction that caused the SIGSEGV after a signal gets handled.
Furthermore, tested as of Ubuntu 22.04, the default behavior for signal is that it automatically de-registers the handler. man signal does suggest that this is not very portable however, so maybe using the more explicit sigaction syscall intead is better.
Therefore, what happens by default on that system is:
the signal gets handled
the handler is automatically disabled
after return, you go back to the instruction that causes the signal
signal happens again
there is no handler, so crash in basically the exact same way as if we hadn't handled the signal
The most important thing to check is if you can generate core dumps at all regardless of the signal handler. Notably, many newer systems such as Ubuntu 22.04 have a complex core dump handler which prevents creation of core files: https://askubuntu.com/questions/1349047/where-do-i-find-core-dump-files-and-how-do-i-view-and-analyze-the-backtrace-st/1442665#1442665 and which you can deactivate as a one off with:
echo 'core' | sudo tee /proc/sys/kernel/core_pattern
Minimal runnable example:
main.c
#include <signal.h> /* signal, SIGSEGV */
#include <unistd.h> /* write, STDOUT_FILENO */
void signal_handler(int sig) {
(void)sig;
const char msg[] = "signal received\n";
write(STDOUT_FILENO, msg, sizeof(msg));
}
int myfunc(int i) {
*(int *)0 = 1;
return i + 1;
}
int main(int argc, char **argv) {
(void)argv;
signal(SIGSEGV, signal_handler);
int ret = myfunc(argc);
return ret;
}
compile and run:
gcc -ggdb3 -O0 -std=c99 -Wall -Wextra -pedantic -o main.out main.c
./main.out
Terminal output contain:
signal received
Segmentation fault (core dumped)
so we see that the signal was both handled, and we got a core file.
And inspecting the core file with:
gdb main.out core.243260
does put us at the correct line:
#0 myfunc (i=1) at main.c:12
12 *(int *)0 = 1;
so we did return to it as expected.
Making it more portable with sigaction
man signal portability section has a Bible of a text about how signal() varied across different OSes and versions:
The only portable use of signal() is to set a signal's disposition to SIG_DFL or SIG_IGN. The semantics when using signal() to establish a signal handler vary across systems (and POSIX.1 explicitly
permits this variation); do not use it for this purpose.
POSIX.1 solved the portability mess by specifying sigaction(2), which provides explicit control of the semantics when a signal handler is invoked; use that interface instead of signal().
In the original UNIX systems, when a handler that was established using signal() was invoked by the delivery of a signal, the disposition of the signal would be reset to SIG_DFL, and the system did
not block delivery of further instances of the signal. This is equivalent to calling sigaction(2) with the following flags:
sa.sa_flags = SA_RESETHAND | SA_NODEFER;
System V also provides these semantics for signal(). This was bad because the signal might be delivered again before the handler had a chance to reestablish itself. Furthermore, rapid deliveries
of the same signal could result in recursive invocations of the handler.
BSD improved on this situation, but unfortunately also changed the semantics of the existing signal() interface while doing so. On BSD, when a signal handler is invoked, the signal disposition is
not reset, and further instances of the signal are blocked from being delivered while the handler is executing. Furthermore, certain blocking system calls are automatically restarted if interrupted
by a signal handler (see signal(7)). The BSD semantics are equivalent to calling sigaction(2) with the following flags:
sa.sa_flags = SA_RESTART;
The situation on Linux is as follows:
The kernel's signal() system call provides System V semantics.
By default, in glibc 2 and later, the signal() wrapper function does not invoke the kernel system call. Instead, it calls sigaction(2) using flags that supply BSD semantics. This default behavior is provided as long as a suitable feature test macro is defined: _BSD_SOURCE on glibc 2.19 and earlier or _DEFAULT_SOURCE in glibc 2.19 and later. (By default, these macros are defined; see
feature_test_macros(7) for details.) If such a feature test macro is not defined, then signal() provides System V semantics.
That seems to suggest that I should get BSD semantics by default, but I seem to get System V semantics for some reason because:
sudo strace -f -s999 -v ./main.out
contains:
rt_sigaction(SIGSEGV, {sa_handler=0x55b428604189, sa_mask=[], sa_flags=SA_RESTORER|SA_INTERRUPT|SA_NODEFER|SA_RESETHAND|0xffffffff00000000, sa_restorer=0x7fb173d0a520}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
which has the SA_NODEFER|SA_RESETHAND. Notably, the flag we care the most about is SA_RESETHAND, which resets the handler to the default behavior.
But maybe I just misinterpreted one of the verses of the Holy Text.
So, just to be more portable, we could do the same as above with sigaction instead:
sigaction.c
#define _XOPEN_SOURCE 700
#include <signal.h> /* signal, SIGSEGV */
#include <unistd.h> /* write, STDOUT_FILENO */
void signal_handler(int sig) {
(void)sig;
const char msg[] = "signal received\n";
write(STDOUT_FILENO, msg, sizeof(msg));
}
int myfunc(int i) {
*(int *)0 = 1;
return i + 1;
}
int main(int argc, char **argv) {
(void)argv;
/* Adapted from: https://www.gnu.org/software/libc/manual/html_node/Sigaction-Function-Example.html */
struct sigaction new_action;
new_action.sa_handler = signal_handler;
sigemptyset(&new_action.sa_mask);
new_action.sa_flags = SA_NODEFER|SA_RESETHAND;
sigaction(SIGINT, &new_action, NULL);
int ret = myfunc(argc);
return ret;
}
which behaves just like main.c in Ubuntu 22.04.
Is there some defined behaviour for segmentation faults which happen within segmentation falut handler under Linux?
Will be there another call to the same handler? If so, on all platforms, is it defined and so on.
Thank you.
Than answer depends on how you installed your signal handler. If you installed your signal handler using the deprecated signal() call, then it will either reset the signal handler to the default handler or block the signal being handled before calling your signal handler. If it blocked the signal, it will unblock it after your signal handler returns.
If you use sigaction(), you have control over which signals get blocked while your signal handler is being called. If you so specify, it is possible to cause infinite recursion.
It is possible to implement a safe wrapper around sigaction() that has an API similar to signal():
sighandler_t safe_signal (int sig, sighandler_t h) {
struct sigaction sa;
struct sigaction osa;
sa.sa_handler = h;
sigemptyset(&sa.sa_mask);
sa.sa_flags = 0;
if (sigaction(sig, &sa, &osa) < 0) {
return SIG_ERR;
}
return osa.sa_handler;
}
This blocks all signals for the duration of the signal handler call, which gets restored after the signal handler returns.
From C-11 standard, 7.14.1.1,
When a signal occurs and func points to a function, it is
implementation-defined whether the equivalent of signal(sig, SIG_DFL);
is executed or the implementation prevents some implementation-defined
set of signals (at least including sig) from occurring until the
current signal handling has completed;
So Standard says that it is implementation defined whether it allows for recursive calls of the same signal handler. So I would conclude that the behaviour is defined but is implementation defined!
But its a total mess if a segfault handler is itself segfaulting :)
I'm writing a multithread plugin based application. I will not be the plugins author. So I would wish to avoid that the main application crashes cause of a segmentation fault in a plugin. Is it possible? Or the crash in the plugin definitely compromise also the main application status?
I wrote a sketch program using qt cause my "real" application is strongly based on qt library. Like you can see I forced the thread to crash calling the trimmed function on a not-allocated QString. The signal handler is correctly called but after the thread is forced to quit also the main application crashes. Did I do something wrong? or like I said before what I'm trying to do is not achievable?
Please note that in this simplified version of the program I avoided to use plugins but only thread. Introducing plugins will add a new critical level, I suppose. I want to go on step by step. And, overall, I want to understand if my target is feasible. Thanks a lot for any kind of help or suggestions everyone will try to give me.
#include <QString>
#include <QThread>
#include<csignal>
#include <QtGlobal>
#include <QtCore/QCoreApplication>
class MyThread : public QThread
{
public:
static void sigHand(int sig)
{
qDebug("Thread crashed");
QThread* th = QThread::currentThread();
th->exit(1);
}
MyThread(QObject * parent = 0)
:QThread(parent)
{
signal(SIGSEGV,sigHand);
}
~MyThread()
{
signal(SIGSEGV,SIG_DFL);
qDebug("Deleted thread, restored default signal handler");
}
void run()
{
QString* s;
s->trimmed();
qDebug("Should not reach this point");
}
};
int main(int argc, char *argv[])
{
QCoreApplication a(argc, argv);
MyThread th(&a);
th.run();
while (th.isRunning());
qDebug("Thread died but main application still on");
return a.exec();
}
I'm currently working on the same issue and found this question via google.
There are several reasons your source is not working:
There is no new thread. The thread is only created, if you call QThread::start. Instead you call MyThread::run, which executes the run method in the main thread.
You call QThread::exit to stop the thread, which is not supposed to directly stop a thread, but sends a (qt) signal to the thread event loop, requesting it to stop. Since there is neither a thread nor an event loop, the function has no effect. Even if you had called QThread::start, it would not work, since writing a run method does not create a qt event loop. To be able to use exit with any QThread, you would need to call QThread::exec first.
However, QThread::exit is the wrong method anyways. To prevent the SIGSEGV, the thread must be called immediately, not after receiving the (qt) signal in its event loop. So although generally frowned upon, in this case QThread::terminate has to be called
But it is generally said to be unsafe to call complex functions like QThread::currentThread, QThread::exit or QThread::terminate from signal handlers, so you should never call them there
Since the thread is still running after the signal handler (and I'm not sure even QThread::terminate would kill it fast enough), the signal handler exits to where it was called from, so it reexecutes the instruction causing the SIGSEGV, and the next SIGSEGV occurs.
Therefore I have used a different approach, the signal handler changes the register containing the instruction address to another function, which will then be run, after the signal handler exits, instead the crashing instruction. Like:
void signalHandler(int type, siginfo_t * si, void* ccontext){
(static_cast<ucontext_t*>(ccontext))->Eip = &recoverFromCrash;
}
struct sigaction sa;
memset(&sa, 0, sizeof(sa)); sa.sa_flags = SA_SIGINFO;
sa.sa_sigaction = &signalHandler;
sigaction(SIGSEGV, &sa, 0);
The recoverFromCrash function is then normally called in the thread causing the SIGSEGV. Since the signal handler is called for all SIGSEGV, from all threads, the function has to check which thread it is running in.
However, I did not consider it safe to simply kill the thread, since there might be other stuff, depending on a running thread. So instead of killing it, I let it run in an endless loop (calling sleep to avoid wasting CPU time). Then, when the program is closed, it sets a global variabel, and the thread is terminated. (notice that the recover function must never return, since otherwise the execution will return to the function which caused the SIGSEGV)
Called from the mainthread on the other hand, it starts a new event loop, to let the program running.
if (QThread::currentThread() != QCoreApplication::instance()->thread()) {
//sub thread
QThread* t = QThread::currentThread();
while (programIsRunning) ThreadBreaker::sleep(1);
ThreadBreaker::forceTerminate();
} else {
//main thread
while (programIsRunning) {
QApplication::processEvents(QEventLoop::AllEvents);
ThreadBreaker::msleep(1);
}
exit(0);
}
ThreadBreaker is a trivial wrapper class around QThread, since msleep, sleep and setTerminationEnabled (which has to be called before terminate) of QThread are protected and could not be called from the recover function.
But this is only the basic picture. There are a lot of other things to worry about: Catching SIGFPE, Catching stack overflows (check the address of the SIGSEGV, run the signal handler in an alternate stack), have a bunch of defines for platform independence (64 bit, arm, mac), show debug messages (try to get a stack trace, wonder why calling gdb for it crashes the X server, wonder why calling glibc backtrace for it crashes the program)...