How to deal with errno and signal handler in Linux? - linux

When we write a signal handler that may change the errno, should we save errno at the beginning of the signal handler and restore the errno at the end of it? Just like below:
void signal_handler(int signo){
int temp_errno = errno;
*** //code here may change the errno
errno = temp_errno;
}

The glibc documentation says:
signal handlers that call functions that may set errno or modify the floating-point environment must save their original values, and restore them before returning.
So go ahead and do that.
If you're writing a multi-threaded program using pthreads, there's a workaround that requires less effort. errno will be in thread-local storage. If you dedicate one thread to handle process-directed signals, blocking the signal in all other threads, you don't have to worry about assignments to errno in the signal handler.

Related

ThreadSanitizer: signal handler spoils errno - how to avoid set of errno

I have a bit of code which handles POSIX signals, and as part of it (to be signal safe) - does a sem_post() system call (according to http://man7.org/linux/man-pages/man3/sem_post.3.html 'async signal safe').
But when I run this code - very occasionally, I get the thread sanitizer complaint:
SUMMARY: ThreadSanitizer: signal handler spoils errno /home/lewis/Sandbox/Stroika-Build-Dir-Ubuntu1804_x86_64/Library/Sources/Stroika/Foundation/Execution/SignalHandlers.cpp:497 in Stroika::Foundation::Execution::SignalHandlerRegistry::FirstPassSignalHandler_(int)
I believe this is due to a call to sem_post, which may INDEED overwrite errno.
And yes - this could indeed mess up another thread if it happened at just the right (wrong?) time.
I've always found the 'thread local' errno mechanism a convenient way to handle errors, but I'm just now realizing how dangerous it is for signal handling code.
Is there some way to call system calls WITHOUT overwriting errno? Something at least vaguely portable?
Even http://man7.org/linux/man-pages/man2/syscall.2.html - says it stores its result in errno.
On Linux you can use _syscall.
Another way would be to save errno at the beginning of the signal handler and restore before returning.
If you are sure that your function is safe in this respect you can also use some attributes (both in GCC and CLANG) to disable instrumentation of your functions.

errno is signal handler

I call write() in my SIGCHLD signal handler.
But write() may sometimes set errno. Will this break my program?
Should I save and then restore errno like the following?
void sigchld_hanlder(int)
{
int old_errno = errno;
write(...);
errno = old_errno;
}
Writing one character to a pipe is a common way to communicate from a signal handler to a poll loop. Yes, you should[1] save errno and restore it in the signal handler. It is also a good idea to put the pipe (or other file descriptor) you are writing to in NON BLOCKING mode so that the write call cannot block (which could happen if the signal handler is called many times before some thread has a chance to read it).
signalfd is another way to safely communicate a signal to a poll loop.
[1]https://www.gnu.org/software/libc/manual/html_node/POSIX-Safety-Concepts.html

Interrupting open() with SIGALRM

We have a legacy embedded system which uses SDL to read images and fonts from an NFS share.
If there's a network problem, TTF_OpenFont() and IMG_Load() hang essentially forever. A test application reveals that open() behaves in the same way.
It occurred to us that a quick fix would be to call alarm() before the calls which open files on the NFS share. The man pages weren't entirely clear whether open() would fail with EINTR when interrupted by SIGALRM, so we put together a test app to verify this approach. We set up a signal handler with sigaction::sa_flags set to zero to ensure that SA_RESTART was not set.
The signal handler was called, but open() was not interrupted. (We observed the same behaviour with SIGINT and SIGTERM.)
I suppose the system treats open() as a "fast" operation even on "slow" infrastructure such as NFS.
Is there any way to change this behaviour and allow open() to be interrupted by a signal?
The man pages weren't entirely clear whether open() would fail with
EINTR when interrupted by SIGALRM, so we put together a test app to
verify this approach.
open(2) is a slow syscall (slow syscalls are those that can sleep forever, and can be awaken when, and if, a signal is caught in the meantime) only for some file types. In general, opens that block the caller until some condition occurs are usually interruptible. Known examples include opening a FIFO (named pipe), or (back in the old days) opening a physical terminal device (it sleeps until the modem is dialed).
NFS-mounted filesystems probably don't cause open(2) to sleep in an interruptible state. After all, you are most likely opening a regular file, and in that case open(2) will not be interruptable.
Is there any way to change this behaviour and allow open() to be
interrupted by a signal?
I don't think so, not without doing some (non-trivial) changes to the kernel.
I would explore the possibility of using setjmp(3) / longjmp(3) (see the manpage if you're not familiar; it's basically non-local gotos). You can initialize the environment buffer before calling open(2), and issue a longjmp(3) in the signal handler. Here's an example:
#include <stdio.h>
#include <stdlib.h>
#include <setjmp.h>
#include <unistd.h>
#include <signal.h>
static jmp_buf jmp_env;
void sighandler(int signo) {
longjmp(jmp_env, 1);
}
int main(void) {
struct sigaction sigact;
sigact.sa_handler = sighandler;
sigact.sa_flags = 0;
sigemptyset(&sigact.sa_mask);
if (sigaction(SIGALRM, &sigact, NULL) < 0) {
perror("sigaction(2) error");
exit(EXIT_FAILURE);
}
if (setjmp(jmp_env) == 0) {
/* First time through
* This is where we would open the file
*/
alarm(5);
/* Simulate a blocked open() */
while (1)
; /* Intentionally left blank */
/* If open(2) is successful here, don't forget to unset
* the alarm
*/
alarm(0);
} else {
/* SIGALRM caught, open(2) canceled */
printf("open(2) timed out\n");
}
return 0;
}
It works by saving the context environment with the help of setjmp(3) before calling open(2). setjmp(3) returns 0 the first time through, and returns whatever value was passed to longjmp(3) otherwise.
Please be aware that this solution is not perfect. Here are some points to keep in mind:
There is a window of time between the call to alarm(2) and the call to open(2) (simulated here with while (1) { ... }) where the process may be preempted for a long time, so there is a chance the alarm expires before we actually attempt to open the file. Sure, with a large timeout such as 2 or 3 seconds this will most likely not happen, but it's still a race condition.
Similarly, there is a window of time between successfully opening the file and canceling the alarm where, again, the process may be preempted for a long time and the alarm may expire before we get the chance to cancel it. This is slightly worse because we have already opened the file so we will "leak" the file descriptor. Again, in practice, with a large timeout this will likely never happen, but it's a race condition nevertheless.
If the code catches other signals, there may be another signal handler in the midst of execution when SIGALRM is caught. Using longjmp(3) inside the signal handler will destroy the execution context of these other signal handlers, and depending on what they were doing, very nasty things may happen (inconsistent state if the signal handlers were manipulating other data structures in the program, etc.). It's as if it started executing, and suddenly crashed somewhere in the middle. You can fix it by: a) carefully setting up all signal handlers such that SIGALRM is blocked before they are invoked (this ensures that the SIGALRM handler does not begin execution until other handlers are done) and b) blocking these other signals before catching SIGALRM. Both actions can be accomplished by setting the sa_mask field of struct sigaction with the necessary mask (the operating system atomically sets the process's signal mask to that value before beginning execution of the handler and unsets it before returning from the handler). OTOH, if the rest of the code doesn't catch signals, then this is not a problem.
sleep(3) may be implemented with alarm(2), and alarm(2) and setitimer(2) share the same timer; if other portions in the code make use of any of these functions, they will interfere and the result will be a huge mess.
Just make sure you weigh in these disadvantages before blindly using this approach. The use of setjmp(3) / longjmp(3) is usually discouraged and makes programs considerably harder to read, understand and maintain. It's not elegant, but right now I don't think you have a choice, unless you're willing to do some core refactoring in the project.
If you do end up using setjmp(3), then at the very least document these limitations.
Maybe there is a strategy of using a separate thread to do the open so the main thread is not held up longer than desired.

Why it will terminate even if I used signl(SIGINT, sig_int)?

As you see, This is a sample in APUE.
#include "apue.h"
static void sig_int(int sig);
int main(int argc, char **argv)
{
char buf[MAXLINE];
pid_t pid;
int status;
if (signal(SIGINT, sig_int) == SIG_ERR) //sig_int is a simple handler function
err_sys("signal error");
printf("%% ");
while (fgets(buf, MAXLINE, stdin) != NULL) {
//This is a loop to implement a simple shell
}
return 0;
}
This is the signal handler
void sig_int(int sig)
/*When I enter Ctrl+C, It'll say a got SIGSTOP, but it would terminate.*/
{
if (sig == SIGINT)
printf("got SIGSTOP\n");
}
When I enter Ctrl+C, It'll say got SIGSTOP, but it terminates right now.
The short version is that the signal interrupts the current system call. You're doing fgets(), which likely now blocks in a read() system-call. The read() call is interrupted, it returns -1 and sets errno to EINTR.
This causes fgets to return NULL, your loop ends, and the program is finished.
Some background
glibc on linux implements two different concepts for signal(). One where system calls are automatically restarted across signals, and one where they are not.
When a signal occurs and the process is blocked in a system call, the system call is interrupted("cancelled"). Execution resumes in the user space application, and the signal handler occurs. The interrupted system call will return an error, and set errno to EINTR.
What happens next depends on whether system calls are restarted or not across signals.
If system calls are restartable, the runtime (glibc) simply retries the system call. For the read() system call, this would be similar to read() being implemented as:
ssize_t read(int fd, void *buf, size_t len)
{
ssize_t sz;
while ((sz = syscall_read(fd, buf, len)) == -1
&& errno == EINTR);
return sz;
}
If system calls are not automatically restarted, read() would behave similar to:
ssize_t read(int fd, void *buf, size_t len)
{
ssize_t sz;
sz = syscall_read(fd, buf, len));
return sz;
}
In the latter case it would be up to your application to check whether read() failed because it was interrupted by a signal. And it is up you, to determine if read() just failed temporarily due to a signal getting handled, and it's up you you to re-try the read() call
signal vs sigaction
By using sigaction() instead of signal(), you get control over
whether system calls are restared or not. The relevant flag you specify with sigaction() is
SA_RESTART
Provide behavior compatible with BSD signal semantics by making certain system calls restartable across signals.
This flag is meaningful only when establishing a sig‐
nal handler. See signal(7) for a discussion of system call restarting.
BSD vs SVR4 semantics
If you use signal(), it depends on what semantics you want. As seen in the description of SA_RESTART, if it is BSD signal semantics, system calls are restarted. This is the default behavior in glibc.
Another difference is that BSD semantics leave the signal handler installed by signal() installed after a signal is handled. SVR4 semantics uninstalls the signal handler, and your signal handler will have to re-install the handler if you want to catch more signals.
apue.h
The "apue.h" however, defines the macro _XOPEN_SOURCE 600 before including <signal.h>. This will cause signal() to have SVR4 semantics, where system calls are not restarted. Which will cause your fgets() call to "fail".
Don't use signal(), use sigaction()
Due to all these differences in behavior, use sigaction() instead of signal. sigaction() lets you control what happens instead of having the semantics change based on a (possibly) hidden #define as is the case with signal()

block alarm() in C in Linux

I need a program with time limitation.
So, I used alarm() function to exit program within the time limit.
But I have a problem with synchronization.
In my program there is a critical section, so if timeout happens within critical section then I want to postpone the alarm after the critical section.
Like this:
main() {
alarm(5);
...
disable_alarm();
//critical section; program shouldn't exit during this section
{...}
enable_alarm(); //if alarm happens during it's disabled, program must exit here.
...
}
In this case, which function should I use for disable_alarm() and enable_alarm().
One way to do this would be to install a signal handler.
Something like the following pseudo-code:
int alarm_received;
my_handler(int sig) {
alarm_received =1;
}
void disable_alarm(void)
{
alarm_received = 0;
signal(SIGALRM, my_handler);
}
void (enable_alarm(void))
{
if (alarm_received)
exit(1);
signal(SIGALRM, SIG_DFL);
}
You should probably use sigaction() rather than signal(), since it's a good habit to get into.
Alternatively, sigprocmask() might be the solution you are looking for, depending on precisely what is meant by blocking a signal. If a blocked signal stays pending, abd gets delivered when unmasked, sigprocmask() is the simpler solution for your problem.
On Linux you could also use poll(2) and timerfd_create(2), assuming you want to test at some points if timer has been reached.

Resources