errno is signal handler - linux

I call write() in my SIGCHLD signal handler.
But write() may sometimes set errno. Will this break my program?
Should I save and then restore errno like the following?
void sigchld_hanlder(int)
{
int old_errno = errno;
write(...);
errno = old_errno;
}

Writing one character to a pipe is a common way to communicate from a signal handler to a poll loop. Yes, you should[1] save errno and restore it in the signal handler. It is also a good idea to put the pipe (or other file descriptor) you are writing to in NON BLOCKING mode so that the write call cannot block (which could happen if the signal handler is called many times before some thread has a chance to read it).
signalfd is another way to safely communicate a signal to a poll loop.
[1]https://www.gnu.org/software/libc/manual/html_node/POSIX-Safety-Concepts.html

Related

ThreadSanitizer: signal handler spoils errno - how to avoid set of errno

I have a bit of code which handles POSIX signals, and as part of it (to be signal safe) - does a sem_post() system call (according to http://man7.org/linux/man-pages/man3/sem_post.3.html 'async signal safe').
But when I run this code - very occasionally, I get the thread sanitizer complaint:
SUMMARY: ThreadSanitizer: signal handler spoils errno /home/lewis/Sandbox/Stroika-Build-Dir-Ubuntu1804_x86_64/Library/Sources/Stroika/Foundation/Execution/SignalHandlers.cpp:497 in Stroika::Foundation::Execution::SignalHandlerRegistry::FirstPassSignalHandler_(int)
I believe this is due to a call to sem_post, which may INDEED overwrite errno.
And yes - this could indeed mess up another thread if it happened at just the right (wrong?) time.
I've always found the 'thread local' errno mechanism a convenient way to handle errors, but I'm just now realizing how dangerous it is for signal handling code.
Is there some way to call system calls WITHOUT overwriting errno? Something at least vaguely portable?
Even http://man7.org/linux/man-pages/man2/syscall.2.html - says it stores its result in errno.
On Linux you can use _syscall.
Another way would be to save errno at the beginning of the signal handler and restore before returning.
If you are sure that your function is safe in this respect you can also use some attributes (both in GCC and CLANG) to disable instrumentation of your functions.

How to deal with errno and signal handler in Linux?

When we write a signal handler that may change the errno, should we save errno at the beginning of the signal handler and restore the errno at the end of it? Just like below:
void signal_handler(int signo){
int temp_errno = errno;
*** //code here may change the errno
errno = temp_errno;
}
The glibc documentation says:
signal handlers that call functions that may set errno or modify the floating-point environment must save their original values, and restore them before returning.
So go ahead and do that.
If you're writing a multi-threaded program using pthreads, there's a workaround that requires less effort. errno will be in thread-local storage. If you dedicate one thread to handle process-directed signals, blocking the signal in all other threads, you don't have to worry about assignments to errno in the signal handler.

Interrupting open() with SIGALRM

We have a legacy embedded system which uses SDL to read images and fonts from an NFS share.
If there's a network problem, TTF_OpenFont() and IMG_Load() hang essentially forever. A test application reveals that open() behaves in the same way.
It occurred to us that a quick fix would be to call alarm() before the calls which open files on the NFS share. The man pages weren't entirely clear whether open() would fail with EINTR when interrupted by SIGALRM, so we put together a test app to verify this approach. We set up a signal handler with sigaction::sa_flags set to zero to ensure that SA_RESTART was not set.
The signal handler was called, but open() was not interrupted. (We observed the same behaviour with SIGINT and SIGTERM.)
I suppose the system treats open() as a "fast" operation even on "slow" infrastructure such as NFS.
Is there any way to change this behaviour and allow open() to be interrupted by a signal?
The man pages weren't entirely clear whether open() would fail with
EINTR when interrupted by SIGALRM, so we put together a test app to
verify this approach.
open(2) is a slow syscall (slow syscalls are those that can sleep forever, and can be awaken when, and if, a signal is caught in the meantime) only for some file types. In general, opens that block the caller until some condition occurs are usually interruptible. Known examples include opening a FIFO (named pipe), or (back in the old days) opening a physical terminal device (it sleeps until the modem is dialed).
NFS-mounted filesystems probably don't cause open(2) to sleep in an interruptible state. After all, you are most likely opening a regular file, and in that case open(2) will not be interruptable.
Is there any way to change this behaviour and allow open() to be
interrupted by a signal?
I don't think so, not without doing some (non-trivial) changes to the kernel.
I would explore the possibility of using setjmp(3) / longjmp(3) (see the manpage if you're not familiar; it's basically non-local gotos). You can initialize the environment buffer before calling open(2), and issue a longjmp(3) in the signal handler. Here's an example:
#include <stdio.h>
#include <stdlib.h>
#include <setjmp.h>
#include <unistd.h>
#include <signal.h>
static jmp_buf jmp_env;
void sighandler(int signo) {
longjmp(jmp_env, 1);
}
int main(void) {
struct sigaction sigact;
sigact.sa_handler = sighandler;
sigact.sa_flags = 0;
sigemptyset(&sigact.sa_mask);
if (sigaction(SIGALRM, &sigact, NULL) < 0) {
perror("sigaction(2) error");
exit(EXIT_FAILURE);
}
if (setjmp(jmp_env) == 0) {
/* First time through
* This is where we would open the file
*/
alarm(5);
/* Simulate a blocked open() */
while (1)
; /* Intentionally left blank */
/* If open(2) is successful here, don't forget to unset
* the alarm
*/
alarm(0);
} else {
/* SIGALRM caught, open(2) canceled */
printf("open(2) timed out\n");
}
return 0;
}
It works by saving the context environment with the help of setjmp(3) before calling open(2). setjmp(3) returns 0 the first time through, and returns whatever value was passed to longjmp(3) otherwise.
Please be aware that this solution is not perfect. Here are some points to keep in mind:
There is a window of time between the call to alarm(2) and the call to open(2) (simulated here with while (1) { ... }) where the process may be preempted for a long time, so there is a chance the alarm expires before we actually attempt to open the file. Sure, with a large timeout such as 2 or 3 seconds this will most likely not happen, but it's still a race condition.
Similarly, there is a window of time between successfully opening the file and canceling the alarm where, again, the process may be preempted for a long time and the alarm may expire before we get the chance to cancel it. This is slightly worse because we have already opened the file so we will "leak" the file descriptor. Again, in practice, with a large timeout this will likely never happen, but it's a race condition nevertheless.
If the code catches other signals, there may be another signal handler in the midst of execution when SIGALRM is caught. Using longjmp(3) inside the signal handler will destroy the execution context of these other signal handlers, and depending on what they were doing, very nasty things may happen (inconsistent state if the signal handlers were manipulating other data structures in the program, etc.). It's as if it started executing, and suddenly crashed somewhere in the middle. You can fix it by: a) carefully setting up all signal handlers such that SIGALRM is blocked before they are invoked (this ensures that the SIGALRM handler does not begin execution until other handlers are done) and b) blocking these other signals before catching SIGALRM. Both actions can be accomplished by setting the sa_mask field of struct sigaction with the necessary mask (the operating system atomically sets the process's signal mask to that value before beginning execution of the handler and unsets it before returning from the handler). OTOH, if the rest of the code doesn't catch signals, then this is not a problem.
sleep(3) may be implemented with alarm(2), and alarm(2) and setitimer(2) share the same timer; if other portions in the code make use of any of these functions, they will interfere and the result will be a huge mess.
Just make sure you weigh in these disadvantages before blindly using this approach. The use of setjmp(3) / longjmp(3) is usually discouraged and makes programs considerably harder to read, understand and maintain. It's not elegant, but right now I don't think you have a choice, unless you're willing to do some core refactoring in the project.
If you do end up using setjmp(3), then at the very least document these limitations.
Maybe there is a strategy of using a separate thread to do the open so the main thread is not held up longer than desired.

How do I "disengage" from `accept` on a blocking socket when signalled from another thread?

I am in the same situation as this guy, but I don't quite understand the answer.
The problem:
Thread 1 calls accept on a socket, which is blocking.
Thread 2 calls close on this socket.
Thread 1 continues blocking. I want it to return from accept.
The solution:
what you should do is send a signal to the thread which is blocked in
accept. This will give it EINTR and it can cleanly disengage - and
then close the socket. Don't close it from a thread other than the one
using it.
I don't get what to do here -- when the signal is received in Thread 1, accept is already blocking, and will continue to block after the signal handler has finished.
What does the answer really mean I should do?
If the Thread 1 signal handler can do something which will cause accept to return immediately, why can't Thread 2 do the same without signals?
Is there another way to do this without signals? I don't want to increase the caveats on the library.
Instead of blocking in accept(), block in select(), poll(), or one of the similar calls that allows you to wait for activity on multiple file descriptors and use the "self-pipe trick". All of the file descriptors passed to select() should be in non-blocking mode. One of the file descriptors should be the server socket that you use with accept(); if that one becomes readable then you should go ahead and call accept() and it will not block. In addition to that one, create a pipe(), set it to non-blocking, and check for the read side becoming readable. Instead of calling close() on the server socket in the other thread, send a byte of data to the first thread on the write end of the pipe. The actual byte value doesn't matter; the purpose is simply to wake up the first thread. When select() indicates that the pipe is readable, read() and ignore the data from the pipe, close() the server socket, and stop waiting for new connections.
The accept() call will return with error code EINTR if a signal is caught before a connection is accepted. So check the return value and error code then close the socket accordingly.
If you wish to avoid the signal mechanism altogether, use select() to determine if there are any incoming connections ready to be accepted before calling accept(). The select() call can be made with a timeout so that you can recover and respond to abort conditions.
I usually call select() with a timeout of 1000 to 3000 milliseconds from a while loop that checks for an exit/abort condition. If select() returns with a ready descriptor I call accept() otherwise I either loop around and block again on select() or exit if requested.
Call shutdown() from Thread 2. accept will return with "invalid argument".
This seems to work but the documentation doesn't really explain its operation across threads -- it just seems to work -- so if someone can clarify this, I'll accept that as an answer.
Just close the listening socket, and handle the resulting error or exception from accept().
I believe signals can be used without increasing "the caveats on the library". Consider the following:
#include <pthread.h>
#include <signal.h>
#include <stddef.h>
static pthread_t thread;
static volatile sig_atomic_t sigCount;
/**
* Executes a concurrent task. Called by `pthread_create()`..
*/
static void* startTask(void* arg)
{
for (;;) {
// calls to `select()`, `accept()`, `read()`, etc.
}
return NULL;
}
/**
* Starts concurrent task. Doesn't return until the task completes.
*/
void start()
{
(void)pthread_create(&thread, NULL, startTask, NULL);
(void)pthread_join(thread);
}
static void noop(const int sig)
{
sigCount++;
}
/**
* Stops concurrent task. Causes `start()` to return.
*/
void stop()
{
struct sigaction oldAction;
struct sigaction newAction;
(void)sigemptyset(&newAction.sa_mask);
newAction.sa_flags = 0;
newAction.sa_handler = noop;
(void)sigaction(SIGTERM, &newAction, &oldAction);
(void)pthread_kill(thread, SIGTERM); // system calls return with EINTR
(void)sigaction(SIGTERM, &oldAction, NULL); // restores previous handling
if (sigCount > 1) // externally-generated SIGTERM was received
oldAction.sa_handler(SIGTERM); // call previous handler
sigCount = 0;
}
This has the following advantages:
It doesn't require anything special in the task code other than normal EINTR handling; consequently, it makes reasoning about resource leakage easier than using pthread_cancel(), pthread_cleanup_push(), pthread_cleanup_pop(), and pthread_setcancelstate().
It doesn't require any additional resources (e.g. a pipe).
It can be enhanced to support multiple concurrent tasks.
It's fairly boilerplate.
It might even compile. :-)

O_RDWR on named pipes with poll()

I have gone through a variaty of different linux named pipe client/server implementations but most of them use the blocking defaults on reads/writes.
As I am already using poll() to check other flags I though it would be a good idea to check for incoming FIFO data via poll() as well...
After all the research I think that opening the pipe in O_RDWR mode is the only way to prevent an indefinitely number of EOF events on a pipe when no writer has opened it.
This way both ends of the pipe are closed and other clients can open the writable end as well. To respond back I would use separate pipes...
My problem is that although I have found some examples that use the O_RDWR flag the open() manpages describe this flag as being unefined when assigned to a FIFO. (http://linux.die.net/man/3/open)
But how would you use poll() on a pipe without O_RDWR? Do you think "O_RDWR" is a legitimate way to open pipes???
First, some preliminaries:
Using O_NONBLOCK and poll() is common practice -- not the other way around. To work successfully, you need to be sure to handle all poll() and read() return states correctly:
read() return value of 0 means EOF -- the other side has closed its connection. This corresponds (usually, but not on all OSes) to poll() returning a POLLHUP revent. You may want to check for POLLHUP before attempting read(), but it is not absolutely necessary since read() is guaranteed to return 0 after the writing side has closed.
If you call read() before a writer has connected, and you have O_RDONLY | O_NONBLOCK, you will get EOF (read() returning 0) repeatedly, as you've noticed. However, if you use poll() to wait for a POLLIN event before calling read(), it will wait for the writer to connect, and not produce the EOFs.
read() return value -1 usually means error. However, if errno == EAGAIN, this simply means there is no more data available right now and you're not blocking, so you can go back to poll() in case other devices need handling. If errno == EINTR, then read() was interrupted before reading any data, and you can either go back to poll() or simply call read() again immediately.
Now, for Linux:
If you open on the reading side with O_RDONLY, then:
The open() will block until there is a corresponding writer open.
poll() will give a POLLIN revent when data is ready to be read, or EOF occurs.
read() will block until either the requested number of bytes is read, the connection is closed (returns 0), it is interrupted by a signal, or some fatal IO error occurs. This blocking sort of defeats the purpose of using poll(), which is why poll() almost always is used with O_NONBLOCK. You could use an alarm() to wake up out of read() after a timeout, but that's overly complicated.
If the writer closes, then the reader will receive a poll() POLLHUP revent and read() will return 0 indefinitely afterwards. At this point, the reader must close its filehandle and reopen it.
If you open on the reading side with O_RDONLY | O_NONBLOCK, then:
The open() will not block.
poll() will give a POLLIN revent when data is ready to be read, or EOF occurs. poll() will also block until a writer is available, if none is present.
After all currently available data is read, read() will either return -1 and set errno == EAGAIN if the connection is still open, or it will return 0 if the connection is closed (EOF) or not yet opened by a writer. When errno == EAGAIN, this means it's time to return to poll(), since the connection is open but there is no more data. When errno == EINTR, read() has read no bytes yet and was interrupted by a signal, so it can be restarted.
If the writer closes, then the reader will receive a poll() POLLHUP revent, and read() will return 0 indefinitely afterwards. At this point the reader must close its filehandle and reopen it.
(Linux-specific:) If you open on the reading side with O_RDWR, then:
The open() will not block.
poll() will give a POLLIN revent when data is ready to be read. However, for named pipes, EOF will not cause POLLIN or POLLHUP revents.
read() will block until the requested number of bytes is read, it is interrupted by a signal, or some other fatal IO error occurs. For named pipes, it will not return errno == EAGAIN, nor will it even return 0 on EOF. It will just sit there until the exact number of bytes requested is read, or until it receives a signal (in which case it will return the number of bytes read so far, or return -1 and set errno == EINTR if no bytes were read so far).
If the writer closes, the reader will not lose the ability to read the named pipe later if another writer opens the named pipe, but the reader will not receive any notification either.
(Linux-specific:) If you open on the reading side with O_RDWR | O_NONBLOCK, then:
The open() will not block.
poll() will give a POLLIN revent when data is ready to be read. However, EOF will not cause POLLIN or POLLHUP revents on named pipes.
After all currently available data is read, read() will return -1 and set errno == EAGAIN. This is the time to return to poll() to wait for more data, possibly from other streams.
If the writer closes, the reader will not lose the ability to read the named pipe later if another writer opens the named pipe. The connection is persistent.
As you are rightly concerned, using O_RDWR with pipes is not standard, POSIX or elsewhere.
However, since this question seems to come up often, the best way on Linux to make "resilient named pipes" which stay alive even when one side closes, and which don't cause POLLHUP revents or return 0 for read(), is to use O_RDWR | O_NONBLOCK.
I see three main ways of handling named pipes on Linux:
(Portable.) Without poll(), and with a single pipe:
open(pipe, O_RDONLY);
Main loop:
read() as much data as needed, possibly looping on read() calls.
If read() == -1 and errno == EINTR, read() all over again.
If read() == 0, the connection is closed, and all data has been received.
(Portable.) With poll(), and with the expectation that pipes, even named ones, are only opened once, and that once they are closed, must be reopened by both reader and writer, setting up a new pipeline:
open(pipe, O_RDONLY | O_NONBLOCK);
Main loop:
poll() for POLLIN events, possibly on multiple pipes at once. (Note: This prevents read() from getting multiple EOFs before a writer has connected.)
read() as much data as needed, possibly looping on read() calls.
If read() == -1 and errno == EAGAIN, go back to poll() step.
If read() == -1 and errno == EINTR, read() all over again.
If read() == 0, the connection is closed, and you must terminate, or close and reopen the pipe.
(Non-portable, Linux-specific.) With poll(), and with the expectation that named pipes never terminate, and may be connected and disconnected multiple times:
open(pipe, O_RDWR | O_NONBLOCK);
Main loop:
poll() for POLLIN events, possibly on multiple pipes at once.
read() as much data as needed, possibly looping on read() calls.
If read() == -1 and errno == EAGAIN, go back to poll() step.
If read() == -1 and errno == EINTR, read() all over again.
If read() == 0, something is wrong -- it shouldn't happen with O_RDWR on named pipes, but only with O_RDONLY or unnamed pipes; it indicates a closed pipe which must be closed and re-opened. If you mix named and unnamed pipes in the same poll() event-handling loop, this case may still need to be handled.
According to open(2) man page, you can pass O_RDONLY|O_NONBLOCK or O_WRONLY|O_NONBLOCK to avoid the open syscall to be blocked (you'll get errno == ENXIO in that case)
As I commented read also the fifo(7) and mkfifo(3) man pages.
Just keep an open O_WRONLY file descriptor in the reading process alongside the O_RDONLY one. This will achieve the same effect, ensuring that read() never returns end-of-file and that poll() and select() will block.
And it's 100% POSIX

Resources