Why can a signal handler not run during system calls? - linux

In Linux a process that is waiting for IO can be in either of the states TASK_INTERRUPTIBLE or TASK_UNINTERRUPTIBLE.
The latter, for example, is the case when the process waits until the read from a regular file completes. If this read takes very long or forever (for whatever reasons), it leads to the problem that the process cannot receive any signals during that time.
The former, is the case if, for example, one waits for a read from a socket (which may take an unbounded amount of time). If such a read call gets interrupted, however, it leads to the problem that it requires the programmer/user to handle the errno == EINTER condition correctly, which he might forget.
My question is: Wouldn't it be possible to allow all system calls to be interrupted by signals and, moreover, in such a way that after the signal handler ran, the original system call simply continues its business? Wouldn't this solve both problems? If it is not possible, why?

The two halves of the question are related but I may have digested too much spiked eggnog to determine if there is so much a one-to-one as opposed to a one-to-several relationship here.
On the kernel side I think a TASK_KILLABLE state was added a couple of years back. The consequence of that was that while the process blocked in a system call (for whatever reason), if the kernel encountered a signal that was going to kill the process anyway, it would let nature take its course. That reduces the chances of a process being permanent stuck in an uninterruptible state. What progress has been made in changing individual system calls to avail themselves of that option I do not know.
On the user side, SA_RESTART and its convenience function cousin siginterrupt go some distance to alleviating the application programmer pain. But the fact that these are specified per signal and aren't honored by all system calls is a pretty good hint that there are reasons why a blanket interruption scheme like you propose can't be implemented, or at least easily implemented. Just thinking about the possible interactions of the matrices of signals, individual system calls, calls that have historically expected and backwardly supported behavior (and possibly multiple historical semantics based on their system bloodline!), is a bit dizzying. But maybe that is the eggnog.

Related

Will non-blocking I/O be put to sleep during copying data from kernel to user?

I ask this question because I am looking at multiplexing I/O in Go, which is using epollwait.
When an socket is ready, a goroutine will be waked up and begin to read socket in non-blocking mode. If the read system call still will be blocked during copying data from kernel to user, I assume the kernel thread the gorouine attached to will be put to sleep as well.
I am not sure of that, hoping someone can help correct me if I am wrong.
I fail to quite parse what you've written.
I'll try to make a sheer guess and conjure you might be overseeing the fact that the write(2) and read(2) syscalls (and those of their ilk such as send(2) and recv(2)) on the sockets put into non-blocking mode are free to consume (and return, respectively) less data than requested.
In other words, a write(2) call on a non-blocking socket told to write 1 megabyte of data will consume just as much data currently fits into the assotiated kernel buffer and return immediately, signalling it consumed only as much data. The next immediate call to write(2) will likely return EWOULDBLOCK.
The same goes for the read(2) call: if you pass it a buffer large enough to hold 1 megabyte of data, and tell it to read that number of bytes, the call will only drain the contents of the kernel buffer and return immediately, signaling how much data it actually copied. The next immediate call to read(2) will likely return EWOULDBLOCK.
So, any attempt to get or put data to the socket succeeds almost immediately: either after the data had been shoveled between the kernel's buffer and the user space or right away—with the EAGAIN return code.
Sure, there's supposedly a possibility for an OS thread to be suspended right in the middle of performing such a syscall, but this does not count as "blocking in a syscall."
Update to the original answer in response to the following comment of the OP:
<…>
This is what I see in book
"UNIX Network Programming" (Volume 1, 3rd), chapter 6.2:
A synchronous I/O operation causes the requesting process
to be blocked until that I/O operation completes. Using these
definitions, the first four I/O models—blocking, nonblocking, I/O
multiplexing, and signal-driven I/O—are all synchronous because the
actual I/O operation (recvfrom) blocks the process.
It uses "blocks" to describe nonblocking I/O operation. That makes me confused.
I still don't understand why the book uses "blocks the process" if the process is actually not blocked.
I can only guess that the book's author intended to highlight that the process is indeed blocked since entering a syscall and until returning from it. Reads from and writes to a non-blocking socket do block to transfer the data, if available, between the kernel and the user space. We colloquially say this does not block because we mean "it does not block waiting and doing nothing for an indeterminate amount of time".
The book's author might contrast this to the so-called asynchronous I/O (called "overlapping" on Windows™)—where you basically give the kernel a buffer with/for data and ask it to do away with it completely in parallel with your code—in the sense the relevant syscall returns right away and the I/O is carried out in background (with regard to your user-space code).
To my knowledge, Go does not use kernel's async I/O facilities on neither platform it supports. You might look there for the developments regarding Linux and its contemporary io_uring subsystem.
Oh, and one more point. The book might (at that point through the narrative at least) be discussing a simplified "classic" scheme where there are no in-process threads, and the sole unit of concurrency is the process (with a single thread of execution). In this scheme, any syscall obviously blocks the whole process. In contrast, Go works only on kernels which support threads, so in a Go program a syscall never blocks the whole process—only the thread it's called on.
Let me take yet another stab at explaining the problem as—I perceive—the OP stated it.
The problem of serving multiple client requests is not new—one of the more visible first statements of it is "The C10k problem".
To quickly recap it, a single threaded server with blocking operations on the sockets it manages is only realistically able to handle a single client at a time.
To solve it, there exist two straightforward approaches:
Fork a copy of the server process to handle each incoming client connection.
On an OS which supports threads, fork a new thread inside the same process to handle each incoming client.
They have their pros and cons but they both suck with regard to resource usage, and—which is more important—they do not play well with the fact most clients have relatively low rate and bandwidth of I/O they perform with regard to the processing resources available on a typical server.
In other words, when serving a typical TCP/IP exchange with a client, the serving thread most of the time sleeps in the write(2) and read(2) calls on the client socket.
This is what most people mean when talking about "blocking operations" on sockets: if a socket is blocking, and operation on it will block until it can actually be carried out, and the originating thread will be put to sleep for an indeterminate amount of time.
Another important thing to note is that when the socket becomes ready, the amount of work done is typically miniscule compared to the amount of time slept between the wakeups.
While the tread sleeps, its resources (such as memory) are effectively wasted, as they cannot be used to do any other work.
Enter "polling". It combats the problem of wasted resources by noticing that the points of readiness of networked sockets are relatively rare and far in between, so it makes sense to have lots of such sockets been served by a single thread: it allows to keep the thread almost as busy as theoretically possible, and also allows to scale out when needed: if a single thread is unable to cope with the data flow, add another thread, and so on.
This approach is definitely cool but it has a downside: the code which reads and writes data must be re-written to use callback style instead of the original plain sequential style. Writing with callbacks is hard: you usuaully have to implement intricate buffer management and state machines to deal with this.
The Go runtime solves this problem by adding another layer of scheduling for its execution flow units—goroutines: for goroutines, operations on the sockets are always blocking, but when a goroutine is about to block on a socket, this is transparently handled by suspending only the goroutine itself—until the requested operation will be able to proceed—and using the thread the goroutine was running on to do other work¹.
This allows to have the best of both approaches: the programmer may write classic no-brainer sequential callback-free networking code but the threads used to handle networking requests are fully utilized².
As to the original question of blocking, both the goroutine and the thread it runs on are indeed blocked when the data transfer on a socket is happening, but since what happens is data shoveling between a kernel and a user-space buffer, the delay is most of the time small, and is no different to the classic "polling" case.
Note that performing of syscalls—including I/O on non-pollable descriptors—in Go (at leas up until, and including Go 1.14) does block both the calling goroutine and the thread it runs on, but is handled differently from those of pollable descriptors: when a special monitoring thread notices a goroutine spent in a syscall more that certain amount of time (20 µs, IIRC), the runtime pulls the so-called "processor" (a runtime thing which runs goroutines on OS threads) from under the gorotuine and tries to make it run another goroutine on another OS thread; if there is a goroutine wanting to run but no free OS thread, the Go runtime creates another one.
Hence "normal" blocking I/O is still blocking in Go in both senses: it blocks both goroutines and OS threads, but the Go scheduler makes sure the program as a whole still able to make progress.
This could arguably be a perfect case for using true asynchronous I/O provided by the kernel, but it's not there yet.
¹ See this classic essay for more info.
² The Go runtime is certainly not the first one to pioneer this idea. For instance, look at the State Threads library (and the more recent libtask) which implement the same approach in plain C; the ST library has superb docs which explain the idea.

Thread Block Operation

I asked two questions(1,2) about reading a file using using a thread so that the main thread doesn't get blocked. My problem wasn't so much writing and starting a thread, my problem was I didn't understand what operations were blocking. I've been told before that reading from a file is a blocking, using the for loop in my second example, is blocking. I don't really understand why or even spot it when looking at a piece of code.
So my question obviously is, how do you spot or determine when an operation is blocking a thread, and how do you fix it?
So my question obviously is, how do you spot or determine when an
operation is blocking a thread
There's no magic way to do it; in general you have to read the documentation for whatever functions you call in order to get an idea about whether they are guaranteed to return quickly or whether they might block for an extended period of time.
If you're looking at a running program and want to know what its threads are currently doing, you can either watch them using a debugger, or insert print statements at various locations so that you can tell (by seeing what text gets printed to stdout and what text doesn't) roughly where the thread is at and what it is doing.
, and how do you fix it?
Blocking is not "broken", so there's nothing to fix. Blocking is intentional behavior, so that e.g. when you call a function that reads from disk, it can provide you back some useful data when it returns. (Consider an alternative non-blocking read, which would always return immediately, but in most cases wouldn't be able to provide you with any data, since the hard drive had not had time to load in any data yet -- not terribly useful).
That said, for network operations you can set your socket(s) to non-blocking mode so that calls to send(), recv(), etc are guaranteed never to block (they will return an error code instead). That only works for networking though; most OS's don't support non-blocking I/O for disk access.

Can I safely access potentially unallocated memory addresses?

I'm trying to create memcpy like function that will fail gracefully (ie return an error instead of segfaulting) when given an address in memory that is part of an unallocated page. I think the right approach is to install a sigsegv signal handler, and do something in the handler to make the memcpy function stop copying.
But I'm not sure what happens in the case my program is multithreaded:
Is it possible for the signal handler to execute in another thread?
What happens if a segfault isn't related to any memcpy operation?
How does one handle two threads executing memcpy concurrently?
Am I missing something else? Am I looking for something that's impossible to implement?
Trust me, you do not want to go down that road. It's a can of worms for many reasons. Correct signal handling is already hard in single threaded environments, yet alone in multithreaded code.
First of all, returning from a signal handler that was caused by an exception condition is undefined behavior - it works in Linux, but it's still undefined behavior nevertheless, and it will give you problems sooner or later.
From man 2 sigaction:
The behaviour of a process is undefined after it returns normally from
a signal-catching function for a SIGBUS, SIGFPE, SIGILL or SIGSEGV
signal that was not generated by kill(), sigqueue() or raise().
(Note: this does not appear on the Linux manpage; but it's in SUSv2)
This is also specified in POSIX. While it works in Linux, it's not good practice.
Below the specific answers to your questions:
Is it possible for the signal handler to execute in another thread?
Yes, it is. A signal is delivered to any thread that is not blocking it (but is delivered only to one, of course), although in Linux and many other UNIX variants, exception-related signals (SIGILL, SIGFPE, SIGBUS and SIGSEGV) are usually delivered to the thread that caused the exception. This is not required though, so for maximum portability you shouldn't rely on it.
You can use pthread_sigmask(2) to block signals in every thread but one; that way you make sure that every signal is always delivered to the same thread. This makes it easy to have a single thread dedicated to signal handling, which in turn allows you to do synchronous signal handling, because the thread may use sigwait(2) (note that multithreaded code should use sigwait(2) rather than sigsuspend(2)) until a signal is delivered and then handle it synchronously. This is a very common pattern.
What happens if a segfault isn't related to any memcpy operation?
Good question. The signal is delivered, and there is no (trivial) way to portably differentiate a genuine segfault from a segfault in memcpy(3).
If you have one thread taking care of every signal, like I mentioned above, you could use sigwaitinfo(2), and then examine the si_addr field of siginfo_t once sigwaitinfo(2) returned. The si_addr field is the memory location that caused the fault, so you could compare that to the memory addresses passed to memcpy(3).
But some platforms, most notably Mac OS, do not implement sigwaitinfo(2) or its cousin sigtimedwait(2).
So there's no way to do it portably.
How does one handle two threads executing memcpy concurrently?
I don't really understand this question, what's so special about multithreaded memcpy(3)? It is the caller's responsibility to make sure regions of memory being read from and written to are not concurrently accessed; memcpy(3) isn't (and never was) thread-safe if you pass it overlapping buffers.
Am I missing something else? Am I looking for something that's
impossible to implement?
If you're concerned with portability, I would say it's pretty much impossible. Even if you just focus on Linux, it will be hard. If this was something easy to do, by this time someone would have probably done it already.
I think you're better off building your own allocator and force user code to rely on it. Then you can store state and manage allocated memory, and easily tell if the buffers passed are valid or not.

When is POSIX thread cancellation not immediate?

The POSIX specifies two types for thread cancellation type: PTHREAD_CANCEL_ASYNCHRONOUS, and PTHREAD_CANCEL_DEFERRED (set by pthread_setcanceltype(3)) determining when pthread_cancel(3) should take effect. By my reading, the POSIX manual pages do not say much about these, but Linux manual page says the following about PTHREAD_CANCEL_ASYNCHRONOUS:
The thread can be canceled at any time. (Typically, it will be canceled immediately upon receiving a cancellation request, but the system doesn't guarantee this.)
I am curious about the meaning about the system doesn't guarantee this. I can easily imagine this happening in multicore/multi-CPU systems (before context switch). But what about single core systems:
Could we have a thread not cancelled immediately when cancellation is requested and cancellation is enabled (pthread_setcancelstate(3)) and cancel type set to PTHREAD_CANCEL_ASYNCHRONOUS?
If yes, under what conditions could this happen?
I am mainly curious about Linux (LinuxThreads / NPTL), but also more generally about POSIX standard compliant way of viewing this cancellation business.
Update/Clarification: Here the real practical concern is usage of resources that are destroyed immediately after calling pthread_cancel() where the targeted thread have cancellation enabled and set to type PTHREAD_CANCEL_ASYNCHRONOUS!!! So the point really is: is there even a tiny possibility for the cancelled thread in this case to continue running normally after context switch (even for a very small time)?
Thanks for Damon's answer the question is reduced about signal delivery and handling in relation to the next context switch.
Update-2: I answered my own question to point that this is bad concern and that the underlying program design should be addressed in fundamentally different conceptual level. I wish this "wrong" question is useful for others wondering about mysteries of asynchronous cancellation.
The meaning is just what it says: It's not guaranteed to happen instantly. The reason for this is that a certain "liberty" for implementation details is needed and accounted for in the standard.
For example under Linux/NPTL, cancellation is implemented by sending signal nr. 32. The thread is cancelled when the signal is received, which usually happens at the next kernel-to-user switch, or at the next interrupt, or at the end of the time slice (which may accidentially be immediately, but usually is not). A signal is never received while the thread isn't running, however. So the real catch here is actually that signals are not necessarily received immediately.
If you think about it, it isn't even possible to do it much different, either. Since you can phtread_cleanup_push some handlers which the operating system must execute (it cannot just blast the thread out of existence!), the thread must necessarily run to be cancelled. There is no guarantee that any particular thread (including the one you want to cancel) is running at the exact time you cancel a thread, so there can be no guarantee that it is cancelled immediately.
Except of course, hypothetically, if the OS was implemented in a way as to block the calling thread and schedule the to-be-cancelled thread so it executes its handlers, and only unblocks pthread_cancel afterwards. But since pthread_cancel isn't specified as blocking, this would be an utterly nasty surprise. It would also be somewhat inacceptable because of interfering wtih execution time limits and scheduler fairness.
So, either your cancel type is "disable", then nothing happens. Or, it is "enable", and the cancel type is "deferred", then the thread cancels when calling a function that is listed as cancellation point in pthreads(7).
Or, it is "asynchronous", then as stated above, the OS will do "something" to cancel the thread as soon as it deems appropriate -- not at a precise, well-defined time, but "soon". In the case of Linux, by sending a signal.
If you need to wonder when the asynchronous cancellation happen, you are doing something terribly wrong.
Following Standards: You are eating ground below your feet by deliberately creating or allowing code to exist whose correctness depends on assumptions about the platform (single core, particular implementation, whatever). It is almost always better, if possible, to follow the standards (and document clearly when it is not possible). The name PTHREAD_CANCEL_ASYNCHROUNOUS itself suggests the meaning asynchronous, which is different from immediate or even almost immediate. The original poster specifically states single core, but why should you allow code to exist that will break in non-deterministic ways, when your code is put to run in truly parallel machines (multiple cores or CPUs) where it is practically impossible to have guarantee of immediateness (this would require stopping the other cores from running or waiting for context switch or some other terrible hack which your OS/CPU is not going to support to support your unconventional wishes).
Asynchronous thread cancellation mode is not meant for guaranteed immediate cancellation of a thread. Hence it is a terribly confusing hack to use them in this way even if it would work.
Async-Safeness: If you are concerned about the mechanism of asynchronous cancellation, it raises the suspicion that the threads in question (because of lack of independence) are maybe not purely computational or written in async-cancel-safe manner.
POSIX specifies only three functions as async-cancel safe: pthread_cancel(3), pthread_setcancelstate(3), and pthread_setcancelmode(3) - see IEEE Std 1003.1, 2013 Edition, 2.9.5. This cancellation mode is only suitable for purely computational tasks that do not call (other than purely computational) library functions; such code would not provide cancellation points if the threads were set to run in the default deferred cancellation mode. Hence the rationale for defining such mode.
It is possible to write async-cancel-safe code by disabling cancellation during critical sections. But library writers (including POSIX library implementors) in general should not care about async-safetyness by reasons of following general convention, avoiding complexity, and even avoiding performance overhead. Because the library writers should not care, you should never expect async-safetyness unless it is explicitly stated otherwise.
If your code is not async-safe (because for example calling other libraries, including POSIX/standard C libraries without temporarily disabling cancellation or changing cancellation mode) and asynchronous cancellation occurs, you might leak resources (memory, etc), leave behind inconsistent states and locked mutexes dead-locking other threads, and summon many other problems currently imaginable and non-imaginable. (If you are writing in C++, it seems you will have other issues to deal with due to POSIX thread cancellation's close association with exception handling.)

What mechanism do PIPES use to "wake up" the recipient?

I have two questions is one here.
On Windows, I am familiar with pipes and how they work. However, I am curious as to what mechanism the OS uses to notify the recipient thread of a message arrival.
Does the thread "poll & sleep" continuously for data? Does the OS check to see if the thread is sleeping and wake it up? Or is there some other mechanism used?
Specifically, I want to build an IPC system where many threads need to pass messages. I don't need to use pipes, but I do need to know the most efficient notification method possible.
The developer can decide how they want to work with the pipe, whether they will sleep/poll or else they want to call blocking functions and wait until the data is available.
About the mechanism that the pipe has for waking up the process --assuming that the process is in a blocking read call-- it is not the pipe, but the OS the one that takes charge, like in any other OS call: it registers the operation and blocks the process/thread until the data is available. When the data is available, it completes the system call.
This is an answer for Unix. I'd lay good money on Windows being pretty similar as the solution has been around a long time and is well known to be robust. The details will vary a bit (different API calls, specifics of semantics, etc.)
It depends on whether the other end is using the pipe's file descriptor in blocking or non-blocking mode.
In blocking mode, the process is waiting in the OS kernel for the data to become available. The way in which notification happens there depends on the OS. Chances are it involves a queue of processes that are considered to be runnable, and everything's made simpler by the fact that the kernel can (largely) control what interrupts it. In a simple (single processor) implementation you could go for something as trivial as noting on write to the pipe that the other process is waiting to read from it (via some kind of “interest set”), and so marking the reader as runnable at that point (at which time it becomes up to the scheduler to decide).
In non-blocking mode, either the process is polling from time to time (yuck!) or they're using a system call like select() or poll() (there are some higher-performance variants too). That's very much like the Windows call WaitForMultipleObjects() and works just great with pipes. That in turn ends up back at that runnable process queue, the interest set, and the scheduler.
It also doesn't really matter too much whether it's blocking because the pipe is full or the pipe is empty, as the control flow is pretty much symmetric between readers and writers. (Unlike the data flow, of course.)

Resources