When and how are system calls interrupted? - linux

This is a followup question to Is a successful send() "atomic"?, as I think it actually concerns system calls in general, not just sends on sockets.
Which system calls can be interrupted, and when they are, where is the interruption handled? I've learned about SA_RESTART, but don't exactly understand what is going on.
If I make a system call without SA_RESTART, can the call be interrupted by any kind of interrupts (e.g. user input) that don't concern my application, but require the OS to abort my call and do something else? Or is it only interrupted by signals that directly concern my process (CTRL+C, socket closed, ...)?
When setting SA_RESTART, what are the semantics of a send() or any other "slow" syscall? Will it always block until all of my data is transmitted or the socket goes down, or can it return with a number smaller than the count in send()'s parameter?
Where is restarting implemented? Does the OS know that I want the call to be restarted upon any interrupts, or is some signal sent to my process and then handled by library code? Or do I have to do it myself, e.g. wrap the call in a while loop and retry as often as necessary?

System calls can be interrupted by any signal, this includes such signals as SIGINT (generated by CTRL-C), SIGHUP, etc.
When SA_RESTART is set, a send() will return (with the sent count) if any data was transmitted before the signal was received, it will return an error EINTR if a send timeout was set (as those can't be restarted), otherwise the send() will be restarted.
System call restarting is implemented in the kernel's signal handling code. The system call internally returns -ERESTARTSYS upon detecting a pending signal (or having a wait interrupted by a signal), which causes the signal handling code to restore the instruction pointer and relevant registers to the state before the call, making the syscall repeat.

Related

Does read/write blocked system call put the process in TASK_UNINTERRUPTIBLE or TASK_INTERRUPTIBLE state?

As mentioned in the man page of signal(7),
Interruption of system calls and library functions by signal handlers
If a signal handler is invoked while a system call or library function call is blocked, then either:
* the call is automatically restarted after the signal handler returns; or
* the call fails with the error EINTR.
Which of these two behaviors occurs depends on the interface and whether or not the signal handler was established using the SA_RESTART flag (see sigaction(2)). The details vary across UNIX systems; below, the details for
Linux.
If a blocked call to one of the following interfaces is interrupted by a signal handler, then the call will be automatically restarted after the signal handler returns if the SA_RESTART flag was used; otherwise the call will
fail with the error EINTR:
* read(2), readv(2), write(2), writev(2), and ioctl(2) calls on "slow" devices. A "slow" device is one where the I/O call may block for an indefinite time, for example, a terminal, pipe, or socket. If an I/O call on a
slow device has already transferred some data by the time it is interrupted by a signal handler, then the call will return a success status (normally, the number of bytes transferred). Note that a (local) disk is not a
slow device according to this definition; I/O operations on disk devices are not interrupted by signals.
As it is mentioned that a blocked call to one of the following interfaces(read, write) is interrupted by a signal handler, then the call will be automatically restarted after the signal handler returns if the SA_RESTART flag was used, that means in case of blocked read/write system call, process must be in TASK_INTERRUPTIBLE state.
But when I was trying to find out blocked system calls that put process in TASK_UNINTERRUPTIBLE state, I found https://unix.stackexchange.com/questions/62697/why-is-i-o-uninterruptible and Why doing I/O in Linux is uninterruptible? , and in both the places it is mentioned that blocked I/O call(read, write) will put a process in TASK_UNINTERRUPTIBLE.
Also its mentioned here: https://access.redhat.com/sites/default/files/attachments/processstates_20120831.pdf
The Uninterruptible state is mostly used by device drivers waiting for disk or network I/O. When the process
is sleeping uninterruptibly, signals accumulated during the sleep are noticed when the process returns from
the system call or trap. In Linux systems. the command ps -l uses the letter D in the state field (S) to
indicate that the process is in an Uninterruptible sleep state. In that case, the process state flag is set as
follows:
p->state = TASK_UNINTERRUPTABLE
LEARN MORE: Read more about D states in the Red Hat Knowledgebase:
https://access.redhat.com/knowledge/solutions/59989/
It's kind of confusing.
Also I want to know other blocked system calls which can put a process in TASK_UNINTERRUPTIBLE state.
With read(2) or write(2) family syscalls, the type of sleep depends on the type of file that is being accessed. In the documentation you quoted, "slow" devices are those where a read/write will sleep interruptibly, and "fast" devices are those which will sleep uninterruptibly (the uninterruptible sleep state is named D for "Disk Wait", since originally read/write on disk files was the most common reason for this type of sleep).
Note that "blocking" technically refers only to interruptible sleep.
Almost any system call can enter uninterruptible sleep, because this can happen (among other things) when a process needs to acquire a lock protecting an internal kernel resource. Usually, this sort of uninterruptible sleep is so short-lived that you will not notice it.

When a goroutine blocks on I/O how does the scheduler identify that it has stopped blocking?

From what I've read here, the golang scheduler will automatically determine if a goroutine is blocking on I/O, and will automatically switch to processing others goroutines on a thread that isn't blocked.
What I'm wondering is how the scheduler then figures out that that goroutine has stopped blocking on I/O.
Does it just do some kind of polling every so often to check if it's still blocking? Is there some kind of background thread running that checks the status of all goroutines?
For example, if you were to do an HTTP GET request inside a goroutine that took 5s to get a response, it would block while waiting for the response, and the scheduler would switch to processing another goroutine. Now given that, when the server returns a response, how does the scheduler understand that the response has arrived, and it's time to go back to the goroutine that made the GET so that it can process the result of the GET?
All I/O must be done through syscalls, and the way syscalls are implemented in Go, they are always called through code that is controlled by the runtime. This means that when you call a syscall, instead of just calling it directly (thus giving up control of the thread to the kernel), the runtime is notified of the syscall you want to make, and it does it on the goroutine's behalf. This allows it to, for example, do a non-blocking syscall instead of a blocking one (essentially telling the kernel, "please do this thing, but instead of blocking until it's done, return immediately, and let me know later once the result is ready"). This allows it to continue doing other work in the meantime.

how and when -EINTR is set when linux system call is blocked

If a system call is blocked , the process state is set to TASK_INTERRUPTIBLE, and the process is removed from run queue.
When a signal is delivered to that process, kernel adds the signal to list of pending signals and sets the process state to TASK_RUNNING.
And when next time schedule() is called this process is executed.
What i did not understand is how exactly blocked system call returns -EINTR to userspace?
Any blocked system call can return -EINTR?
The logic of setting -EINTR is done by signal handling code or by system call itself?
AFAIK signal handling only happens before returning to userspace, is that true?
Does signal handling happens during context switch?
Please help me understand this.
When the process is running again (i.e., when schedule() returns), the driver must check for this case with the signal_pending() function, and abort what it's doing and return the -EINTR error code.
Many system calls are restartable, i.e., after interrupted by a signal, they could be just executed again without changing the functionality.
In that case, they return -ERESTARTSYS instead of -EINTR, and the kernel will handle the restarting automatically after the signal has been handled.
For an example, see the function uart_wait_modem_status in drivers/tty/serial/serial_core.c, or any other place where EINTR or ERESTARTSYS are used.

When does a process handle a signal

I want to know when does a linux process handles the signal.
Assuming that the process has installed the signal handler for a signal, I wanted to know when would the process's normal execution flow be interrupted and signal handler called.
According to http://www.tldp.org/LDP/tlk/ipc/ipc.html, the process would handle the signal when it exits from a system call. This would mean that a normal instruction like a = b+c (or its equivalent machine code) would not be interrupted because of signal.
Also, there are system calls which would get interrupted (and fail with EINTR or get restarted) upon receiving a signal. This means that signal is processed even before the system call completes. This behaviour seems to b conflicting with what I have mentioned in the previous paragraph.
So, I am not clear as to when is the signal processed and in which process states would it be handled by the process. Can it be interrupted
Anytime it enters from kernel space to user space, or
Anytime it is in user space, or
Anytime the process is scheduled for execution by the scheduler
Thanks!
According to http://www.tldp.org/LDP/tlk/ipc/ipc.html, the process would handle the signal when it exits from a system call. This would mean that a normal instruction like a = b+c (or its equivalent machine code) would not be interrupted because of signal.
Well, if that were the case, a CPU-intensive process would not obey the process scheduler. The scheduler, in fact, can interrupt a process at any point of time when its time quantum has elapsed. Unless it is a FIFO real-time process.
A more correct definition: One point when a signal is delivered to the process is when the control flow leaves the kernel mode to resume executing user-mode code. That doesn't necessarily involve a system call.
A lot of the semantics of signal handling are documented (for Linux, anyway - other OSes probably have similar, but not necessarily in the same spot) in the section 7 signal manual page, which, if installed on your system, can be accessed like this:
man 7 signal
If manual pages are not installed, online copies are pretty easy to find.

Implementation of Signals under Linux and Windows?

I am not new to the use of signals in programming. I mostly work in C/C++ and Python.
But I am interested in knowing how signals are actually implemented in Linux (or Windows).
Does the OS check after each CPU instruction in a signal descriptor table if there are any registered signals left to process? Or is the process manager/scheduler responsible for this?
As signal are asynchronous, is it true that a CPU instruction interrupts before it complete?
The OS definitely does not process each and every instruction. No way. Too slow.
When the CPU encounters a problem (like division by 0, access to a restricted resource or a memory location that's not backed up by physical memory), it generates a special kind of interrupt, called an exception (not to be confused with C++/Java/etc high level language exception abstract).
The OS handles these exceptions. If it's so desired and if it's possible, it can reflect an exception back into the process from which it originated. The so-called Structured Exception Handling (SEH) in Windows is this kind of reflection. C signals should be implemented using the same mechanism.
On the systems I'm familiar with (although I can't see why it should be much different elsewhere), signal delivery is done when the process returns from the kernel to user mode.
Let's consider the one cpu case first. There are three sources of signals:
the process sends a signal to itself
another process sends the signal
an interrupt handler (network, disk, usb, etc) causes a signal to be sent
In all those cases the target process is not running in userland, but in kernel mode. Either through a system call, or through a context switch (since the other process couldn't send a signal unless our target process isn't running), or through an interrupt handler. So signal delivery is a simple matter of checking if there are any signals to be delivered just before returning to userland from kernel mode.
In the multi cpu case if the target process is running on another cpu it's just a matter of sending an interrupt to the cpu it's running on. The interrupt does nothing other than force the other cpu to go into kernel mode and back so that signal processing can be done on the way back.
A process can send signal to another process. process can register its own signal handler to handle the signal. SIGKILL and SIGSTOP are two signals which can not be captured.
When process executes signal handler, it blocks the same signal, That means, when signal handler is in execution, if another same signal arrives, it will not invoke the signal handler [ called blocking the signal], but it makes the note that the signal has arrived [ ie: pending signal]. once the already running signal handler is executed, then the pending signal is handled. If you do not want to run the pending signal, then you can IGNORE the signal.
The problem in the above concept is:
Assume the following:
process A has registered signal handler for SIGUSR1.
1) process A gets signal SIGUSR1, and executes signalhandler()
2) process A gets SIGUSR1,
3) process A gets SIGUSR1,
4) process A gets SIGUSR1,
When step (2) occurs, is it made as 'pending signal'. Ie; it needs to be served.
And when the step (3) occors, it is just ignored as, there is only one bit
available to indicate the pending signal for each available signals.
To avoid such problem, ie: if we dont want to loose the signals, then we can use
real time signals.
2) Signals are executed synchronously,
Eg.,
1) process is executing in the middle of signal handler for SIGUSR1,
2) Now, it gets another signal SIGUSR2,
3) It stops the SIGUSR1, and continues with SIGUSR2,
and once it is done with SIGUSR2, then it continues with SIGUSR1.
3) IMHO, what i remember about checking if there are any signal has arrived to the process is:
1) When context switch happens.
Hope this helps to some extend.

Resources