How does a Linux signal lead to the instruction stream of the X86 processor getting interrupted? So what CPU facility is used?
You have synchronous and asynchronous interrupts.
Synchronous interrupts are for issues like page faults, exceptions etc. Problems that are caused by the instructions that are executing on the CPU.
Asynchronous interrupts are from an IPI from the LAPIC, timer interrupt or for an interrupt picked up by the I/O-APIC and routed to the right LAPIC which then interrupts the processor. So these are external events.
But which X86 mechanism does the Signal use to interrupt the instruction stream and start processing the signal handler.
It isn't an asynchronous interrupt AFAIK because interrupts are handled within the kernel and signals in user-space. But its behavior is very similar to that of an asynchronous interrupt.
The kernel has to deliver a signal to user-space. You're right that that doesn't just happen on its own in hardware. That's why signal handling can respect the user-space red-zone, sigaltstack, and default actions if there's no handler registered.
As soon as the kernel has control, it can deliver the signal to user-space (or do the default action of ignoring it or killing the process).
If the signal was sent by a process running on another core, to a process running on this core, then probably it's delivered to user-space from an IPI handler, or else just at the next timer interrupt or system call that gives the kernel a chance to check for a pending signal.
When the IPI interrupt handler is preparing to return to user-space, it notices that there's a pending interrupt for the process that it's about to return to. (Either with a special case for one type of IPI, or by running the scheduler since we're in the kernel anyway). Instead of using iret to return to the interrupt frame pushed by hardware for the async interrupt, the kernel instead can iret to the address of the user-space signal handler.
The whole point of using an IPI (if that's what Linux even does) is to transfer control to the kernel sooner, instead of just waiting for it to notice the pending signal the next time it calls schedule().
If the process the signal is sent to isn't currently running on any core, then it either wakes the process up if there's a free CPU, or the signal just sits there for that task until the scheduler on some core decides to run it on this core. At that point it will notice and deliver the pending signal.
Related
Are interrupts executed on all processors, or only on one?
For instance, when I type, do all processors handle the interrupt? Or only one of them and the rest carry on with other taks?
Here's a high-level view of the low-level processing. I'm describing a simple typical architecture, real architectures can be more complex or differ in ways that don't matter at this level of detail.
When an interrupt occurs, the processor looks if interrupts are masked. If they are, nothing happens until they are unmasked. When interrupts become unmasked, if there are any pending interrupts, the processor picks one.
Then the processor executes the interrupt by branching to a particular address in memory. The code at that address is called the interrupt handler. When the processor branches there, it masks interrupts (so the interrupt handler has exclusive control) and saves the contents of some registers in some place (typically other registers).
The interrupt handler does what it must do, typically by communicating with the peripheral that triggered the interrupt to send or receive data. If the interrupt was raised by the timer, the handler might trigger the OS scheduler, to switch to a different thread. When the handler finishes executing, it executes a special return-from-interrupt instruction that restores the saved registers and unmasks interrupts.
The interrupt handler must run quickly, because it's preventing any other interrupt from running. In the Linux kernel, interrupt processing is divided in two parts:
The “top half” is the interrupt handler. It does the minimum necessary, typically communicate with the hardware and set a flag somewhere in kernel memory.
The “bottom half” does any other necessary processing, for example copying data into process memory, updating kernel data structures, etc. It can take its time and even block waiting for some other part of the system since it runs with interrupts enabled.
I read some related posts:
(1) From Robert Love: http://permalink.gmane.org/gmane.linux.kernel.kernelnewbies/1791
You cannot sleep in an interrupt handler because interrupts do not have a backing
process context, and thus there is nothing to reschedule back into. In other
words, interrupt handlers are not associated with a task, so there is nothing to
"put to sleep" and (more importantly) "nothing to wake up". They must run
atomically.
(2) From Which context are softirq and tasklet in?
If sleep is allowed, then the linux cannot schedule them and finally cause a
kernel panic with a dequeue_task error. The interrupt context does not even
have a data structure describing the register info, so they can never be scheduled
by linux. If it is designed to have that structure and can be scheduled, the
performance for interrupt handling process will be effected.
So in my understanding, interrupt handlers run in interrupt context, and can not sleep, that is to say, can not perform the context switch as normal processes do with backing mechanism.
But a interrupt handler can be interrupted by another interrupt. And when the second interrupt handler finishes its work, control flow would jump back to the first interrupt handler.
How is this "restoring" implemented without normal context switch? Is it like normal function calls with all the registers and other related stuff stored in a certain stack?
The short answer is that an interrupt handler, if it can be interrupted by an interrupt, is interrupted precisely the same way anything else is interrupted by an interrupt.
Say process X is running. If process X is interrupted, then the interrupt handler runs. To the extent there is a context, it's still process X, though it's now running interrupt code in the kernel (think of the state as X->interrupt if you like). If another interrupt occurs, then the interrupt is interrupted, but there is still no special process context. The state is now X->first_interrupt->second_interrupt. When the second interrupt finishes, the first interrupt will resume just as X will resume when the first interrupt finishes. Still, the only process context is process X.
You can describe these as context switches, but they aren't like process context switches. They're more analogous to entering and exiting the kernel -- the process context stays the same but the execution level and unit of code can change.
The interrupt routine will store some CPU state and registers before enter real interrupt handler, and will restore these information before returning to interrupted task. Normally, this kind of storing and restoring is not called context-switch, as the context of interrupted process is not changed.
As of 2020, interrupts (hard IRQ here) in Linux do not nest on a local CPU in general. This is at least mentioned twice by group/maintainer actively contributing to Linux kernel:
From NAPI updates written by Jakub Kicinski in 2020:
…Because normal interrupts don't nest in Linux, the system can't service any new interrupt while it's already processing one.
And from Bootlin in 2022:
…Interrupt handlers are run with all interrupts disabled on the local CPU…
So this question is probably less relevant nowadays, at least for Linux kernel.
Why we can sleep in software interrupt case while it is not allowed in case of hardware interrupt?
e.g. System calls can sleep while ISR cannot sleep.
When you enter in the kernel code through a process (i.e., a syscall) the kernel is said to be in process context. This means that the kernel is executed on behalf of a process. The execution of the kernel is synchronous with the user-level, and therefore it is possible to access user-level. It is also possible to call sleeping functions, because the scheduler is capable of schedule a new process.
When you enter in the kernel from a hardware source (i.e., an interrupt), then the kernel is said to be in interrupt context. The execution of the kernel is asynchronous with respect to the user-level, and you cannot do any ssumption of what is being executed at user-level. For example, some resources may be in some unconsistent state. For this reason, the code cannot block because the scheduler cannot schedule a new process.
This difference is well explained in Rubini's book Linux Device Drivers, 3rd edition which is freely available on the web.
Normally, ISRs run with interrupt disabled, so if sleeped in ISR we have no chance to wake up.
Interrupt handler uses interrupted process's kernel stack. If we switched to other process in ISR, the kernel stack will be changed to other process's.
I am not new to the use of signals in programming. I mostly work in C/C++ and Python.
But I am interested in knowing how signals are actually implemented in Linux (or Windows).
Does the OS check after each CPU instruction in a signal descriptor table if there are any registered signals left to process? Or is the process manager/scheduler responsible for this?
As signal are asynchronous, is it true that a CPU instruction interrupts before it complete?
The OS definitely does not process each and every instruction. No way. Too slow.
When the CPU encounters a problem (like division by 0, access to a restricted resource or a memory location that's not backed up by physical memory), it generates a special kind of interrupt, called an exception (not to be confused with C++/Java/etc high level language exception abstract).
The OS handles these exceptions. If it's so desired and if it's possible, it can reflect an exception back into the process from which it originated. The so-called Structured Exception Handling (SEH) in Windows is this kind of reflection. C signals should be implemented using the same mechanism.
On the systems I'm familiar with (although I can't see why it should be much different elsewhere), signal delivery is done when the process returns from the kernel to user mode.
Let's consider the one cpu case first. There are three sources of signals:
the process sends a signal to itself
another process sends the signal
an interrupt handler (network, disk, usb, etc) causes a signal to be sent
In all those cases the target process is not running in userland, but in kernel mode. Either through a system call, or through a context switch (since the other process couldn't send a signal unless our target process isn't running), or through an interrupt handler. So signal delivery is a simple matter of checking if there are any signals to be delivered just before returning to userland from kernel mode.
In the multi cpu case if the target process is running on another cpu it's just a matter of sending an interrupt to the cpu it's running on. The interrupt does nothing other than force the other cpu to go into kernel mode and back so that signal processing can be done on the way back.
A process can send signal to another process. process can register its own signal handler to handle the signal. SIGKILL and SIGSTOP are two signals which can not be captured.
When process executes signal handler, it blocks the same signal, That means, when signal handler is in execution, if another same signal arrives, it will not invoke the signal handler [ called blocking the signal], but it makes the note that the signal has arrived [ ie: pending signal]. once the already running signal handler is executed, then the pending signal is handled. If you do not want to run the pending signal, then you can IGNORE the signal.
The problem in the above concept is:
Assume the following:
process A has registered signal handler for SIGUSR1.
1) process A gets signal SIGUSR1, and executes signalhandler()
2) process A gets SIGUSR1,
3) process A gets SIGUSR1,
4) process A gets SIGUSR1,
When step (2) occurs, is it made as 'pending signal'. Ie; it needs to be served.
And when the step (3) occors, it is just ignored as, there is only one bit
available to indicate the pending signal for each available signals.
To avoid such problem, ie: if we dont want to loose the signals, then we can use
real time signals.
2) Signals are executed synchronously,
Eg.,
1) process is executing in the middle of signal handler for SIGUSR1,
2) Now, it gets another signal SIGUSR2,
3) It stops the SIGUSR1, and continues with SIGUSR2,
and once it is done with SIGUSR2, then it continues with SIGUSR1.
3) IMHO, what i remember about checking if there are any signal has arrived to the process is:
1) When context switch happens.
Hope this helps to some extend.
I know that linux does nested interrupts where one interrupt can "preempt" another interrupt, but what about with other tasks.
I am just trying to understand how linux handles interrupts. Can they be preempted by some other user task/kernel task.
Reading Why kernel code/thread executing in interrupt context cannot sleep? which links to Robert Loves article, I read this :
some interrupt handlers (known in
Linux as fast interrupt handlers) run
with all interrupts on the local
processor disabled. This is done to
ensure that the interrupt handler runs
without interruption, as quickly as
possible. More so, all interrupt
handlers run with their current
interrupt line disabled on all
processors. This ensures that two
interrupt handlers for the same
interrupt line do not run
concurrently. It also prevents device
driver writers from having to handle
recursive interrupts, which complicate
programming.
So AFIK all IRQ's are disabled while within the interrupt handler, therefore it cannot be interrupted!?
Simple answer: An interrupt can only be interrupted by interrupts of higher priority.
Therefore an interrupt can be interrupted by the kernel or a user task if the interrupt's priority is lower than the kernel scheduler interrupt priority or user task interrupt priority.
Note that by "user task" I mean user-defined interrupt.