Interrupts and context switches - multithreading

I'm reading some papers and source codes about OS implementation and have a questions about that.
In some operating systems, found on github, the context switch is made within the interrupt handling from timer interrupts.
They saved the registers rbx, r12, r13, r14, r15, rbp and rsp and recovered those registers from the saved state of the next thread.
These thread switches were made within the timer interrupt handling routine, before iret was called. My question is: When the mentioned registers where recovered by the interrupt handler, why is the iret called? On switching the thread, the next thread will start immediately - or is he starting after finishing the interrupt handler with the iret call?

When the mentioned registers where recovered by the interrupt handler, why is the iret called?
IRET returns the process to the state that it was in before the exception or interrupt that caused it to enter kernel mode.
The registers switches that you see change the process context but that is the state of the process when it was in a kernel mode handler. The IRET instruction then returns the process state to how it was in user mode.

When the mentioned registers were recovered by the interrupt handler, why is the iret called? On switching the thread, the next thread will start immediately
You say, "on switching the thread," but the iret instruction is what makes the thread switch happen.
or is he starting after finishing the interrupt handler with the iret call?
Don't think of iret as "return from interrupt." Think of it as,"restore execution context" instead. It pops words from the stack into important context registers, always including the program counter, and maybe including registers that define virtual address space and privilege level. The next instruction that the CPU executes after the iret will be an instruction from the newly restored context.
The saved context that iret pops off the stack happens to be the same format as what a hardware interrupt pushes, but that doesn't mean that you can only pop the context that was pushed by the most recent hardware interrupt. You can pop a context that was pushed some time earlier, and then saved in some "thread" data structure. You can even pop an entirely new context that was manufactured from nothing in order to start a new thread.

Related

kernel entry points on ARM

I was reading through the ARM kernel source code in order to better my understanding and came across something interesting.
Inside arch/arm/kernel/entry-armv.S there is a macro named vector_stub, that generates a small chunk of assembly followed by a jump table for various ARM modes. For instance, there is a call to vector_stub irq, IRQ_MODE, 4 which causes the macro to be expanded to a body with label vector_irq; and the same occurs for vector_dabt, vector_pabt, vector_und, and vector_fiq.
Inside each of these vector_* jump tables, there is exactly 1 DWORD with the address of a label with a _usr suffix.
I'd like to confirm that my understanding is accurate, please see below.
Does this mean that labels with the _usr suffix are executed, only if the interrupt arises when the kernel thread executing on that CPU is in userspace context? For instance, irq_usr is executed if the interrupt occurs when the kernel thread is in userspace context, dabt_usr is executed if the interrupt occurs when the kernel thread is in userspace context, and so on.
If [1] is true, then which kernel threads are responsible for handling, say irqs, with a different suffix such as irq_svc. I am assuming that this is the handler for an interrupt request that happens in SVC mode. If so, which kernel thread handles this? The kernel thread currently in SVC mode, on whichever CPU receives the interrupt?
If [2] is true, then at what point does the kernel thread finish processing the second interrupt, and return to where it had left off(also in SVC mode)? Is it ret_from_intr?
Inside each of these vector_* jump tables, there is exactly 1 DWORD with the address of a label with a _usr suffix.
This is correct. The table in indexed by the current mode. For instance, irq only has three entries; irq_usr, irq_svc, and irq_invalid. Irq's should be disabled during data aborts, FIQ and other modes. Linux will always transfer to svc mode after this brief 'vector stub' code. It is accomplished with,
#
# Prepare for SVC32 mode. IRQs remain disabled.
#
mrs r0, cpsr
eor r0, r0, #(\mode ^ SVC_MODE | PSR_ISETSTATE)
msr spsr_cxsf, r0
### ... other unrelated code
movs pc, lr # branch to handler in SVC mode
This is why irq_invalid is used for all other modes. Exceptions should never happen when this vector stub code is executing.
Does this mean that labels with the _usr suffix are executed, only if the interrupt arises when the kernel thread executing on that CPU is in userspace context? For instance, irq_usr is executed if the interrupt occurs when the kernel thread is in userspace context, dabt_usr is executed if the interrupt occurs when the kernel thread is in userspace context, and so on.
Yes, the spsr is the interrupted mode and the table indexes by these mode bits.
If 1 is true, then which kernel threads are responsible for handling, say irqs, with a different suffix such as irq_svc. I am assuming that this is the handler for an interrupt request that happens in SVC mode. If so, which kernel thread handles this? The kernel thread currently in SVC mode, on whichever CPU receives the interrupt?
I think you have some misunderstanding here. There is a 'kernel thread' for user space processes. The irq_usr is responsible for storing the user mode registers as a reschedule might take place. The context is different for irq_svc as a kernel stack was in use and it is the same one the IRQ code will use. What happens when a user task calls read()? It uses a system call and code executes in a kernel context. Each process has both a user and svc/kernel stack (and thread info). A kernel thread is a process without any user space stack.
If 2 is true, then at what point does the kernel thread finish processing the second interrupt, and return to where it had left off(also in SVC mode)? Is it ret_from_intr?
Generally Linux returns to the kernel thread that was interrupted so it can finish it's work. However, there is a configuration option for pre-empting svc threads/contexts. If the interrupt resulted in a reschedule event, then a process/context switch may result if CONFIG_PREEMPT is active. See svc_preempt for this code.
See also:
Linux kernel arm exception stack init
Arm specific irq initialization

What is "process context" exactly, and how does it relates to "interrupt context"?

What does the following phrase mean: "the kernel executes in the process context"?
Does it mean that if CPU is executing some process and then some interrupt occurs (system call, key press, etc.), the CPU will keep the page table for the currently running process loaded and then it will execute the interrupt handler which resides in the process's kernel space?
If this is what it means, then it seems like the interrupt handler is executed in the process context, so what does interrupt context means?
Process context is its current state.
We need to save the context of the current running process so it can be resumed after the interrupt is handled.
Process context is basically its current state (what is in its registers).
esp
ss
eip
cs
and more.
We need to save the instruction pointer (EIP) and the CS (Code Segment) so that after the interrupt is handled we can continue running from where we were stopped.
The interrupt handler code resides in Kernel memory. Once an interrupt occur, we immediately switch from user mode to kernel mode. The state of the current running process is saved, part of it on user-stack and the other part on kernel-stack (depending on architecture). Assuming it's x86 then the interrupt handler is run by loading the appropriate ss, cs, esp and eip from TSS and Interrupt descriptor table.

How is interrupt context "restored" when a interrupt handler is interrupted by another interrupt?

I read some related posts:
(1) From Robert Love: http://permalink.gmane.org/gmane.linux.kernel.kernelnewbies/1791
You cannot sleep in an interrupt handler because interrupts do not have a backing
process context, and thus there is nothing to reschedule back into. In other
words, interrupt handlers are not associated with a task, so there is nothing to
"put to sleep" and (more importantly) "nothing to wake up". They must run
atomically.
(2) From Which context are softirq and tasklet in?
If sleep is allowed, then the linux cannot schedule them and finally cause a
kernel panic with a dequeue_task error. The interrupt context does not even
have a data structure describing the register info, so they can never be scheduled
by linux. If it is designed to have that structure and can be scheduled, the
performance for interrupt handling process will be effected.
So in my understanding, interrupt handlers run in interrupt context, and can not sleep, that is to say, can not perform the context switch as normal processes do with backing mechanism.
But a interrupt handler can be interrupted by another interrupt. And when the second interrupt handler finishes its work, control flow would jump back to the first interrupt handler.
How is this "restoring" implemented without normal context switch? Is it like normal function calls with all the registers and other related stuff stored in a certain stack?
The short answer is that an interrupt handler, if it can be interrupted by an interrupt, is interrupted precisely the same way anything else is interrupted by an interrupt.
Say process X is running. If process X is interrupted, then the interrupt handler runs. To the extent there is a context, it's still process X, though it's now running interrupt code in the kernel (think of the state as X->interrupt if you like). If another interrupt occurs, then the interrupt is interrupted, but there is still no special process context. The state is now X->first_interrupt->second_interrupt. When the second interrupt finishes, the first interrupt will resume just as X will resume when the first interrupt finishes. Still, the only process context is process X.
You can describe these as context switches, but they aren't like process context switches. They're more analogous to entering and exiting the kernel -- the process context stays the same but the execution level and unit of code can change.
The interrupt routine will store some CPU state and registers before enter real interrupt handler, and will restore these information before returning to interrupted task. Normally, this kind of storing and restoring is not called context-switch, as the context of interrupted process is not changed.
As of 2020, interrupts (hard IRQ here) in Linux do not nest on a local CPU in general. This is at least mentioned twice by group/maintainer actively contributing to Linux kernel:
From NAPI updates written by Jakub Kicinski in 2020:
…Because normal interrupts don't nest in Linux, the system can't service any new interrupt while it's already processing one.
And from Bootlin in 2022:
…Interrupt handlers are run with all interrupts disabled on the local CPU…
So this question is probably less relevant nowadays, at least for Linux kernel.

Must IRET be used when returning from an interrupt?

IRET can restore the registers from the stack,including EFLAGS, ESP, EIP and so on, but we can also restore the registers all by ourselves. For example, "movl" can be used to restore the %esp register, "jmp" can jump the address pointed to EIP which is stored on the stack.
The linux kernel returns from all interrupts by IRET, which is a weight instruction.
Some kernel operations (like context switches) happen frequently.
Isn't IRET a waste?
Besides all the heavy stuff IRET can and often should do in addition to a mere blend of POPF+RETF, there's one more thing that it does. It has a special function related to non-maskable interrupts (NMIs).
Concurrent NMIs are delivered to the CPU one by one. IRET signals to the NMI circuitry that another NMI can now be delivered. No other instruction can do this signalling.
If NMIs could preempt execution of other NMI ISRs, they would be able to cause a stack overflow, which rarely is a good thing. Unless we're talking about this wonderful website. :)
So, all in all, IRET is not a waste.
Probably because doing all that manually would need even more CPU clocks.
From wikipedija:
The actual code that is invoked when an interrupt occurs is called the
Interrupt Service Routine (ISR). When an exception occurs, a program
invokes an interrupt, or the hardware raises an interrupt, the
processor uses one of several methods (to be discussed) to transfer
control to the ISR, whilst allowing the ISR to safely return control
to whatever it interrupted after execution is complete. At minimum,
FLAGS and CS:IP are saved and the ISR's CS:IP loaded; however, some
mechanisms cause a full task switch to occur before the ISR begins
(and another task switch when it ends).
So IRET isn't waste, it is minimum (and the fastest way) to return from ISR. Also all other CPU registers used in ISR must be preserved at begining and restored before IRET exsecution!

Interrupt handlers executed in a different thread?

I wanted to know when the processor get's interrupted and an ISR (interrupt service routine) is executed, is that executed in the context of the thread that was interrupted to handle this interrupt or is it executed in its own thread and then goes back to where it left of in the original thread?
So a context switch actually occurs when an interrupt occurs?
A thread isn't created to handle the interrupt (part of why system calls can sometimes fail), though you can have a special thread to handle interrupts (read about "second level interrupt handlers" in the Wikipedia article on interrupt handling; I'm not certain if Windows uses SLIHs). There is a potential context switch since the ISR runs in kernel mode. Even if the current thread is in kernel mode, some context will be saved before calling the interrupt handler.
Still looking for documentation.

Resources