Linux Interrupts Concurency - linux

Are interrupts executed on all processors, or only on one?
For instance, when I type, do all processors handle the interrupt? Or only one of them and the rest carry on with other taks?

Here's a high-level view of the low-level processing. I'm describing a simple typical architecture, real architectures can be more complex or differ in ways that don't matter at this level of detail.
When an interrupt occurs, the processor looks if interrupts are masked. If they are, nothing happens until they are unmasked. When interrupts become unmasked, if there are any pending interrupts, the processor picks one.
Then the processor executes the interrupt by branching to a particular address in memory. The code at that address is called the interrupt handler. When the processor branches there, it masks interrupts (so the interrupt handler has exclusive control) and saves the contents of some registers in some place (typically other registers).
The interrupt handler does what it must do, typically by communicating with the peripheral that triggered the interrupt to send or receive data. If the interrupt was raised by the timer, the handler might trigger the OS scheduler, to switch to a different thread. When the handler finishes executing, it executes a special return-from-interrupt instruction that restores the saved registers and unmasks interrupts.
The interrupt handler must run quickly, because it's preventing any other interrupt from running. In the Linux kernel, interrupt processing is divided in two parts:
The “top half” is the interrupt handler. It does the minimum necessary, typically communicate with the hardware and set a flag somewhere in kernel memory.
The “bottom half” does any other necessary processing, for example copying data into process memory, updating kernel data structures, etc. It can take its time and even block waiting for some other part of the system since it runs with interrupts enabled.

Related

Sending user-mode interrupts on x86

On Linux x86, can I send interrupts (e.g., triggered by a timer, or other other mechanism), which will be handled by code running in user mode?
Assuming the answer is yes (and it is almost certainly yes, see e.g., timer_create), does delivering this interrupt occur solely in user mode, or is there some kernel transition involved (e.g., the interrupt is initially handled by the kernel, which then sends the signal to the user process).
All kernel timer interfaces work by delivering signals to user-space processes, after handling the timer interrupt inside the kernel (or otherwise noticing that or waiting until the deadline has been reached).
There are many big obstacles to having an interrupt handler run in ring 3, or from a user-space virtual address that's only mapped by one specific process. (Even if you pin that memory so it can't be paged, it is still only mapped at all when CR3 is set to that process's page tables. x86 uses virtual addresses in the IDT (interrupt descriptor table) and the page must be mapped when the interrupt fires (or else you get a page fault, I guess, which you really don't want to happen totally asynchronously). This is not a problem for normal kernel interrupt handlers; it always keeps kernel code mapped to the same virtual address across all user-space page tables. )
A kernel API that allowed registering a user-space function pointer as a ring 0 interrupt handler would be handing the keys to the kingdom to that userspace process, literally running with kernel privileges, so that's pretty much unreasonable.
It is technically possible for x86 to have an interrupt handler that runs in ring 3, but if the interrupt fired while in ring 0, iret would fault instead of returning back to the kernel code that got interrupted.
An interrupt handler would have to be written specially to return with iret, and to preserve all registers. e.g. __attribute__((interrupt_handler)) https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html. And any other process on the same core would be at the mercy of this process; any bugs in this (like destroying some architectural state, or dirtying SSE/AVX regs) could affect other processes. (If you can figure out how to get code in one process to run while CR3 might be set for another process...)
Avoiding deadlocks would also be a big issue; in the kernel there are a lot of limits on what you can do in an interrupt handler proper (the "top half") because it can run asynchronously in between any other instruction (unless you disable interrupts on that core).
I don't think it's really plausible for Linux to let you do this; even if you somehow solve all the (very hard) problems and even get the handler to run in ring 3, the kernel still has to trust it not to step on the architectural state of any other process.
There is precedent for things like X servers getting privileges to run in/out instructions (via iopl) and/or access /dev/mem (which would in theory let it steal info from other processes). But this would be even worse, and give you easy access to snapshots of register state from other processes.

Linux Process Context and SVC call in ARM

As per some Linux books
kernel code that services system calls issued by user applications
runs on behalf of the corresponding application process and is said to
be executing in process context. Interrupt Handlers run in interrupt
context.
Now svc and irq are two exceptions.
So when linux is handling svc it is in process context and while it is handling irq it is in interrupt context. Is that how it is mapped ?
Just one edit to this
It is also mentioned in books that tasklets / softirqs run in interrupt context while workqueues run in Process context. So does it mean that tasklet would run in CPSR.mode = IRQ ?
If I understand your confusion in the right way:
Since Linux is a capable, preemptive, complex operating system it has much finer handling of concepts such as handling of interrupts or serving software traps compared to bare metal hardware.
For example when a supervisor call (svc) happens hardware switches to SVC mode then Linux handles this as simple as preparing some data structures for handling it further then quits from SVC mode so core can continue serving in user mode thus making it possible to run into many more exception modes instead of blocking them.
It is same for IRQ mode, Linux handles bare minimum in IRQ mode. It prepares data structures as which IRQ happened, which handler should be invoked etc then exits from IRQ mode immediately to allow more to happen on that core. Later on some other internal kernel thread may process that interrupt further. Since hardware while being relatively simple runs really fast thus handling of interrupt runs in parallel with many processes.
Downside of this advanced approach is it gives no guarantees on response time requirements or overhead of it becomes visible on slower hardware like MCUs.
So ARM's exception modes provides two things for Linux: message type and priority backed with hardware support.
Message type is what exception mode is about, if it was a SVC, IRQ, FIQ, DATA ABORT, UNDEFINED INSTRUCTION, etc. So when hardware runs into an exception mode, Linux implicitly knows what it is handling.
Priority is about providing a capable and responsive hardware, for example system should be able to acknowledge an interrupt while handling some less important supervisor call.
Hardware support is for handling above two easier and faster. For example some registers are banked, or there is an extra system mode to handle reentrant IRQ easier.

How does a kernel return from the thread

I am doing some study hardcore study on computers etc. so I can get started on my own mini Hello World OS.
I was looking a how kernels work and I was wondering how the kernel makes the current thread return to the kernel (so it can switch to another) even though the kernel isn't running and the thread has no instruction to do so.
Does it use some kind of CPU interrupt that goes back to the kernel after a few nanoseconds?
Does it use some kind of CPU interrupt that goes back to the kernel after a few nanoseconds?
It is during timer interrupts and (blocking) system calls that the kernel decides whether to keep executing the currently active thread(s) or switch to another thread. The timer interupt handler updates resource usages, such as consumed system and user time, for the currently running process and scheduler_tick() function that decides whether a process/tread need to be pre-empted.
See "Preemption and Context Switching" on page 62 of Linux Kernel Development book.
The kernel, however, must know when to call schedule(). If it called schedule() only
when code explicitly did so, user-space programs could run indefinitely. Instead, the kernel
provides the need_resched flag to signify whether a reschedule should be performed (see
Table 4.1).This flag is set by scheduler_tick() when a process should be preempted, and
by try_to_wake_up() when a process that has a higher priority than the currently run-
ning process is awakened.The kernel checks the flag, sees that it is set, and calls schedule() to switch to a new process.The flag is a message to the kernel that the scheduler should be invoked as soon as possible because another process deserves to run.
Does it use some kind of CPU interrupt
Yes! Modern preemptive kernels are absolutely dependent upon interrupts from hardware to deliver good I/O performance. Keyboard, mouse, disk, NIC, USB, etc. drivers are all entered from interrupts and can make threads that are waiting on them ready/running when required (e.g., when data is available).
Threads can also change state as a result of making an OS call that changes the caller's own state of that of another thread.
The interrupt from the hardware timer is one of many interrupt sources and is only special in that many system operations have timeouts that are signaled by this interrupt. Other than that, the timer interrupt just causes a reschedule which, in most cases, changes nothing re. the ready/running state of threads. If the machine is grossly CPU-overloaded to the point where there are more ready threads than there are cores, there is a side-effect of the timer interrupt that causes CPU time to be shared amongst the ready threads.
Do not fixate on the timer interrupt—the other driver interrupts are absolutely essential. It is not impossible to build a functional preemptive multithreaded kernel with no timer interrupt at all.

Internals of a Linux system call

What happens (in detail) when a thread makes a system call by raising interrupt 80? What work does Linux do to the thread's stack and other state? What changes are done to the processor to put it into kernel mode? After running the interrupt handler, how is control restored back to the calling process?
What if the system call can't be completed quickly: e.g. a read from disk. How does the interrupt handler relinquish control so that the processor can do other stuff while data is being loaded and how does it then obtain control again?
A crash course in kernel mode in one stack overflow answer
Good questions! (Interview questions?)
What happens (in detail) when a
thread makes a system call by raising
interrupt 80?
The int $80 operation is vaguely like a function call. The CPU "takes a trap" and restarts at a known address in kernel mode, typically with a different MMU mode as well. The kernel will save many of the registers, though it doesn't have to save the registers that a program would not expect an ordinary function call to save.
What work does Linux do to the
thread's stack and other state?
Typically an OS will save registers that the ABI promises not to change during procedure calls. The stack will stay the same; the kernel will run on a per-thread kernel stack rather than the per-thread user stack. Naturally some state will change, otherwise there would be no reason to do the system call.
What changes are done to the
processor to put it into kernel mode?
This is usually entirely automatic. The CPU has, generically, a software-interrupt instruction that is a bit like a functional-call operation. It will cause the switch to kernel mode under controlled conditions. Typically, the CPU will change some sort of PSW protection bit, save the old PSW and PC, start at a well-known trap vector address, and may also switch to a different memory management protection and mapping arrangement.
After running the interrupt handler,
how is control restored back to the
calling process?
There will be some sort of "return from interrupt" or "return from trap" instruction, typically, that will act a bit like a complicated function-return instruction. Some RISC processors did very little automatically and required specific code to do the return and some CISC processors like x86 have (never-really-used) instructions that would execute dozens of operations documented in pages of architecture-manual pseudo-code for capability adjustments.
What if the system call can't be
completed quickly: e.g. a read from
disk. How does the interrupt handler
relinquish control so that the
processor can do other stuff while
data is being loaded and how does it
then obtain control again?
The kernel itself is threaded much like a threaded user program is. It just switches stacks (threads) and works on someone else's process for a while.
To answer the last part of the question - what does the kernel do if the system call needs to sleep -
After a system call, the kernel is still logically running in the context of the same task that made the system call - it's just in kernel mode rather than user mode - it is NOT a separate thread and most system calls do not invoke logic from another task/thread. What happens is that the system call calls wait_event, or wait_event_timeout or some other wait function, which adds the task to a list of tasks waiting for something, then puts the task to sleep, which changes its state, and calls schedule() to relinquish the current CPU.
After this the task cannot be run again until it gets woken up, typically by another task (kernel task, etc) or interrupt handler calling a wake* function which will wake up the task(s) sleeping waiting for that particular event, which means the scheduler will soon schedule them again.
It's worth noting that userspace tasks (i.e. threads) are only one type of task and there are a few others internal to the kernel which can do work as well - these are kernel threads and bottom half handlers / tasklets / task queues etc. Work which doesn't belong to any particular userspace process (for example network handling e.g. responding to pings) gets done in these. These tasks are allowed to go to sleep, unlike interrupts (which should not invoke the scheduler)
http://tldp.org/LDP/khg/HyperNews/get/syscall/syscall86.html
This should help people who seek for answers to what happens when the syscall instruction is executed which transfers the control to the kernel (user mode to kernel mode). This is based upon x86_64 architecture.
https://0xax.gitbooks.io/linux-insides/content/SysCall/syscall-2.html

what is a reentrant kernel

What is a reentrant kernel?
Much simpler answer:
Kernel Re-Entrance
If the kernel is not re-entrant, a process can only be suspended while it is in user mode. Although it could be suspended in kernel mode, that would still block kernel mode execution on all other processes. The reason for this is that all kernel threads share the same memory. If execution would jump between them arbitrarily, corruption might occur.
A re-entrant kernel enables processes (or, to be more precise, their corresponding kernel threads) to give away the CPU while in kernel mode. They do not hinder other processes from also entering kernel mode. A typical use case is IO wait. The process wants to read a file. It calls a kernel function for this. Inside the kernel function, the disk controller is asked for the data. Getting the data will take some time and the function is blocked during that time.
With a re-entrant kernel, the scheduler will assign the CPU to another process (kernel thread) until an interrupt from the disk controller indicates that the data is available and our thread can be resumed. This process can still access IO (which needs kernel functions), like user input. The system stays responsive and CPU time waste due to IO wait is reduced.
This is pretty much standard for today's desktop operating systems.
Kernel pre-emption
Kernel pre-emption does not help in the overall throughput of the system. Instead, it seeks for better responsiveness.
The idea here is that normally kernel functions are only interrupted by hardware causes: Either external interrupts, or IO wait cases, where it voluntarily gives away control to the scheduler. A pre-emptive kernel instead also interrupts and suspends kernel functions just like it would interrupt processes in user mode. The system is more responsive, as processes e.g. handling mouse input, are woken up even while heavy work is done inside the kernel.
Pre-emption on kernel level makes things harder for the kernel developer: The kernel function cannot be suspended only voluntarily or by interrupt handlers (which are somewhat a controlled environment), but also by any other process due to the scheduler. Care has to be taken to e.g. avoid deadlocks: A thread locks resource A but needing resource B is interrupted by another thread which locks resource B, but then needs resource A.
Take my explanation of pre-emption with a grain of salt. I'm happy for any corrections.
All Unix kernels are reentrant. This means that several processes may be executing in Kernel Mode at the same time. Of course, on uniprocessor systems, only one process can progress, but many can be blocked in Kernel Mode when waiting for the CPU or the completion of some I/O operation. For instance, after issuing a read to a disk on behalf of a process, the kernel lets the disk controller handle it and resumes executing other processes. An interrupt notifies the kernel when the device has satisfied the read, so the former process can resume the execution.
One way to provide reentrancy is to write functions so that they modify only local variables and do not alter global data structures. Such functions are called reentrant functions . But a reentrant kernel is not limited only to such reentrant functions (although that is how some real-time kernels are implemented). Instead, the kernel can include nonreentrant functions and use locking mechanisms to ensure that only one process can execute a nonreentrant function at a time.
If a hardware interrupt occurs, a reentrant kernel is able to suspend the current running process even if that process is in Kernel Mode. This capability is very important, because it improves the throughput of the device controllers that issue interrupts. Once a device has issued an interrupt, it waits until the CPU acknowledges it. If the kernel is able to answer quickly, the device controller will be able to perform other tasks while the CPU handles the interrupt.
Now let's look at kernel reentrancy and its impact on the organization of the kernel. A kernel control path denotes the sequence of instructions executed by the kernel to handle a system call, an exception, or an interrupt.
In the simplest case, the CPU executes a kernel control path sequentially from the first instruction to the last. When one of the following events occurs, however, the CPU interleaves the kernel control paths :
A process executing in User Mode invokes a system call, and the corresponding kernel control path verifies that the request cannot be satisfied immediately; it then invokes the scheduler to select a new process to run. As a result, a process switch occurs. The first kernel control path is left unfinished, and the CPU resumes the execution of some other kernel control path. In this case, the two control paths are executed on behalf of two different processes.
The CPU detects an exception-for example, access to a page not present in RAM-while running a kernel control path. The first control path is suspended, and the CPU starts the execution of a suitable procedure. In our example, this type of procedure can allocate a new page for the process and read its contents from disk. When the procedure terminates, the first control path can be resumed. In this case, the two control paths are executed on behalf of the same process.
A hardware interrupt occurs while the CPU is running a kernel control path with the interrupts enabled. The first kernel control path is left unfinished, and the CPU starts processing another kernel control path to handle the interrupt. The first kernel control path resumes when the interrupt handler terminates. In this case, the two kernel control paths run in the execution context of the same process, and the total system CPU time is accounted to it. However, the interrupt handler doesn't necessarily operate on behalf of the process.
An interrupt occurs while the CPU is running with kernel preemption enabled, and a higher priority process is runnable. In this case, the first kernel control path is left unfinished, and the CPU resumes executing another kernel control path on behalf of the higher priority process. This occurs only if the kernel has been compiled with kernel preemption support.
These information available on http://jno.glas.net/data/prog_books/lin_kern_2.6/0596005652/understandlk-CHP-1-SECT-6.html
More On http://linux.omnipotent.net/article.php?article_id=12496&page=-1
The kernel is the core part of an operating system that interfaces directly with the hardware and schedules processes to run.
Processes call kernel functions to perform tasks such as accessing hardware or starting new processes. For certain periods of time, therefore, a process will be executing kernel code. A kernel is called reentrant if more than one process can be executing kernel code at the same time. "At the same time" can mean either that two processes are actually executing kernel code concurrently (on a multiprocessor system) or that one process has been interrupted while it is executing kernel code (because it is waiting for hardware to respond, for instance) and that another process that has been scheduled to run has also called into the kernel.
A reentrant kernel provides better performance because there is no contention for the kernel. A kernel that is not reentrant needs to use a lock to make sure that no two processes are executing kernel code at the same time.
A reentrant function is one that can be used by more than one task concurrently without fear of data corruption. Conversely, a non-reentrant function is one that cannot be shared by more than one task unless mutual exclusion to the function is ensured either by using a semaphore or by disabling interrupts during critical sections of code. A reentrant function can be interrupted at any time and resumed at a later time without loss of data. Reentrant functions either use local variables or protect their data when global variables are used.
A reentrant function:
Does not hold static data over successive calls
Does not return a pointer to static data; all data is provided by the caller of the function
Uses local data or ensures protection of global data by making a local copy of it
Must not call any non-reentrant functions

Resources