local_bh_disable, preempt_disable, local_irq_disable - multithreading

local_bh_disable disables the processing of bottom halves (softirqs). Softirqs are processed on either, interrupt return path, or by the ksoftirqd-(per cpu)-thread that will be woken up if the system suffers of heavy softirq-load.
preempt_disable disables preemption, which means, that while a thread is executing inside a preempt_disable <-> preemt_enable scope, it will not be put to sleep by the scheduler.
This means that, if the system-timer-interrupt occurs while the current thread is inside that scope, it might update the accouting tables of the scheduler, but it will not switch context to another thread. this includes the softirqd.
local_irq_disable or local_irq_save disable interrupts for the local cpu. this means that the local cpu will not react to any irqs, so it will not run any interrupt return paths and hence, cannot run softirqs there.
Assuming my above statements are true (which i am not sure about), then wouldnt it be redundant to call local_bh_disable AFTER you called preempt_disable and local_irq_save (while being in process context)?

Yes. Once local_irq_save / disable has been called, no further protection is needed -- you won't be interrupted (except by an NMI or an exception in your code).
Often, however, you'll find bits of code that are designed to be callable from different contexts, so they may provide protection for some sub-operation that ends up being redundant in some paths.

preempt_disable/enable scope ensures that calling schedule inside that scope does nothing (i.e. preemption is disabled). However, a softirq or a irq can interrupt you.
Disabling irq will only disable hard interrupts, as disabling bh(softirqs) will only disable software interrupts, but you need to explicitly specify which one you want to disable.
There are 4 levels: NMI, IRQ, softirq, process. NMI(non maskable interrupts) can interrupt IRQ, softirq, process; IRQ can interrupt a softirq and a process; softirqs can interrupt a process.
Calling local_bh_disable() after local_irq_save() may be redundant (not sure), but calling local_bh_disable() after preempt_disable() is definitely needed if you want to disable BH.

Related

In which cases different interrupt handlers may be interrupted / preempted?

There are three type of interruptions:
External
Internal (software interrupts)
Syscall (based on internal)
I had got the question "can the interruptions be interrupted or preempted by scheduler (which is also interrupt by timer)?"
After some research i am totally confused:
Someone says there are priorities of the interruptions and only interruption with higher priority can interrupt another one. Does it pertain only to external interruptions? / How is it arranged in real OS, say x86-64 Linux? Is it used? / Okay, if some interruption has been interrupted, is it gonna be resumed? Someone says interruptions don't have a "context", but as i know preempting some process / thread occurs with interruptions from a timer, so it's possible to switch context back to the interruption that has occurred to preempt the process. Correct me please if i'm wrong in this.
Someone says there is some flag for interrupt handlers "INTERRUPTIBLE", and let's say if some syscall handler is executing with this flag set it may be interrupted by a signal. Is it related only to software interruptions? Don't external interruptions have that flag? / What if this flag is not set but the timer (for example) interruption occurs to preempt the thread? Is it ignored?
Someone says it only depends on what an interruption handler is doing. For example if there is read() syscall that is waiting for input, it sets particular flags so that scheduler (timer interruption) can interrupt and preempt it. But if the handler is doing something crucial it can forbid to interrupt or preempt it, so it's gonna possess a cpu by itself until it finished. It seems there are a lot of mechanisms in x86, and i don't absolutely get which one is used and how it works in real life.
(Answering for Linux on x86, since those are your tags).
There are only two types of events that could interrupt your code. Interrupts, and exceptions. Syscalls are considered exceptions.
I had got the question "can the interruptions be interrupted or preempted by scheduler (which is also interrupt by timer)?"
The scheduler doesn't interrupt anything. It's just code, it doesn't run on it's own. Only an interrupt or exception can interrupt your code. You can have a timer interrupt occur that then calls the schedule().
If you're in an interrupt, interrupts are likely disabled. So no timer interrupt, thus no schedule(). If you're in an interrupt with interrupts enabled (allowing for nested interrupts), a timer interrupt could try to invoke the scheduler, but it wouldn't run because it would detect that preemption is disabled. And preemption is always disabled when you're in an interrupt via the preempt_count field. The goal here is to prevent preemption in an interrupt context.
You have lots of other questions, but most of them can be answered by reading the available literature on the subject.

How is interrupt context "restored" when a interrupt handler is interrupted by another interrupt?

I read some related posts:
(1) From Robert Love: http://permalink.gmane.org/gmane.linux.kernel.kernelnewbies/1791
You cannot sleep in an interrupt handler because interrupts do not have a backing
process context, and thus there is nothing to reschedule back into. In other
words, interrupt handlers are not associated with a task, so there is nothing to
"put to sleep" and (more importantly) "nothing to wake up". They must run
atomically.
(2) From Which context are softirq and tasklet in?
If sleep is allowed, then the linux cannot schedule them and finally cause a
kernel panic with a dequeue_task error. The interrupt context does not even
have a data structure describing the register info, so they can never be scheduled
by linux. If it is designed to have that structure and can be scheduled, the
performance for interrupt handling process will be effected.
So in my understanding, interrupt handlers run in interrupt context, and can not sleep, that is to say, can not perform the context switch as normal processes do with backing mechanism.
But a interrupt handler can be interrupted by another interrupt. And when the second interrupt handler finishes its work, control flow would jump back to the first interrupt handler.
How is this "restoring" implemented without normal context switch? Is it like normal function calls with all the registers and other related stuff stored in a certain stack?
The short answer is that an interrupt handler, if it can be interrupted by an interrupt, is interrupted precisely the same way anything else is interrupted by an interrupt.
Say process X is running. If process X is interrupted, then the interrupt handler runs. To the extent there is a context, it's still process X, though it's now running interrupt code in the kernel (think of the state as X->interrupt if you like). If another interrupt occurs, then the interrupt is interrupted, but there is still no special process context. The state is now X->first_interrupt->second_interrupt. When the second interrupt finishes, the first interrupt will resume just as X will resume when the first interrupt finishes. Still, the only process context is process X.
You can describe these as context switches, but they aren't like process context switches. They're more analogous to entering and exiting the kernel -- the process context stays the same but the execution level and unit of code can change.
The interrupt routine will store some CPU state and registers before enter real interrupt handler, and will restore these information before returning to interrupted task. Normally, this kind of storing and restoring is not called context-switch, as the context of interrupted process is not changed.
As of 2020, interrupts (hard IRQ here) in Linux do not nest on a local CPU in general. This is at least mentioned twice by group/maintainer actively contributing to Linux kernel:
From NAPI updates written by Jakub Kicinski in 2020:
…Because normal interrupts don't nest in Linux, the system can't service any new interrupt while it's already processing one.
And from Bootlin in 2022:
…Interrupt handlers are run with all interrupts disabled on the local CPU…
So this question is probably less relevant nowadays, at least for Linux kernel.

Why disabling interrupts disables kernel preemption and how spin lock disables preemption

I am reading Linux Kernel Development recently, and I have a few questions related to disabling preemption.
In the "Interrupt Control" section of chapter 7, it says:
Moreover, disabling interrupts also disables kernel preemption.
I also read from the book that kernel preemption can occur in the follow cases:
When an interrupt handler exits, before returning to kernel-space.
When kernel code becomes preemptible again.
If a task in the kernel explicitly calls schedule()
If a task in ther kernel blocks (which results in a call to schedule())
But I can't relate disabling interrupts with these cases.
As far as I know, a spinlock would disable preemption with the preempt_disable() function.
The post What exactly are "spin-locks"?
says:
On a single core machine a spinlock is simply a "disable interrupts" or "raise IRQL" which prevents thread scheduling completely.
Does preempt_disable() disable preemption by disabling interrupts?
I am not a scheduler guru, but I would like to explain how I see it.
Here are several things.
preempt_disable() doesn't disable IRQ. It just increases a thread_info->preempt_count variable.
Disabling interrupts also disables preemption because scheduler isn't working after that - but only on a single-CPU machine. On the SMP it isn't enough because when you close the interrupts on one CPU the other / others still does / do something asynchronously.
The Big Lock (means - closing all interrupts on all CPUs) is slowing the system down dramatically - so it is why it not anymore in use. This is also the reason why preempt_disable() doesn't close the IRQ.
You can see what is preempt_disable(). Try this:
1. Get a spinlock.
2. Call schedule()
In the dmesg you will see something like "BUG: scheduling while atomic". This happens when scheduler detects that your process in atomic (not preemptive) context but it schedules itself.
Good luck.
In a test kernel module I wrote to monitor/profile a task, I've tried disabling interrupts by:
1 - Using local_irq_save()
2 - Using spin_lock_irqsave()
3 - Manually disable_irq() to all IRQs in /proc/interrupts
In all 3 cases I could still use the hrtimer to measure time even though IRQs were disabled (and a task I was monitoring got preempted as well).
I find this veeeeerrrryyyy strange... I personally was anticipating what Sebastian Mountaniol pointed out -> No interrupts - no clock. No clock - no timers...
Linux kernel 2.6.32 on a single core, single CPU... Can anyone have a better explanation ?
preempt_disable() doesn't disable the interrupts. It however increments the count of preempt counter. Let's say you call preempt_disable() n times in your code path, preemption will only enable at the nth preempt_enable().
disabling interrupts to prevent preemption : not a safe way. This will undoubtedly disable normal kernel preemption because scheduler_tick() won't be called on system tick (no interrupt handler invoked). However, if the program triggers the schedule function, preemption will occur if preempt_disable() was not invoked.
In linux, raw_spin_lock() doesn't disable local interrupts which may lead to deadlock. For instance, if an interrupt handler is invoked which tries to lock already held spin lock, it won't be able to unless the process itself releases it which is not possible as interrupt return wouldn't occur.
So, it's better to use raw_spin_lock_irq(), which disables interrupts.
Interrupt disabling disables some forms of kernel preemption, but there are other ways kernel preemption can happen. For this reason, disabling interrupts is not considered a safe way to prevent kernel preemption.
For instance, with interrupts disabled, cond_resched() would still cause preemption, but wouldn't if preemption was explicitly disabled.
This is why, in regards to your second question, spin locks don't use interrupt disabling to disable preemption. They explicitly call preempt_disable() which increments preempt_count, and disables all ways that preemption can happen except for explicit calls to schedule().

How does a kernel return from the thread

I am doing some study hardcore study on computers etc. so I can get started on my own mini Hello World OS.
I was looking a how kernels work and I was wondering how the kernel makes the current thread return to the kernel (so it can switch to another) even though the kernel isn't running and the thread has no instruction to do so.
Does it use some kind of CPU interrupt that goes back to the kernel after a few nanoseconds?
Does it use some kind of CPU interrupt that goes back to the kernel after a few nanoseconds?
It is during timer interrupts and (blocking) system calls that the kernel decides whether to keep executing the currently active thread(s) or switch to another thread. The timer interupt handler updates resource usages, such as consumed system and user time, for the currently running process and scheduler_tick() function that decides whether a process/tread need to be pre-empted.
See "Preemption and Context Switching" on page 62 of Linux Kernel Development book.
The kernel, however, must know when to call schedule(). If it called schedule() only
when code explicitly did so, user-space programs could run indefinitely. Instead, the kernel
provides the need_resched flag to signify whether a reschedule should be performed (see
Table 4.1).This flag is set by scheduler_tick() when a process should be preempted, and
by try_to_wake_up() when a process that has a higher priority than the currently run-
ning process is awakened.The kernel checks the flag, sees that it is set, and calls schedule() to switch to a new process.The flag is a message to the kernel that the scheduler should be invoked as soon as possible because another process deserves to run.
Does it use some kind of CPU interrupt
Yes! Modern preemptive kernels are absolutely dependent upon interrupts from hardware to deliver good I/O performance. Keyboard, mouse, disk, NIC, USB, etc. drivers are all entered from interrupts and can make threads that are waiting on them ready/running when required (e.g., when data is available).
Threads can also change state as a result of making an OS call that changes the caller's own state of that of another thread.
The interrupt from the hardware timer is one of many interrupt sources and is only special in that many system operations have timeouts that are signaled by this interrupt. Other than that, the timer interrupt just causes a reschedule which, in most cases, changes nothing re. the ready/running state of threads. If the machine is grossly CPU-overloaded to the point where there are more ready threads than there are cores, there is a side-effect of the timer interrupt that causes CPU time to be shared amongst the ready threads.
Do not fixate on the timer interrupt—the other driver interrupts are absolutely essential. It is not impossible to build a functional preemptive multithreaded kernel with no timer interrupt at all.

Context switch in Interrupt handlers

Why can't a context switch happen when an interrupt handler is executing ? More specifically, in the linux kernel, interrupt handlers run in the context of the process that was interrupted. Why is it not possible to do a context switch in the interrupt handler to schedule another process ?
On a multiprocessor, a context switch can certainly happen while an interrupt handler is executing. In fact, it would be difficult to prevent.
On a single-CPU machine, by definition it can only be running one thread of control at a time. It only has one register set, one ALU, etc. So if the interrupt handler is running there simply are no resources with which to execute a context switch.
Now, if you mean, can the interrupt handler actually call the context switch code and make one happen, well, I suppose on some systems that could be made to work. But for most, this wouldn't have much value and would be difficult to arrange. The CPU is running at elevated priority, and this priority cannot be lowered or synchronization between interrupt levels is lost. Critical sections in the OS are already synchronizing against interrupt execution and this would introduce complexities. Furthermore, a context switch happens by changing stacks, much like in a threaded user mode program, so it's hard to imagine how this might happen when the interrupt stack is needed for a return from the interrupt.
A couple of reasons, I guess, depending on the meaning of your question:
Q: Why would context switching during an interrupt be bad?
A: Interrupts are generally for interacting with hardware. Hardware is typically time-sensitive so the OS can't just stop dealing with it in the middle of something and come back when it feels like it.
Q: What stops a context switch from happening during an interrupt?
A: An interrupt happens in a special interrupt context, not a regular process context. Since it's not in a process, it's not subject to context switching as a normal process would be.
There's probably a better, deeper explanation to be made, but that's the extent of my own understanding of the matter.

Resources