Why disabling interrupts disables kernel preemption and how spin lock disables preemption

Why disabling interrupts disables kernel preemption and how spin lock disables preemption - linux

I am reading Linux Kernel Development recently, and I have a few questions related to disabling preemption.
In the "Interrupt Control" section of chapter 7, it says:
Moreover, disabling interrupts also disables kernel preemption.
I also read from the book that kernel preemption can occur in the follow cases:
When an interrupt handler exits, before returning to kernel-space.
When kernel code becomes preemptible again.
If a task in the kernel explicitly calls schedule()
If a task in ther kernel blocks (which results in a call to schedule())
But I can't relate disabling interrupts with these cases.
As far as I know, a spinlock would disable preemption with the preempt_disable() function.
The post What exactly are "spin-locks"?
says:
On a single core machine a spinlock is simply a "disable interrupts" or "raise IRQL" which prevents thread scheduling completely.
Does preempt_disable() disable preemption by disabling interrupts?

I am not a scheduler guru, but I would like to explain how I see it.
Here are several things.
preempt_disable() doesn't disable IRQ. It just increases a thread_info->preempt_count variable.
Disabling interrupts also disables preemption because scheduler isn't working after that - but only on a single-CPU machine. On the SMP it isn't enough because when you close the interrupts on one CPU the other / others still does / do something asynchronously.
The Big Lock (means - closing all interrupts on all CPUs) is slowing the system down dramatically - so it is why it not anymore in use. This is also the reason why preempt_disable() doesn't close the IRQ.
You can see what is preempt_disable(). Try this:
1. Get a spinlock.
2. Call schedule()
In the dmesg you will see something like "BUG: scheduling while atomic". This happens when scheduler detects that your process in atomic (not preemptive) context but it schedules itself.
Good luck.

In a test kernel module I wrote to monitor/profile a task, I've tried disabling interrupts by:
1 - Using local_irq_save()
2 - Using spin_lock_irqsave()
3 - Manually disable_irq() to all IRQs in /proc/interrupts
In all 3 cases I could still use the hrtimer to measure time even though IRQs were disabled (and a task I was monitoring got preempted as well).
I find this veeeeerrrryyyy strange... I personally was anticipating what Sebastian Mountaniol pointed out -> No interrupts - no clock. No clock - no timers...
Linux kernel 2.6.32 on a single core, single CPU... Can anyone have a better explanation ?

preempt_disable() doesn't disable the interrupts. It however increments the count of preempt counter. Let's say you call preempt_disable() n times in your code path, preemption will only enable at the nth preempt_enable().
disabling interrupts to prevent preemption : not a safe way. This will undoubtedly disable normal kernel preemption because scheduler_tick() won't be called on system tick (no interrupt handler invoked). However, if the program triggers the schedule function, preemption will occur if preempt_disable() was not invoked.
In linux, raw_spin_lock() doesn't disable local interrupts which may lead to deadlock. For instance, if an interrupt handler is invoked which tries to lock already held spin lock, it won't be able to unless the process itself releases it which is not possible as interrupt return wouldn't occur.
So, it's better to use raw_spin_lock_irq(), which disables interrupts.

Interrupt disabling disables some forms of kernel preemption, but there are other ways kernel preemption can happen. For this reason, disabling interrupts is not considered a safe way to prevent kernel preemption.
For instance, with interrupts disabled, cond_resched() would still cause preemption, but wouldn't if preemption was explicitly disabled.
This is why, in regards to your second question, spin locks don't use interrupt disabling to disable preemption. They explicitly call preempt_disable() which increments preempt_count, and disables all ways that preemption can happen except for explicit calls to schedule().

Related

In which cases different interrupt handlers may be interrupted / preempted?

There are three type of interruptions:
External
Internal (software interrupts)
Syscall (based on internal)
I had got the question "can the interruptions be interrupted or preempted by scheduler (which is also interrupt by timer)?"
After some research i am totally confused:
Someone says there are priorities of the interruptions and only interruption with higher priority can interrupt another one. Does it pertain only to external interruptions? / How is it arranged in real OS, say x86-64 Linux? Is it used? / Okay, if some interruption has been interrupted, is it gonna be resumed? Someone says interruptions don't have a "context", but as i know preempting some process / thread occurs with interruptions from a timer, so it's possible to switch context back to the interruption that has occurred to preempt the process. Correct me please if i'm wrong in this.
Someone says there is some flag for interrupt handlers "INTERRUPTIBLE", and let's say if some syscall handler is executing with this flag set it may be interrupted by a signal. Is it related only to software interruptions? Don't external interruptions have that flag? / What if this flag is not set but the timer (for example) interruption occurs to preempt the thread? Is it ignored?
Someone says it only depends on what an interruption handler is doing. For example if there is read() syscall that is waiting for input, it sets particular flags so that scheduler (timer interruption) can interrupt and preempt it. But if the handler is doing something crucial it can forbid to interrupt or preempt it, so it's gonna possess a cpu by itself until it finished. It seems there are a lot of mechanisms in x86, and i don't absolutely get which one is used and how it works in real life.

(Answering for Linux on x86, since those are your tags).
There are only two types of events that could interrupt your code. Interrupts, and exceptions. Syscalls are considered exceptions.
I had got the question "can the interruptions be interrupted or preempted by scheduler (which is also interrupt by timer)?"
The scheduler doesn't interrupt anything. It's just code, it doesn't run on it's own. Only an interrupt or exception can interrupt your code. You can have a timer interrupt occur that then calls the schedule().
If you're in an interrupt, interrupts are likely disabled. So no timer interrupt, thus no schedule(). If you're in an interrupt with interrupts enabled (allowing for nested interrupts), a timer interrupt could try to invoke the scheduler, but it wouldn't run because it would detect that preemption is disabled. And preemption is always disabled when you're in an interrupt via the preempt_count field. The goal here is to prevent preemption in an interrupt context.
You have lots of other questions, but most of them can be answered by reading the available literature on the subject.

local_bh_disable, preempt_disable, local_irq_disable

local_bh_disable disables the processing of bottom halves (softirqs). Softirqs are processed on either, interrupt return path, or by the ksoftirqd-(per cpu)-thread that will be woken up if the system suffers of heavy softirq-load.
preempt_disable disables preemption, which means, that while a thread is executing inside a preempt_disable <-> preemt_enable scope, it will not be put to sleep by the scheduler.
This means that, if the system-timer-interrupt occurs while the current thread is inside that scope, it might update the accouting tables of the scheduler, but it will not switch context to another thread. this includes the softirqd.
local_irq_disable or local_irq_save disable interrupts for the local cpu. this means that the local cpu will not react to any irqs, so it will not run any interrupt return paths and hence, cannot run softirqs there.
Assuming my above statements are true (which i am not sure about), then wouldnt it be redundant to call local_bh_disable AFTER you called preempt_disable and local_irq_save (while being in process context)?

Yes. Once local_irq_save / disable has been called, no further protection is needed -- you won't be interrupted (except by an NMI or an exception in your code).
Often, however, you'll find bits of code that are designed to be callable from different contexts, so they may provide protection for some sub-operation that ends up being redundant in some paths.

preempt_disable/enable scope ensures that calling schedule inside that scope does nothing (i.e. preemption is disabled). However, a softirq or a irq can interrupt you.
Disabling irq will only disable hard interrupts, as disabling bh(softirqs) will only disable software interrupts, but you need to explicitly specify which one you want to disable.
There are 4 levels: NMI, IRQ, softirq, process. NMI(non maskable interrupts) can interrupt IRQ, softirq, process; IRQ can interrupt a softirq and a process; softirqs can interrupt a process.
Calling local_bh_disable() after local_irq_save() may be redundant (not sure), but calling local_bh_disable() after preempt_disable() is definitely needed if you want to disable BH.

Early bootup scheduling is extremenly fragile

As per init/main.c: setup_kernel
/* Disable preemption - early bootup scheduling is extremely
fragile until we cpu_idle for the first time*/
Why it is called fragile ? Any specific reason
What is its dependency on cpu_idle

Preemption in kernel allows for kernel code to be preempted before it finishes. At the time, while scheduler is already starting, many portions of the kernel are not yet configured nor setup, so start_kernel() ensures that preemption is disabled even when it starts the timer interrupt which makes sure that the crucial setup tasks are not preempted before they finish.
Once cpu_idle task is running, if I read the source correctly, all necessary early initialization tasks are done and preemption can be reenabled.

How does a kernel return from the thread

I am doing some study hardcore study on computers etc. so I can get started on my own mini Hello World OS.
I was looking a how kernels work and I was wondering how the kernel makes the current thread return to the kernel (so it can switch to another) even though the kernel isn't running and the thread has no instruction to do so.
Does it use some kind of CPU interrupt that goes back to the kernel after a few nanoseconds?

Does it use some kind of CPU interrupt that goes back to the kernel after a few nanoseconds?
It is during timer interrupts and (blocking) system calls that the kernel decides whether to keep executing the currently active thread(s) or switch to another thread. The timer interupt handler updates resource usages, such as consumed system and user time, for the currently running process and scheduler_tick() function that decides whether a process/tread need to be pre-empted.
See "Preemption and Context Switching" on page 62 of Linux Kernel Development book.
The kernel, however, must know when to call schedule(). If it called schedule() only
when code explicitly did so, user-space programs could run indefinitely. Instead, the kernel
provides the need_resched flag to signify whether a reschedule should be performed (see
Table 4.1).This flag is set by scheduler_tick() when a process should be preempted, and
by try_to_wake_up() when a process that has a higher priority than the currently run-
ning process is awakened.The kernel checks the flag, sees that it is set, and calls schedule() to switch to a new process.The flag is a message to the kernel that the scheduler should be invoked as soon as possible because another process deserves to run.

Does it use some kind of CPU interrupt
Yes! Modern preemptive kernels are absolutely dependent upon interrupts from hardware to deliver good I/O performance. Keyboard, mouse, disk, NIC, USB, etc. drivers are all entered from interrupts and can make threads that are waiting on them ready/running when required (e.g., when data is available).
Threads can also change state as a result of making an OS call that changes the caller's own state of that of another thread.
The interrupt from the hardware timer is one of many interrupt sources and is only special in that many system operations have timeouts that are signaled by this interrupt. Other than that, the timer interrupt just causes a reschedule which, in most cases, changes nothing re. the ready/running state of threads. If the machine is grossly CPU-overloaded to the point where there are more ready threads than there are cores, there is a side-effect of the timer interrupt that causes CPU time to be shared amongst the ready threads.
Do not fixate on the timer interrupt—the other driver interrupts are absolutely essential. It is not impossible to build a functional preemptive multithreaded kernel with no timer interrupt at all.

Are there any difference between "kernel preemption" and "interrupt"?

I just reading an article which says:
Reasons to control the interrupt system generally boil down to needing to provide synchronization. By disabling interrupts, you can guarantee that an interrupt handler will not preempt your current code. Moreover, disabling interrupts also disables kernel preemption. Neither disabling interrupt delivery nor disabling kernel preemption provides any protection from concurrent access from another processor,however.
So I just wonder the difference between interrupt and kernel preemption.
Or could we say disabling kernel preemption also disables interrupts?

When a process is interrupted, the kernel runs some code, which may not be related to what the process does.
When this is done, two things can happen:
1. The same process will get the CPU again.
2. A different process will get the CPU. The current process was preempted.
So preemption will only happen after an interrupt, but an interrupt doesn't always cause preemption.

They're different. Interrupts may occur outside even the context of the kernel, so changing the way the kernel handles preemption won't affect interrupts. It just appears that in the context of your article, kernel preemption depends on interrupts working (probably because it's implemented using a timer of some sort).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string