Why can't schedule() be called directly from a hardware interrupt? - linux

Why can't schedule() be called directly from a hardware interrupt?
For example, why can't I call schedule() directly from scheduler_tick() and instead I have to use need_resched flag?
I tried looking for an answer but I came empty handed. Any help would be much appreciated.

Consider a cpu with a spin lock which now serves an interrupt. If you schedule() out, you violate the invariant of spin lock owners not going off cpu. Note that for the most part spin locks DON'T disable interrupts. Sometimes there are locks relevant to interrupt handlers and in those cases spin_lock_irq and/or spin_lock_irqsave is used.

Scheduling happens on timer interrupts. The basic rule is that only one interrupt can be open at a time, so if you go to sleep in the "got data from device X" interrupt, the timer interrupt cannot run to schedule it out.
Interrupts also happen many times and overlap. If you put the "got data" interrupt to sleep, and then get more data, what happens? It's confusing (and fragile) enough that the catch-all rule is: no sleeping in interrupts. You will do it wrong.

Related

In which cases different interrupt handlers may be interrupted / preempted?

There are three type of interruptions:
External
Internal (software interrupts)
Syscall (based on internal)
I had got the question "can the interruptions be interrupted or preempted by scheduler (which is also interrupt by timer)?"
After some research i am totally confused:
Someone says there are priorities of the interruptions and only interruption with higher priority can interrupt another one. Does it pertain only to external interruptions? / How is it arranged in real OS, say x86-64 Linux? Is it used? / Okay, if some interruption has been interrupted, is it gonna be resumed? Someone says interruptions don't have a "context", but as i know preempting some process / thread occurs with interruptions from a timer, so it's possible to switch context back to the interruption that has occurred to preempt the process. Correct me please if i'm wrong in this.
Someone says there is some flag for interrupt handlers "INTERRUPTIBLE", and let's say if some syscall handler is executing with this flag set it may be interrupted by a signal. Is it related only to software interruptions? Don't external interruptions have that flag? / What if this flag is not set but the timer (for example) interruption occurs to preempt the thread? Is it ignored?
Someone says it only depends on what an interruption handler is doing. For example if there is read() syscall that is waiting for input, it sets particular flags so that scheduler (timer interruption) can interrupt and preempt it. But if the handler is doing something crucial it can forbid to interrupt or preempt it, so it's gonna possess a cpu by itself until it finished. It seems there are a lot of mechanisms in x86, and i don't absolutely get which one is used and how it works in real life.
(Answering for Linux on x86, since those are your tags).
There are only two types of events that could interrupt your code. Interrupts, and exceptions. Syscalls are considered exceptions.
I had got the question "can the interruptions be interrupted or preempted by scheduler (which is also interrupt by timer)?"
The scheduler doesn't interrupt anything. It's just code, it doesn't run on it's own. Only an interrupt or exception can interrupt your code. You can have a timer interrupt occur that then calls the schedule().
If you're in an interrupt, interrupts are likely disabled. So no timer interrupt, thus no schedule(). If you're in an interrupt with interrupts enabled (allowing for nested interrupts), a timer interrupt could try to invoke the scheduler, but it wouldn't run because it would detect that preemption is disabled. And preemption is always disabled when you're in an interrupt via the preempt_count field. The goal here is to prevent preemption in an interrupt context.
You have lots of other questions, but most of them can be answered by reading the available literature on the subject.

Can the same timer interrupt occur in parallel?

I implemented one timer interrupt handler in kernel module.
This timer interrupt handler requires about 1000us to run.
And I want this timer to trigger up every 10us.
(In doing so, I hope the same handler will be performed in parallel.)
(I know that this can create a tremendous amount of interrupt overhead, but I want to implement it for some testing.)
But this handler does not seem to run in parallel.
Timer interrupt seems to wait until the handler in progress is finished.
Can the same timer interrupt occur in parallel?
If not, is there a kernel mechanism that can run the same handler in parallel?
If the timer triggers every 10us, and requires 1000us (1ms) to complete, you would require 100 dedicated cpu's to barely keep up to the timers. The short answer is no, the interrupt system isn't going to support this. If an interrupt recursed, it would inevitably consume the interrupt handler stack.
Interrupts typically work by having a short bit of code be directly invoked when the interrupt asserts. If more work is to be done, this short bit would schedule a slower bit to follow on, and inhibit this source of interrupt. This is to minimize the latency caused by disparate devices seeking cpu attention. The slower bit, when it determines it has satiated the device request, can re-enable interrupts from this source.
[ In linux, the short bit is called the top half; the slower bit the bottom half. It is a bit confusing, because decades of kernel implementation pre-linux named it exactly the other way around. Best to avoid these terms. ]
One of many ways to get the effect you desire is to have this slow handler release a semaphore then re-enable the interrupt. You could then have an appropriate number of threads sit in a loop acquiring the semaphore then performing your task.

local_bh_disable, preempt_disable, local_irq_disable

local_bh_disable disables the processing of bottom halves (softirqs). Softirqs are processed on either, interrupt return path, or by the ksoftirqd-(per cpu)-thread that will be woken up if the system suffers of heavy softirq-load.
preempt_disable disables preemption, which means, that while a thread is executing inside a preempt_disable <-> preemt_enable scope, it will not be put to sleep by the scheduler.
This means that, if the system-timer-interrupt occurs while the current thread is inside that scope, it might update the accouting tables of the scheduler, but it will not switch context to another thread. this includes the softirqd.
local_irq_disable or local_irq_save disable interrupts for the local cpu. this means that the local cpu will not react to any irqs, so it will not run any interrupt return paths and hence, cannot run softirqs there.
Assuming my above statements are true (which i am not sure about), then wouldnt it be redundant to call local_bh_disable AFTER you called preempt_disable and local_irq_save (while being in process context)?
Yes. Once local_irq_save / disable has been called, no further protection is needed -- you won't be interrupted (except by an NMI or an exception in your code).
Often, however, you'll find bits of code that are designed to be callable from different contexts, so they may provide protection for some sub-operation that ends up being redundant in some paths.
preempt_disable/enable scope ensures that calling schedule inside that scope does nothing (i.e. preemption is disabled). However, a softirq or a irq can interrupt you.
Disabling irq will only disable hard interrupts, as disabling bh(softirqs) will only disable software interrupts, but you need to explicitly specify which one you want to disable.
There are 4 levels: NMI, IRQ, softirq, process. NMI(non maskable interrupts) can interrupt IRQ, softirq, process; IRQ can interrupt a softirq and a process; softirqs can interrupt a process.
Calling local_bh_disable() after local_irq_save() may be redundant (not sure), but calling local_bh_disable() after preempt_disable() is definitely needed if you want to disable BH.

Pthread Concepts

I'm studying threads and I am not sure if I understand some concepts. What is the difference between preemption and yield? So far I know that preemption is a forced yield but I am not sure what it actually means.
Thanks for your help.
Preemption is when one thread stops another thread from running so that it may run.
To yield is when a thread voluntarily gives up processor time.
Have a gander at these...
http://en.wikipedia.org/wiki/Preemption_(computing)
http://en.wikipedia.org/wiki/Thread_(computing)
The difference is how the OS is entered.
'yield' is a software interrupt AKA system call, one of the many that may result in a change in the set of running threads, (there are lots of other system calls that can do this - blocking reads, synchronization calls). yield() is called from a running thread and may result in another ready, (but not running), thread of the same priority being run instead of the calling thread - if there is one.
The exact behaviour of yield() is somewhat hardware/OS/language-dependent. Unless you are developing low-level lock-free thread comms mechanisms, and you are very good at it, it's best to just forget about yield().
Preemption is the act of interrupting one thread and dispatching another in its place. It can only occur after a hardware interrupt. When hardware interrupts, its driver is entered. The driver may decide that it can usefully make a thread ready, (eg. a thread is blocked on a read() call to the driver and the driver has accumulated a nice, big buffer of data). The driver can do this by signaling a semaphore and exiting via. the OS, (which provides an entry point for just such a purpose). This driver exit path causes a reschedule and, probably, makes the read thread running instead of some other thread that was running before the interrupt - the other thread has been preempted. Essentially and simply, preemption occurs when the OS decides to interrupt-return to a different set of threads than the one that was interrupted.
Yield: The thread calls a function in the scheduler, which potentially "parks" that thread, and starts another one. The other thread is one which called yield earlier, and now appears to return from it. Many functions can have yielding semantics, such as reading from a device.
Preempt: an external event comes into the system: some kind of interrupt (clock, network data arriving, disk I/O completing ...). Whichever thread is running at that time is suspended, and the machine is running operating system code the interrupt context. When the interrupt is serviced, and it's time to return from the interrupt, a scheduling decision can be made to keep the interrupted thread parked, and instead resume another one. That is a preemption. If/when that original thread gets to run again, the context which was saved by the interrupt will be activated and it will pick up exactly where it left off.
Scheduling systems which rely on yield exclusively are called "cooperative" or "cooperative multitasking" as opposed to "preemptive".
Traditional (read: old, 1970's and 80's) Unix is cooperatively multitasked in the kernel, with a preemptive user space. The kernel routines are trusted to yield in a reasonable time, and so preemption is disabled when running kernel code. This greatly simplifies kernel coding and improves reliability, at the expense of performance, especially when multiple processors are introduced. Linux was like this for many years.

When does OS check signals?

For simplicity,let's suppose it's on a single core architecture.
OS' main responsibility is to assign CPU time to different processes.
When does it check signals?
My bet is that it checks it when switching context(hang proc A and wait B) ,but I don't have any proof..
The answer, sadly, depends on the OS. On most, if not all, OS signals are event-driven entities. For example, in the case of a hardware interrupt, the hardware sends the signal to the interrupt handler, which then does its stuff, usually upon a context-switch (like you suggested).
It depends on the OS exactly, but in the case of a signal sent from a specific program, it usually happens when you context-switch a process to be executed. Signals are then checked. In the case of kill, the kill command is "tied" to the process, and the OS' interrupt handler takes care of it.
Operating systems have interrupt handlers that deal with that kind of thing. They periodically check, but it realy depends on the OS. In the specific case of kill PID (I use this example because you used it in an above comment), it will check the next time PID is scheduled for continued execution.
Short but unsatisfying answer: it depends on the signal and on the OS.
Hope this helps!
N.S.
Sources: I've programmed operating systems before, and I've taken multiple concurrency classes.
It doesn't poll for them if that's what you mean. When someone asks the kernel to send a signal, it interrupts the program to handle it.
Segfaults are triggered by hardware interrupts. The interrupt handler asks the kernel to pass the message along. Timeouts are similar.
It's all event-driven. Although some of the events quickly and simply leave messages around to be collected later for later - mouse movements etc. What happens next is very system-dependent but it's not a signal anymore.

Resources