linux bottom-half preemption

linux bottom-half preemption - linux

As far as I know there are many mechanisms to implement bottom-halves in Linux:
softirq
taslket
workqueue
threaded irq ( request_threaded_irq() )
Which all have their characteristics regarding schedulability.
What I cannot get from the literature is their preemption possibility. What kind of tasks can preempt the various different bottom-half implementations?
More specifically, I am interested in threaded irqs and workqueues. How much can one be confident that once scheduled a threaded irq or a workqueue is not preempted before completion i.e. runs in one shot? What are the types of tasks that are able preempt them?
For example, Linux Kernel Development by Robert Love states that only top-halves can preempt softirqs, so I would say that softirqs complete in one shot most of the times (or if they get preempted, it's only for a very short time).
My goal is to qualitatively assess the time between two operations in the same threaded irq or workqueue. In particular the time between i2c data read and a reading of the system clock.
Thanks.

Worqueues and threaded IRQ handlers are run in a process context and can be preempted. When they can be preempted actually depends on your kernel configuration (CONFIG_PREEMPT or CONFIG_PREEMPT_VOLUNTARY) and also on the realtime priorities you set on the handling thread.
You can not assume that your workqueue or your bottom half will not get interrupted. This means that if you share resources with a top half, you have to use proper locking.

Related

I/O bound vs CPU bound depend on kernel?

I saw this question online preparing for a job interview:
Given a non-preemptive kernel which type of process will get affected more
in terms of performance and why?
I/O bound
CPU bound
I know that a CPU-bound process gets long quanta but with low priority, wWhereas an I/O-bound process gets short quanta with high priority.
At first I though I/O bound will get affected more since it must wait for reading from disk to finish (and not just to ask the OS to wake it up when something is ready).
But I think this is wrong since even in non-preemptive kernel a process can decide by itself to finish its job and let another work.
I am looking for detailed answer to deeply understand what I am missing here.

I think that is a trick question. The difference between a preemptive and non-preemptive kernel is in the way that thread that are in kernel mode are scheduled. (See What is the difference between Non-preemptive, Preemptive and Selective Preemptive Kernel?)
This makes no difference to a CPU-bound thread, since such a thread will be executing in user mode the whole time.
It also makes little difference to a (normal) I/O bound thread. While the thread will be in kernel mode while blocked on I/O, the kernel thread will be de-scheduled when waiting for a physical I/O event to occur.
However, it does make a difference for real-time threads doing I/O. If a real-time (high priority) thread is waiting on an I/O event and the event happens, you want the current kernel thread (if any) to be preempted so that the high priority kernel thread can take over. A preemptive kernel allows that. A non-preemptive thread doesn't, and the high-priority thread is held up until the low-priority one finishes what it is currently doing.
It might also make a difference to how different kinds of I/O are (effectively) prioritized; e.g. the "soft real-time" characteristics of I/O.
(Apparently ... Linux kernels only allow one kernel thread to be active at a time for thread-safety reasons.)
In your question, you are speculating about user-mode preemption. AFAIK, that is a orthogonal to kernel-mode preemption, and preemptive / non-preemptive kernels.

How does interrupt polling perform context switching?

Consider a very old single-core CPU that does not support hardware interrupts, and let's say I want to write a multi-tasked operating system. Using a hardware timer, one can poll an IRQ line in order to determine whether the timer has elapsed, and if so then switch threads/processes.
However, in order to be able to poll, the kernel has to have execution attention by the CPU. For a CPU that supports hardware interrupts, an ISR is called upon an interrupt and (correct me if I'm wrong) if the interrupt is by the context-switch timer, the appropriate ISR calls the kernel code that handles context switching.
If a CPU does not support hardware interrupts (again, correct me if I'm wrong), then the kernel has to repeatedly check for interrupts and the appropriate ISR is called in kernel space.
But, if a user thread is currently in execution on this hypothetical processor, the thread has to manually yield execution to the kernel for it to be able check whether the context-switch is due according to the timer through the appropriate IRQ line. This can be done by calling an appropriate kernel function.
Is there a way to implement non-cooperative multithreading on a single-core processor that only supports software interrupts? Are my thoughts correct, or am I missing something?

Well, you are generally correct that the kernel can't do multitasking until it gains control of the CPU. That happens via an interrupt or when user code makes a system call.
The timer interrupt, in particular, is used for preemptive time slicing. I think it would be pretty hard to find a whole CPU that didn't support a timer interrupt, that you didn't have to program with punch cards or switches. Interrupts are much older than multiple cores or virtual memory or DMA or anything fancy at all.
Some SoCs have real time sub-components that have this sort of restriction (like Beaglebone), and it might come up if you were coding a small CPU in an FPGA or something.
Without interrupts, you have to wait for system calls, which basically becomes cooperative multitasking.

How does the OS scheduler regain control of CPU?

I recently started to learn how the CPU and the operating system works, and I am a bit confused about the operation of a single-CPU machine with an operating system that provides multitasking.
Supposing my machine has a single CPU, this would mean that, at any given time, only one process could be running.
Now, I can only assume that the scheduler used by the operating system to control the access to the precious CPU time is also a process.
Thus, in this machine, either the user process or the scheduling system process is running at any given point in time, but not both.
So here's a question:
Once the scheduler gives up control of the CPU to another process, how can it regain CPU time to run itself again to do its scheduling work? I mean, if any given process currently running does not yield the CPU, how could the scheduler itself ever run again and ensure proper multitasking?
So far, I had been thinking, well, if the user process requests an I/O operation through a system call, then in the system call we could ensure the scheduler is allocated some CPU time again. But I am not even sure if this works in this way.
On the other hand, if the user process in question were inherently CPU-bound, then, from this point of view, it could run forever, never letting other processes, not even the scheduler run again.
Supposing time-sliced scheduling, I have no idea how the scheduler could slice the time for the execution of another process when it is not even running?
I would really appreciate any insight or references that you can provide in this regard.

The OS sets up a hardware timer (Programmable interval timer or PIT) that generates an interrupt every N milliseconds. That interrupt is delivered to the kernel and user-code is interrupted.
It works like any other hardware interrupt. For example your disk will force a switch to the kernel when it has completed an IO.

Google "interrupts". Interrupts are at the centre of multithreading, preemptive kernels like Linux/Windows. With no interrupts, the OS will never do anything.
While investigating/learning, try to ignore any explanations that mention "timer interrupt", "round-robin" and "time-slice", or "quantum" in the first paragraph – they are dangerously misleading, if not actually wrong.
Interrupts, in OS terms, come in two flavours:
Hardware interrupts – those initiated by an actual hardware signal from a peripheral device. These can happen at (nearly) any time and switch execution from whatever thread might be running to code in a driver.
Software interrupts – those initiated by OS calls from currently running threads.
Either interrupt may request the scheduler to make threads that were waiting ready/running or cause threads that were waiting/running to be preempted.
The most important interrupts are those hardware interrupts from peripheral drivers – those that make threads ready that were waiting on IO from disks, NIC cards, mice, keyboards, USB etc. The overriding reason for using preemptive kernels, and all the problems of locking, synchronization, signaling etc., is that such systems have very good IO performance because hardware peripherals can rapidly make threads ready/running that were waiting for data from that hardware, without any latency resulting from threads that do not yield, or waiting for a periodic timer reschedule.
The hardware timer interrupt that causes periodic scheduling runs is important because many system calls have timeouts in case, say, a response from a peripheral takes longer than it should.
On multicore systems the OS has an interprocessor driver that can cause a hardware interrupt on other cores, allowing the OS to interrupt/schedule/dispatch threads onto multiple cores.
On seriously overloaded boxes, or those running CPU-intensive apps (a small minority), the OS can use the periodic timer interrupts, and the resulting scheduling, to cycle through a set of ready threads that is larger than the number of available cores, and allow each a share of available CPU resources. On most systems this happens rarely and is of little importance.
Every time I see "quantum", "give up the remainder of their time-slice", "round-robin" and similar, I just cringe...

To complement #usr's answer, quoting from Understanding the Linux Kernel:
The schedule( ) Function
schedule( ) implements the scheduler. Its objective is to find a
process in the runqueue list and then assign the CPU to it. It is
invoked, directly or in a lazy way, by several kernel routines.
[...]
Lazy invocation
The scheduler can also be invoked in a lazy way by setting the
need_resched field of current [process] to 1. Since a check on the value of this
field is always made before resuming the execution of a User Mode
process (see the section "Returning from Interrupts and Exceptions" in
Chapter 4), schedule( ) will definitely be invoked at some close
future time.

Pthread Concepts

I'm studying threads and I am not sure if I understand some concepts. What is the difference between preemption and yield? So far I know that preemption is a forced yield but I am not sure what it actually means.
Thanks for your help.

Preemption is when one thread stops another thread from running so that it may run.
To yield is when a thread voluntarily gives up processor time.

Have a gander at these...
http://en.wikipedia.org/wiki/Preemption_(computing)
http://en.wikipedia.org/wiki/Thread_(computing)

The difference is how the OS is entered.
'yield' is a software interrupt AKA system call, one of the many that may result in a change in the set of running threads, (there are lots of other system calls that can do this - blocking reads, synchronization calls). yield() is called from a running thread and may result in another ready, (but not running), thread of the same priority being run instead of the calling thread - if there is one.
The exact behaviour of yield() is somewhat hardware/OS/language-dependent. Unless you are developing low-level lock-free thread comms mechanisms, and you are very good at it, it's best to just forget about yield().
Preemption is the act of interrupting one thread and dispatching another in its place. It can only occur after a hardware interrupt. When hardware interrupts, its driver is entered. The driver may decide that it can usefully make a thread ready, (eg. a thread is blocked on a read() call to the driver and the driver has accumulated a nice, big buffer of data). The driver can do this by signaling a semaphore and exiting via. the OS, (which provides an entry point for just such a purpose). This driver exit path causes a reschedule and, probably, makes the read thread running instead of some other thread that was running before the interrupt - the other thread has been preempted. Essentially and simply, preemption occurs when the OS decides to interrupt-return to a different set of threads than the one that was interrupted.

Yield: The thread calls a function in the scheduler, which potentially "parks" that thread, and starts another one. The other thread is one which called yield earlier, and now appears to return from it. Many functions can have yielding semantics, such as reading from a device.
Preempt: an external event comes into the system: some kind of interrupt (clock, network data arriving, disk I/O completing ...). Whichever thread is running at that time is suspended, and the machine is running operating system code the interrupt context. When the interrupt is serviced, and it's time to return from the interrupt, a scheduling decision can be made to keep the interrupted thread parked, and instead resume another one. That is a preemption. If/when that original thread gets to run again, the context which was saved by the interrupt will be activated and it will pick up exactly where it left off.
Scheduling systems which rely on yield exclusively are called "cooperative" or "cooperative multitasking" as opposed to "preemptive".
Traditional (read: old, 1970's and 80's) Unix is cooperatively multitasked in the kernel, with a preemptive user space. The kernel routines are trusted to yield in a reasonable time, and so preemption is disabled when running kernel code. This greatly simplifies kernel coding and improves reliability, at the expense of performance, especially when multiple processors are introduced. Linux was like this for many years.

Does Linux drop into the kernel on all cores?

For a multi-core computer running Linux 2.6.x, what happens when a thread makes a system call? Does it drop into the kernel only on the core that the thread is running on, or does it drop into the kernel on all cores (sorry if this is a newbie question).
Is this behaviour (whichever is the correct one) the same when receiving interrupts in general? If not, what are the differences?

Only the thread that does the syscall enters the kernel. All scheduling in Linux is done on thread granularity. As for interrupts - they are routed to one core, i.e. only one processor is interrupted for each given hardware event. Then interrupts could be manually assigned to specific cores. This is done with a mask in /proc/irq/IRQ-NUMBER/smp_affinity. You can see which CPUs receive what hardware interrupts in /proc/interrupts.

Ony one core handles a system call, and only one core handles an interrupt.
I don't have any references off hand for exactly how interrupts are routed - perhaps Intel's System Programming Guide would be helpful here.
But, imagine if all cores were interrupted by every system call or interrupt. Linux is designed to scale to many cores. This would kill that scalability - on a massive server every disk I/O, timer interrupt, etc., would effectively stall every single core in the system, preventing them from doing useful work.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string