Early bootup scheduling is extremenly fragile - linux

As per init/main.c: setup_kernel
/* Disable preemption - early bootup scheduling is extremely
fragile until we cpu_idle for the first time*/
Why it is called fragile ? Any specific reason
What is its dependency on cpu_idle

Preemption in kernel allows for kernel code to be preempted before it finishes. At the time, while scheduler is already starting, many portions of the kernel are not yet configured nor setup, so start_kernel() ensures that preemption is disabled even when it starts the timer interrupt which makes sure that the crucial setup tasks are not preempted before they finish.
Once cpu_idle task is running, if I read the source correctly, all necessary early initialization tasks are done and preemption can be reenabled.

Related

How does the scheduler know that a thread is blocked waiting for input?

When a thread executing user code is waiting for input, how does the scheduler know to interrupt it or how does the thread know to call the scheduler, seeing as the average programmer of a simple single threaded application is unlikely to insert sched_yield() everywhere. Does the compiler insert sched_yield() on optimisation or does the thread just spin lock until the general timer interrupt set by the scheduler fires, or does the user have to explicitly state wait(), sleep() functions in order for the context to switch?
This question is especially relevant if the scheduler is not preemptive because then it has to call the scheduler when it is waiting for input for throughput to be effective, but I'm not sure how it does this.
Be careful not to confuse preemption with the ability of a process to sleep. Processes can sleep even with a non-preempting scheduler. This is what happens when a process is waiting for I/O. The process makes a system call such as read() and the device determines no data is available. It then internally puts the process to sleep by updating a data structure used by the scheduler. The scheduler then executes other processes until an interrupt or some other event occurs that wakes the original process. The awoken process then becomes eligible again for scheduling.
On the other hand preemption is the ability of an architecture's scheduler to stop execution of a process without its cooperation. The interruption can occur anywhere in the program's instruction stream. Control returns to the scheduler which can then execute other processes and return to the interrupted (preempted) process later. Most schedulers allocate time slices where a process is allowed to run for up to a predetermined amount of time, after which it is preempted if higher-priority processes need time slices.
Unless you're writing drivers or kernel code, you don't need to worry about the underlying mechanisms too much. When writing user-space applications the key concepts are (1) that some system calls may block which means your process is put to sleep until an event occurs, and (2) on preemptible systems (all mainstream modern operating systems) your program may be preempted at any time so that other processes can run.
* Note that in some platforms, such as Linux, a thread is really just another process which shares its virtual address space with another process. Processes and threads are therefore treated exactly the same by the scheduler.
It is not clear to me whether your question is about theory or practice. In practice in every modern operating system, i/o operations are privileged. Meaning that in order for a user process or thread to access files, devices and so on it must issue a system call.
Then the kernel has the opportunity to do whatever it considers appropriate. For example it can check whether the I/o operation will block and, therefore switch the running (i.e. “call” the scheduler) process after issuing the operation.
Note that this mechanism can work even when there is no timer interruption handled by the kernel. Anyway in general it will depend upon your system. For example in an embedded system where no OS exits (or a minimal one) it could be the entire responsibility of the user’s code to invoke the scheduler before issueing a blocking operation.
Kernel can be preemptive, not scheduler.
First sched_yield() and wait() are types of voluntary preemption, when process itself gives out CPU even if kernel is non-preemptive.
If kernel has ability to switch to another process when time quantum has expired or higher priority process become runnable then we are talking about involuntary preemption, i.e preemptive kernel, and it can happen on different places explained below.
Difference is that insched_yield() process stays in runnable TASK_RUNNING state but just goes to the end of the run queue for it's static priority. Process must wait to get the CPU again.
On the other hand, wait() puts process to a sleep TASK_(UN)INTERRUPTABLE state, on a wait queue, calls schedule() and waits for an event to occur. When event occur, process are moved to run queue again. But that doesn't mean that they will get CPU immediately.
Here is explained when schedule() can be called after process is woken up:
Wakeups don't really cause entry into schedule(). They add a
task to the run-queue and that's it.
If the new task added to the run-queue preempts the current
task, then the wakeup sets TIF_NEED_RESCHED and schedule() gets
called on the nearest possible occasion:
If the kernel is preemptible (CONFIG_PREEMPT=y):
in syscall or exception context, at the next outmost
preempt_enable(). (this might be as soon as the wake_up()'s
spin_unlock()!)
in IRQ context, return from interrupt-handler to
preemptible context
If the kernel is not preemptible (CONFIG_PREEMPT is not set)
then at the next:
cond_resched() call
explicit schedule() call
return from syscall or exception to user-space
return from interrupt-handler to user-space

Why disabling interrupts disables kernel preemption and how spin lock disables preemption

I am reading Linux Kernel Development recently, and I have a few questions related to disabling preemption.
In the "Interrupt Control" section of chapter 7, it says:
Moreover, disabling interrupts also disables kernel preemption.
I also read from the book that kernel preemption can occur in the follow cases:
When an interrupt handler exits, before returning to kernel-space.
When kernel code becomes preemptible again.
If a task in the kernel explicitly calls schedule()
If a task in ther kernel blocks (which results in a call to schedule())
But I can't relate disabling interrupts with these cases.
As far as I know, a spinlock would disable preemption with the preempt_disable() function.
The post What exactly are "spin-locks"?
says:
On a single core machine a spinlock is simply a "disable interrupts" or "raise IRQL" which prevents thread scheduling completely.
Does preempt_disable() disable preemption by disabling interrupts?
I am not a scheduler guru, but I would like to explain how I see it.
Here are several things.
preempt_disable() doesn't disable IRQ. It just increases a thread_info->preempt_count variable.
Disabling interrupts also disables preemption because scheduler isn't working after that - but only on a single-CPU machine. On the SMP it isn't enough because when you close the interrupts on one CPU the other / others still does / do something asynchronously.
The Big Lock (means - closing all interrupts on all CPUs) is slowing the system down dramatically - so it is why it not anymore in use. This is also the reason why preempt_disable() doesn't close the IRQ.
You can see what is preempt_disable(). Try this:
1. Get a spinlock.
2. Call schedule()
In the dmesg you will see something like "BUG: scheduling while atomic". This happens when scheduler detects that your process in atomic (not preemptive) context but it schedules itself.
Good luck.
In a test kernel module I wrote to monitor/profile a task, I've tried disabling interrupts by:
1 - Using local_irq_save()
2 - Using spin_lock_irqsave()
3 - Manually disable_irq() to all IRQs in /proc/interrupts
In all 3 cases I could still use the hrtimer to measure time even though IRQs were disabled (and a task I was monitoring got preempted as well).
I find this veeeeerrrryyyy strange... I personally was anticipating what Sebastian Mountaniol pointed out -> No interrupts - no clock. No clock - no timers...
Linux kernel 2.6.32 on a single core, single CPU... Can anyone have a better explanation ?
preempt_disable() doesn't disable the interrupts. It however increments the count of preempt counter. Let's say you call preempt_disable() n times in your code path, preemption will only enable at the nth preempt_enable().
disabling interrupts to prevent preemption : not a safe way. This will undoubtedly disable normal kernel preemption because scheduler_tick() won't be called on system tick (no interrupt handler invoked). However, if the program triggers the schedule function, preemption will occur if preempt_disable() was not invoked.
In linux, raw_spin_lock() doesn't disable local interrupts which may lead to deadlock. For instance, if an interrupt handler is invoked which tries to lock already held spin lock, it won't be able to unless the process itself releases it which is not possible as interrupt return wouldn't occur.
So, it's better to use raw_spin_lock_irq(), which disables interrupts.
Interrupt disabling disables some forms of kernel preemption, but there are other ways kernel preemption can happen. For this reason, disabling interrupts is not considered a safe way to prevent kernel preemption.
For instance, with interrupts disabled, cond_resched() would still cause preemption, but wouldn't if preemption was explicitly disabled.
This is why, in regards to your second question, spin locks don't use interrupt disabling to disable preemption. They explicitly call preempt_disable() which increments preempt_count, and disables all ways that preemption can happen except for explicit calls to schedule().

Linux process scheduling in kernel mode

Here is some description quoted from Wiki
The Linux kernel provides preemptive scheduling under certain
conditions. Until kernel version 2.4, only processes were preemptive,
i.e. in addition to time quantum expiration, an execution of current
process in user mode would be interrupted if higher dynamic priority
processes entered TASK_RUNNING state. Towards Linux 2.6, an
ability to interrupt a task executing kernel code was added, although
with that not all sections of the kernel code can be preempted.
Then it also says this,
Preemption improves latency, increases responsiveness, and makes Linux
more suitable for desktop and real-time applications. Older versions
of the kernel had a so-called big kernel lock for synchronization
across the entire kernel. This was finally removed by Arnd Bergmann in
2011
So does the above statement hold true for the current linux kernel that kernel preemption is
conditional? e.g. if a process is trapped into kernel mode by making a system call, this process will not be under preemptive scheduling?
Where can I find some up-to-date introduction articles/books about linux scheduling in both user mode and kernel mode?
Of course kernel preemption is conditional. You would not want the kernel to switch tasks while holding an exclusive lock or while writing to time-sensitive hardware registers in a device driver.
However, the Linux kernel does its best to minimize these conditions in order to make preemption happen as quickly as it can.
Note that this in-kernel preemption is only compiled into the kernel when the compile option CONFIG_PREEMPT is yes. There is also CONFIG_PREEMPT_VOLUNTARY which only does task switching when the kernel explicitly checks for it.
Kernel preemption comes with a cost. Rapidly switching tasks requires doing a lot of mostly wasted housekeeping work instead of actual work. This slows down the whole system and results in less work being done. That is why these compile options exist. A Linux kernel built for a database or web server should not use preemption at all. A kernel built for HPC is sometimes modified to only switch tasks once a second, or less.
That all changes for real-time tasks. These tasks rely on reacting quickly and within a reliable timeframe. The default Linux kernel is pretty good at this, but there is a patch set called the "-rt patches" that makes it really good. The patch set does all sorts of things like prioritize interrupt handlers and change kernel locks so that locks can be dropped and restarted later.
CPU scheduling decisions may take place when a process:
1. Switches from running to waiting state (e.g. I/O request)
2. Switches from running to ready state (e.g. Interrupt)
3. Switches from waiting to ready (e.g. I/O completion)
4. Terminates
Scheduling under 1 and 4 are non-preemptive and all other scheduling is preemptive, have to deal with possibility that operations (system calls) may be incomplete.
Yes Linux provides preemptive scheduling under certain conditions unlike some Unix variants where kernel schedules until completion without preemption. In Linux 2.6, kernel was made preemptive a task running as long it is not holding a lock and safe to rescheduling.
Older versions of the kernel had a so-called big kernel lock for synchronization
across the entire kernel.
refers to each user-level thread maps only to one kernel thread.

How does a kernel return from the thread

I am doing some study hardcore study on computers etc. so I can get started on my own mini Hello World OS.
I was looking a how kernels work and I was wondering how the kernel makes the current thread return to the kernel (so it can switch to another) even though the kernel isn't running and the thread has no instruction to do so.
Does it use some kind of CPU interrupt that goes back to the kernel after a few nanoseconds?
Does it use some kind of CPU interrupt that goes back to the kernel after a few nanoseconds?
It is during timer interrupts and (blocking) system calls that the kernel decides whether to keep executing the currently active thread(s) or switch to another thread. The timer interupt handler updates resource usages, such as consumed system and user time, for the currently running process and scheduler_tick() function that decides whether a process/tread need to be pre-empted.
See "Preemption and Context Switching" on page 62 of Linux Kernel Development book.
The kernel, however, must know when to call schedule(). If it called schedule() only
when code explicitly did so, user-space programs could run indefinitely. Instead, the kernel
provides the need_resched flag to signify whether a reschedule should be performed (see
Table 4.1).This flag is set by scheduler_tick() when a process should be preempted, and
by try_to_wake_up() when a process that has a higher priority than the currently run-
ning process is awakened.The kernel checks the flag, sees that it is set, and calls schedule() to switch to a new process.The flag is a message to the kernel that the scheduler should be invoked as soon as possible because another process deserves to run.
Does it use some kind of CPU interrupt
Yes! Modern preemptive kernels are absolutely dependent upon interrupts from hardware to deliver good I/O performance. Keyboard, mouse, disk, NIC, USB, etc. drivers are all entered from interrupts and can make threads that are waiting on them ready/running when required (e.g., when data is available).
Threads can also change state as a result of making an OS call that changes the caller's own state of that of another thread.
The interrupt from the hardware timer is one of many interrupt sources and is only special in that many system operations have timeouts that are signaled by this interrupt. Other than that, the timer interrupt just causes a reschedule which, in most cases, changes nothing re. the ready/running state of threads. If the machine is grossly CPU-overloaded to the point where there are more ready threads than there are cores, there is a side-effect of the timer interrupt that causes CPU time to be shared amongst the ready threads.
Do not fixate on the timer interrupt—the other driver interrupts are absolutely essential. It is not impossible to build a functional preemptive multithreaded kernel with no timer interrupt at all.

Are there any difference between "kernel preemption" and "interrupt"?

I just reading an article which says:
Reasons to control the interrupt system generally boil down to needing to provide synchronization. By disabling interrupts, you can guarantee that an interrupt handler will not preempt your current code. Moreover, disabling interrupts also disables kernel preemption. Neither disabling interrupt delivery nor disabling kernel preemption provides any protection from concurrent access from another processor,however.
So I just wonder the difference between interrupt and kernel preemption.
Or could we say disabling kernel preemption also disables interrupts?
When a process is interrupted, the kernel runs some code, which may not be related to what the process does.
When this is done, two things can happen:
1. The same process will get the CPU again.
2. A different process will get the CPU. The current process was preempted.
So preemption will only happen after an interrupt, but an interrupt doesn't always cause preemption.
They're different. Interrupts may occur outside even the context of the kernel, so changing the way the kernel handles preemption won't affect interrupts. It just appears that in the context of your article, kernel preemption depends on interrupts working (probably because it's implemented using a timer of some sort).

Resources