Linux Kernel Multicore Issue - linux

I have some doubts regarding some linux kernel scheduling.
1) Does linux kernel(schedular to be specific) always runs on CPU-0?
2) One Scenario:
One kernel thread running on CPU - 0, goes into sleep with interrupts disabled.
In this case, will the schedular run on other CPU?
if Yes, how is the selection made out of the remaining core so as to which will run
the schedular, is this decision made while disabling interrupts on CPU - 0?

The scheduler is just a piece of code (in particular, the schedule() function).
Like most other parts of the kernel, it runs on whatever CPU it is called.
The scheduler gets called when some thread wants to sleep or after an interrupt has been handled; this can happen on all CPUs.

1) Does linux kernel(schedular to be specific) always runs on CPU-0?
(No, scheduler can runs on any CPU cores.)
2) One Scenario:
One kernel thread running on CPU - 0, goes into sleep with interrupts disabled.
In this case, will the schedular run on other CPU?
(A thread running on CPU -0, goes into sleep. Which means the thread
quits the CPU voluntarily.The sleep code will call linux scheduler, and the scheduler will choose another thread/process to run.This is noting to do with the interrupts.Disabling the interrupts(eg. timer interrupt), can stop the thread being interrupted and scheduled out to the CPU against its will.)
if Yes, how is the selection made out of the remaining core so as to which will run
the schedular, is this decision made while disabling interrupts on CPU - 0?
(Hope this helps!)

Related

Is the scheduler built into the kernel a program or a process?

I looked up the CPU scheduler source code built into the kernel.
https://github.com/torvalds/linux/tree/master/kernel/sched
But I have a question.
There are mixed opinions on the cpu scheduler on the Internet.
I saw an opinion that CPU scheduler is a process.
Question: If so, when ps-ef on Linux, the scheduler process should be visible. It was difficult to find the PID and name of the scheduler process.
The PID for the CPU scheduler process is not on the internet either. However, the PID 0 SWAPPER process is called SCHED, but in Linux, PID 0 functions as an idle process.
I saw an opinion that CPU scheduler is not a process.
CPU scheduler is a passive source code built into the kernel, and user processes frequently enter the kernel and rotate the source code.
Question: How does the user process execute the kernel's scheduler source code on its own?
What if you created a user program without adding a system call using the scheduler of the kernel?
How does the user process self-rotate the scheduler in the kernel without such code?
You have 2 similar questions (The opinion that the scheduler built into the kernel is the program and the opinion that it is the process and I want to know how to implement the cpu scheduling process in Linux operating system) so I'll answer for both of these here.
The answer is that it doesn't work that way at all. The scheduler is not called by user mode processes by using system calls. The scheduler isn't a system call. There are timers that are programmed to throw interrupts after some time has elapsed. Timers are accessed using registers that are memory in RAM often called memory mapped IO (MMIO). You write to some position in RAM specified by the ACPI tables (https://wiki.osdev.org/ACPI) and it will allow to control the chips in the CPU or external PCI devices (PCI is everything nowadays).
When the timer reaches 0, it will trigger an interrupt. Interrupts are thrown by hardware (the CPU). The CPU thus includes special mechanism to let the OS determine the position at which it will jump on interrupt (https://wiki.osdev.org/Interrupt_Descriptor_Table). Interrupts are used by the CPU to notify the OS that an event happened. Without interrupts, the OS would have to reserve at least one core of the processor for a special kernel process that would constantly poll the registers of peripherals and other things. It would be impossible to implement. Also, if user mode processes did the scheduler system call by themselves, the kernel would be slave to user mode because it wouldn't be able to tell if a process is finished and processes could be selfish over CPU time.
I didn't look at the source code but I think the scheduler is also often called on some IO completion (also on interrupt but not always on timer interrupt). I am quite sure that the scheduler must not be preempted. That is interrupts (and other things) will be disabled while the schedule() function runs.
I don't think you can call the scheduler a process (not even a kernel thread). The scheduler can be called by kernel threads that are created by interrupts due to bottom half processing. In bottom half processing, the top "half" of the interrupt handler runs fast and efficiently while the bottom "half" is added to the queue of processes and runs when the scheduler decides it should be scheduled. This has the effect of creating some kernel threads. The scheduler can thus be called from kernel threads but not always from bottom half of interrupts. There has to be a mechanism to call the scheduler without the scheduler having to schedule the task itself. Otherwise, the kernel will stop functioning.

How one thread wakes up others inside Linux Kernel

My question is how and when one thread wakes up other thread(s)?
I tried to look at Linux kernel code, but didn't find what I was looking for.
For example, there is one thread waiting on mutex, conditional variable, file descriptor (event fd, for example).
What work is performed by thread that releases mutex, and work is performed by other cpu core that is about to run sleeping thread?
I have searched existing answers, but did not find details.
I have have read that scheduler can usually be called:
after a system call before returning to userspace
after interrupt processing
after timer interrupt processing - for example, every 1ms (hz = 1000) ок 4ms (hz = 250)
I believe that thread that releases some resource, it calls through some system call kernel function try_to_wake_up. This function picks some task(s) and sets its state to RUNNABLE. This work is performed by signaling thread and takes some time. But how actually task is started to run? If system is buzy, there may be no free cpus to run this task. Some time in the future, for example, after call by timer when some other thread goes to sleep or exhausted its quantum, scheduler is called on some cpu and takes runnable task for running. Maybe this task will be preferably run on that cpu where it ran previously.
But there must be some other scenario. When there are idle cpus, I believe that task is awakened immediately, without waiting at most 1ms or even 4ms (wake-up latency is always around several microseconds, not milleseconds).
Also, for example, imagine situation when some thread is running exclusively on some cpu core.
This cpu core may be isolated from kernel threads and interrupts handlers and only one user thread has affinity set to run on this and only on this core. I believe that if there are enough free cpu cores, no other threads will be normally scheduled to run on that core (am I wrong?)
Also this cpu core may have nohz_full option enabled. So when user thread goes to sleep, this core goes to sleep too. No irqs from devices, no timer irqs are processed.
So there must be some way for one cpu to tell other cpu to start running (throug interrupt), call scheduler and run user thread that is ready to awake.
Scheduler must run not on the cpu that releases resource, but on some other cpu, that should be awakened. Maybe this is performed somehow via IPI interrupt? Can you help me to find corresponding code in the kernel or describe how it works?
Thank you.

How does a kernel come to know that the CPU is idle?

I was studying Operating Systems and am stuck on a doubt that when a currently running process on the processor requests for some I/O, the CPU becomes idle and the scheduler then schedules another process to execute on the CPU. How does the kernel here come to know that the CPU has become idle. Is there some kind of hardware interrupt sent by the processor?
The OS 'knows' that a CPU needs to become idle when it performs a scheduling run and has fewer ready threads than cores.
If the scheduler runs and has only two ready threads that can use CPU, but has four actual cores available, then it will direct the 'surplus' cores to an 'idle' thread that is a loop around a 'HLT', or like instruction, that causes the core to stop fetching and executing instructions until an interrupt is received.
In my option, the kernel always running on the CPU, and the kernel knows schedule which process or interrupt handler.

Which tasks correspond to the Linux kernel scheduler?

In Linux at Kernel level we have threads/tasks (belonging to Kernel and user), e.g.,
swapper: is a kernel thread (process 0), the ancestor of all processes, created from scratch during the initialization phase of Linux by the start_kernel() function. Also
init: an additional kernel thread, process 1 (init process)
HelloWorld: A thread for user program
My question is about the Kernel scheduler, that performs the following jobs:
-Schedule tasks within a fixed amount of time (i.e. context-switching)
-Calculate the timeslices dynamically (short/long vs priority based)
-Assigns process priorities dynamically (when needed)
-Monitoring the processes to its jobs
(does this include any further?)
More specifically my questions becomes: Which thread/task(s) at Kernel level correspond to the scheduler? Should it be 'scheduler' etc or Does any other task from kernel do its job?
P.S.:
"swapper" in kernel is an idle thread/task with lowest priority (halt) [1]. Does this do anything other than "step-down"?
Does Linux create a dedicated instance of scheduler for each core in multi-core system? If no, then how it does on multicore?
[1] Why do we need a swapper task in linux?
The Linux scheduler does not have a task or thread corresponding to it. The Linux scheduler code, mainly the schedule() function, is run when the timer used for scheduling issues an interrupt or when it is explicitly called in the kernel code (e.g. as part of a system call).
On multicore, the scheduler code is run independently on each core. The timer interrupt received on Core 0 is usually broadcast by using IPIs (Inter-Processor Interrupts) to the other cores. If the platform has per CPU timers, then Linux usually uses these to issue the interrupt required for scheduling instead of using IPIs.
I don't think there is a separate thread that runs the scheduler. schedule() is the function which does the scheduling. Some of the places where schedule() function may be called from:
1) A process(A user space or kernel space) which is executing the kernel code wants to wait for some event (calls any of the wait functions)
2) A process which calls schedule() function to yield the processor for itself for sometime.
3) Every time when a timer interrupt is generated.
Like this there may be many places where the schedule() function may be invoked.

Process Scheduling from Processor point of view

I understand that the scheduling is done by the kernel. Let us suppose a
process (P1) in Linux is currently executing on the processor.
Since the current process doesn't know anything about the time slice
and the kernel is currently not executing on the processor, how does the kernel schedule the next process to execute?
Is there some kind of interrupt to tell the processor to switch to execute the kernel or any other mechanism for the purpose?
In brief, it is an interrupt which gives control back to the kernel. The interrupt may appear due to any reason.
Most of the times the kernel gets control due to timer interrupt, or a key-press interrupt might wake-up the kernel.
Interrupt informing completion of IO with peripheral systems or virtually anything that changes the system state may
wake-up the kernel.
More about interrupts:
Interrupts as such are divided into top-half and bottom half. Bottom Halves are for deferring work from interrupt context.
Top-half: runs with interrupts disabled hence should be superfast, relinquish the CPU as soon as possible, usually
1) stores interrupt state flag and disables the interrupts(reset
some pin on the processor),
2) communicates with the hardware, stores state information,
delegates remaining responsibility to bottom-half,
3) restores the interrupt state flag and enables the interrupt((set
some pin on the processor).
Bottom-half: Handles the deferred work(delegated work by the top-half) runs with interrupts enabled hence may take a while before completion.
Two mechanisms are used to implement bottom-half processing.
1) Tasklets
2) Work queues
.
If timer is the interrupt to switch back to kernel, is the interrupt a hardware interrupt???
The timer interrupt of interest under our context of discussion is the hardware timer interrupt,
Inside kernel, the word timer interrupt may either mean (architecture-dependent) hardware timer interrupts or software timer interrupts.
Read this for a brief overview.
More about timers
Remeber "Timers" are an advanced topic, difficult to comprehend.
is the interrupt a hardware interrupt??? if it is a hardware
interrupt, what is the frequency of the timer?
Read Chapter 10. Timers and Time Management
if the interval of the timer is shorter than time slice, will kernel give the CPU back the same process, which was running early?
It depends upon many factors for ex: the sheduler being used, load on the system, process priorities, things like that.
The most popular CFS doesn't really depend upon the notion of time slice for preemption!
The next suitable process as picked up by CFS will get the CPU time.
The relation between timer ticks, time-slice and context switching is not so straight-forward.
Each process has its own (dynamically calculated) time slice. The kernel keeps track of the time slice used by the process.
On SMP, the CPU specific activities such as monitoring the execution time of the currently running process is done by the interrupts raised by the local APIC timer.
The local APIC timer sends an interrupt only to its processor.
However, the default time slice is defined in include/linux/sched/rt.h
Read this.
Few things could happen -
a. The current process (p1) can finish up its timeslice and then the
scheduler will check is there is any other process that could be run.
If there's no other process, the scheduler will put itself in the
idle state. The scheduler will assign p1 to the CPU if p1 is a CPU hoggy
task or p1 didn't leave the CPU voluntarily.
b. Another possibility is - a high priority task has jumped in. On every
scheduler tick, the scheduler will check if there's any process which
needs the CPU badly and is likely to preempt the current task.
In other words, a process can leave the CPU in two ways - voluntarily or involuntarily. In the first case, the process puts itself to sleep and therefore releases the CPU (case a). In the other case, a process has been preempted with a higher priority task.
(Note: This answer is based on the CFS task scheduler
of the current Linux kernel)

Resources