Scheduler for Linux kernel threads - linux

Linux includes a few privileged processes called kernel threads. Is there any scheduler which runs/suspends them? If yes, is this scheduler the same as the system scheduler (I mean the one to schedule the whole system processes)?

The Linux scheduler is scheduling tasks. These can be
kernel threads (e.g. kswapd), or
single-threaded processes (e.g. bash), or
individual threads of a multi-threaded process (e.g. some browsers or servers)
The many threads of a multi-threaded process are tasks sharing a common address space (and other things, e.g. file descriptors).
AFAIK, the scheduler does not separate kernel threads from other tasks. But the scheduler do take into account scheduling policies (sched_setscheduler(2)) and priorities (setpriority(2)) (For most kernel threads, the priority is often very high). See sched(7)

Yes ! Let me clarify the system scheduler part here.
Every task is associated with a task_struct which contains the details of each task say its pid, its name, when it recently started, priority etc etc.http://lxr.free-electrons.com/source/include/linux/sched.h#L1224
Typically depending on the priority of the task either Fair scheduler or Real time scheduler kicks in and these co exist. Just to keep it simple and not to go into details, these are different scheduler algorithms that cater to different type of tasks.
Now Kernel threads also have an associated task_struct and as #Basile Starynkevitch pointed a couple of KPI's, we can use sched_setparam KPI's to modify the sched params and change the scheduler to which the task belongs to depening on what they are about to do.

Related

How does a process schedule its own threads

After the Kernel schedules a process that has threads, How does said process schedule its own threads during its time splice?
For most modern kernels, the kernel only schedules threads, and processes are mostly just a container for the threads to execute inside (e.g. a container that contains a virtual address space, however many threads, and a few other scraps like file handles).
For some kernels (mostly very old unix kernels that existed before threads were invented) the kernel schedules processes and then a user-space library emulates threads. For this to work properly all of the "blocking" system calls (e.g. write()) have to be replaced by asynchronous system calls (e.g. aio_write()) so that other threads in the process can be given CPU time; however I wouldn't want to assume it works properly (e.g. if any thread blocks, then maybe all threads in the process block).
Also it may not work when there's multiple CPUs (kernel gives a process one CPU, but then from the kernel's perspective that process is running and can't use a second CPU). There are sophisticated work-arounds for this (to support "M:N threading") but it's just easier and better to fix the scheduler so it works with threads. Fortunately/unfortunately this didn't matter much in the early days because very few computers had more than one CPU anyway.
Lastly; it doesn't work for thread priorities - e.g. one process might keep CPU busy executing an unimportant/low priority thread while another process doesn't get that CPU time when it desperately needs it for an important/high priority thread. This occurs because no process knows about threads belonging to other processes and the kernel only knows about processes and not threads.
Of course these are also the reasons why every kernel adopted "kernel schedules threads and not processes" (and those that didn't died).
It's down to jargon definitions, but threads are simply a bunch of processes sharing an address space. Older Unixes even called them Light Weight Processes.
With that classical understanding of threads, the answer is that, these days, it's the OS that does the scheduling and each thread gets its own timeslices.
Extras
Some OSes do things to "the whole process" - e.g. Windows will give the process that has mouse focus a priority boost (all it's threads get dynamically notched up a few priority places), to make that application appear to be more sprightly (this goes back to Windows 3).
Other operating systems will increase the priority of a thread dynamically, to solve priority inversion situations. This is where a low priority thread that has control of a resource (I/O, or perhaps a semaphore) is blocking a higher priority thread from running (because the resource is not available. This is the priority inversion, and it's solved by the OS boosting the priority of the blocking thread until it gives up the required resource.
Either the kernel schedules the threads or the kernel schedules processes simulates thread by scheduling it own threads.
Usually, the process schedules its own threads using a library that sets timers. When the timer handler saves the current "thread's" registers then loads a new set of registers from another "thread."

Can kernel schedule user level threads of same process on different cores?

As far as I know kernel doesn't know whether it is executing a user thread or user process because for kernel user threads are user process, it only schedules user processes and doesn't care which thread was running in that process.
I have one more question, Is there per core ready queue or a single ready queue for all the cores?
I was reading this paper and it is written that
In the stock Linux kernel the set of runnable threads is partitioned
into mostly-private per core scheduling queues; in the common case,
each core only reads, writes, and locks its own queue.
The linux kernel scheduler uses the "task" as its primary schedulable entity. This corresponds to a user-space thread. For a traditional simple Unix-style program, there is only a single thread in the process and so the distinction can be ignored. Other programs of course may have multiple threads. But in all cases, the kernel only schedules tasks (i.e. threads).
Your terminology above therefore doesn't really match the situation. The kernel doesn't really care whether the different threads it schedules are part of the same process or different processes: each thread can be scheduled independently. You can have multiple threads from the same process running on different processors/cores at the same time.
Yes, there are separate run queues for each core.
The paper you reference is, I think, slightly misleading in its phrasing. In particular, saying that the "set of runnable threads is partitioned into..." doesn't give quite the right meaning; that makes it sound like the threads are divided into multiple groups that are then assigned to different cores and can only be executed there. It would be more accurate to say that there is a separate run queue for each core containing a set of threads waiting to execute, and in common use, the scheduler doesn't need to reference the queues for other cores.
But in fact, threads can migrate from one core to another. For example, if there is a thread waiting to run on core A (hence in core A's run queue), but core A is already busy running some other thread, and there is another core that is not busy, the waiting thread may be migrated to that other core and executed there. (This is an oversimplification of course as there are other factors that go into deciding whether/when to migrate a thread.)

How Linux scheduler schedules processes on multi-core processors?

Multi-core processors exploits thread level parallelism, it means that multiple threads runs in parallel. Suppose, a process has only one thread then, do the other cores remain idle during the execution of this process? In linux system, scheduler consider processes and threads both as a task. It doesn't differentiate between process and thread while scheduling it. So, does this means that different cores executes different threads of different processes in parallel?
When context-switch happens, does this happen only for one core or for all the cores of the cpu?
You are right: processes and threads are the same from the Linux scheduler's point of view. These tasks are queued according to the scheduler's rules and wait for their turn.
There are scheduling rules such as priority or CPU affinity (to prevent a thread to migrate to another core and preserve cache data).
A context switch may happen on a core every fixed amount of time (a time slice) because the CPU automatically runs some kernel code periodically to permit preemption. Depending on the scheduler's rules, a task can be run for many time slices. A context switch can also occur when a thread calls functions that makes it unrunnable (eg. waiting for IO).
In some cases, if not all, there is one scheduling process per core which does all that.
There is also a similar question on superuser

Does linux schedule a process or a thread?

After reading this SO question I got a few doubts. Please help in understanding.
Scheduling involves deciding when to run a process and for what quantum of time.
Does linux kernel schedule a thread or a process? As process and thread are not differentiated inside kernel how a scheduler treats them?
How quantum for each thread is decided?
a. If a quantum of time (say 100us) is decided for a process is that getting shared between all the threads of the process? or
b. A quantum for each thread is decided by the scheduler?
Note: Questions 1 and 2 are related and may look the same but just wanted to be clear on how things are working posted them both here.
The Linux scheduler (on recent Linux kernels, e.g. 3.0 at least) is scheduling schedulable tasks or simply tasks.
A task may be :
a single-threaded process (e.g. created by fork without any thread library)
any thread inside a multi-threaded process (including its main thread), in particular Posix threads (pthreads)
kernel tasks, which are started internally in the kernel and stay in kernel land (e.g. kworker, nfsiod, kjournald , kauditd, kswapd etc etc...)
In other words, threads inside multi-threaded processes are scheduled like non-threaded -i.e. single threaded- processes.
The low-level clone(2) syscall creates user-land schedulable tasks (and can be used both for creating fork-ed process or for implementation of thread libraries, like pthread). Unless you are a low-level thread library implementor, you don't want to use clone directly.
AFAIK, for multi-threaded processes, the kernel is (almost) not scheduling the process, but each individual thread inside (including the main thread).
Actually, there is some notion of thread groups and affinity in the scheduling, but I don't know them well
These days, processors have generally more than one core, and each core is running a task (at some given instant) so you do have several tasks running in parallel.
CPU quantum times are given to tasks, not to processes
The NPTL implementation of POSIX thread specifications sees thread as a different process inside kernel, having unique task_struct (and therefore pid too) so each thread is schedulable in itself as mentioned. Therefore each thread gets its own timeslice and is scheduled just like processes as mentioned above.
Just to add, Currently Linux scheduler is also capable of scheduling not only single tasks ( a simple process), but groups of processes or even users ( all processes, belonging to a user) as a whole. This allows implementing of group scheduling, where CPU time is first divided between process groups and then distributed within those groups to single threads.
Linux threads does not directly operate on processes or threads, but works with schedulable entities. Represented by struct sched_entity.
It's fair to say that every process/thread is a sched_entity but the converse might not be true.
To know detailed process scheduling, refer here

Task in vxworks

When we doing taskSpawn a task is creating in vxworks. What is actually a task. Is there any relation with thread.
In my understanding vxworks is thread based Operating system.
Can some one please help me the real difference between task/thread/process in real scenario.
Somewhere I saw task is the execution of set of instruction. If it is like that then thread also have some set of instruction so can we call thread as task.
Please help
Thread is a concept typically used with an OS supporting process models (Unix/Linux/Windows) where you run a process.
This process could have a single thread of execution (like a simple C program). Or you could create multiple threads to perform certain operations in parallel within the current process memory space.
With older vxWorks, there was no process model. Everything would run in the same memory space. A vxWorks tasks provides the context where the system code would execute. All code (with the exception of interrupt handlers) will execute in the context of a Task.
Tasks are independent execution units. They can share resources, have common memory, etc... but the scheduler executes the tasks based on very specific criteria. Typically, the highest priority task in the system is the task that will be executing at any given time.
Once a task is done/sleeps/blocked waiting for resources, then the next highest priority task in the system will run.
For your purpose, you can probably think of the task as a thread.
A task is abstract concept in OS design. A task is a single context of execution. A task has a memory space it operates in where its data and code is stored. This memory space may or may not be shared with other tasks. A task has a state (e.g. running, stopped, killed...), it (usually) has a stack. A task has a priority over other tasks.
On example of such a task, is a VxWorks task. Another is a Linux thread.
In Linux (and I believe also in latest version of VxWorks btw), there exists a concept of a related group of tasks. Tasks belonging to the same group share memory space and several other resources (e.g. file handlers). A Linux process is such a group of tasks.
By an large, the OS scheduler schedules tasks and not processes. The process is a convective abstraction for the programmer to think about group of related threads together.
I hope that helped.
In vxWorks, tasks is a runnable unit.
Task have TCB (Task Control Block) with a unique task space and specific priority (as you defined in the taskSpawn function).
The vxWorks scheduler can run only task, this is the minimum runnable unit (the scheduler can run the kernel itself and interrupt can run in the system).
The decision which task to run will base on the task state (must be in READY), and the task priority (in vxWorks, the highest priority is the lower number).
Note that several tasks might be in the same priority and then the kernel will run different tasks according the scheme you configured (FIFO or round robin).
In vxWorks, all tasks have the same memory space (including Kernel memory space). This is the reason that WindRiver added the "Process like" mechanism from vxWorks 6.x. Process have its own "virtual memory space" that protected by MMU.
Just to summery it for you:
Tasks have the same memory space over the system.
Threads have the same memory space within their process.
Process memory space protected by MMU.
task and threads are similar to process. but the difference is threads dont have seperate memory space for their own they run under the pcb(stack) of the process itself.but whereas,task has its own stack area and is a light weighted process i.e.,tcb is much smaller when compared to pcb so context switching or task switching can happen faster .
since vxworks deals with rtos and switching latency should be very less ,it deals with tasks.
In addition to the existing anwers:
If you ever need to create POSIX threads on your VxWorks system (which is possible by including POSIX in the kernel config and calling pthread_create() ), you will notice that those threads will appear as tasks in you task list (type 'i' in C shell).
Hence, tasks and threads are very much alike. VxWorks even wraps POSIX threads as tasks so they can be handled in parallel to existing native tasks.

Resources