Linux process and threads scheduling - linux

I'm playing around with linux scheduling with sched.h and bump into some questions.
From what I know, Linux scheduler do not treat threads and processes differently when scheduling. Threads are just like process which shares common resources.
Ok, Say I have Process-A and it is set to CPU core 0 and SCHED_FIFO property is set; which will runs until another higher priority tasks kicks in. If Process-A creates a new thread, will the thread inherit the same property? (i.e. be bind to CPU 0 with SCHED_FIFO properties or will it be DEFAULT?)
Thanks!

You can actually test this with a simple program, but from various man pages:
sched_setaffinity:
A child created via fork(2) inherits its parent's CPU affinity mask.
The affinity mask is preserved across an execve(2).
pthread_create:
The new thread inherits copies of the calling thread's capability sets
(see
capabilities(7)) and CPU affinity mask (see sched_setaffinity(2)).
sched_setscheduler:
Child processes inherit the scheduling policy and parameters across a
fork(2).
The scheduling policy and parameters are preserved across execve(2).

Related

How to bind a process to a set of cpu in golang?

I use os/exec pkg to have a process run. I want to check it cpu affinity and modify it to bind the process to a specific cpu set. I find
func SchedSetaffinity(pid int, set *CPUSet) error
This function is in golang.org/x/sys/unix package. However, it says it just bind a thread to a specific cpu. I don't know wheter it works on process. And I wonder how to get the CPUSet. Is it a value I need to define?
Taskset : To enable a process run on a specific CPU, you use the command 'taskset' in linux. Accordingly you can arrive on a logic based on "taskset -p [mask] [pid]" where the mask represents the cores in which the particular process shall run, provided the whole program runs with GOMAXPROCS=1.
pthread_setaffinity_np : You can use cgo and arrive on a logic that calls pthread_setaffinity_np, as Go uses pthreads in cgo mode. (The pthread_attr_setaffinity_np() function sets the CPU affinity mask attribute of the thread attributes object referred to by attr to the value specified in cpuset. )
Go helps in incorporation of affinity control via "SchedSetaffinity" that can be checked for confining a thread to specific cores. Accordingly , you can arrive on a logic for usage of "SchedSetaffinity(pid int, set *CPUSet)" that sets the CPU affinity mask of the thread specified by pid. If pid is 0 the calling thread is used.
It should be noted that GOMAXPROCS variable limits the number of operating system threads that can execute user-level Go code simultaneously. If it is > 1 then, you may use runtime.LockOSThread of Go that shall pin the current goroutine to the current thread that is is running on . The calling goroutine will always execute in that thread, and no other goroutine will execute in it, until the calling goroutine has made as many calls to UnlockOSThread as to LockOSThread.
cgroups : There is also option of using cgroups that helps in organizing the processes hierarchically and distribution of system resources along the hierarchy in a controlled and configurable manner. Here, there is subsystem termed as cpuset that enables assigning individual CPUs (on a multicore system) and memory nodes to process in a cgroup. The cpuset lists CPUs to be used by tasks within this cgroup. The CPU numbers are comma-separated numbers or ranges. For example:
#cat cpuset.cpus
0-4,6,8-10
A process is confined to run only on the CPUs in the cpuset it belongs to, and to allocate memory only on the memory nodes in that cpuset. It should be noted that all processes are put in the cgroup that the parent process belongs to at the time on creation and a process can be migrated to another cgroup. Migration of a process doesn't affect already existing descendant processes.

How does a process schedule its own threads

After the Kernel schedules a process that has threads, How does said process schedule its own threads during its time splice?
For most modern kernels, the kernel only schedules threads, and processes are mostly just a container for the threads to execute inside (e.g. a container that contains a virtual address space, however many threads, and a few other scraps like file handles).
For some kernels (mostly very old unix kernels that existed before threads were invented) the kernel schedules processes and then a user-space library emulates threads. For this to work properly all of the "blocking" system calls (e.g. write()) have to be replaced by asynchronous system calls (e.g. aio_write()) so that other threads in the process can be given CPU time; however I wouldn't want to assume it works properly (e.g. if any thread blocks, then maybe all threads in the process block).
Also it may not work when there's multiple CPUs (kernel gives a process one CPU, but then from the kernel's perspective that process is running and can't use a second CPU). There are sophisticated work-arounds for this (to support "M:N threading") but it's just easier and better to fix the scheduler so it works with threads. Fortunately/unfortunately this didn't matter much in the early days because very few computers had more than one CPU anyway.
Lastly; it doesn't work for thread priorities - e.g. one process might keep CPU busy executing an unimportant/low priority thread while another process doesn't get that CPU time when it desperately needs it for an important/high priority thread. This occurs because no process knows about threads belonging to other processes and the kernel only knows about processes and not threads.
Of course these are also the reasons why every kernel adopted "kernel schedules threads and not processes" (and those that didn't died).
It's down to jargon definitions, but threads are simply a bunch of processes sharing an address space. Older Unixes even called them Light Weight Processes.
With that classical understanding of threads, the answer is that, these days, it's the OS that does the scheduling and each thread gets its own timeslices.
Extras
Some OSes do things to "the whole process" - e.g. Windows will give the process that has mouse focus a priority boost (all it's threads get dynamically notched up a few priority places), to make that application appear to be more sprightly (this goes back to Windows 3).
Other operating systems will increase the priority of a thread dynamically, to solve priority inversion situations. This is where a low priority thread that has control of a resource (I/O, or perhaps a semaphore) is blocking a higher priority thread from running (because the resource is not available. This is the priority inversion, and it's solved by the OS boosting the priority of the blocking thread until it gives up the required resource.
Either the kernel schedules the threads or the kernel schedules processes simulates thread by scheduling it own threads.
Usually, the process schedules its own threads using a library that sets timers. When the timer handler saves the current "thread's" registers then loads a new set of registers from another "thread."

Scheduler for Linux kernel threads

Linux includes a few privileged processes called kernel threads. Is there any scheduler which runs/suspends them? If yes, is this scheduler the same as the system scheduler (I mean the one to schedule the whole system processes)?
The Linux scheduler is scheduling tasks. These can be
kernel threads (e.g. kswapd), or
single-threaded processes (e.g. bash), or
individual threads of a multi-threaded process (e.g. some browsers or servers)
The many threads of a multi-threaded process are tasks sharing a common address space (and other things, e.g. file descriptors).
AFAIK, the scheduler does not separate kernel threads from other tasks. But the scheduler do take into account scheduling policies (sched_setscheduler(2)) and priorities (setpriority(2)) (For most kernel threads, the priority is often very high). See sched(7)
Yes ! Let me clarify the system scheduler part here.
Every task is associated with a task_struct which contains the details of each task say its pid, its name, when it recently started, priority etc etc.http://lxr.free-electrons.com/source/include/linux/sched.h#L1224
Typically depending on the priority of the task either Fair scheduler or Real time scheduler kicks in and these co exist. Just to keep it simple and not to go into details, these are different scheduler algorithms that cater to different type of tasks.
Now Kernel threads also have an associated task_struct and as #Basile Starynkevitch pointed a couple of KPI's, we can use sched_setparam KPI's to modify the sched params and change the scheduler to which the task belongs to depening on what they are about to do.

Does linux schedule a process or a thread?

After reading this SO question I got a few doubts. Please help in understanding.
Scheduling involves deciding when to run a process and for what quantum of time.
Does linux kernel schedule a thread or a process? As process and thread are not differentiated inside kernel how a scheduler treats them?
How quantum for each thread is decided?
a. If a quantum of time (say 100us) is decided for a process is that getting shared between all the threads of the process? or
b. A quantum for each thread is decided by the scheduler?
Note: Questions 1 and 2 are related and may look the same but just wanted to be clear on how things are working posted them both here.
The Linux scheduler (on recent Linux kernels, e.g. 3.0 at least) is scheduling schedulable tasks or simply tasks.
A task may be :
a single-threaded process (e.g. created by fork without any thread library)
any thread inside a multi-threaded process (including its main thread), in particular Posix threads (pthreads)
kernel tasks, which are started internally in the kernel and stay in kernel land (e.g. kworker, nfsiod, kjournald , kauditd, kswapd etc etc...)
In other words, threads inside multi-threaded processes are scheduled like non-threaded -i.e. single threaded- processes.
The low-level clone(2) syscall creates user-land schedulable tasks (and can be used both for creating fork-ed process or for implementation of thread libraries, like pthread). Unless you are a low-level thread library implementor, you don't want to use clone directly.
AFAIK, for multi-threaded processes, the kernel is (almost) not scheduling the process, but each individual thread inside (including the main thread).
Actually, there is some notion of thread groups and affinity in the scheduling, but I don't know them well
These days, processors have generally more than one core, and each core is running a task (at some given instant) so you do have several tasks running in parallel.
CPU quantum times are given to tasks, not to processes
The NPTL implementation of POSIX thread specifications sees thread as a different process inside kernel, having unique task_struct (and therefore pid too) so each thread is schedulable in itself as mentioned. Therefore each thread gets its own timeslice and is scheduled just like processes as mentioned above.
Just to add, Currently Linux scheduler is also capable of scheduling not only single tasks ( a simple process), but groups of processes or even users ( all processes, belonging to a user) as a whole. This allows implementing of group scheduling, where CPU time is first divided between process groups and then distributed within those groups to single threads.
Linux threads does not directly operate on processes or threads, but works with schedulable entities. Represented by struct sched_entity.
It's fair to say that every process/thread is a sched_entity but the converse might not be true.
To know detailed process scheduling, refer here

Process Priority vs Thread Priority

In Linux, a process is a set of threads. Each thread has its own priority! But does a process have a priority too? If so, how is it different from the thread priority? And when a new process is created, how are these values propagated?
Linux implements (kernel level) Threads essentially as Processes. So you fall back to the good old process-priorities there.
See NPTL and nice (for understanding that processes are the first ones to have priorities). Mostly defaults are applied - in case of threads, the thread is a copy, so its priorities should be copied too. Will certainly vary with varying schedulers.

Resources