Linux Threads and process - CPU affinity

Linux Threads and process - CPU affinity - linux

I have few queries related to threads and Process scheduling.
When my process goes into sleep and wakes back, is it always that it will be scheduled on the same CPU that it got scheduled before?
When i create a thread from the process, Will it also be executed on the same CPU always? Even if other CPU's are free and sleeping.
I would like to know the mechanism in Linux in specific. Also i am creating the threads through pthread library. I am facing a random hangup issue which is always not reproducible. Need this information to proceed in the right direction.

On single processor/core systems
Yes
Yes
on multi processor/core systems
No.
No.
use taskset to retrieve or set a processes’s CPU affinity on multicore systems. Setting the CPU affinity to a specific processor/core will change the answers to
Yes
Yes
also for multicore systems.
From within an application you may use sched_setaffinity and/or sched_getaffinity to adjust the CPU affinity.
Edit: Additional details about how/when CPU swaps are managed with respect to cache disadvantages:
The Linux/SMP Scheduler: "... In order to achieve good system performance, Linux/SMP (2.4 kernel) adopts an empirical rule to solve the dilemma ..." Read the details in the linked reference, section The Linux/SMP Scheduler.
For the newer CFS (Completely Fair Scheduler) you'd look at sched_migration_cost. "...if the real runtime of the task is smaller than the values of this parameter then the scheduler assumes that it is still in the cache and tries to avoid moving the task to another CPU during the load balancing procedure ..." (e.g.: Completely Fair Scheduler and its tuning).

when process goes in to sleep and when it wake up ,it is not necessary that it will schedule on same cpu.if u have multiprocessor environment then according to scheduler policy it will schedule on any cpu.When process goes to sleep there are different reason ,it goes to sleep beacause it is waiting for io,any resource.When event will occurs it goes from waiting state to ready state.At that time which cpu will be free scheduler will schedule that process on free cpu.It is not necessary it will schedule on same cpu.
for extra information about scheduler open source code of scheduler in linux release tree path.

Related

How linux kernel scheduling works on multi core processor?

I recently started reading Robert Love's book "Linux Kernel Development 3rd edition" and dived into the scheduler part, which left me with lots of questions.
So first off, I understood there are 2 cases where the scheduler changes the task that currently runs (Correct me if I'm not precise), either by a task that willingly requested to re-schedule since it blocks on some I/O or sleeps, or a timer interrupt that caused the cpu to jump to scheduler code and preempt the current task if it's interruptible.
Does each core in a multicore processor get the interrupt that is related to re-scheduling? Do they each have a different timer, or say there is one interrupt that in some type of algorithm picks a specific core to handle it each time?
Assuming not only one core re-schedules each interrupt (since then I would imagine it might take a while to swap processes on all of the cores), what happens if two cores re-schedule at the same time? Because, I assume that when you run the schedule function the task-list must be locked, and then I'd imagine a few cores re-scheduling their current task simultaneously resulting in only one core actually doing scheduling work and all of the other cores waiting on the task-list lock.
Not only that the task-list lock is required to touch the actual task-list and say change tasks state or run-queue order, what if one core that schedules currently calculates which task should be run next and meanwhile another core finishes scheduling successfully which causes the first core calculation to be totally mistaken since the successful re-scheduling just heavily changed the system state?
I understood that in linux priority is divided to "nice value" which is -20 to 19 (higher means less priority and more "nice") and real-time priority (0-99). real-time priority values matter only for a couple of scheduling policies, and each process can register to a different scheduling policy.
Does the real-time policies always beat processes that are not registered to real-time policies? Meaning if I run a real-time process I will never get to execute normal processes? How are the "nice" values of normal processes and real-time priority values of real-time processes work together in the scheduler algorithm?

What do we mean by "Non-preemptive Kernel"? [duplicate]

I read that Linux kernel is preemptive, which is different from most Unix kernels. So, what does it really mean for a kernal to be preemptive?
Some analogies or examples would be better than pure theoretical explanation.
ADD 1 -- 11:00 AM 12/7/2018
Preemptive is just one paradigm of multi-tasking. There are others like Cooperative Multi-tasking. A better understanding can be achieved by comparing them.

Prior to Linux kernel version 2.5.4, Linux Kernel was not preemptive which means a process running in kernel mode cannot be moved out of processor until it itself leaves the processor or it starts waiting for some input output operation to get complete.
Generally a process in user mode can enter into kernel mode using system calls. Previously when the kernel was non-preemptive, a lower priority process could priority invert a higher priority process by denying it access to the processor by repeatedly calling system calls and remaining in the kernel mode. Even if the lower priority process' timeslice expired, it would continue running until it completed its work in the kernel or voluntarily relinquished control. If the higher priority process waiting to run is a text editor in which the user is typing or an MP3 player ready to refill its audio buffer, the result is poor interactive performance. This way non-preemptive kernel was a major drawback at that time.

Imagine the simple view of preemptive multi-tasking. We have two user tasks, both of which are running all the time without using any I/O or performing kernel calls. Those two tasks don't have to do anything special to be able to run on a multi-tasking operating system. The kernel, typically based on a timer interrupt, simply decides that it's time for one task to pause to let another one run. The task in question is completely unaware that anything happened.
However, most tasks make occasional requests of the kernel via syscalls. When this happens, the same user context exists, but the CPU is running kernel code on behalf of that task.
Older Linux kernels would never allow preemption of a task while it was busy running kernel code. (Note that I/O operations always voluntarily re-schedule. I'm talking about a case where the kernel code has some CPU-intensive operation like sorting a list.)
If the system allows that task to be preempted while it is running kernel code, then we have what is called a "preemptive kernel." Such a system is immune to unpredictable delays that can be encountered during syscalls, so it might be better suited for embedded or real-time tasks.
For example, if on a particular CPU there are two tasks available, and one takes a syscall that takes 5ms to complete, and the other is an MP3 player application that needs to feed the audio pipe every 2ms, you might hear stuttering audio.
The argument against preemption is that all kernel code that might be called in task context must be able to survive preemption-- there's a lot of poor device driver code, for example, that might be better off if it's always able to complete an operation before allowing some other task to run on that processor. (With multi-processor systems the rule rather than the exception these days, all kernel code must be re-entrant, so that argument isn't as relevant today.) Additionally, if the same goal could be met by improving the syscalls with bad latency, perhaps preemption is unnecessary.
A compromise is CONFIG_PREEMPT_VOLUNTARY, which allows a task-switch at certain points inside the kernel, but not everywhere. If there are only a small number of places where kernel code might get bogged down, this is a cheap way of reducing latency while keeping the complexity manageable.

Traditional unix kernels had a single lock, which was held by a thread while kernel code was running. Therefore no other kernel code could interrupt that thread.
This made designing the kernel easier, since you knew that while one thread using kernel resources, no other thread was. Therefore the different threads cannot mess up each others work.
In single processor systems this doesn't cause too many problems.
However in multiprocessor systems, you could have a situation where several threads on different processors or cores all wanted to run kernel code at the same time. This means that depending on the type of workload, you could have lots of processors, but all of them spend most of their time waiting for each other.
In Linux 2.6, the kernel resources were divided up into much smaller units, protected by individual locks, and the kernel code was reviewed to make sure that locks were only held while the corresponding resources were in use. So now different processors only have to wait for each other if they want access to the same resource (for example hardware resource).

The preemption allows the kernel to give the IMPRESSION of parallelism: you've got only one processor (let's say a decade ago), but you feel like all your processes are running simulaneously. That's because the kernel preempts (ie, take the execution out of) the execution from one process to give it to the next one (maybe according to their priority).
EDIT Not preemptive kernels wait for processes to give back the hand (ie, during syscalls), so if your process computes a lot of data and doesn't call any kind of yield function, the other processes won't be able to execute to execute their calls. Such systems are said to be cooperative because they ask for the cooperation of the processes to ensure the equity of the execution time
EDIT 2 The main goal of preemption is to improve the reactivity of the system among multiple tasks, so that's good for end-users, whereas on the other-hand, servers want to achieve the highest througput, so they don't need it: (from the Linux kernel configuration)
Preemptible kernel (low-latency desktop)
Voluntary kernel preemption (desktop)
No forced preemption (server)

The linux kernel is monolithic and give a little computing timespan to all the running process sequentially. It means that the processes (eg. the programs) do not run concurrently, but they are given a give timespan regularly to execute their logic. The main problem is that some logic can take longer to terminate and prevent the kernel to allow time for the next process. This results in system "lags".
A preemtive kernel has the ability to switch context. It means that it can stop a "hanging" process even if it is not finished, and give the computing time to the next process as expected. The "hanging" process will continue to execute when its time has come without any problem.
Practically, it means that the kernel has the ability to achieve tasks in realtime, which is particularly interesting for audio recording and editing.
The ubuntu studio districution packages a preemptive kernel as well as a buch of quality free software devoted to audio and video edition.

It means that the operating system scheduler is free to suspend the execution of the running processes to give the CPU to another process whenever it wants; the normal way to do this is to give to each process that is waiting for the CPU a "quantum" of CPU time to run. After it has expired the scheduler takes back the control (and the running process cannot avoid this) to give another quantum to another process.
This method is often compared with the cooperative multitasking, in which processes keep the CPU for all the time they need, without being interrupted, and to let other applications run they have to call explicitly some kind of "yield" function; naturally, to avoid giving the feeling of the system being stuck, well-behaved applications will yield the CPU often. Still,if there's a bug in an application (e.g. an infinite loop without yield calls) the whole system will hang, since the CPU is completely kept by the faulty program.
Almost all recent desktop OSes use preemptive multitasking, that, even if it's more expensive in terms of resources, is in general more stable (it's more difficult for a sigle faulty app to hang the whole system, since the OS is always in control). On the other hand, when the resources are tight and the application are expected to be well-behaved, cooperative multitasking is used. Windows 3 was a cooperative multitasking OS; a more recent example can be RockBox, an opensource PMP firmware replacement.

I think everyone did a good job of explaining this but I'm just gonna add little more info. in context of Linux IRQ, interrupt and kernel scheduler.
Process scheduler is the component of the OS that is responsible for deciding if current running job/process should continue to run and if not which process should run next.
preemptive scheduler is a scheduler which allows to be interrupted and a running process then can change it's state and then let another process to run (since the current one was interrupted).
On the other hand, non-preemptive scheduler can't take away CPU away from a process (aka cooperative)
FYI, the name word "cooperative" can be confusing because the word's meaning does not clearly indicate what scheduler actually does.
For example, Older Windows like 3.1 had cooperative schedulers.
Full credit to wonderful article here

I think it became preemptive from 2.6. preemptive means when a new process is ready to run, the cpu will be allocated to the new process, it doesn't need the running process be co-operative and give up the cpu.

Linux kernel is preemptive means that The kernel supports preemption.
For example, there are two processes P1(higher priority) and P2(lower priority) which are doing read system calls and they are running in kernel mode. Suppose P2 is running and is in the kernel mode and P2 is scheduled to run.
If kernel preemption is available, then preemption can happen at the kernel level i.e P2 can get preempted and but to sleep and the P1 can continue to run.
If kernel preemption is not available, since P2 is in kernel mode, system simply waits till P2 is complete and then

Process with multiple threads on multiprocessor system. How do they work?

So I was reading about Processes and Threads and I had a question. Following is the scenario.
Uniprocessor Environment
I understand that the OS rotates the processes over processor for a particular time period.(quantum) . Now I get it when the process is single threaded, ie just one path of execution. In that case, whenever it is assigned the processor, it continues with it's execution. Let's say the process forks and or just creates a new thread. Now how does the entire process works? Is it that the OS will say to process P "Go on, continue with execution" and the Process within itself will pick the new thread or the parent thread on rotation? So that if there are more than two threads, the rotation seems fair to each thread. Or does the OS actually interacts with the threads? (In that case I am not sure what happens).
Multiprocessor Environment
Now say I have a multiprocessor environment. Now in this case, if there was just uni-threaded process, then OS will assign either of the processors to it and on it will go with it's execution. Now say, there are multiple threads in the Process. Now if I assign one of the processor to the process, and ask it to continue it's execution, and the Process has to pick either of the thread for it's execution, then there never will be parallel processing going on in that specific process. Since the process will have to put either of it's threads on the processor.
So how does it happen in both the cases?
Cheers.

Process Scheduing
Operating Systems ultimately control these types of thread scheduling.
Windows systems are priority-based and so will allow a process to consume more resources that others. This is why your machine can 'hang', if a process has been escalated to a high priority. Priorities are ranged between 1-31 as far as I know.
Mac OS / Linux / Unix are time-based, allowing all processes to have equal amounts of CPU time. Therefore loading more processes will slow your system down as they all share a smaller slice of execution time.
Uniprocessor Environment
The OS is ultimately responsible for this but switching processes involves (I cannot guarantee accuracy here, but its just an indication):
Halting a process / thread
Storing the current stack (code location)
Storing the current registers of the CPU
Asking the kernel for the next process/thread to run
Kernel indicates which one has to be run
OS reloads the registers from the cache
OS reloads the current stack for the next application.
Resumes the process
Obviously the more threads and processes you have running, the slower it will become. The problem is that the time taken to switch processes can actually take longer than the time allowed to execute the process.
Threads are just child processes of a single process. For a single processor, it just looks like additional work.
Multi-processor Environment
Multi-processor environments work differently as the cache is shared amongst processors. I believe these are called L1 (Level) and L2 caches. So the difference is that processor A can reload the state stored by processor B without conflicts. 'Hyper-threading' also has the same approach, although this is processor specific. The difference here is that a processor could solely control a specific process - this is called 'CPU Affinity' Its not encouraged for every process, but it does allow an application to have a dedicated processor to work off.

This is OS-specific, of course, but most operating systems schedule at the thread level. A process is just a grouping of threads. For example, on Linux, threads are called "tasks" and each is scheduled independently. They are created with the clone call. What is typically called a thread is a task which shares its address space (and other resources such as file descriptors, mount points, etc.) with the creating task. Note that the clone call can also create what is typically called a process if the flags to enable sharing are not passed.
Considering the above, any thread may be scheduled at any time on any processor, no matter how many processors there are available. That said, most OSs also attempt to maintain some measure of processor affinity to avoid excessive cache misses, but usually if a thread is runnable and a different CPU is available, it will change CPUs. Often there is also a way to specify which CPUs a particular thread may execute upon.

Doesn't matter whether there is 1 or 128 processors. The OS manages access to resources to try an efficiently match up requests with availabilty, and that includes CPU execution. If a thread is running, it has already managed to get some CPU but, if it requests a resource that is not immediately available, it no longer needs any CPU until that other resource does become free, and so the OS will remove CPU execution from it and, if there is another thread that is waiting for CPU, it will hand it over. When the requested reource does become available, the thread will be made ready again. If there is a core free, it will be made running 'immediately', if not, the CPU scheduling algorithm makes a decision on whether to stop a currently-running thread to free up a core or to leave the newly-ready thrad waiting.
It's better to try and ignore things like 'time-slice, quantum, priority' - it causes much confusion and FUD. If a running thread wants something it cannot have yet, it doesn't need any more CPU cycles, and the OS will take them away and, if another thread needs it, apply them there. That is why preemptive multitaskers exist - to match up threads with resources in an attempt to maximize forward progress.

How does the OS scheduler regain control of CPU?

I recently started to learn how the CPU and the operating system works, and I am a bit confused about the operation of a single-CPU machine with an operating system that provides multitasking.
Supposing my machine has a single CPU, this would mean that, at any given time, only one process could be running.
Now, I can only assume that the scheduler used by the operating system to control the access to the precious CPU time is also a process.
Thus, in this machine, either the user process or the scheduling system process is running at any given point in time, but not both.
So here's a question:
Once the scheduler gives up control of the CPU to another process, how can it regain CPU time to run itself again to do its scheduling work? I mean, if any given process currently running does not yield the CPU, how could the scheduler itself ever run again and ensure proper multitasking?
So far, I had been thinking, well, if the user process requests an I/O operation through a system call, then in the system call we could ensure the scheduler is allocated some CPU time again. But I am not even sure if this works in this way.
On the other hand, if the user process in question were inherently CPU-bound, then, from this point of view, it could run forever, never letting other processes, not even the scheduler run again.
Supposing time-sliced scheduling, I have no idea how the scheduler could slice the time for the execution of another process when it is not even running?
I would really appreciate any insight or references that you can provide in this regard.

The OS sets up a hardware timer (Programmable interval timer or PIT) that generates an interrupt every N milliseconds. That interrupt is delivered to the kernel and user-code is interrupted.
It works like any other hardware interrupt. For example your disk will force a switch to the kernel when it has completed an IO.

Google "interrupts". Interrupts are at the centre of multithreading, preemptive kernels like Linux/Windows. With no interrupts, the OS will never do anything.
While investigating/learning, try to ignore any explanations that mention "timer interrupt", "round-robin" and "time-slice", or "quantum" in the first paragraph – they are dangerously misleading, if not actually wrong.
Interrupts, in OS terms, come in two flavours:
Hardware interrupts – those initiated by an actual hardware signal from a peripheral device. These can happen at (nearly) any time and switch execution from whatever thread might be running to code in a driver.
Software interrupts – those initiated by OS calls from currently running threads.
Either interrupt may request the scheduler to make threads that were waiting ready/running or cause threads that were waiting/running to be preempted.
The most important interrupts are those hardware interrupts from peripheral drivers – those that make threads ready that were waiting on IO from disks, NIC cards, mice, keyboards, USB etc. The overriding reason for using preemptive kernels, and all the problems of locking, synchronization, signaling etc., is that such systems have very good IO performance because hardware peripherals can rapidly make threads ready/running that were waiting for data from that hardware, without any latency resulting from threads that do not yield, or waiting for a periodic timer reschedule.
The hardware timer interrupt that causes periodic scheduling runs is important because many system calls have timeouts in case, say, a response from a peripheral takes longer than it should.
On multicore systems the OS has an interprocessor driver that can cause a hardware interrupt on other cores, allowing the OS to interrupt/schedule/dispatch threads onto multiple cores.
On seriously overloaded boxes, or those running CPU-intensive apps (a small minority), the OS can use the periodic timer interrupts, and the resulting scheduling, to cycle through a set of ready threads that is larger than the number of available cores, and allow each a share of available CPU resources. On most systems this happens rarely and is of little importance.
Every time I see "quantum", "give up the remainder of their time-slice", "round-robin" and similar, I just cringe...

To complement #usr's answer, quoting from Understanding the Linux Kernel:
The schedule( ) Function
schedule( ) implements the scheduler. Its objective is to find a
process in the runqueue list and then assign the CPU to it. It is
invoked, directly or in a lazy way, by several kernel routines.
[...]
Lazy invocation
The scheduler can also be invoked in a lazy way by setting the
need_resched field of current [process] to 1. Since a check on the value of this
field is always made before resuming the execution of a User Mode
process (see the section "Returning from Interrupts and Exceptions" in
Chapter 4), schedule( ) will definitely be invoked at some close
future time.

What does it mean to say "linux kernel is preemptive"?

I read that Linux kernel is preemptive, which is different from most Unix kernels. So, what does it really mean for a kernal to be preemptive?
Some analogies or examples would be better than pure theoretical explanation.
ADD 1 -- 11:00 AM 12/7/2018
Preemptive is just one paradigm of multi-tasking. There are others like Cooperative Multi-tasking. A better understanding can be achieved by comparing them.

Prior to Linux kernel version 2.5.4, Linux Kernel was not preemptive which means a process running in kernel mode cannot be moved out of processor until it itself leaves the processor or it starts waiting for some input output operation to get complete.
Generally a process in user mode can enter into kernel mode using system calls. Previously when the kernel was non-preemptive, a lower priority process could priority invert a higher priority process by denying it access to the processor by repeatedly calling system calls and remaining in the kernel mode. Even if the lower priority process' timeslice expired, it would continue running until it completed its work in the kernel or voluntarily relinquished control. If the higher priority process waiting to run is a text editor in which the user is typing or an MP3 player ready to refill its audio buffer, the result is poor interactive performance. This way non-preemptive kernel was a major drawback at that time.

Traditional unix kernels had a single lock, which was held by a thread while kernel code was running. Therefore no other kernel code could interrupt that thread.
This made designing the kernel easier, since you knew that while one thread using kernel resources, no other thread was. Therefore the different threads cannot mess up each others work.
In single processor systems this doesn't cause too many problems.
However in multiprocessor systems, you could have a situation where several threads on different processors or cores all wanted to run kernel code at the same time. This means that depending on the type of workload, you could have lots of processors, but all of them spend most of their time waiting for each other.
In Linux 2.6, the kernel resources were divided up into much smaller units, protected by individual locks, and the kernel code was reviewed to make sure that locks were only held while the corresponding resources were in use. So now different processors only have to wait for each other if they want access to the same resource (for example hardware resource).

The preemption allows the kernel to give the IMPRESSION of parallelism: you've got only one processor (let's say a decade ago), but you feel like all your processes are running simulaneously. That's because the kernel preempts (ie, take the execution out of) the execution from one process to give it to the next one (maybe according to their priority).
EDIT Not preemptive kernels wait for processes to give back the hand (ie, during syscalls), so if your process computes a lot of data and doesn't call any kind of yield function, the other processes won't be able to execute to execute their calls. Such systems are said to be cooperative because they ask for the cooperation of the processes to ensure the equity of the execution time
EDIT 2 The main goal of preemption is to improve the reactivity of the system among multiple tasks, so that's good for end-users, whereas on the other-hand, servers want to achieve the highest througput, so they don't need it: (from the Linux kernel configuration)
Preemptible kernel (low-latency desktop)
Voluntary kernel preemption (desktop)
No forced preemption (server)

The linux kernel is monolithic and give a little computing timespan to all the running process sequentially. It means that the processes (eg. the programs) do not run concurrently, but they are given a give timespan regularly to execute their logic. The main problem is that some logic can take longer to terminate and prevent the kernel to allow time for the next process. This results in system "lags".
A preemtive kernel has the ability to switch context. It means that it can stop a "hanging" process even if it is not finished, and give the computing time to the next process as expected. The "hanging" process will continue to execute when its time has come without any problem.
Practically, it means that the kernel has the ability to achieve tasks in realtime, which is particularly interesting for audio recording and editing.
The ubuntu studio districution packages a preemptive kernel as well as a buch of quality free software devoted to audio and video edition.

It means that the operating system scheduler is free to suspend the execution of the running processes to give the CPU to another process whenever it wants; the normal way to do this is to give to each process that is waiting for the CPU a "quantum" of CPU time to run. After it has expired the scheduler takes back the control (and the running process cannot avoid this) to give another quantum to another process.
This method is often compared with the cooperative multitasking, in which processes keep the CPU for all the time they need, without being interrupted, and to let other applications run they have to call explicitly some kind of "yield" function; naturally, to avoid giving the feeling of the system being stuck, well-behaved applications will yield the CPU often. Still,if there's a bug in an application (e.g. an infinite loop without yield calls) the whole system will hang, since the CPU is completely kept by the faulty program.
Almost all recent desktop OSes use preemptive multitasking, that, even if it's more expensive in terms of resources, is in general more stable (it's more difficult for a sigle faulty app to hang the whole system, since the OS is always in control). On the other hand, when the resources are tight and the application are expected to be well-behaved, cooperative multitasking is used. Windows 3 was a cooperative multitasking OS; a more recent example can be RockBox, an opensource PMP firmware replacement.

I think everyone did a good job of explaining this but I'm just gonna add little more info. in context of Linux IRQ, interrupt and kernel scheduler.
Process scheduler is the component of the OS that is responsible for deciding if current running job/process should continue to run and if not which process should run next.
preemptive scheduler is a scheduler which allows to be interrupted and a running process then can change it's state and then let another process to run (since the current one was interrupted).
On the other hand, non-preemptive scheduler can't take away CPU away from a process (aka cooperative)
FYI, the name word "cooperative" can be confusing because the word's meaning does not clearly indicate what scheduler actually does.
For example, Older Windows like 3.1 had cooperative schedulers.
Full credit to wonderful article here

I think it became preemptive from 2.6. preemptive means when a new process is ready to run, the cpu will be allocated to the new process, it doesn't need the running process be co-operative and give up the cpu.

Linux kernel is preemptive means that The kernel supports preemption.
For example, there are two processes P1(higher priority) and P2(lower priority) which are doing read system calls and they are running in kernel mode. Suppose P2 is running and is in the kernel mode and P2 is scheduled to run.
If kernel preemption is available, then preemption can happen at the kernel level i.e P2 can get preempted and but to sleep and the P1 can continue to run.
If kernel preemption is not available, since P2 is in kernel mode, system simply waits till P2 is complete and then

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string