Threads - Pre-emptive Mutitasking vs. Priorities - multithreading

In my understanding, pre-emptive multitasking is the case when a time-slice (e.g. a 1 millisecond time-slice) makes the scheduler (of the OS) pass (to the CPU) one thread to the CPU for a particular span of time (1 millisecond in this example) and then switches to another thread (executes it for 1 millisecond and then switches back to the first thread and so on - assuming that there are only two threads, for simplicity).
Reference: https://www.youtube.com/watch?v=hsERPf9k54U
In contrast to pre-emptive multi-tasking is the concept of priorities - the OS sets priorities of applications in numbers, e.g. 1 to 39 etc., on whatever basis - that is not the concern for now.
And the advantage of this is that if one application hangs, the time-slicer simply goes back to the other thread (let's say this thread belongs to a different application, and the first application has hanged) and continues to work normally. Then you can close the hanged app.
Reference: https://www.youtube.com/watch?v=hsERPf9k54U
Now I don't think this is particularly an advantage of this kind of multitasking. It should be the same thing in the preemptive multitasking, isn't it?
Thank you in advance.

Preemptive, multitasking and priority (scheduling) are different aspects of the OS concepts.
Preemptive, in the context of process scheduling, is a strategy in which the OS can preempt (take) the resources allocated for a process whenever it (the OS) need. In contrast, non-preemptive scheduling strategy cannot preempt (take back) the resources until the process finishes using them and release them.
A priority scheduling algorithm can be implemented with preemptive or non-preemptive strategy.

Related

How Tasks are scheduled in a multi-core processor

I've got confused about how tasks are scheduled in a multi-core processor. Actually, different sources have different opinions. Importantly, there isn't enough document about tasks scheduling mechanism in a multi-core processor. Therefore, I decided to ask you a question.
I depicted a process that contains a process kernel thread, and two user-level threads. and provide a pseudo-code about the processing logic.
The question is, How this process will be executed in a multi-core processing unite that contains 2 physical cores and 4 logical processors (each core has 2). Such that, there are not any waiting processes, and the CPU was assigned to the process completely.
I guess it works like below:
Note: PKT_C1_LP1 means process kernel thread is assigned to core 1 and logical processor 1
|--PKT_C1_LP1--1s--| |--T1_C1_LP1--1s--| |--TSK1_C1_LP1--1s--|
|--T2_C1_LP2--2s-----------| |--TSK2_C1_LP2--1s--|
----------- timeline ----------->
Update
Seems like the answer(s) to your question(s) will depend a lot on what
OS and scheduler your system is running.
Because there aren't any waiting processes and also enough resources. So I believe that almost all of the scheduling algorithms in any os will have insignificant differences. However, let's say, for simplicity it is:
non-preemptive FCFS scheduling
Here's a timing diagram of the code that each thread needs to execute. This imagines a maximal case where each task immmediately spawns a new thread. The green sections are infinitesimally short pieces of code (think, "not-to-scale") but are basically just scheduling operations. And the red sections are similarly short process EXIT and thread END scheduling operations. (I've omitted penalties associated with thread creation. And notice that worker threads do not END, they just go idle, and they stay in a thread pool.
Basic Timing Diagram
Now the first thing you'll notice is that, because of the way tasks work, the second task can be executed on the same thread that scheduled it, because no more tasks are scheduled, and the thread is only going to await that task. This has nothing to do with thread scheduling, and everything to do with how tasks efficiently manage their pool of worker threads. This is application-level code, not os-level code that accomplishes this. The diagram below requires 1 fewer threads thanks to tasks.
Timing Diagram with smarter tasks
Now we can look at what the scheduler needs to do. We are still dealing with only logical processors. (The details of which core will execute which thread are complicated so let's leave that out for the moment.) Here we see that we can naively execute all each of these threads on their own processor.
Greedy usage of processors
It will likely be more efficient to execute the worker thread on one of the previous processors. They are idle when worker thread 1 needs to execute, so it makes more sense to reuse one of the previously allocated processors. Here task 1 code in worker thread 1 is shown executing on processor 2 (could also have been assigned to processor 1 because it is also free, but stay tuned for the next diagram and you'll see why I put it on processor 2).
Schedule thread to reuse a processor
And finally, we can construct the last version that takes us to the most efficient scheduling. This hinges on optimizing the case where you create a thread and then immediately join a thread. Different operating systems try to optimize this case so that the newly created thread can run on the same processor. It means that creating the thread doesn't immediately schedule the new thread on a free processor and burn the cost of a context switch back to the thread that scheduled it. Instead, the new thread is scheduled when we block in our Join operation, or when the next clock interrupt occurs. If we can quickly get to our Join call before an interrupt triggers the scheduler (we're talking < 10 ms on a typical operating systems for such things to be triggered by the clock chip) then the scheduling will happen more efficiently like this (below), where thread 2 can be scheduled to run on the same processor without a context switch. (Interestingly, Linux and Windows optimize this case differently.)
Final timing diagram
You'll notice (above) that this can now all execute on only two logical processors.
Whether it is more efficient to run these on separate cores or different logical processors of the same core is a nuance of the operating system again that depends highly on virtual memory usage and also the hardware specs of the processor and its caches. Different operating systems will do different things here, too. And the details matter greatly. Non-uniform memory architecture would affect the decision too.
In the real world, the operating system may use heuristics to determine the best priority and placement for threads and processes. The real world answer is so much different and more nuanced than this "computer science" answer I've given and depends on the specific details.
Additional Reading/Viewing:
Windows and Linux: A Tale of Two Kernels - Tech-Ed 2004 (Older but excellent info)
Processes, Threads, and Jobs in the Windows Operating System
Scheduling: Introduction; and Multiprocessor Scheduling (Advanced)
Capacity Aware Scheduling

How do user level threads (ULTs) and kernel level threads (KLTs) differ with regards to concurrent execution?

Here's what I understand; please correct/add to it:
In pure ULTs, the multithreaded process itself does the thread scheduling. So, the kernel essentially does not notice the difference and considers it a single-thread process. If one thread makes a blocking system call, the entire process is blocked. Even on a multicore processor, only one thread of the process would running at a time, unless the process is blocked. I'm not sure how ULTs are much help though.
In pure KLTs, even if a thread is blocked, the kernel schedules another (ready) thread of the same process. (In case of pure KLTs, I'm assuming the kernel creates all the threads of the process.)
Also, using a combination of ULTs and KLTs, how are ULTs mapped into KLTs?
Your analysis is correct. The OS kernel has no knowledge of user-level threads. From its perspective, a process is an opaque black box that occasionally makes system calls. Consequently, if that program has 100,000 user-level threads but only one kernel thread, then the process can only one run user-level thread at a time because there is only one kernel-level thread associated with it. On the other hand, if a process has multiple kernel-level threads, then it can execute multiple commands in parallel if there is a multicore machine.
A common compromise between these is to have a program request some fixed number of kernel-level threads, then have its own thread scheduler divvy up the user-level threads onto these kernel-level threads as appropriate. That way, multiple ULTs can execute in parallel, and the program can have fine-grained control over how threads execute.
As for how this mapping works - there are a bunch of different schemes. You could imagine that the user program uses any one of multiple different scheduling systems. In fact, if you do this substitution:
Kernel thread <---> Processor core
User thread <---> Kernel thread
Then any scheme the OS could use to map kernel threads onto cores could also be used to map user-level threads onto kernel-level threads.
Hope this helps!
Before anything else, templatetypedef's answer is beautiful; I simply wanted to extend his response a little.
There is one area which I felt the need for expanding a little: combinations of ULT's and KLT's. To understand the importance (what Wikipedia labels hybrid threading), consider the following examples:
Consider a multi-threaded program (multiple KLT's) where there are more KLT's than available logical cores. In order to efficiently use every core, as you mentioned, you want the scheduler to switch out KLT's that are blocking with ones that in a ready state and not blocking. This ensures the core is reducing its amount of idle time. Unfortunately, switching KLT's is expensive for the scheduler and it consumes a relatively large amount of CPU time.
This is one area where hybrid threading can be helpful. Consider a multi-threaded program with multiple KLT's and ULT's. Just as templatetypedef noted, only one ULT can be running at one time for each KLT. If a ULT is blocking, we still want to switch it out for one which is not blocking. Fortunately, ULT's are much more lightweight than KLT's, in the sense that there less resources assigned to a ULT and they require no interaction with the kernel scheduler. Essentially, it is almost always quicker to switch out ULT's than it is to switch out KLT's. As a result, we are able to significantly reduce a cores idle time relative to the first example.
Now, of course, all of this depends on the threading library being used for implementing ULT's. There are two ways (which I can come up with) for "mapping" ULT's to KLT's.
A collection of ULT's for all KLT's
This situation is ideal on a shared memory system. There is essentially a "pool" of ULT's to which each KLT has access. Ideally, the threading library scheduler would assign ULT's to each KLT upon request as opposed to the KLT's accessing the pool individually. The later could cause race conditions or deadlocks if not implemented with locks or something similar.
A collection of ULT's for each KLT (Qthreads)
This situation is ideal on a distributed memory system. Each KLT would have a collection of ULT's to run. The draw back is that the user (or the threading library) would have to divide the ULT's between the KLT's. This could result in load imbalance since it is not guaranteed that all ULT's will have the same amount of work to complete and complete roughly the same amount of time. The solution to this is allowing for ULT migration; that is, migrating ULT's between KLT's.

erlang threading and OS threads correlation [duplicate]

Erlang is known for being able to support MANY lightweight processes; it can do this because these are not processes in the traditional sense, or even threads like in P-threads, but threads entirely in user space.
This is well and good (fantastic actually). But how then are Erlang threads executed in parallel in a multicore/multiprocessor environment? Surely they have to somehow be mapped to kernel threads in order to be executed on separate cores?
Assuming that that's the case, how is this done? Are many lightweight processes mapped to a single kernel thread?
Or is there another way around this problem?
Answer depends on the VM which is used:
1) non-SMP: There is one scheduler (OS thread), which executes all Erlang processes, taken from the pool of runnable processes (i.e. those who are not blocked by e.g. receive)
2) SMP: There are K schedulers (OS threads, K is usually a number of CPU cores), which executes Erlang processes from the shared process queue. It is a simple FIFO queue (with locks to allow simultaneous access from multiple OS threads).
3) SMP in R13B and newer: There will be K schedulers (as before) which executes Erlang processes from multiple process queues. Each scheduler has it's own queue, so process migration logic from one scheduler to another will be added. This solution will improve performance by avoiding excessive locking in shared process queue.
For more information see this document prepared by Kenneth Lundin, Ericsson AB, for Erlang User Conference, Stockholm, November 13, 2008.
I want to ammend previous answers.
Erlang, or rather the Erlang runtime system (erts), defaults the number of schedulers (OS threads) and the number of runqueues to number of processing elements on your platform. That is processors cores or hardware threads. You can change these settings in runtime using:
erlang:system_flag(schedulers_online, NP) -> PrevNP
The Erlang processes does not have any affinity to any schedulers yet. The logic balancing the processes between the schedulers follows two rules. 1) A starving scheduler will steal work from another scheduler. 2) Migration paths are setup to push processes from schedulers with lots of processes to schedulers with less work. This is done to assure fairness in reduction count (execution time) for each process.
Schedulers however can be locked to specific processing elements. This not done by default. To let erts do the scheduler->core affinity use:
erlang:system_flag(scheduler_bind_type, default_bind) -> PrevBind
Several other bind types can be found in the documentation. Using affinity can greatly improve performance in heavy load situations! Especially in high lock contention situations. Also, the linux kernel cannot handle hyperthreads to say the least. If you have hyperthreads on your platform you should really use this feature in erlang.
I'm purely guessing here, but I'd imagine that there's a small number of threads, which pick processes from a common process pool for execution. Once a process hits a blocking operation, the thread executing it puts it aside and picks another. When a process being executed causes another process to become unblocked, that newly unblocked process gets placed into the pool. I suppose a thread might also stop execution of a process even when it's not blocked at certain points to serve other processes.
I would like to add some input to what was described in the accepted answer.
Erlang Scheduler is the essential part of the Erlang Runtime System and provides its own abstraction and implementation of the conception of lightweight processes atop the OS threads.
Each Scheduler runs within a single OS thread. Normally, there are as many schedulers as CPU (cores) are on he hardware (it is configurable though and naturally does not bring much value when number of schedulers exceeds those of hardware cores). The system might also be configured that scheduler will not jump between OS threads.
Now, when the Erlang process is being created it is entirely the responsibility of the ERTS and Scheduler to manage life cycle and resources consumption as well as its memory footprint etc.
One of the core implementation details is that each process has a time budget of 2000 reductions available when the Scheduler picks up that process from the run queue. Each progress in the system (even I/O) is guaranteed to have a reductions budget. That is what actually makes ERTS a system with preemptive multitasking.
I would recommend a great blog post on that topic by Jesper Louis Andersen http://jlouisramblings.blogspot.com/2013/01/how-erlang-does-scheduling.html
As the short answer: Erlang processes are not OS threads and do not map to them directly. Erlang Schedulers are what runs on the OS threads and provide smart implementation of more finely grained Erlang processes hiding those details behind programmer's eyes.

Preemptive threads Vs Non Preemptive threads

Can someone please explain the difference between preemptive Threading model and Non Preemptive threading model?
As per my understanding:
Non Preemptive threading model: Once a thread is started it cannot be stopped or the control cannot be transferred to other threads until the thread has completed its task.
Preemptive Threading Model: The runtime is allowed to step in and hand control from one thread to another at any time. Higher priority threads are given precedence over Lower priority threads.
Can someone please:
Explain if the understanding is correct.
Explain the advantages and disadvantages of both models.
An example of when to use what will be really helpful.
If i create a thread in Linux (system v or Pthread) without mentioning any options(are there any??) by default the threading model used is preemptive threading model?
No, your understanding isn't entirely correct. Non-preemptive (aka cooperative) threads typically manually yield control to let other threads run before they finish (though it is up to that thread to call yield() (or whatever) to make that happen.
Preempting threading is simpler. Cooperative threads have less overhead.
Normally use preemptive. If you find your design has a lot of thread-switching overhead, cooperative threads would be a possible optimization. In many (most?) situations, this will be a fairly large investment with minimal payoff though.
Yes, by default you'd get preemptive threading, though if you look around for the CThreads package, it supports cooperative threading. Few enough people (now) want cooperative threads that I'm not sure it's been updated within the last decade though...
Non-preemptive threads are also called cooperative threads. An example of these is POE (Perl). Another example is classic Mac OS (before OS X). Cooperative threads have exclusive use of the CPU until they give it up. The scheduler then picks another thread to run.
Preemptive threads can voluntarily give up the CPU just like cooperative ones, but when they don't, it will be taken from them, and the scheduler will start another thread. POSIX & SysV threads fall in this category.
Big advantages of cooperative threads are greater efficiency (on single-core machines, at least) and easier handling of concurrency: it only exists when you yield control, so locking isn't required.
Big advantages of preemptive threads are better fault tolerance: a single thread failing to yield doesn't stop all other threads from executing. Also normally works better on multi-core machines, since multiple threads execute at once. Finally, you don't have to worry about making sure you're constantly yielding. That can be really annoying inside, e.g., a heavy number crunching loop.
You can mix them, of course. A single preemptive thread can have many cooperative threads running inside it.
If you use non-preemptive it does not mean that process doesn't perform context switching when the process is waiting for I/O. The dispatcher will choose another process according to the scheduling model. We have to trust the process.
non-preemptive:
less context switching, less overhead that can be sensible in non-preemptive model
Easier to handle since it can be handled using a single-core processor
preemptive:
Advantage:
In this model, we have a priority that helps us to have more control over the running process
Better concurrency is a bonus
Handling system calls without blocking the entire system
Disadvantage:
Requires more complex algorithms for synchronization and critical section handling is inevitable.
The overhead that comes with it
In cooperative (non-preemptive) models, once a thread is given control it continues to run until it explicitly yields control or it blocks.
In a preemptive model, the virtual machine is allowed to step in and hand control from one thread to another at any time. Both models have their advantages and disadvantages.
Java threads are generally preemptive between priorities. A higher priority thread takes precedence over a lower priority thread. If a higher priority thread goes to sleep or blocks, then a lower priority thread can run (assuming one is available and ready to run).
However, as soon as the higher priority thread wakes up or unblocks, it will interrupt the lower priority thread and run until it finishes, blocks again, or is preempted by an even higher priority thread.
The Java Language Specification, occasionally allows the VMs to run lower priority threads instead of a runnable higher priority thread, but in practice this is unusual.
However, nothing in the Java Language Specification specifies what is supposed to happen with equal priority threads. On some systems these threads will be time-sliced and the runtime will allot a certain amount of time to a thread. When that time is up, the runtime preempts the running thread and switches to the next thread with the same priority.
On other systems, a running thread will not be preempted in favor of a thread with the same priority. It will continue to run until it blocks, explicitly yields control, or is preempted by a higher priority thread.
As for the advantages both derobert and pooria have highlighted them quite clearly.

How, if at all, do Erlang Processes map to Kernel Threads?

Erlang is known for being able to support MANY lightweight processes; it can do this because these are not processes in the traditional sense, or even threads like in P-threads, but threads entirely in user space.
This is well and good (fantastic actually). But how then are Erlang threads executed in parallel in a multicore/multiprocessor environment? Surely they have to somehow be mapped to kernel threads in order to be executed on separate cores?
Assuming that that's the case, how is this done? Are many lightweight processes mapped to a single kernel thread?
Or is there another way around this problem?
Answer depends on the VM which is used:
1) non-SMP: There is one scheduler (OS thread), which executes all Erlang processes, taken from the pool of runnable processes (i.e. those who are not blocked by e.g. receive)
2) SMP: There are K schedulers (OS threads, K is usually a number of CPU cores), which executes Erlang processes from the shared process queue. It is a simple FIFO queue (with locks to allow simultaneous access from multiple OS threads).
3) SMP in R13B and newer: There will be K schedulers (as before) which executes Erlang processes from multiple process queues. Each scheduler has it's own queue, so process migration logic from one scheduler to another will be added. This solution will improve performance by avoiding excessive locking in shared process queue.
For more information see this document prepared by Kenneth Lundin, Ericsson AB, for Erlang User Conference, Stockholm, November 13, 2008.
I want to ammend previous answers.
Erlang, or rather the Erlang runtime system (erts), defaults the number of schedulers (OS threads) and the number of runqueues to number of processing elements on your platform. That is processors cores or hardware threads. You can change these settings in runtime using:
erlang:system_flag(schedulers_online, NP) -> PrevNP
The Erlang processes does not have any affinity to any schedulers yet. The logic balancing the processes between the schedulers follows two rules. 1) A starving scheduler will steal work from another scheduler. 2) Migration paths are setup to push processes from schedulers with lots of processes to schedulers with less work. This is done to assure fairness in reduction count (execution time) for each process.
Schedulers however can be locked to specific processing elements. This not done by default. To let erts do the scheduler->core affinity use:
erlang:system_flag(scheduler_bind_type, default_bind) -> PrevBind
Several other bind types can be found in the documentation. Using affinity can greatly improve performance in heavy load situations! Especially in high lock contention situations. Also, the linux kernel cannot handle hyperthreads to say the least. If you have hyperthreads on your platform you should really use this feature in erlang.
I'm purely guessing here, but I'd imagine that there's a small number of threads, which pick processes from a common process pool for execution. Once a process hits a blocking operation, the thread executing it puts it aside and picks another. When a process being executed causes another process to become unblocked, that newly unblocked process gets placed into the pool. I suppose a thread might also stop execution of a process even when it's not blocked at certain points to serve other processes.
I would like to add some input to what was described in the accepted answer.
Erlang Scheduler is the essential part of the Erlang Runtime System and provides its own abstraction and implementation of the conception of lightweight processes atop the OS threads.
Each Scheduler runs within a single OS thread. Normally, there are as many schedulers as CPU (cores) are on he hardware (it is configurable though and naturally does not bring much value when number of schedulers exceeds those of hardware cores). The system might also be configured that scheduler will not jump between OS threads.
Now, when the Erlang process is being created it is entirely the responsibility of the ERTS and Scheduler to manage life cycle and resources consumption as well as its memory footprint etc.
One of the core implementation details is that each process has a time budget of 2000 reductions available when the Scheduler picks up that process from the run queue. Each progress in the system (even I/O) is guaranteed to have a reductions budget. That is what actually makes ERTS a system with preemptive multitasking.
I would recommend a great blog post on that topic by Jesper Louis Andersen http://jlouisramblings.blogspot.com/2013/01/how-erlang-does-scheduling.html
As the short answer: Erlang processes are not OS threads and do not map to them directly. Erlang Schedulers are what runs on the OS threads and provide smart implementation of more finely grained Erlang processes hiding those details behind programmer's eyes.

Resources