What is the difference between user level threads and coroutines? - multithreading

User-level threading involves N user level threads that run on a single kernel thread. What are the details of the user-level threading and how does this differ from coroutines?

Wikipedia has a quite in-depth summary on the subject: Thread (computing).
With Green threads there's a VM executing instructions that typically decides between to switch thread in-between two instructions.
With coroutines the two functions yield to each other at specified points, possibly passing values along, and typically requiring special language support. E.g. a producer yielding to a consumer, passing along an item.

The idea behind user-level threads is to have multiple different logical threads running in the same program but to have the user program handle the mapping from logical threads to kernel threads (Which actually get scheduled) rather than having the OS handle the entire mapping. This can improve performance by letting the user program handle scheduling. Conceptually, user threads are one implementation of preemptive multitasking, where multiple jobs are run to completion in parallel by having the threads periodically stopped while other threads run.
Coroutines, on the other hand, are a generalization of standard function call and return ("subroutines") where functions pass control back and forth to one another, communicating values as they switch between routines. The switching back and forth between coroutines is under the control of the coroutines themselves; control only passes from one coroutine to another if one of the coroutines explicitly yields a value to another. This is an example of cooperative multitasking, where multiple jobs are completed in parallel by having the individual steps in the task manually coordinate who gets to run and when.
Hope this helps!

Related

Do Rust threads run at the same time in parallel? Documentation sounds like it does not [duplicate]

I want to know if a program can run two threads at the same time (that is basically what it is used for correct?). But if I were to do a system call in one function where it runs on thread A, and have some other tasks running in another function where it runs on thread B, would they both be able to run at the same time or would my second function wait until the system call finishes?
Add-on to my original question: Now would this process still be an uninterruptable process while the system call is going on? I am talking about using any system call on UNIX/LINUX.
Multi-threading and parallel processing are two completely different topics, each worthy of its own conversation, but for the sake of introduction...
Threading:
When you launch an executable, it is running in a thread within a process. When you launch another thread, call it thread 2, you now have 2 separately running execution chains (threads) within the same process. On a single core microprocessor (uP), it is possible to run multiple threads, but not in parallel. Although conceptually the threads are often said to run at the same time, they are actually running consecutively in time slices allocated and controlled by the operating system. These slices are interleaved with each other. So, the execution steps of thread 1 do not actually happen at the same time as the execution steps of thread 2. These behaviors generally extend to as many threads as you create, i.e. packets of execution chains all working within the same process and sharing time slices doled out by the operating system.
So, in your system call example, it really depends on what the system call is as to whether or not it would finish before allowing the execution steps of the other thread to proceed. Several factors play into what will happen: Is it a blocking call? Does one thread have more priority than the other. What is the duration of the time slices?
Links relevant to threading in C:
SO Example
POSIX
ANSI C
Parallel Processing:
When multi-threaded program execution occurs on a multiple core system (multiple uP, or multiple multi-core uP) threads can run concurrently, or in parallel as different threads may be split off to separate cores to share the workload. This is one example of parallel processing.
Again, conceptually, parallel processing and threading are thought to be similar in that they allow things to be done simultaneously. But that is concept only, they are really very different, in both target application and technique. Where threading is useful as a way to identify and split out an entire task within a process (eg, a TCP/IP server may launch a worker thread when a new connection is requested, then connects, and maintains that connection as long as it remains), parallel processing is typically used to send smaller components of the same task (eg. a complex set of computations that can be performed independently in separate locations) off to separate resources (cores, or uPs) to be completed simultaneously. This is where multiple core processors really make a difference. But parallel processing also takes advantage of multiple systems, popular in areas such as genetics and MMORPG gaming.
Links relevant to parallel processing in C:
OpenMP
More OpenMP (examples)
Gribble Labs - Introduction to OpenMP
CUDA Tookit from NVIDIA
Additional reading on the general topic of threading and architecture:
This summary of threading and architecture barely scratches the surface. There are many parts to the the topic. Books to address them would fill a small library, and there are thousands of links. Not surprisingly within the broader topic some concepts do not seem to follow reason. For example, it is not a given that simply having more cores will result in faster multi-threaded programs.
Yes, they would, at least potentially, run "at the same time", that's exactly what threads are for; of course there are many details, for example:
If both threads run system calls that e.g. write to the same file descriptor they might temporarily block each other.
If thread synchronisation primitives like mutexes are used then the parallel execution will be blocked.
You need a processor with at least two cores in order to have two threads truly run at the same time.
It's a very large and very complex subject.
If your computer has only a single CPU, you should know, how it can execute more than one thread at the same time.
In single-processor systems, only a single thread of execution occurs at a given instant. because Single-processor systems support logical concurrency, not physical concurrency.
On multiprocessor systems, several threads do, in fact, execute at the same time, and physical concurrency is achieved.
The important feature of multithreaded programs is that they support logical concurrency, not whether physical concurrency is actually achieved.
The basics are simple, but the details get complex real quickly.
You can break a program into multiple threads (if it makes sense to do so), and each thread will run "at its own pace", such that if one must wait for, eg, some file I/O that doesn't slow down the others.
On a single processor multiple threads are accommodated by "time slicing" the processor somehow -- either on a simple clock basis or by letting one thread run until it must wait (eg, for I/O) and then "switching" to the next thread. There is a whole art/science to doing this for maximum efficiency.
On a multi-processor (such as most modern PCs which have from 2 to 8 "cores") each thread is assigned to a separate processor, and if there are not enough processors then they are shared as in the single processor case.
The whole area of assuring "atomicity" of operations by a single thread, and assuring that threads don't somehow interfere with each other is incredibly complex. In general a there is a "kernel" or "nucleus" category of system call that will not be interrupted by another thread, but thats only a small subset of all system calls, and you have to consult the OS documentation to know which category a particular system call falls into.
They will run at the same time, for one thread is independent from another, even if you perform a system call.
It's pretty easy to test it though, you can create one thread that prints something to the console output and perform a system call at another thread, that you know will take some reasonable amount of time. You will notice that the messages will continue to be printed by the other thread.
Yes, A program can run two threads at the same time.
it is called Multi threading.
would they both be able to run at the same time or would my second function wait until the system call finishes?
They both are able to run at the same time.
if you want, you can make thread B wait until Thread A completes or reverse
Two thread can run concurrently only if it is running on multiple core processor system, but if it has only one core processor then two threads can not run concurrently. So only one thread run at a time and if it finishes its job then the next thread which is on queue take the time.

Understanding coroutine

From wikipedia the
paragraph Comparison with threads states:
... This means that coroutines provide concurrency but not parallelism ...
I understand that coroutine is lighter than thread, context switching is not involved, no critical sections so mutex is also not needed. What confuses me is that the way it works seems not to scale. According to wikipedia, coroutines provide concurrency, they work cooperatively. A program with coroutines still executes instructions sequentially, this is exactly the same as threads on a single core machine, but what about multicore machines? on which threads run in parallel, while coroutines work the same as on single core machines.
My question is how coroutines will perform better than threads on multicore machines?
...what about multicore machines?...
Coroutines are a model of concurrency (in which two or more stateful activities can be in-progress at the same time), but not a model of parallelism (in which the program would able to use more hardware resources than what a single, conventional CPU core can provide).
Threads can run independently of one another, and if your hardware supports it (i.e., if your machine has more than one core) then two or more threads can be performing their independent activities at the same instant in time.
But coroutines, by definition, are interdependent. A coroutine only runs when it is called by another coroutine, and the caller is suspended until the current coroutine calls it back. Only one coroutine from a set of coroutines can ever be actually running at any given instant in time.

Does multithreading actually work in uniprocessor environment

Suppose we have a process with multiple threads in a uniprocessor.
Now I know that if we have several processes, only one of them will be processed at a time in a uniprocessor and hence the processes are not concurrent.
If my understanding is correct, similarly each thread will be processed at a time and not concurrent in a uniprocessor. Is this statement true? If so then does multithreading mean having more than one thread in a process and does not mean running multiple threads at a time? And does that mean there's no benefit of creating user threads in a uniprocessor environment?
TL;DR: threads are switching more often than processes and in real time we have an effect of concurrency because it is happens really fast.
when you wrote:
each thread will be processed at a time and not concurrent in a uni processor
Notice the word "concurrent", there is no real concurrency in uni processor, there is only effect of that thanks to the multiple number of context switches between processes.
Let's clarify something here, the single core of the CPU can handle one thread at a given time, each process has a main thread and (if needed) more threads running together. If a process A is now running and it has 3 threads: A1(main thread), A2, A3 all three will be running as long as process A is being processed by the CPU core. When a context switch occur process A is no longer running and now process B will run with his threads.
About this statement:
there's no benefit of creating user threads in a uni processor environment
That is not true. there is a benefit in creating threads, they are easier to create ("spawn" as in the books) and shearing the process heap memory. Creating a sub process ("child" as in the books) is a overhead comparing to a thread because a process need to have his own memory. For example each google chrome tab is a process not a thread, but this tab has multiple threads running concurrency with little responsibility.
If you are still somehow running a computer with just one, single-core, CPU, then you would be correct to observe that only one thread can be physically executing at one time. But that does not negate the value of breaking up the application into multiple threads and/or processes.
The essential benefit is concurrency. When one thread is waiting (e.g. for an input/output operation to complete), there is something else for the CPU to be doing in the meantime: it can be running a different thread that isn't waiting. With a carefully designed application, you can get much better utilization of every part of the hardware, more parallelism, and thus, more throughput.
My favorite go-to example is a fast food restaurant. About a dozen workers, each one doing different things, cooperate to bring your order to you. Even if one of them (say, "the fry guy") is standing around, someone else always has something to do. Several orders are in-process at once. This overlap, this "concurrency," is what you are shooting for – regardless of how many CPUs you have.
Multithreading is also commonly used with GUI applications that also need to do some kind of "heavy lifting." One thread handles the GUI interaction (and has no other real responsibilities) while other threads, with a slightly inferior priority (or "niceness") do the lifting. When a GUI event comes in, the GUI thread pre-empts the others and responds to it immediately, then of course goes right back to sleep again. But in this way the GUI always remains very responsive – even though the other threads are doing "heavy lifting" things, GUI messages are still handled very promptly. (I scooped-up about a 25% performance improvement by re-tooling an older application to use this approach, because the application was no longer "polling" for GUI events.)
The first question I ask about any thread is, "what does it wait for?" To me, a thread is defined by what event it waits for and what it does when that event happens.
Threads were in wide-spread use for at least a decade before multi-processor computers became commercially available. They are useful when you want to write a program that has to respond to un-synchronized events that come from multiple different sources. There's a few different ways to model a program like that. One way is to have a different thread to wait on each different event source. The next most popular is an event driven architecture in which there's a main loop that waits for all events and calls different event handler functions for each of the different kinds of event.
The multi-threaded style of program often is easier to read* because there's usually different activities going on inside the program, and the state of each activity can be implicit in the context (i.e., registers and call stack) of the thread that's driving it, while in the event-driven model, each activity's state must be explicitly encoded in some object.
The implicit-in-the-context way of keeping the state is much closer to the procedural style of coding a single activity that we learn as beginners.
*Easier to read does not mean that the code is easy to write without making bad and non-obvious mistakes!!
The main impetus for developing threads was Ada compliance. Prior to that, different operating systems had their own ways of handing multiple things at once. In eunuchs, the way to do more than one thing was to spin off a new process. In VMS, software interrupts (aka Asynchronous System Traps or Asynchronous Procedure Calls in Windoze). In those days (1970's) multiprocessor systems were rare.
One of the goals of Ada was to have a system independent way of doing things. It adopted the "task" which is effectively a thread. In order to support Ada, compiler developers had to include task (thread) libraries.
With the rise of multiprocessors, operating systems started to make threads (rather than processes) the basic schedulable unit in a system.
Threads then give a way for programs to handle multiple things simultaneously, even if there is only one processor. Sadly, support for threads in programming languages has been woefully lacking. Ada is the only major language I can think of that has real support for threads (tasks). Thread support in Java, for example, is a complete, sick joke. The result is threads are not as effective in practice as they could be.

Running two threads at the same time

I want to know if a program can run two threads at the same time (that is basically what it is used for correct?). But if I were to do a system call in one function where it runs on thread A, and have some other tasks running in another function where it runs on thread B, would they both be able to run at the same time or would my second function wait until the system call finishes?
Add-on to my original question: Now would this process still be an uninterruptable process while the system call is going on? I am talking about using any system call on UNIX/LINUX.
Multi-threading and parallel processing are two completely different topics, each worthy of its own conversation, but for the sake of introduction...
Threading:
When you launch an executable, it is running in a thread within a process. When you launch another thread, call it thread 2, you now have 2 separately running execution chains (threads) within the same process. On a single core microprocessor (uP), it is possible to run multiple threads, but not in parallel. Although conceptually the threads are often said to run at the same time, they are actually running consecutively in time slices allocated and controlled by the operating system. These slices are interleaved with each other. So, the execution steps of thread 1 do not actually happen at the same time as the execution steps of thread 2. These behaviors generally extend to as many threads as you create, i.e. packets of execution chains all working within the same process and sharing time slices doled out by the operating system.
So, in your system call example, it really depends on what the system call is as to whether or not it would finish before allowing the execution steps of the other thread to proceed. Several factors play into what will happen: Is it a blocking call? Does one thread have more priority than the other. What is the duration of the time slices?
Links relevant to threading in C:
SO Example
POSIX
ANSI C
Parallel Processing:
When multi-threaded program execution occurs on a multiple core system (multiple uP, or multiple multi-core uP) threads can run concurrently, or in parallel as different threads may be split off to separate cores to share the workload. This is one example of parallel processing.
Again, conceptually, parallel processing and threading are thought to be similar in that they allow things to be done simultaneously. But that is concept only, they are really very different, in both target application and technique. Where threading is useful as a way to identify and split out an entire task within a process (eg, a TCP/IP server may launch a worker thread when a new connection is requested, then connects, and maintains that connection as long as it remains), parallel processing is typically used to send smaller components of the same task (eg. a complex set of computations that can be performed independently in separate locations) off to separate resources (cores, or uPs) to be completed simultaneously. This is where multiple core processors really make a difference. But parallel processing also takes advantage of multiple systems, popular in areas such as genetics and MMORPG gaming.
Links relevant to parallel processing in C:
OpenMP
More OpenMP (examples)
Gribble Labs - Introduction to OpenMP
CUDA Tookit from NVIDIA
Additional reading on the general topic of threading and architecture:
This summary of threading and architecture barely scratches the surface. There are many parts to the the topic. Books to address them would fill a small library, and there are thousands of links. Not surprisingly within the broader topic some concepts do not seem to follow reason. For example, it is not a given that simply having more cores will result in faster multi-threaded programs.
Yes, they would, at least potentially, run "at the same time", that's exactly what threads are for; of course there are many details, for example:
If both threads run system calls that e.g. write to the same file descriptor they might temporarily block each other.
If thread synchronisation primitives like mutexes are used then the parallel execution will be blocked.
You need a processor with at least two cores in order to have two threads truly run at the same time.
It's a very large and very complex subject.
If your computer has only a single CPU, you should know, how it can execute more than one thread at the same time.
In single-processor systems, only a single thread of execution occurs at a given instant. because Single-processor systems support logical concurrency, not physical concurrency.
On multiprocessor systems, several threads do, in fact, execute at the same time, and physical concurrency is achieved.
The important feature of multithreaded programs is that they support logical concurrency, not whether physical concurrency is actually achieved.
The basics are simple, but the details get complex real quickly.
You can break a program into multiple threads (if it makes sense to do so), and each thread will run "at its own pace", such that if one must wait for, eg, some file I/O that doesn't slow down the others.
On a single processor multiple threads are accommodated by "time slicing" the processor somehow -- either on a simple clock basis or by letting one thread run until it must wait (eg, for I/O) and then "switching" to the next thread. There is a whole art/science to doing this for maximum efficiency.
On a multi-processor (such as most modern PCs which have from 2 to 8 "cores") each thread is assigned to a separate processor, and if there are not enough processors then they are shared as in the single processor case.
The whole area of assuring "atomicity" of operations by a single thread, and assuring that threads don't somehow interfere with each other is incredibly complex. In general a there is a "kernel" or "nucleus" category of system call that will not be interrupted by another thread, but thats only a small subset of all system calls, and you have to consult the OS documentation to know which category a particular system call falls into.
They will run at the same time, for one thread is independent from another, even if you perform a system call.
It's pretty easy to test it though, you can create one thread that prints something to the console output and perform a system call at another thread, that you know will take some reasonable amount of time. You will notice that the messages will continue to be printed by the other thread.
Yes, A program can run two threads at the same time.
it is called Multi threading.
would they both be able to run at the same time or would my second function wait until the system call finishes?
They both are able to run at the same time.
if you want, you can make thread B wait until Thread A completes or reverse
Two thread can run concurrently only if it is running on multiple core processor system, but if it has only one core processor then two threads can not run concurrently. So only one thread run at a time and if it finishes its job then the next thread which is on queue take the time.

How do user level threads (ULTs) and kernel level threads (KLTs) differ with regards to concurrent execution?

Here's what I understand; please correct/add to it:
In pure ULTs, the multithreaded process itself does the thread scheduling. So, the kernel essentially does not notice the difference and considers it a single-thread process. If one thread makes a blocking system call, the entire process is blocked. Even on a multicore processor, only one thread of the process would running at a time, unless the process is blocked. I'm not sure how ULTs are much help though.
In pure KLTs, even if a thread is blocked, the kernel schedules another (ready) thread of the same process. (In case of pure KLTs, I'm assuming the kernel creates all the threads of the process.)
Also, using a combination of ULTs and KLTs, how are ULTs mapped into KLTs?
Your analysis is correct. The OS kernel has no knowledge of user-level threads. From its perspective, a process is an opaque black box that occasionally makes system calls. Consequently, if that program has 100,000 user-level threads but only one kernel thread, then the process can only one run user-level thread at a time because there is only one kernel-level thread associated with it. On the other hand, if a process has multiple kernel-level threads, then it can execute multiple commands in parallel if there is a multicore machine.
A common compromise between these is to have a program request some fixed number of kernel-level threads, then have its own thread scheduler divvy up the user-level threads onto these kernel-level threads as appropriate. That way, multiple ULTs can execute in parallel, and the program can have fine-grained control over how threads execute.
As for how this mapping works - there are a bunch of different schemes. You could imagine that the user program uses any one of multiple different scheduling systems. In fact, if you do this substitution:
Kernel thread <---> Processor core
User thread <---> Kernel thread
Then any scheme the OS could use to map kernel threads onto cores could also be used to map user-level threads onto kernel-level threads.
Hope this helps!
Before anything else, templatetypedef's answer is beautiful; I simply wanted to extend his response a little.
There is one area which I felt the need for expanding a little: combinations of ULT's and KLT's. To understand the importance (what Wikipedia labels hybrid threading), consider the following examples:
Consider a multi-threaded program (multiple KLT's) where there are more KLT's than available logical cores. In order to efficiently use every core, as you mentioned, you want the scheduler to switch out KLT's that are blocking with ones that in a ready state and not blocking. This ensures the core is reducing its amount of idle time. Unfortunately, switching KLT's is expensive for the scheduler and it consumes a relatively large amount of CPU time.
This is one area where hybrid threading can be helpful. Consider a multi-threaded program with multiple KLT's and ULT's. Just as templatetypedef noted, only one ULT can be running at one time for each KLT. If a ULT is blocking, we still want to switch it out for one which is not blocking. Fortunately, ULT's are much more lightweight than KLT's, in the sense that there less resources assigned to a ULT and they require no interaction with the kernel scheduler. Essentially, it is almost always quicker to switch out ULT's than it is to switch out KLT's. As a result, we are able to significantly reduce a cores idle time relative to the first example.
Now, of course, all of this depends on the threading library being used for implementing ULT's. There are two ways (which I can come up with) for "mapping" ULT's to KLT's.
A collection of ULT's for all KLT's
This situation is ideal on a shared memory system. There is essentially a "pool" of ULT's to which each KLT has access. Ideally, the threading library scheduler would assign ULT's to each KLT upon request as opposed to the KLT's accessing the pool individually. The later could cause race conditions or deadlocks if not implemented with locks or something similar.
A collection of ULT's for each KLT (Qthreads)
This situation is ideal on a distributed memory system. Each KLT would have a collection of ULT's to run. The draw back is that the user (or the threading library) would have to divide the ULT's between the KLT's. This could result in load imbalance since it is not guaranteed that all ULT's will have the same amount of work to complete and complete roughly the same amount of time. The solution to this is allowing for ULT migration; that is, migrating ULT's between KLT's.

Resources