Are coroutines single-threaded by nature? - multithreading

According to Wikipedia, coroutines are based on cooperative multitasking, which makes them less resource-hungry than threads. No context switch, no blocking, no expensive system calls, no critical sections and so on.
In other words, all those coroutine benefits seem to come from disallowing multithreading in the first place. This makes coroutines single-threaded by nature: concurrency is achieved, but no true parallelism.
Is it true? Is it possible to implement coroutines by using multiple threads instead?

Coroutines allow multitasking without multithreading, but they don't disallow multithreading.
In languages that support both, a coroutine that is put to sleep can be re-awakened in a different thread.
The usual arrangement for CPU-bound tasks is to have a thread pool with about twice as many threads as you have CPU cores. This thread pool is then used to execute maybe thousands of coroutines simultaneously. The threads share a queue of coroutines ready to execute, and whenever a thread's current coroutine blocks, it just gets another one to work on from the queue.
In this situation you have enough busy threads to keep your CPU busy, and you still have thread context switches, but not enough of them to waste significant resources. The number of coroutine context switches is thousands of times higher.

Multiple coroutines can be mapped to a single OS thread. But a single OS thread can only utilize 1 CPU. So you need multiple OS threads to utilize multiple CPUs.
So if a coroutine scheduler needs to utilize multiple CPUs (very likely), it needs to make use of multiple OS threads.
Have a look at the Go scheduler and look for MN scheduler.

Related

What profit of using multi-threaded coroutines?

I work with coroutines pretty long time, but I still don't understand completely, why do I need to prefer multi-threaded coroutines instead of single-threaded coroutines.
I can clearly see the profit of using multi-threaded coroutines when their count is less or equal to the physical thread count. But if we have more tasks than physical threads, why wouldn't we rather use only one coroutine thread?
I'll clarify the final question: Why is 10 threads of coroutines better than only one thread with many coroutines?
Coroutines are units of computation (like tasks). The way they are dispatched onto actual threads is orthogonal to how many coroutines you have. You can use a single-threaded dispatcher or a multi-threaded dispatcher, and depending on this your coroutines will be scheduled differently.
Multi-threaded coroutines doesn't mean 1 thread per coroutine. You can dispatch 100 coroutines on 8 threads.
But if we have more tasks than physical threads, why wouldn't we rather use only one coroutine thread?
There are multiple parts in this question.
First, if you have more tasks than logical cores, you could still dispatch all those tasks onto just the right number of threads. You don't have to completely give up on multithreading. This is actually exactly what Dispatchers.Default is about: dispatching as many coroutines as you want onto a limited number of threads equal to the number of hardware threads (logical cores) that you have. The point is to make use of all the hardware as much as possible without wasting theads (and thus memory).
Second, not every task is CPU-bound. Some I/O operations block threads (network calls, disk reads/writes etc.). When a thread is blocked on I/O, it doesn't use the CPU. If you have 8 logical cores, using only 8 threads for I/O would be suboptimal, because while some threads are blocked, the CPU cannot run other tasks. With more threads, it can (at the cost of some memory). This is the point of Dispatchers.IO, which can create more threads as needed and can exceed the number of logical cores (within a reasonable limit).
Why is 10 threads of coroutines better than only one thread with many coroutines?
Let's assume you have 100 coroutines to dispatch.
Using only one thread to run those coroutines implies that only 1 core at most is doing the work at a given time, so nothing happens in parallel. This means all the other cores are idle, which is suboptimal. Worse, any I/O operation done by a coroutine blocks this only thread and prevents the CPU from doing anything while we're waiting on I/O.
Using 10 threads, you can literally execute 10 coroutines at the same time if your hardware is sufficient, which can be 10x faster (if your coroutines don't have inter-dependencies).
Using 100 threads would not be that beneficial if your coroutines are CPU-bound, but might be useful if you have a bunch of I/O tasks (as we've seen). That said, the more threads you use, the more memory is consumed. So even with a ton of I/O operations, you have to find a balance between throughput and memory, you don't want to spawn millions of threads.
In short, using multi-threading still has the same advantages with or without coroutines: it allows to make use of your hardware resources as much as possible. Using coroutines is just an easier way to define tasks, dispatch them onto threads, express dependencies, avoid blocking threads unnecessarily, etc.

Are Coroutines preemptive or just blocking the thread that picked the Runnable?

after digging a bit inside implementations of the Coroutine dispatchers such as "Default" and "IO",
I see they are just containing a Java executor (which is a simple thread pool) and a queue of Runnables which are the coroutine logic blocks.
let's take an example scenario where I am launching 10,000 coroutines on the same coroutine context, "Default" dispatcher for example, which contains an Executor with 512 real threads in its pool.
those coroutines will be added to the dispatcher queue (in case the number of in-flight coroutines exceeded the max threshold).
let's assume for example that the first 512 coroutines I launched out of the 10,000 are really slow and heavy.
are the rest of my coroutines will be blocked until at least 1 of my real threads will finish,
or is there some time-slicing mechanism in those "user-space threads"?
Coroutines are scheduled cooperatively, not pre-emptively, so context switch is possible only at suspension points. This is actually by design, it makes execution much faster, because coroutines don't fight each other and the number of context switches is lower than in pre-emptive scheduling.
But as you noticed, it has drawbacks. If performing long CPU-intensive calculations it is advised to invoke yield() from time to time. It allows to free the thread for other coroutines. Another solution is to create a distinct thread pool for our calculations to separate them from other parts of the application. This has similar drawback as pre-emptive scheduling - it will make coroutines/threads fight for the access to CPU cores.
Once a coroutine starts executing it will continue to do so until it hits a suspension point, which is introduced by a call to suspendCoroutine or suspendCancellableCoroutine.
Suspension is the fundamental idea
This however is by design, because suspension is fundamental to the performance gains that are introduced by coroutines, whole point behind coroutines is that why keep blocking a thread when its doing nothing but wait (ex sync IO). why not use this thread to do something else
Without suspension you lose much of the performance gain
So in order to identify the switch in your particular case, you will have to define the term slow and heavy. A cpu intensive task such as generating a prime number can be slow and heavy and a API call which performs complex computation on server and then returns a result can also be slow and heavy. if 512 coroutines have no suspension point, then others will have to wait for them to complete. which actually defeats the whole point of using the coroutines, since you are effectively using coroutiens as a replacement for threads but with added overhead.
If you have to execute bunch of non-suspending operations in parallel, you should instead use a service like Executor, since in this case coroutines does nothing but add a useless layer of abstraction.

Understanding coroutine

From wikipedia the
paragraph Comparison with threads states:
... This means that coroutines provide concurrency but not parallelism ...
I understand that coroutine is lighter than thread, context switching is not involved, no critical sections so mutex is also not needed. What confuses me is that the way it works seems not to scale. According to wikipedia, coroutines provide concurrency, they work cooperatively. A program with coroutines still executes instructions sequentially, this is exactly the same as threads on a single core machine, but what about multicore machines? on which threads run in parallel, while coroutines work the same as on single core machines.
My question is how coroutines will perform better than threads on multicore machines?
...what about multicore machines?...
Coroutines are a model of concurrency (in which two or more stateful activities can be in-progress at the same time), but not a model of parallelism (in which the program would able to use more hardware resources than what a single, conventional CPU core can provide).
Threads can run independently of one another, and if your hardware supports it (i.e., if your machine has more than one core) then two or more threads can be performing their independent activities at the same instant in time.
But coroutines, by definition, are interdependent. A coroutine only runs when it is called by another coroutine, and the caller is suspended until the current coroutine calls it back. Only one coroutine from a set of coroutines can ever be actually running at any given instant in time.

What is the difference between threads and lightweight threads?

I don't quite understand the difference between threads and lightweight threads. From an API perspective both types of threads are identical so where exactly does the difference come in. Is it at the implementation level where a lightweight thread is managed by a higher level runtime than the OS thread scheduler or is it something else? Also, is there set of heuristics that people use to decide which type of thread to use in specific scenarios?
In what context, lightweight threads could represent threads which are implemented by a library, for example threads can be simulated in a library by switching between lightweight threads at an event handling layer, these lightweight threads are queued up and processed by a singe OS thread, the advantage of this is that since context switching is handled in the library switching can occur when the processing of data is complete and so the data does not need to be loaded back into the CPU's cache next time this lightweight thread becomes active.
Lightweight threads could also refer to co-operative threads (or fibers), these are threads where you have to explicitly yield to give other lightweight threads a chance, this has the same advantage in that the context switching can occur at a place you know you have finished processing some data and so you know it will not be need again.
Alternativly Lightweight threads could mean normal OS threads and the non-lightweight threads could mean processes, process have at least one thread within them and also have there own memory and other resources, they are more expensive than threads because you can not share data between thread easily and it can be a more expensive operation for the OS to create processes.

Preemptive threads Vs Non Preemptive threads

Can someone please explain the difference between preemptive Threading model and Non Preemptive threading model?
As per my understanding:
Non Preemptive threading model: Once a thread is started it cannot be stopped or the control cannot be transferred to other threads until the thread has completed its task.
Preemptive Threading Model: The runtime is allowed to step in and hand control from one thread to another at any time. Higher priority threads are given precedence over Lower priority threads.
Can someone please:
Explain if the understanding is correct.
Explain the advantages and disadvantages of both models.
An example of when to use what will be really helpful.
If i create a thread in Linux (system v or Pthread) without mentioning any options(are there any??) by default the threading model used is preemptive threading model?
No, your understanding isn't entirely correct. Non-preemptive (aka cooperative) threads typically manually yield control to let other threads run before they finish (though it is up to that thread to call yield() (or whatever) to make that happen.
Preempting threading is simpler. Cooperative threads have less overhead.
Normally use preemptive. If you find your design has a lot of thread-switching overhead, cooperative threads would be a possible optimization. In many (most?) situations, this will be a fairly large investment with minimal payoff though.
Yes, by default you'd get preemptive threading, though if you look around for the CThreads package, it supports cooperative threading. Few enough people (now) want cooperative threads that I'm not sure it's been updated within the last decade though...
Non-preemptive threads are also called cooperative threads. An example of these is POE (Perl). Another example is classic Mac OS (before OS X). Cooperative threads have exclusive use of the CPU until they give it up. The scheduler then picks another thread to run.
Preemptive threads can voluntarily give up the CPU just like cooperative ones, but when they don't, it will be taken from them, and the scheduler will start another thread. POSIX & SysV threads fall in this category.
Big advantages of cooperative threads are greater efficiency (on single-core machines, at least) and easier handling of concurrency: it only exists when you yield control, so locking isn't required.
Big advantages of preemptive threads are better fault tolerance: a single thread failing to yield doesn't stop all other threads from executing. Also normally works better on multi-core machines, since multiple threads execute at once. Finally, you don't have to worry about making sure you're constantly yielding. That can be really annoying inside, e.g., a heavy number crunching loop.
You can mix them, of course. A single preemptive thread can have many cooperative threads running inside it.
If you use non-preemptive it does not mean that process doesn't perform context switching when the process is waiting for I/O. The dispatcher will choose another process according to the scheduling model. We have to trust the process.
non-preemptive:
less context switching, less overhead that can be sensible in non-preemptive model
Easier to handle since it can be handled using a single-core processor
preemptive:
Advantage:
In this model, we have a priority that helps us to have more control over the running process
Better concurrency is a bonus
Handling system calls without blocking the entire system
Disadvantage:
Requires more complex algorithms for synchronization and critical section handling is inevitable.
The overhead that comes with it
In cooperative (non-preemptive) models, once a thread is given control it continues to run until it explicitly yields control or it blocks.
In a preemptive model, the virtual machine is allowed to step in and hand control from one thread to another at any time. Both models have their advantages and disadvantages.
Java threads are generally preemptive between priorities. A higher priority thread takes precedence over a lower priority thread. If a higher priority thread goes to sleep or blocks, then a lower priority thread can run (assuming one is available and ready to run).
However, as soon as the higher priority thread wakes up or unblocks, it will interrupt the lower priority thread and run until it finishes, blocks again, or is preempted by an even higher priority thread.
The Java Language Specification, occasionally allows the VMs to run lower priority threads instead of a runnable higher priority thread, but in practice this is unusual.
However, nothing in the Java Language Specification specifies what is supposed to happen with equal priority threads. On some systems these threads will be time-sliced and the runtime will allot a certain amount of time to a thread. When that time is up, the runtime preempts the running thread and switches to the next thread with the same priority.
On other systems, a running thread will not be preempted in favor of a thread with the same priority. It will continue to run until it blocks, explicitly yields control, or is preempted by a higher priority thread.
As for the advantages both derobert and pooria have highlighted them quite clearly.

Resources