Is a Task lightweight compared to a Thread? - multithreading

I overheard a coworker saying that a Task is basically a lightweight thread. Coming from a C++ background (where threads where the lightest weight processing unit), this seems counter-intuitive to me.
Aren't Tasks just as heavy as Threads?

You need to distinguish between a unit of work (Tasks) from the underlying process used to host/execute them. It isn't even necessary for Tasks to run on other threads. For example, Tasks can be executed in a single threaded application that periodically yields control to the task pool.
Even when Tasks are executed on separate threads, there is usually not a 1 to 1 relationship between Task and Thread. The threads are preallocated as part of a pool, and then tasks are scheduled to run on these threads as available. Creating a new task does not require the overhead of creating a thread, it only requires the cost of an enque in a task queue.
This makes tasks inherently more scalable. I can have millions of tasks throughout the lifetime of my application, but only ever actually use some constant number of threads.

Typically a "thread" implies mandatory concurrency. Starting up a thread requires allocating a stack and internal OS data structures for it. In contrast, a "task" often refers to a piece of work for which concurrency is optional, hence a parallel framework (such as OpenMP, Cilk Plus, TBB, PPL) can use the same thread to execute many tasks, by serializing the tasks, and converting optional parallelism to real parallelism only as necessary to keep the machine busy.

You are right - everything runs on a thread under the covers.
The reason people say that a Task is more lightweight than a Thread is that Microsoft put a lot of thought into having Tasks make efficient use of Threads, and the implementation is probably much lighter weight than what the average developer would come up with on their own using the Thread class.
EDIT
A more clear explanation is that a Task object is lighter weight than a Thread object, and while each Task is eventually run on a Thread, creating N Task objects concurrently leads to less than N concurrent Thread objects being used, for large N.

Related

How is ReactiveX different from multithreaded programming

Reading about ReaactiveX(like here), it states something like:
An advantage of this approach is that when you have a bunch of tasks that are not dependent on each
other, you can start them all at the same time rather than waiting for each one to finish before
starting the next one — that way, your entire bundle of tasks only takes as long to complete as the
longest task in the bundle.
Are not we all doing this already using multi threading programming? So how are two things different actually?
This is a broader topic about light-weight async tasks in general vs threads.
A big difference is in cost and speed. Threads are expensive, and OSs generally limit the number that you can create. Every thread has room for an entire full stack set aside in case it's needed (you get a StackOverflow if it wasn't enough). If you have more threads than processors, then switching tasks means saving off all the current thread info and loading the new thread into registers, etc.
ReactiveX libraries work with callbacks, so the only memory needed is the object with the callback data. Switching ReactiveX tasks is just a method call.
You can have many millions of ReactiveX tasks in progress at once, not so much with threads.
Most slow tasks (like file or network IO) actually do a lot of waiting. Why allocate an entire thread just to do nothing but wait?
With ReactiveX, the tasks are just simple objects that are waiting, just sitting in a queue.
Now, ReactiveX is built on top of threading. Those millions of tasks (just callback object in memory) when actually running are running on some thread. And ReactiveX, tasks aren't all really "running" at the same time (only a thread running on a core can actually do something). Most tasks do a lot of waiting so really those millions of ReactiveX tasks, are really just all "waiting" at the same time by hanging out in a queue.
Also, consider a scenario like Javascript, which is a single threaded environment. Multi-threading just isn't an option there. Even if you can create threads, avoiding concurrency or simplifying UI code that needs thread affinity can be nice when many tasks are all managed on a single thread.
Even in multiple thread scenarios, ReactiveX can be really helpful since it's API guarantees synchronous event callback for a particular stream, even if that steam is using many threads to generate the data.

Use of the terms "queues", "multicore", and "threads" in Grand Central Dispatch

I am trying to get my head around the concepts of Grand Central Dispatch. I want to understand these quotes from Vandad's book on Concurrent Programming.
The real use for GCD is to dispatch tasks to multiple cores, without making you the programmer, worry about which core is executing which task.
and
At the heart of GCD are dispatch queues. Dispatch queues are pools of threads.
and finally
You will not be working with these threads directly. You will just work with dispatch queues, dispatching tasks to these queues and asking queues to invoke your task.
I have bolded the key terms.
Are multiple cores the same as queues? Does a queue consist of many threads? Does each thread perform a task?
So multiple cores are the same as queues?
Not really. A queue is a programming abstraction, a core is a physical resource in your processor. There is no unique relationship between a queue and a core, although at any given point in time it can be said that a given queue is executing a given task on a given core.
A queue consists of many threads?
A queue consists of tasks. Tasks are assigned to threads by the queue managing system when it comes the time to execute that task. Threads are OS resources and are allocated to cores, which effectively run them and have no notion of what a task is (except for Hyper-Threading CPUs).
If you do not account for hardware-multithreading (e.g., Hyper-threading), at any given point in time a core is running a specific thread; when it comes the time to run a different thread, a context-switch occurs in that core. If you account for hardware-multithreading, you can have multiple threads running on virtual cores hosted in the same physical core.
The relationship between queues and threads is opaque. A queue could manage several threads at once, or several threads once at a time, or just one all the time; in the first case, you have a parallel queue, able to execute parallel tasks on simultaneous threads; in the second and third case, you have a serial queue.
Each thread performs a task?
At any given point in time, a thread is performing a task. You can have threads that are spawn, execute their task, and die; or you can have long running threads (i.e., the main thread) that execute several tasks.
Maybe it is pretty puzzling at start, you might need some reading about Operating Systems and maybe high-level Processor Architectures to fully understand this.
GCD aims at letting you reason exclusively in abstract terms: i.e., in terms of tasks and queues, and forget about threads and cores, that are seen as a sort of "implementation means", or low-level details that you can leave to the system to use efficiently.
Queues are just list of tasks to execute, cores depend on the processor, you can have 1 or many cores.
Queues are configurable and you can decide if tasks can be executed concurently or not, if you allow concurency in your queue, tasks in the queue can be executed at the same time in different cores.
I'm not sure those quotes really do GCD justice. For example, to take each quote in turn:
GCD is more than useable (and useful) even if you have only a single core available, since multi-threading certain tasks have their place in computer science regardless of the number of physical CPU cores available. Better to think of it as an alternative to managing threads explicitly - GCD will do the thread management so you don't have to, you (as the programmer) just have to think in terms of queues and whether certain related tasks must be done serially or can be done concurrently.
Dispatch queues are not "pools of threads". Dispatch queues are "units of work aggregation" and should be thought of that way. How that work is physically performed, by one thread or multiple threads, is not the programmer's concern and, in fact, the less assumptions the programmer makes about that the better since GCD tries very hard to be efficient and use as few threads as possible while still effectively utilizing hardware resources.
The third quote is good - that is the appropriate idiom to embrace. Just submit your work (be it blocks or function/context tuples) to the appropriate queue, creating queues as necessary to associate with resources that require synchronization, and you've got the gist of GCD.

What is the difference between threads and lightweight threads?

I don't quite understand the difference between threads and lightweight threads. From an API perspective both types of threads are identical so where exactly does the difference come in. Is it at the implementation level where a lightweight thread is managed by a higher level runtime than the OS thread scheduler or is it something else? Also, is there set of heuristics that people use to decide which type of thread to use in specific scenarios?
In what context, lightweight threads could represent threads which are implemented by a library, for example threads can be simulated in a library by switching between lightweight threads at an event handling layer, these lightweight threads are queued up and processed by a singe OS thread, the advantage of this is that since context switching is handled in the library switching can occur when the processing of data is complete and so the data does not need to be loaded back into the CPU's cache next time this lightweight thread becomes active.
Lightweight threads could also refer to co-operative threads (or fibers), these are threads where you have to explicitly yield to give other lightweight threads a chance, this has the same advantage in that the context switching can occur at a place you know you have finished processing some data and so you know it will not be need again.
Alternativly Lightweight threads could mean normal OS threads and the non-lightweight threads could mean processes, process have at least one thread within them and also have there own memory and other resources, they are more expensive than threads because you can not share data between thread easily and it can be a more expensive operation for the OS to create processes.

Can thread creation within OS internals run concurrently?

Suppose we have a dual-core machine with a mainstream, modern OS capable to utilize both the cores.
If I have two threads, P1 and Q1 within the same process, and they happen to commence creating child threads, say, P2 and Q2, at approximately the same machine cycle, will OS perform the thread creation concurrently?
I heard thread creation is expensive, so the question came forth...
Thanks in advance.
Any reasonably well designed OS can have multiple processors executing kernel code at the same time. Therefore some of the tasks involved in a thread creation can be happening concurrently. But there will be some necessary serialization to manipulate some shared data structures (e.g. allocating memory, inserting a newly created threat structure into a global list). The processors could contend for the same lock thereby reducing concurrency.
Systems/applications which make new threads so often that the overhead of thread creation actually matters are probably designed wrong (doing too little useful work in a thread relative to the startup time, and not taking advantage of the obvious optimization of reusing short-lived threads from a pool).
It will be sorta-concurrently. There are aspects of thread-creation that cannot proceed in parallel - it would be unfortunate if the kernel memory-manager allocated both threads the same stack!
Thread creation is sufficiently expensive that it's worth while avoiding doing it at all during an app. run, hence the popularity of thread pools. Long-running tasks that block can be threaded off and left for the life of the app - often this means that explicit thread termination, (awkward at best, almost impossible at worst, from user code), is not necessary.
I think developers continually start and stop threads because they like to think of them as 'functions', where you 'pass parameters' in at the start and 'return' results when the thread ends. Ths is not the best way of conceptualizing threads.

Thread Pool vs Thread Spawning

Can someone list some comparison points between Thread Spawning vs Thread Pooling, which one is better? Please consider the .NET framework as a reference implementation that supports both.
Thread pool threads are much cheaper than a regular Thread, they pool the system resources required for threads. But they have a number of limitations that may make them unfit:
You cannot abort a threadpool thread
There is no easy way to detect that a threadpool completed, no Thread.Join()
There is no easy way to marshal exceptions from a threadpool thread
You cannot display any kind of UI on a threadpool thread beyond a message box
A threadpool thread should not run longer than a few seconds
A threadpool thread should not block for a long time
The latter two constraints are a side-effect of the threadpool scheduler, it tries to limit the number of active threads to the number of cores your CPU has available. This can cause long delays if you schedule many long running threads that block often.
Many other threadpool implementations have similar constraints, give or take.
A "pool" contains a list of available "threads" ready to be used whereas "spawning" refers to actually creating a new thread.
The usefulness of "Thread Pooling" lies in "lower time-to-use": creation time overhead is avoided.
In terms of "which one is better": it depends. If the creation-time overhead is a problem use Thread-pooling. This is a common problem in environments where lots of "short-lived tasks" need to be performed.
As pointed out by other folks, there is a "management overhead" for Thread-Pooling: this is minimal if properly implemented. E.g. limiting the number of threads in the pool is trivial.
For some definition of "better", you generally want to go with a thread pool. Without knowing what your use case is, consider that with a thread pool, you have a fixed number of threads which can all be created at startup or can be created on demand (but the number of threads cannot exceed the size of the pool). If a task is submitted and no thread is available, it is put into a queue until there is a thread free to handle it.
If you are spawning threads in response to requests or some other kind of trigger, you run the risk of depleting all your resources as there is nothing to cap the amount of threads created.
Another benefit to thread pooling is reuse - the same threads are used over and over to handle different tasks, rather than having to create a new thread each time.
As pointed out by others, if you have a small number of tasks that will run for a long time, this would negate the benefits gained by avoiding frequent thread creation (since you would not need to create a ton of threads anyway).
My feeling is that you should start just by creating a thread as needed... If the performance of this is OK, then you're done. If at some point, you detect that you need lower latency around thread creation you can generally drop in a thread pool without breaking anything...
All depends on your scenario. Creating new threads is resource intensive and an expensive operation. Most very short asynchronous operations (less than a few seconds max) could make use of the thread pool.
For longer running operations that you want to run in the background, you'd typically create (spawn) your own thread. (Ab)using a platform/runtime built-in threadpool for long running operations could lead to nasty forms of deadlocks etc.
Thread pooling is usually considered better, because the threads are created up front, and used as required. Therefore, if you are using a lot of threads for relatively short tasks, it can be a lot faster. This is because they are saved for future use and are not destroyed and later re-created.
In contrast, if you only need 2-3 threads and they will only be created once, then this will be better. This is because you do not gain from caching existing threads for future use, and you are not creating extra threads which might not be used.
It depends on what you want to execute on the other thread.
For short task it is better to use a thread pool, for long task it may be better to spawn a new thread as it could starve the thread pool for other tasks.
The main difference is that a ThreadPool maintains a set of threads that are already spun-up and available for use, because starting a new thread can be expensive processor-wise.
Note however that even a ThreadPool needs to "spawn" threads... it usually depends on workload - if there is a lot of work to be done, a good threadpool will spin up new threads to handle the load based on configuration and system resources.
There is little extra time required for creating/spawning thread, where as thread poll already contains created threads which are ready to be used.
This answer is a good summary but just in case, here is the link to Wikipedia:
http://en.wikipedia.org/wiki/Thread_pool_pattern
For Multi threaded execution combined with getting return values from the execution, or an easy way to detect that a threadpool has completed, java Callables could be used.
See https://blogs.oracle.com/CoreJavaTechTips/entry/get_netbeans_6 for more info.
Assuming C# and Windows 7 and up...
When you create a thread using new Thread(), you create a managed thread that becomes backed by a native OS thread when you call Start – a one to one relationship. It is important to know only one thread runs on a CPU core at any given time.
An easier way is to call ThreadPool.QueueUserWorkItem (i.e. background thread), which in essence does the same thing, except those background threads aren’t forever tied to a single native thread. The .NET scheduler will simulate multitasking between managed threads on a single native thread. With say 4 cores, you’ll have 4 native threads each running multiple managed threads, determined by .NET. This offers lighter-weight multitasking since switching between managed threads happens within the .NET VM not in the kernel. There is some overhead associated with crossing from user mode to kernel mode, and the .NET scheduler minimizes such crossing.
It may be important to note that heavy multitasking might benefit from pure native OS threads in a well-designed multithreading framework. However, the performance benefits aren’t that much.
With using the ThreadPool, just make sure the minimum worker thread count is high enough or ThreadPool.QueueUserWorkItem will be slower than new Thread(). In a benchmark test looping 512 times calling new Thread() left ThreadPool.QueueUserWorkItem in the dust with default minimums. However, first setting the minimum worker thread count to 512, in this test, made new Thread() and ThreadPool.QueueUserWorkItem perform similarly.
A side effective of setting a high worker thread count is that new Task() (or Task.Factory.StartNew) also performed similarly as new Thread() and ThreadPool.QueueUserWorkItem.

Resources