Thread Pool vs Thread Spawning - multithreading

Can someone list some comparison points between Thread Spawning vs Thread Pooling, which one is better? Please consider the .NET framework as a reference implementation that supports both.

Thread pool threads are much cheaper than a regular Thread, they pool the system resources required for threads. But they have a number of limitations that may make them unfit:
You cannot abort a threadpool thread
There is no easy way to detect that a threadpool completed, no Thread.Join()
There is no easy way to marshal exceptions from a threadpool thread
You cannot display any kind of UI on a threadpool thread beyond a message box
A threadpool thread should not run longer than a few seconds
A threadpool thread should not block for a long time
The latter two constraints are a side-effect of the threadpool scheduler, it tries to limit the number of active threads to the number of cores your CPU has available. This can cause long delays if you schedule many long running threads that block often.
Many other threadpool implementations have similar constraints, give or take.

A "pool" contains a list of available "threads" ready to be used whereas "spawning" refers to actually creating a new thread.
The usefulness of "Thread Pooling" lies in "lower time-to-use": creation time overhead is avoided.
In terms of "which one is better": it depends. If the creation-time overhead is a problem use Thread-pooling. This is a common problem in environments where lots of "short-lived tasks" need to be performed.
As pointed out by other folks, there is a "management overhead" for Thread-Pooling: this is minimal if properly implemented. E.g. limiting the number of threads in the pool is trivial.

For some definition of "better", you generally want to go with a thread pool. Without knowing what your use case is, consider that with a thread pool, you have a fixed number of threads which can all be created at startup or can be created on demand (but the number of threads cannot exceed the size of the pool). If a task is submitted and no thread is available, it is put into a queue until there is a thread free to handle it.
If you are spawning threads in response to requests or some other kind of trigger, you run the risk of depleting all your resources as there is nothing to cap the amount of threads created.
Another benefit to thread pooling is reuse - the same threads are used over and over to handle different tasks, rather than having to create a new thread each time.
As pointed out by others, if you have a small number of tasks that will run for a long time, this would negate the benefits gained by avoiding frequent thread creation (since you would not need to create a ton of threads anyway).

My feeling is that you should start just by creating a thread as needed... If the performance of this is OK, then you're done. If at some point, you detect that you need lower latency around thread creation you can generally drop in a thread pool without breaking anything...

All depends on your scenario. Creating new threads is resource intensive and an expensive operation. Most very short asynchronous operations (less than a few seconds max) could make use of the thread pool.
For longer running operations that you want to run in the background, you'd typically create (spawn) your own thread. (Ab)using a platform/runtime built-in threadpool for long running operations could lead to nasty forms of deadlocks etc.

Thread pooling is usually considered better, because the threads are created up front, and used as required. Therefore, if you are using a lot of threads for relatively short tasks, it can be a lot faster. This is because they are saved for future use and are not destroyed and later re-created.
In contrast, if you only need 2-3 threads and they will only be created once, then this will be better. This is because you do not gain from caching existing threads for future use, and you are not creating extra threads which might not be used.

It depends on what you want to execute on the other thread.
For short task it is better to use a thread pool, for long task it may be better to spawn a new thread as it could starve the thread pool for other tasks.

The main difference is that a ThreadPool maintains a set of threads that are already spun-up and available for use, because starting a new thread can be expensive processor-wise.
Note however that even a ThreadPool needs to "spawn" threads... it usually depends on workload - if there is a lot of work to be done, a good threadpool will spin up new threads to handle the load based on configuration and system resources.

There is little extra time required for creating/spawning thread, where as thread poll already contains created threads which are ready to be used.

This answer is a good summary but just in case, here is the link to Wikipedia:
http://en.wikipedia.org/wiki/Thread_pool_pattern

For Multi threaded execution combined with getting return values from the execution, or an easy way to detect that a threadpool has completed, java Callables could be used.
See https://blogs.oracle.com/CoreJavaTechTips/entry/get_netbeans_6 for more info.

Assuming C# and Windows 7 and up...
When you create a thread using new Thread(), you create a managed thread that becomes backed by a native OS thread when you call Start – a one to one relationship. It is important to know only one thread runs on a CPU core at any given time.
An easier way is to call ThreadPool.QueueUserWorkItem (i.e. background thread), which in essence does the same thing, except those background threads aren’t forever tied to a single native thread. The .NET scheduler will simulate multitasking between managed threads on a single native thread. With say 4 cores, you’ll have 4 native threads each running multiple managed threads, determined by .NET. This offers lighter-weight multitasking since switching between managed threads happens within the .NET VM not in the kernel. There is some overhead associated with crossing from user mode to kernel mode, and the .NET scheduler minimizes such crossing.
It may be important to note that heavy multitasking might benefit from pure native OS threads in a well-designed multithreading framework. However, the performance benefits aren’t that much.
With using the ThreadPool, just make sure the minimum worker thread count is high enough or ThreadPool.QueueUserWorkItem will be slower than new Thread(). In a benchmark test looping 512 times calling new Thread() left ThreadPool.QueueUserWorkItem in the dust with default minimums. However, first setting the minimum worker thread count to 512, in this test, made new Thread() and ThreadPool.QueueUserWorkItem perform similarly.
A side effective of setting a high worker thread count is that new Task() (or Task.Factory.StartNew) also performed similarly as new Thread() and ThreadPool.QueueUserWorkItem.

Related

Nested threads in C++ 11 multithread programming

At first, I create four threads and each of them will call a GPU function. However, within each of the four, I also want to create two threads. One is to read data from the disk and the other is to do the computation. I am not sure if I can create a nested thread in C++. I think this is not a neat code. Can I have another way to solve the problem?
In general it should be no problem to create a new thread from a running thread.
Like you assume it's not the best solution, because creating/destroying threads often isn't cheap and the more threads you have the more context switches you have which is or can be also a performance penalty.
So you could create a thread pool which has given number of threads and let the thread pool threads work on reading data from disk and do computations. You would avoid massive creation and destroying of threads.
If you also create/destroy often the threads who are calling GPU functions you could create two threadpools one for the threads calling the GPU functions and one threadpool for reading from disk and computations.
You could use std::async and do away with thread management entirely. Or use a hybrid approach where you have the 4 core threads which I assume will never die and then in those functions where you wish to perform more asynchronous work you can use std::async.
https://solarianprogrammer.com/2012/10/17/cpp-11-async-tutorial/
It's not clear whether async tasks are using thread pools. If you want to ensure high performance, which you probably care about since you're using GPUs, you should use a thread pool.
http://roar11.com/2016/01/a-platform-independent-thread-pool-using-c14/

Is a Task lightweight compared to a Thread?

I overheard a coworker saying that a Task is basically a lightweight thread. Coming from a C++ background (where threads where the lightest weight processing unit), this seems counter-intuitive to me.
Aren't Tasks just as heavy as Threads?
You need to distinguish between a unit of work (Tasks) from the underlying process used to host/execute them. It isn't even necessary for Tasks to run on other threads. For example, Tasks can be executed in a single threaded application that periodically yields control to the task pool.
Even when Tasks are executed on separate threads, there is usually not a 1 to 1 relationship between Task and Thread. The threads are preallocated as part of a pool, and then tasks are scheduled to run on these threads as available. Creating a new task does not require the overhead of creating a thread, it only requires the cost of an enque in a task queue.
This makes tasks inherently more scalable. I can have millions of tasks throughout the lifetime of my application, but only ever actually use some constant number of threads.
Typically a "thread" implies mandatory concurrency. Starting up a thread requires allocating a stack and internal OS data structures for it. In contrast, a "task" often refers to a piece of work for which concurrency is optional, hence a parallel framework (such as OpenMP, Cilk Plus, TBB, PPL) can use the same thread to execute many tasks, by serializing the tasks, and converting optional parallelism to real parallelism only as necessary to keep the machine busy.
You are right - everything runs on a thread under the covers.
The reason people say that a Task is more lightweight than a Thread is that Microsoft put a lot of thought into having Tasks make efficient use of Threads, and the implementation is probably much lighter weight than what the average developer would come up with on their own using the Thread class.
EDIT
A more clear explanation is that a Task object is lighter weight than a Thread object, and while each Task is eventually run on a Thread, creating N Task objects concurrently leads to less than N concurrent Thread objects being used, for large N.

What is the difference between threads and lightweight threads?

I don't quite understand the difference between threads and lightweight threads. From an API perspective both types of threads are identical so where exactly does the difference come in. Is it at the implementation level where a lightweight thread is managed by a higher level runtime than the OS thread scheduler or is it something else? Also, is there set of heuristics that people use to decide which type of thread to use in specific scenarios?
In what context, lightweight threads could represent threads which are implemented by a library, for example threads can be simulated in a library by switching between lightweight threads at an event handling layer, these lightweight threads are queued up and processed by a singe OS thread, the advantage of this is that since context switching is handled in the library switching can occur when the processing of data is complete and so the data does not need to be loaded back into the CPU's cache next time this lightweight thread becomes active.
Lightweight threads could also refer to co-operative threads (or fibers), these are threads where you have to explicitly yield to give other lightweight threads a chance, this has the same advantage in that the context switching can occur at a place you know you have finished processing some data and so you know it will not be need again.
Alternativly Lightweight threads could mean normal OS threads and the non-lightweight threads could mean processes, process have at least one thread within them and also have there own memory and other resources, they are more expensive than threads because you can not share data between thread easily and it can be a more expensive operation for the OS to create processes.

Can thread creation within OS internals run concurrently?

Suppose we have a dual-core machine with a mainstream, modern OS capable to utilize both the cores.
If I have two threads, P1 and Q1 within the same process, and they happen to commence creating child threads, say, P2 and Q2, at approximately the same machine cycle, will OS perform the thread creation concurrently?
I heard thread creation is expensive, so the question came forth...
Thanks in advance.
Any reasonably well designed OS can have multiple processors executing kernel code at the same time. Therefore some of the tasks involved in a thread creation can be happening concurrently. But there will be some necessary serialization to manipulate some shared data structures (e.g. allocating memory, inserting a newly created threat structure into a global list). The processors could contend for the same lock thereby reducing concurrency.
Systems/applications which make new threads so often that the overhead of thread creation actually matters are probably designed wrong (doing too little useful work in a thread relative to the startup time, and not taking advantage of the obvious optimization of reusing short-lived threads from a pool).
It will be sorta-concurrently. There are aspects of thread-creation that cannot proceed in parallel - it would be unfortunate if the kernel memory-manager allocated both threads the same stack!
Thread creation is sufficiently expensive that it's worth while avoiding doing it at all during an app. run, hence the popularity of thread pools. Long-running tasks that block can be threaded off and left for the life of the app - often this means that explicit thread termination, (awkward at best, almost impossible at worst, from user code), is not necessary.
I think developers continually start and stop threads because they like to think of them as 'functions', where you 'pass parameters' in at the start and 'return' results when the thread ends. Ths is not the best way of conceptualizing threads.

What are multi-threading DOs and DONTs? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I am applying my new found knowledge of threading everywhere and getting lots of surprises
Example:
I used threads to add numbers in an
array. And outcome was different every
time. The problem was that all of my
threads were updating the same
variable and were not synchronized.
What are some known thread issues?
What care should be taken while using
threads?
What are good multithreading resources.
Please provide examples.
sidenote:(I renamed my program thread_add.java to thread_random_number_generator.java:-)
In a multithreading environment you have to take care of synchronization so two threads doesn't clobber the state by simultaneously performing modifications. Otherwise you can have race conditions in your code (for an example see the infamous Therac-25 accident.) You also have to schedule the threads to perform various tasks. You then have to make sure that your synchronization and scheduling doesn't cause a deadlock where multiple threads will wait for each other indefinitely.
Synchronization
Something as simple as increasing a counter requires synchronization:
counter += 1;
Assume this sequence of events:
counter is initialized to 0
thread A retrieves counter from memory to cpu (0)
context switch
thread B retrieves counter from memory to cpu (0)
thread B increases counter on cpu
thread B writes back counter from cpu to memory (1)
context switch
thread A increases counter on cpu
thread A writes back counter from cpu to memory (1)
At this point the counter is 1, but both threads did try to increase it. Access to the counter has to be synchronized by some kind of locking mechanism:
lock (myLock) {
counter += 1;
}
Only one thread is allowed to execute the code inside the locked block. Two threads executing this code might result in this sequence of events:
counter is initialized to 0
thread A acquires myLock
context switch
thread B tries to acquire myLock but has to wait
context switch
thread A retrieves counter from memory to cpu (0)
thread A increases counter on cpu
thread A writes back counter from cpu to memory (1)
thread A releases myLock
context switch
thread B acquires myLock
thread B retrieves counter from memory to cpu (1)
thread B increases counter on cpu
thread B writes back counter from cpu to memory (2)
thread B releases myLock
At this point counter is 2.
Scheduling
Scheduling is another form of synchronization and you have to you use thread synchronization mechanisms like events, semaphores, message passing etc. to start and stop threads. Here is a simplified example in C#:
AutoResetEvent taskEvent = new AutoResetEvent(false);
Task task;
// Called by the main thread.
public void StartTask(Task task) {
this.task = task;
// Signal the worker thread to perform the task.
this.taskEvent.Set();
// Return and let the task execute on another thread.
}
// Called by the worker thread.
void ThreadProc() {
while (true) {
// Wait for the event to become signaled.
this.taskEvent.WaitOne();
// Perform the task.
}
}
You will notice that access to this.task probably isn't synchronized correctly, that the worker thread isn't able to return results back to the main thread, and that there is no way to signal the worker thread to terminate. All this can be corrected in a more elaborate example.
Deadlock
A common example of deadlock is when you have two locks and you are not careful how you acquire them. At one point you acquire lock1 before lock2:
public void f() {
lock (lock1) {
lock (lock2) {
// Do something
}
}
}
At another point you acquire lock2 before lock1:
public void g() {
lock (lock2) {
lock (lock1) {
// Do something else
}
}
}
Let's see how this might deadlock:
thread A calls f
thread A acquires lock1
context switch
thread B calls g
thread B acquires lock2
thread B tries to acquire lock1 but has to wait
context switch
thread A tries to acquire lock2 but has to wait
context switch
At this point thread A and B are waiting for each other and are deadlocked.
There are two kinds of people that do not use multi threading.
1) Those that do not understand the concept and have no clue how to program it.
2) Those that completely understand the concept and know how difficult it is to get it right.
I'd make a very blatant statement:
DON'T use shared memory.
DO use message passing.
As a general advice, try to limit the amount of shared state and prefer more event-driven architectures.
I can't give you examples besides pointing you at Google. Search for threading basics, thread synchronisation and you'll get more hits than you know.
The basic problem with threading is that threads don't know about each other - so they will happily tread on each others toes, like 2 people trying to get through 1 door, sometimes they will pass though one after the other, but sometimes they will both try to get through at the same time and will get stuck. This is difficult to reproduce, difficult to debug, and sometimes causes problems. If you have threads and see "random" failures, this is probably the problem.
So care needs to be taken with shared resources. If you and your friend want a coffee, but there's only 1 spoon you cannot both use it at the same time, one of you will have to wait for the other. The technique used to 'synchronise' this access to the shared spoon is locking. You make sure you get a lock on the shared resource before you use it, and let go of it afterwards. If someone else has the lock, you wait until they release it.
Next problem comes with those locks, sometimes you can have a program that is complex, so much that you get a lock, do something else then access another resource and try to get a lock for that - but some other thread has that 2nd resource, so you sit and wait... but if that 2nd thread is waiting for the lock you hold for the 1st resource.. it's going to sit and wait. And your app just sits there. This is called deadlock, 2 threads both waiting for each other.
Those 2 are the vast majority of thread issues. The answer is generally to lock for as short a time as possible, and only hold 1 lock at a time.
I notice you are writing in java and that nobody else mentioned books so Java Concurrency In Practice should be your multi-threaded bible.
-- What are some known thread issues? --
Race conditions.
Deadlocks.
Livelocks.
Thread starvation.
-- What care should be taken while using threads? --
Using multi-threading on a single-processor machine to process multiple tasks where each task takes approximately the same time isn’t always very effective.For example, you might decide to spawn ten threads within your program in order to process ten separate tasks. If each task takes approximately 1 minute to process, and you use ten threads to do this processing, you won’t have access to any of the task results for the whole 10 minutes. If instead you processed the same tasks using just a single thread, you would see the first result in 1 minute, the next result 1 minute later, and so on. If you can make use of each result without having to rely on all of the results being ready simultaneously, the single
thread might be the better way of implementing the program.
If you launch a large number of threads within a process, the overhead of thread housekeeping and context switching can become significant. The processor will spend considerable time in switching between threads, and many of the threads won’t be able to make progress. In addition, a single process with a large number of threads means that threads in other processes will be scheduled less frequently and won’t receive a reasonable share of processor time.
If multiple threads have to share many of the same resources, you’re unlikely to see performance benefits from multi-threading your application. Many developers see multi-threading as some sort of magic wand that gives automatic performance benefits. Unfortunately multi-threading isn’t the magic wand that it’s sometimes perceived to be. If you’re using multi-threading for performance reasons, you should measure your application’s performance very closely in several different situations, rather than just relying on some non-existent magic.
Coordinating thread access to common data can be a big performance killer. Achieving good performance with multiple threads isn’t easy when using a coarse locking plan, because this leads to low concurrency and threads waiting for access. Alternatively, a fine-grained locking strategy increases the complexity and can also slow down performance unless you perform some sophisticated tuning.
Using multiple threads to exploit a machine with multiple processors sounds like a good idea in theory, but in practice you need to be careful. To gain any significant performance benefits, you might need to get to grips with thread balancing.
-- Please provide examples. --
For example, imagine an application that receives incoming price information from
the network, aggregates and sorts that information, and then displays the results
on the screen for the end user.
With a dual-core machine, it makes sense to split the task into, say, three threads. The first thread deals with storing the incoming price information, the second thread processes the prices, and the final thread handles the display of the results.
After implementing this solution, suppose you find that the price processing is by far the longest stage, so you decide to rewrite that thread’s code to improve its performance by a factor of three. Unfortunately, this performance benefit in a single thread may not be reflected across your whole application. This is because the other two threads may not be able to keep pace with the improved thread. If the user interface thread is unable to keep up with the faster flow of processed information, the other threads now have to wait around for the new bottleneck in the system.
And yes, this example comes directly from my own experience :-)
DONT use global variables
DONT use many locks (at best none at all - though practically impossible)
DONT try to be a hero, implementing sophisticated difficult MT protocols
DO use simple paradigms. I.e share the processing of an array to n slices of the same size - where n should be equal to the number of processors
DO test your code on different machines (using one, two, many processors)
DO use atomic operations (such as InterlockedIncrement() and the like)
YAGNI
The most important thing to remember is: do you really need multithreading?
I agree with pretty much all the answers so far.
A good coding strategy is to minimise or eliminate the amount of data that is shared between threads as much as humanly possible. You can do this by:
Using thread-static variables (although don't go overboard on this, it will eat more memory per thread, depending on your O/S).
Packaging up all state used by each thread into a class, then guaranteeing that each thread gets exactly one state class instance to itself. Think of this as "roll your own thread-static", but with more control over the process.
Marshalling data by value between threads instead of sharing the same data. Either make your data transfer classes immutable, or guarantee that all cross-thread calls are synchronous, or both.
Try not to have multiple threads competing for the exact same I/O "resource", whether it's a disk file, a database table, a web service call, or whatever. This will cause contention as multiple threads fight over the same resource.
Here's an extremely contrived OTT example. In a real app you would cap the number of threads to reduce scheduling overhead:
All UI - one thread.
Background calcs - one thread.
Logging errors to a disk file - one thread.
Calling a web service - one thread per unique physical host.
Querying the database - one thread per independent group of tables that need updating.
Rather than guessing how to do divvy up the tasks, profile your app and isolate those bits that are (a) very slow, and (b) could be done asynchronously. Those are good candidates for a separate thread.
And here's what you should avoid:
Calcs, database hits, service calls, etc - all in one thread, but spun up multiple times "to improve performance".
Don't start new threads unless you really need to. Starting threads is not cheap and for short running tasks starting the thread may actually take more time than executing the task itself. If you're on .NET take a look at the built in thread pool, which is useful in a lot of (but not all) cases. By reusing the threads the cost of starting threads is reduced.
EDIT: A few notes on creating threads vs. using thread pool (.NET specific)
Generally try to use the thread pool. Exceptions:
Long running CPU bound tasks and blocking tasks are not ideal run on the thread pool cause they will force the pool to create additional threads.
All thread pool threads are background threads, so if you need your thread to be foreground, you have to start it yourself.
If you need a thread with different priority.
If your thread needs more (or less) than the standard 1 MB stack space.
If you need to be able to control the life time of the thread.
If you need different behavior for creating threads than that offered by the thread pool (e.g. the pool will throttle creating of new threads, which may or may not be what you want).
There are probably more exceptions and I am not claiming that this is the definitive answer. It is just what I could think of atm.
I am applying my new found knowledge of threading everywhere
[Emphasis added]
DO remember that a little knowledge is dangerous. Knowing the threading API of your platform is the easy bit. Knowing why and when you need to use synchronisation is the hard part. Reading up on "deadlocks", "race-conditions", "priority inversion" will start you in understanding why.
The details of when to use synchronisation are both simple (shared data needs synchronisation) and complex (atomic data types used in the right way don't need synchronisation, which data is really shared): a lifetime of learning and very solution specific.
An important thing to take care of (with multiple cores and CPUs) is cache coherency.
I am surprised that no one has pointed out Herb Sutter's Effective Concurrency columns yet. In my opinion, this is a must read if you want to go anywhere near threads.
a) Always make only 1 thread responsible for a resource's lifetime. That way thread A won't delete a resource thread B needs - if B has ownership of the resource
b) Expect the unexpected
DO think about how you will test your code and set aside plenty of time for this. Unit tests become more complicated. You may not be able to manually test your code - at least not reliably.
DO think about thread lifetime and how threads will exit. Don't kill threads. Provide a mechanism so that they exit gracefully.
DO add some kind of debug logging to your code - so that you can see that your threads are behaving correctly both in development and in production when things break down.
DO use a good library for handling threading rather than rolling your own solution (if you can). E.g. java.util.concurrency
DON'T assume a shared resource is thread safe.
DON'T DO IT. E.g. use an application container that can take care of threading issues for you. Use messaging.
In .Net one thing that surprised me when I started trying to get into multi-threading is that you cannot straightforwardly update the UI controls from any thread other than the thread that the UI controls were created on.
There is a way around this, which is to use the Control.Invoke method to update the control on the other thread, but it is not 100% obvious the first time around!
Don't be fooled into thinking you understand the difficulties of concurrency until you've split your head into a real project.
All the examples of deadlocks, livelocks, synchronization, etc, seem simple, and they are. But they will mislead you, because the "difficulty" in implementing concurrency that everyone is talking about is when it is used in a real project, where you don't control everything.
While your initial differences in sums of numbers are, as several respondents have pointed out, likely to be the result of lack of synchronisation, if you get deeper into the topic, be aware that, in general, you will not be able to reproduce exactly the numeric results you get on a serial program with those from a parallel version of the same program. Floating-point arithmetic is not strictly commutative, associative, or distributive; heck, it's not even closed.
And I'd beg to differ with what, I think, is the majority opinion here. If you are writing multi-threaded programs for a desktop with one or more multi-core CPUs, then you are working on a shared-memory computer and should tackle shared-memory programming. Java has all the features to do this.
Without knowing a lot more about the type of problem you are tackling, I'd hesitate to write that 'you should do this' or 'you should not do that'.

Resources