Nested threads in C++ 11 multithread programming - multithreading

At first, I create four threads and each of them will call a GPU function. However, within each of the four, I also want to create two threads. One is to read data from the disk and the other is to do the computation. I am not sure if I can create a nested thread in C++. I think this is not a neat code. Can I have another way to solve the problem?

In general it should be no problem to create a new thread from a running thread.
Like you assume it's not the best solution, because creating/destroying threads often isn't cheap and the more threads you have the more context switches you have which is or can be also a performance penalty.
So you could create a thread pool which has given number of threads and let the thread pool threads work on reading data from disk and do computations. You would avoid massive creation and destroying of threads.
If you also create/destroy often the threads who are calling GPU functions you could create two threadpools one for the threads calling the GPU functions and one threadpool for reading from disk and computations.

You could use std::async and do away with thread management entirely. Or use a hybrid approach where you have the 4 core threads which I assume will never die and then in those functions where you wish to perform more asynchronous work you can use std::async.
https://solarianprogrammer.com/2012/10/17/cpp-11-async-tutorial/
It's not clear whether async tasks are using thread pools. If you want to ensure high performance, which you probably care about since you're using GPUs, you should use a thread pool.
http://roar11.com/2016/01/a-platform-independent-thread-pool-using-c14/

Related

Bind multiple threads to multiple CPU?

I have multiple threads that are accessing the same data and it is too painful to make them thread safe. Therefore, they are now forced to run only on one CPU core using CPU affinity and only one thread can be running at the same time.
I was wondering if it is possible to group these threads and let them float to other CPU cores all together ? In this way, I don't have to spare one CPU core for these threads.
This is based on Unix/BSD platform
There is no way to do this on Windows. I don't know about Unix/Linux but I doubt it's possible.
Note, that this does not make your system thread-safe. Even on uni-processor machines thread-safety is a concern.
i++
is not atomic. Two thread can both read i, then compute i+1, then write i. That results in a lost update.
You need to throw this approach away. Probably, you should be using a global lock that you hold around accesses to shared mutable state. That makes all these concerns go away and is reasonably simple to implement.
Either make the code thread safe or use just one thread.
The simplest solution is probably the "one big lock" model. With this model, one lock protects all the data that's shared among the threads and not handled in a thread-safe way. The threads start out always holding the lock. Then you identify all the points where the threads can block and release the lock during that block.

What is the difference between threads and lightweight threads?

I don't quite understand the difference between threads and lightweight threads. From an API perspective both types of threads are identical so where exactly does the difference come in. Is it at the implementation level where a lightweight thread is managed by a higher level runtime than the OS thread scheduler or is it something else? Also, is there set of heuristics that people use to decide which type of thread to use in specific scenarios?
In what context, lightweight threads could represent threads which are implemented by a library, for example threads can be simulated in a library by switching between lightweight threads at an event handling layer, these lightweight threads are queued up and processed by a singe OS thread, the advantage of this is that since context switching is handled in the library switching can occur when the processing of data is complete and so the data does not need to be loaded back into the CPU's cache next time this lightweight thread becomes active.
Lightweight threads could also refer to co-operative threads (or fibers), these are threads where you have to explicitly yield to give other lightweight threads a chance, this has the same advantage in that the context switching can occur at a place you know you have finished processing some data and so you know it will not be need again.
Alternativly Lightweight threads could mean normal OS threads and the non-lightweight threads could mean processes, process have at least one thread within them and also have there own memory and other resources, they are more expensive than threads because you can not share data between thread easily and it can be a more expensive operation for the OS to create processes.

Can thread creation within OS internals run concurrently?

Suppose we have a dual-core machine with a mainstream, modern OS capable to utilize both the cores.
If I have two threads, P1 and Q1 within the same process, and they happen to commence creating child threads, say, P2 and Q2, at approximately the same machine cycle, will OS perform the thread creation concurrently?
I heard thread creation is expensive, so the question came forth...
Thanks in advance.
Any reasonably well designed OS can have multiple processors executing kernel code at the same time. Therefore some of the tasks involved in a thread creation can be happening concurrently. But there will be some necessary serialization to manipulate some shared data structures (e.g. allocating memory, inserting a newly created threat structure into a global list). The processors could contend for the same lock thereby reducing concurrency.
Systems/applications which make new threads so often that the overhead of thread creation actually matters are probably designed wrong (doing too little useful work in a thread relative to the startup time, and not taking advantage of the obvious optimization of reusing short-lived threads from a pool).
It will be sorta-concurrently. There are aspects of thread-creation that cannot proceed in parallel - it would be unfortunate if the kernel memory-manager allocated both threads the same stack!
Thread creation is sufficiently expensive that it's worth while avoiding doing it at all during an app. run, hence the popularity of thread pools. Long-running tasks that block can be threaded off and left for the life of the app - often this means that explicit thread termination, (awkward at best, almost impossible at worst, from user code), is not necessary.
I think developers continually start and stop threads because they like to think of them as 'functions', where you 'pass parameters' in at the start and 'return' results when the thread ends. Ths is not the best way of conceptualizing threads.

How to deal with OpenMP thread pool contention

I'm working on an application that uses both coarse and fine grained multi-threading. That is, we manage scheduling of large work units on a pool of threads manually, and then within those work units certain functions utilize OpenMP for finer grain multithreading.
We have realized gains by selectively using OpenMP in our costliest loops, but are concerned about creating contention for the OpenMP worker pool as we add OpenMP blocks to cheaper loops. Is there a way to signal to OpenMP that a block of code should use the pool if it is available, and if not it should process the loop serially?
you can use omp_set_num_threads(int) to set no. of threads inside a pool. Then the compiler will try to create a pool of threads if possible and schedule them. if it is not possible to create pool, then it will create as many threads as possible and run others in serial fashion.
for more info try this link
You may be able to do what you want by clever use of omp_get_num_threads, omp_set_num_threads, and if and num_threads clauses on parallel directives. OpenMP 3.0 also provides tasks which might be useful.

Thread Pool vs Thread Spawning

Can someone list some comparison points between Thread Spawning vs Thread Pooling, which one is better? Please consider the .NET framework as a reference implementation that supports both.
Thread pool threads are much cheaper than a regular Thread, they pool the system resources required for threads. But they have a number of limitations that may make them unfit:
You cannot abort a threadpool thread
There is no easy way to detect that a threadpool completed, no Thread.Join()
There is no easy way to marshal exceptions from a threadpool thread
You cannot display any kind of UI on a threadpool thread beyond a message box
A threadpool thread should not run longer than a few seconds
A threadpool thread should not block for a long time
The latter two constraints are a side-effect of the threadpool scheduler, it tries to limit the number of active threads to the number of cores your CPU has available. This can cause long delays if you schedule many long running threads that block often.
Many other threadpool implementations have similar constraints, give or take.
A "pool" contains a list of available "threads" ready to be used whereas "spawning" refers to actually creating a new thread.
The usefulness of "Thread Pooling" lies in "lower time-to-use": creation time overhead is avoided.
In terms of "which one is better": it depends. If the creation-time overhead is a problem use Thread-pooling. This is a common problem in environments where lots of "short-lived tasks" need to be performed.
As pointed out by other folks, there is a "management overhead" for Thread-Pooling: this is minimal if properly implemented. E.g. limiting the number of threads in the pool is trivial.
For some definition of "better", you generally want to go with a thread pool. Without knowing what your use case is, consider that with a thread pool, you have a fixed number of threads which can all be created at startup or can be created on demand (but the number of threads cannot exceed the size of the pool). If a task is submitted and no thread is available, it is put into a queue until there is a thread free to handle it.
If you are spawning threads in response to requests or some other kind of trigger, you run the risk of depleting all your resources as there is nothing to cap the amount of threads created.
Another benefit to thread pooling is reuse - the same threads are used over and over to handle different tasks, rather than having to create a new thread each time.
As pointed out by others, if you have a small number of tasks that will run for a long time, this would negate the benefits gained by avoiding frequent thread creation (since you would not need to create a ton of threads anyway).
My feeling is that you should start just by creating a thread as needed... If the performance of this is OK, then you're done. If at some point, you detect that you need lower latency around thread creation you can generally drop in a thread pool without breaking anything...
All depends on your scenario. Creating new threads is resource intensive and an expensive operation. Most very short asynchronous operations (less than a few seconds max) could make use of the thread pool.
For longer running operations that you want to run in the background, you'd typically create (spawn) your own thread. (Ab)using a platform/runtime built-in threadpool for long running operations could lead to nasty forms of deadlocks etc.
Thread pooling is usually considered better, because the threads are created up front, and used as required. Therefore, if you are using a lot of threads for relatively short tasks, it can be a lot faster. This is because they are saved for future use and are not destroyed and later re-created.
In contrast, if you only need 2-3 threads and they will only be created once, then this will be better. This is because you do not gain from caching existing threads for future use, and you are not creating extra threads which might not be used.
It depends on what you want to execute on the other thread.
For short task it is better to use a thread pool, for long task it may be better to spawn a new thread as it could starve the thread pool for other tasks.
The main difference is that a ThreadPool maintains a set of threads that are already spun-up and available for use, because starting a new thread can be expensive processor-wise.
Note however that even a ThreadPool needs to "spawn" threads... it usually depends on workload - if there is a lot of work to be done, a good threadpool will spin up new threads to handle the load based on configuration and system resources.
There is little extra time required for creating/spawning thread, where as thread poll already contains created threads which are ready to be used.
This answer is a good summary but just in case, here is the link to Wikipedia:
http://en.wikipedia.org/wiki/Thread_pool_pattern
For Multi threaded execution combined with getting return values from the execution, or an easy way to detect that a threadpool has completed, java Callables could be used.
See https://blogs.oracle.com/CoreJavaTechTips/entry/get_netbeans_6 for more info.
Assuming C# and Windows 7 and up...
When you create a thread using new Thread(), you create a managed thread that becomes backed by a native OS thread when you call Start – a one to one relationship. It is important to know only one thread runs on a CPU core at any given time.
An easier way is to call ThreadPool.QueueUserWorkItem (i.e. background thread), which in essence does the same thing, except those background threads aren’t forever tied to a single native thread. The .NET scheduler will simulate multitasking between managed threads on a single native thread. With say 4 cores, you’ll have 4 native threads each running multiple managed threads, determined by .NET. This offers lighter-weight multitasking since switching between managed threads happens within the .NET VM not in the kernel. There is some overhead associated with crossing from user mode to kernel mode, and the .NET scheduler minimizes such crossing.
It may be important to note that heavy multitasking might benefit from pure native OS threads in a well-designed multithreading framework. However, the performance benefits aren’t that much.
With using the ThreadPool, just make sure the minimum worker thread count is high enough or ThreadPool.QueueUserWorkItem will be slower than new Thread(). In a benchmark test looping 512 times calling new Thread() left ThreadPool.QueueUserWorkItem in the dust with default minimums. However, first setting the minimum worker thread count to 512, in this test, made new Thread() and ThreadPool.QueueUserWorkItem perform similarly.
A side effective of setting a high worker thread count is that new Task() (or Task.Factory.StartNew) also performed similarly as new Thread() and ThreadPool.QueueUserWorkItem.

Resources