Which boost multithreading design pattern should I use?

Which boost multithreading design pattern should I use? - multithreading

I'm getting started with boost for multi-threading to port my program to window ( from pthread of linux ) . Is there anyone familiar with it ? Any suggestion on which pattern should I use ?
Here are the requirements:
I have many threads most of the time running the same thing with different parameters,
All threads shared a memory location called "critical memory" (an array)
Synchronization has to be done with a "barrier" at certain iteration
requires highest parallelization if possible i.e good scheduling with same priority for all threads ( currently I let CPU does the job, but I find out that boost has threadpool with thread.schedule() not sure if i should use )
For pthread, every thing is function, so I'm not sure if I should convert it to object, what's the advantage then ?. A little bit of confusion after reading this tutorial http://antonym.org/2009/05/threading-with-boost---part-i-creating-threads.html so many options to use...
Thanks in advance

porting should be quite straightforward:
I have many threads most of the time
running the same thing with different
parameters,
create required number of threads with functor that binds your different parameters, like:
boost::thread thr1(boost::bind(your_thread_func, arg1, arg2));
All threads shared a memory location
called "critical memory" (an array)
nothing special here, just use boost::mutex to synchronize access (or another mutex type if you have special requirements)
Synchronization has to be done with a
"barrier" at certain iteration
use boost::barrier: http://www.boost.org/doc/libs/1_45_0/doc/html/thread/synchronization.html#thread.synchronization.barriers
requires highest parallelization if possible i.e good scheduling with same priority for all threads ( currently I let CPU does the job, but I find out that boost has threadpool with thread.schedule() not sure if i should use )
CPU? probably you meant OS scheduler. it's the simplest possible solution and in most cases satisfactory one. threadpool is not a part of boost and tbh I'm not familiar with it. boost thread doesn't have scheduler.
I don't know anything about your task and its parallelization potential, so I'll suppose it can be parallelized to more threads than you have cores. theoretically, to receive highest performance you need to smoothly distribute your work among thread number = number of your cores (including virtual ones). it's not the easiest task and you can use ready-for-use solutions. e.g. Intel Threading Building Blocks (GPL license) or even Boost Asio. Despite its main purpose is network communication, Asio has its dispatcher and you can use it as a thread pool. Just create optimal number of threads (number of cores?), boost::asio::io_service object and run it from all threads. post work to thread pool by io_service::post()

In my opinion, to use win32 port of pthread is more straightforward way to accomplish such tasks.
EDITED:
Last month I converted 1 year old project source code to pthread-w32 from Boost.Thread. Boost.Thread provides lots of good thing but if you are working on the old fashion thread routines Boost.Thread could be too mush hassle.

Related

User level threads vs Kernel level threads

I'm aware that User Level threads are created on the User Mode( no privileges) and Kernel threads are created in the Kernel Mode( privileged).
I am also aware that Processor threads are hardware threads that operate on Kernel Threads( I hope I am correct by putting it in this way)
Here is my confusion:-
User Level threads are not recognized by the OS as they are created, maintained and destroyed on the User Level. The OS doesn't see a multithreaded process from the User Mode as being multithreaded. It treats it as a single threaded process. Therefore, this program cannot take advantage of Multiprocessing, I guess it cannot take advantage of hyperthreading as well since it appears as single threaded in the OS.
So what's the use of Multithreading in this case? I mean the computation time will still be the same🤷‍♂️.
The last question is, do POSIX thread API and OPenMP create user level threads or Kernel threads?
I know what both libraries are, please don't explain about that.
If none creates Kernel threads then how do we create a multithreaded program that takes advantage of multiprocessing?

...what's the use of Multithreading in this case?
Multithreading is older than multiprocessing. Multithreading is one model of concurrent computing. That is to say, it's a way to write a computer program in which different activities are allowed to happen independently from each other. A classic example is a multi-user network server that creates a new thread for each connected client. Each thread can talk to its own client in a simple, synchronous way even though there may be no synchrony between what the different clients want to do. You don't need to have multiple CPUs for that.
When multi-CPU computers were invented, using multiple threads to exploit them for parallel processing was a natural and obvious choice.
I mean the computation time [for a green-threaded program that cannot exploit multiple CPUs] will still be the same.
That is true, but depending on what the different activities are that the program performs concurrently, the multi-threaded version of it may be easier to read and understand* than a program that's built around a different model of concurrency.
The reason is, we all were taught to write single-threaded, synchronous code when we were beginners. We understood that we were writing instructions that "the computer" would follow. We now say "a thread" instead of saying "the computer," but otherwise, the code executed by each thread can be mostly similar to the style of code that we wrote as beginners.
Part of what makes it so simple is, that the state of each of the concurrent activities can be mostly implicit in the contexts and the local variables (i.e., the stacks) of the different threads. If you choose a different model of concurrency (e.g., an event driven model) then you may have to explicitly represent more of that state with (maybe complex) data structures.
* Easier to read but not necessarily easier to write without making subtle mistakes. But, when I started working with large teams of software developers, they taught me that I'd be reading about ten lines of code for every one line that I wrote, so "easier to read but harder to write" turns out to be a win in the long run.

Pure user level threads are (as you pointed out) not a lot of use because they don't allow you to exploit the processing capability of multiple cores within a process.
The flip-side is that pure kernel level threads will typically incur substantial overheads when switch between threads. (There are things that you can do to deal with that, but ... that's another topic.) But the upshot is that the overheads make it inefficient to preform small tasks (units of work) using kernel level threads.
Another alternative to both is a hybrid of user level and kernel level threads. For example, suppose:
each process has one kernel level thread for each physical core,
each kernel level thread can switch between a bunch of user level threads and,
switching between a user level threads is handled by a scheduler in user space.
The Java Loom project is developing a new threading model (roughly) along those lines. Classic Java threads are still kernel level threads. New virtual threads are user level threads. A Java program gets to choose whether it uses classic or virtual threads ... or both.
There is a lot of material on Loom on the web; e.g.
https://blogs.oracle.com/javamagazine/post/java-loom-virtual-threads-platform-threads
https://www.infoq.com/news/2022/05/virtual-threads-for-jdk19/
https://wiki.openjdk.org/display/loom/Main
Loom is likely to be part of the next Java release: Java 19.
I'm pretty sure that (C / C++) POSIX threads are kernel level. I don't know about OpenMPI threads, but I'd expect they are kernel level too. (They wouldn't be fit for purpose as pure user level threads.)
I have heard of hybrid threading models for C / C++, though I don't know about actual implementations. Look for articles, etcetera that talk about Threads vs Fibres.

Can two threads within the same process run simultaneously on two processors? Why / Why not?

I don't understand. Isn't this the whole idea of multi-threading?
Edit: Question modified from "Why two threads within the same process cannot run simultaneously on two processors?".

In the article you link to, it lists this as a limitation of user-level threads (that are implemented by an application itself, without being backed by OS-level threads).
That's correct, but it does not apply to "real" threads. The OS is free to schedule them across multiple processors.
Now that most operating systems have robust support for multithreading, I believe that those user-level threads are a thing of the past.
So, yes, the whole point of multi-threading is to be able to run code in parallel on as many CPU as you want to assign to it. And "user-level threads" were a workaround for platforms without proper native thread support, and it was limited in the way you describe (no multiple CPU for a single application process).

Openmp thread divergence?

The term thread divergence is used in CUDA; from my understanding it's a situation where different threads are assigned to do different tasks and this results in a big performance hit.
I was wondering, is there a similar penalty for doing this in openmp? For example, say I have a 6 core processor and a program with 6 threads. If I have a conditional that makes 3 threads perform a certain task, and then have the other three threads perform a completely different task, will there be a big performance hit? I guess in essence it's sort of using openmp to do MIMD.
Basically, I'm writing a program with openmp and CUDA. I want two threads to run a CUDA kernel while the other left over threads run C code. Thanks.

No, there is no performance hit for diverging threads using OpenMP. It is a problem in CUDA because of the way instructions are broadcast simultaneously to a set of cores. When an OpenMP thread targets a CPU core, each CPU core has its own independent set of instructions to follow, and it runs just like any other single-threaded program would.
You may see some of your cores being underutilized if you have synchronization barriers following thread divergence, because that would force faster threads to wait for the slower threads to catch up.

When speaking about CPU parallelism, there's no intrinsic performance hit from using a certain threading design pattern. Not at the theoretical level at least.
The only problem I see is that since the threads are doing different things which may have varying completion times, some of the threads may sit idle after finishing their work, waiting for the others to finish a longer task.

The term thread divergence in CUDA refers to the situation when not all threads of a bock evaluate a conditional with the same outcome. Such threads are said to diverge. If diverging threads are in the same warp then such threads may perform work serially which leads to performance loss.
I am not sure that OpenMP has the same issue, though. When different threads perform different work then load balancing may be used by the runtime perhaps, but it doesn't lead to work serialization necessarily.

there is no this kind of problem in openmp because every openmp thread has its own PC.

TPL Tasks, Threads, etc

Could someone clear up to me how these things correlate:
Task
Thread
ThreadPool's thread
Paraller.For/ForEach/Invoke
I.e. when I create a Task and run it, where does it get a thread to execute on? And when I call Parallel.* what is really going on under the covers?
Any links to articles, blogposts, etc are also very welcomed!

The ideal state of a system is to have 1 actively running thread per CPU core. By defining work in more general terms of "tasks", the TPL can dynamically decide how many threads to use and which tasks to do on each one in order to come closest to achieving that ideal state. These are decisions that are almost always best made dynamically at runtime because when writing the code you can't know for sure how many CPU cores will be available to your application, how busy they are with other work, etc.

Thread: is a real OS thread, has handle and ID.
ThreadPool: is a collection of already-created OS Threads. These threads are owned/maintained by the runtime, and your code is only allowed to "borrow" them for a while, you can only do work short-termed work in these threads, and you can't modify any thread state, nor delete these threads.
Best guesses on these two:
Task: might get run on a pre-created thread in the thread pool, or might get run as part of user-mode scheduling, this is all depending on what the runtime thinks is best. Another guess: with TPL, the user-mode scheduling is NOT based on OS Fibers, but is its own complete (and working) implementation).
Parallel.For: actually, no clue how this is implemented. The runtime might create new threads to do the parallel bits, or much more likely use the thread pool's threads for the parallelism.

Why might threads be considered "evil"?

I was reading the SQLite FAQ, and came upon this passage:
Threads are evil. Avoid them.
I don't quite understand the statement "Thread are evil". If that is true, then what is the alternative?
My superficial understanding of threads is:
Threads make concurrence happen. Otherwise, the CPU horsepower will be wasted, waiting for (e.g.) slow I/O.
But the bad thing is that you must synchronize your logic to avoid contention and you have to protect shared resources.
Note: As I am not familiar with threads on Windows, I hope the discussion will be limited to Linux/Unix threads.

When people say that "threads are evil", the usually do so in the context of saying "processes are good". Threads implicitly share all application state and handles (and thread locals are opt-in). This means that there are plenty of opportunities to forget to synchronize (or not even understand that you need to synchronize!) while accessing that shared data.
Processes have separate memory space, and any communication between them is explicit. Furthermore, primitives used for interprocess communication are often such that you don't need to synchronize at all (e.g. pipes). And you can still share state directly if you need to, using shared memory, but that is also explicit in every given instance. So there are fewer opportunities to make mistakes, and the intent of the code is more explicit.

Simple answer the way I understand it...
Most threading models use "shared state concurrency," which means that two execution processes can share the same memory at the same time. If one thread doesn't know what the other is doing, it can modify the data in a way that the other thread doesn't expect. This causes bugs.
Threads are "evil" because you need to wrap your mind around n threads all working on the same memory at the same time, and all of the fun things that go with it (deadlocks, racing conditions, etc).
You might read up about the Clojure (immutable data structures) and Erlang (message passsing) concurrency models for alternative ideas on how to achieve similar ends.

What makes threads "evil" is that once you introduce more than one stream of execution into your program, you can no longer count on your program to behave in a deterministic manner.
That is to say: Given the same set of inputs, a single-threaded program will (in most cases) always do the same thing.
A multi-threaded program, given the same set of inputs, may well do something different every time it is run, unless it is very carefully controlled. That is because the order in which the different threads run different bits of code is determined by the OS's thread scheduler combined with a system timer, and this introduces a good deal of "randomness" into what the program does when it runs.
The upshot is: debugging a multi-threaded program can be much harder than debugging a single-threaded program, because if you don't know what you are doing it can be very easy to end up with a race condition or deadlock bug that only appears (seemingly) at random once or twice a month. The program will look fine to your QA department (since they don't have a month to run it) but once it's out in the field, you'll be hearing from customers that the program crashed, and nobody can reproduce the crash.... bleah.
To sum up, threads aren't really "evil", but they are strong juju and should not be used unless (a) you really need them and (b) you know what you are getting yourself into. If you do use them, use them as sparingly as possible, and try to make their behavior as stupid-simple as you possibly can. Especially with multithreading, if anything can go wrong, it (sooner or later) will.

I would interpret it another way. It's not that threads are evil, it's that side-effects are evil in a multithreaded context (which is a lot less catchy to say).
A side effect in this context is something that affects state shared by more than one thread, be it global or just shared. I recently wrote a review of Spring Batch and one of the code snippets used is:
private static Map<Long, JobExecution> executionsById = TransactionAwareProxyFactory.createTransactionalMap();
private static long currentId = 0;
public void saveJobExecution(JobExecution jobExecution) {
Assert.isTrue(jobExecution.getId() == null);
Long newId = currentId++;
jobExecution.setId(newId);
jobExecution.incrementVersion();
executionsById.put(newId, copy(jobExecution));
}
Now there are at least three serious threading issues in less than 10 lines of code here. An example of a side effect in this context would be updating the currentId static variable.
Functional programming (Haskell, Scheme, Ocaml, Lisp, others) tend to espouse "pure" functions. A pure function is one with no side effects. Many imperative languages (eg Java, C#) also encourage the use of immutable objects (an immutable object is one whose state cannot change once created).
The reason for (or at least the effect of) both of these things is largely the same: they make multithreaded code much easier. A pure function by definition is threadsafe. An immutable object by definition is threadsafe.
The advantage processes have is that there is less shared state (generally). In traditional UNIX C programming, doing a fork() to create a new process would result in shared process state and this was used as a means of IPC (inter-process communication) but generally that state is replaced (with exec()) with something else.
But threads are much cheaper to create and destroy and they take less system resources (in fact, the operating itself may have no concept of threads yet you can still create multithreaded programs). These are called green threads.

The paper you linked to seems to explain itself very well. Did you read it?
Keep in mind that a thread can refer to the programming-language construct (as in most procedural or OOP languages, you create a thread manually, and tell it to executed a function), or they can refer to the hardware construct (Each CPU core executes one thread at a time).
The hardware-level thread is obviously unavoidable, it's just how the CPU works. But the CPU doesn't care how the concurrency is expressed in your source code. It doesn't have to be by a "beginthread" function call, for example. The OS and the CPU just have to be told which instruction threads should be executed.
His point is that if we used better languages than C or Java with a programming model designed for concurrency, we could get concurrency basically for free. If we'd used a message-passing language, or a functional one with no side-effects, the compiler would be able to parallelize our code for us. And it would work.

Threads aren't any more "evil" than hammers or screwdrivers or any other tools; they just require skill to utilize. The solution isn't to avoid them; it's to educate yourself and up your skill set.

Creating a lot of threads without constraint is indeed evil.. using a pooling mechanisme (threadpool) will mitigate this problem.
Another way threads are 'evil' is that most framework code is not designed to deal with multiple threads, so you have to manage your own locking mechanisme for those datastructures.
Threads are good, but you have to think about how and when you use them and remember to measure if there really is a performance benefit.

A thread is a bit like a light weight process. Think of it as an independent path of execution within an application. The thread runs in the same memory space as the application and therefore has access to all the same resources, global objects and global variables.
The good thing about them: you can parallelise a program to improve performance. Some examples, 1) In an image editing program a thread may run the filter processing independently of the GUI. 2) Some algorithms lend themselves to multiple threads.
Whats bad about them? if a program is poorly designed they can lead to deadlock issues where both threads are waiting on each other to access the same resource. And secondly, program design can me more complex because of this. Also, some class libraries don't support threading. e.g. the c library function "strtok" is not "thread safe". In other words, if two threads were to use it at the same time they would clobber each others results. Fortunately, there are often thread safe alternatives... e.g. boost library.
Threads are not evil, they can be very useful indeed.
Under Linux/Unix, threading hasn't been well supported in the past although I believe Linux now has Posix thread support and other unices support threading now via libraries or natively. i.e. pthreads.
The most common alternative to threading under Linux/Unix platforms is fork. Fork is simply a copy of a program including it's open file handles and global variables. fork() returns 0 to the child process and the process id to the parent. It's an older way of doing things under Linux/Unix but still well used. Threads use less memory than fork and are quicker to start up. Also, inter process communications is more work than simple threads.

In a simple sense you can think of a thread as another instruction pointer in the current process. In other words it points the IP of another processor to some code in the same executable. So instead of having one instruction pointer moving through the code there are two or more IP's executing instructions from the same executable and address space simultaneously.
Remember the executable has it's own address space with data / stack etc... So now that two or more instructions are being executed simultaneously you can imagine what happens when more than one of the instructions wants to read/write to the same memory address at the same time.
The catch is that threads are operating within the process address space and are not afforded protection mechanisms from the processor that full blown processes are. (Forking a process on UNIX is standard practice and simply creates another process.)
Out of control threads can consume CPU cycles, chew up RAM, cause execeptions etc.. etc.. and the only way to stop them is to tell the OS process scheduler to forcibly terminate the thread by nullifying it's instruction pointer (i.e. stop executing). If you forcibly tell a CPU to stop executing a sequence of instructions what happens to the resources that have been allocated or are being operated on by those instructions? Are they left in a stable state? Are they properly freed? etc...
So, yes, threads require more thought and responsibility than executing a process because of the shared resources.

For any application that requires stable and secure execution for long periods of time without failure or maintenance, threads are always a tempting mistake. They invariably turn out to be more trouble than they are worth. They produce rapid results and prototypes that seem to be performing correctly but after a couple weeks or months running you discover that they have critical flaws.
As mentioned by another poster, once you use even a single thread in your program you have now opened a non-deterministic path of code execution that can produce an almost infinite number of conflicts in timing, memory sharing and race conditions. Most expressions of confidence in solving these problems are expressed by people who have learned the principles of multithreaded programming but have yet to experience the difficulties in solving them.
Threads are evil. Good programmers avoid them wherever humanly possible. The alternative of forking was offered here and it is often a good strategy for many applications. The notion of breaking your code down into separate execution processes which run with some form of loose coupling often turns out to be an excellent strategy on platforms that support it. Threads running together in a single program is not a solution. It is usually the creation of a fatal architectural flaw in your design that can only be truly remedied by rewriting the entire program.
The recent drift towards event oriented concurrency is an excellent development innovation. These kinds of programs usually prove to have great endurance after they are deployed.
I've never met a young engineer who didn't think threads were great. I've never met an older engineer who didn't shun them like the plague.

Being an older engineer, I heartily agree with the answer by Texas Arcane.
Threads are very evil because they cause bugs that are extremely difficult to solve. I have literally spent months solving sporadic race-conditions. One example caused trams to suddenly stop about once a month in the middle of the road and block traffic until towed away. Luckily I didn't create the bug, but I did get to spend 4 months full-time to solve it...
It's a tad late to add to this thread, but I would like to mention a very interesting alternative to threads: asynchronous programming with co-routines and event loops. This is being supported by more and more languages, and does not have the problem of race conditions like multi-threading has.
It can replace multi-threading in cases where it is used to wait on events from multiple sources, but not where calculations need to be performed in parallel on multiple CPU cores.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string