How do I use tbb::parallel_invoke to run large number of tasks in parallel?

How do I use tbb::parallel_invoke to run large number of tasks in parallel? - tbb

My problem similar to producer-consumer problem. For e.g. I need to run 999 producers and 1 consumer in parallel. Basically all 999 producers do the same task.

Parallel frameworks such as TBB (and Cilk Plus and PPL) focus on optional concurrency, which allows them to harvest just enough to keep the machine busy but not overload it.
If concurrency among producers is required then most TBB constructs are inappropriate. For example, tbb::parallel_for makes no promise that it will run anything in parallel. It just uses parallelism if it is available at the moment. For mandatory concurrency, you will need a separate std::thread for each producer. With 999 threads running, do not expect much speedup unless you have a machine with 999 hardware threads, or are keeping most of the threads sleeping most of the time (e.g. by using condition variables).

Related

Do Rust threads run at the same time in parallel? Documentation sounds like it does not [duplicate]

I want to know if a program can run two threads at the same time (that is basically what it is used for correct?). But if I were to do a system call in one function where it runs on thread A, and have some other tasks running in another function where it runs on thread B, would they both be able to run at the same time or would my second function wait until the system call finishes?
Add-on to my original question: Now would this process still be an uninterruptable process while the system call is going on? I am talking about using any system call on UNIX/LINUX.

Multi-threading and parallel processing are two completely different topics, each worthy of its own conversation, but for the sake of introduction...
Threading:
When you launch an executable, it is running in a thread within a process. When you launch another thread, call it thread 2, you now have 2 separately running execution chains (threads) within the same process. On a single core microprocessor (uP), it is possible to run multiple threads, but not in parallel. Although conceptually the threads are often said to run at the same time, they are actually running consecutively in time slices allocated and controlled by the operating system. These slices are interleaved with each other. So, the execution steps of thread 1 do not actually happen at the same time as the execution steps of thread 2. These behaviors generally extend to as many threads as you create, i.e. packets of execution chains all working within the same process and sharing time slices doled out by the operating system.
So, in your system call example, it really depends on what the system call is as to whether or not it would finish before allowing the execution steps of the other thread to proceed. Several factors play into what will happen: Is it a blocking call? Does one thread have more priority than the other. What is the duration of the time slices?
Links relevant to threading in C:
SO Example
POSIX
ANSI C
Parallel Processing:
When multi-threaded program execution occurs on a multiple core system (multiple uP, or multiple multi-core uP) threads can run concurrently, or in parallel as different threads may be split off to separate cores to share the workload. This is one example of parallel processing.
Again, conceptually, parallel processing and threading are thought to be similar in that they allow things to be done simultaneously. But that is concept only, they are really very different, in both target application and technique. Where threading is useful as a way to identify and split out an entire task within a process (eg, a TCP/IP server may launch a worker thread when a new connection is requested, then connects, and maintains that connection as long as it remains), parallel processing is typically used to send smaller components of the same task (eg. a complex set of computations that can be performed independently in separate locations) off to separate resources (cores, or uPs) to be completed simultaneously. This is where multiple core processors really make a difference. But parallel processing also takes advantage of multiple systems, popular in areas such as genetics and MMORPG gaming.
Links relevant to parallel processing in C:
OpenMP
More OpenMP (examples)
Gribble Labs - Introduction to OpenMP
CUDA Tookit from NVIDIA
Additional reading on the general topic of threading and architecture:
This summary of threading and architecture barely scratches the surface. There are many parts to the the topic. Books to address them would fill a small library, and there are thousands of links. Not surprisingly within the broader topic some concepts do not seem to follow reason. For example, it is not a given that simply having more cores will result in faster multi-threaded programs.

Yes, they would, at least potentially, run "at the same time", that's exactly what threads are for; of course there are many details, for example:
If both threads run system calls that e.g. write to the same file descriptor they might temporarily block each other.
If thread synchronisation primitives like mutexes are used then the parallel execution will be blocked.
You need a processor with at least two cores in order to have two threads truly run at the same time.
It's a very large and very complex subject.

If your computer has only a single CPU, you should know, how it can execute more than one thread at the same time.
In single-processor systems, only a single thread of execution occurs at a given instant. because Single-processor systems support logical concurrency, not physical concurrency.
On multiprocessor systems, several threads do, in fact, execute at the same time, and physical concurrency is achieved.
The important feature of multithreaded programs is that they support logical concurrency, not whether physical concurrency is actually achieved.

The basics are simple, but the details get complex real quickly.
You can break a program into multiple threads (if it makes sense to do so), and each thread will run "at its own pace", such that if one must wait for, eg, some file I/O that doesn't slow down the others.
On a single processor multiple threads are accommodated by "time slicing" the processor somehow -- either on a simple clock basis or by letting one thread run until it must wait (eg, for I/O) and then "switching" to the next thread. There is a whole art/science to doing this for maximum efficiency.
On a multi-processor (such as most modern PCs which have from 2 to 8 "cores") each thread is assigned to a separate processor, and if there are not enough processors then they are shared as in the single processor case.
The whole area of assuring "atomicity" of operations by a single thread, and assuring that threads don't somehow interfere with each other is incredibly complex. In general a there is a "kernel" or "nucleus" category of system call that will not be interrupted by another thread, but thats only a small subset of all system calls, and you have to consult the OS documentation to know which category a particular system call falls into.

They will run at the same time, for one thread is independent from another, even if you perform a system call.
It's pretty easy to test it though, you can create one thread that prints something to the console output and perform a system call at another thread, that you know will take some reasonable amount of time. You will notice that the messages will continue to be printed by the other thread.

Yes, A program can run two threads at the same time.
it is called Multi threading.
would they both be able to run at the same time or would my second function wait until the system call finishes?
They both are able to run at the same time.
if you want, you can make thread B wait until Thread A completes or reverse

Two thread can run concurrently only if it is running on multiple core processor system, but if it has only one core processor then two threads can not run concurrently. So only one thread run at a time and if it finishes its job then the next thread which is on queue take the time.

What profit of using multi-threaded coroutines?

I work with coroutines pretty long time, but I still don't understand completely, why do I need to prefer multi-threaded coroutines instead of single-threaded coroutines.
I can clearly see the profit of using multi-threaded coroutines when their count is less or equal to the physical thread count. But if we have more tasks than physical threads, why wouldn't we rather use only one coroutine thread?
I'll clarify the final question: Why is 10 threads of coroutines better than only one thread with many coroutines?

Coroutines are units of computation (like tasks). The way they are dispatched onto actual threads is orthogonal to how many coroutines you have. You can use a single-threaded dispatcher or a multi-threaded dispatcher, and depending on this your coroutines will be scheduled differently.
Multi-threaded coroutines doesn't mean 1 thread per coroutine. You can dispatch 100 coroutines on 8 threads.
But if we have more tasks than physical threads, why wouldn't we rather use only one coroutine thread?
There are multiple parts in this question.
First, if you have more tasks than logical cores, you could still dispatch all those tasks onto just the right number of threads. You don't have to completely give up on multithreading. This is actually exactly what Dispatchers.Default is about: dispatching as many coroutines as you want onto a limited number of threads equal to the number of hardware threads (logical cores) that you have. The point is to make use of all the hardware as much as possible without wasting theads (and thus memory).
Second, not every task is CPU-bound. Some I/O operations block threads (network calls, disk reads/writes etc.). When a thread is blocked on I/O, it doesn't use the CPU. If you have 8 logical cores, using only 8 threads for I/O would be suboptimal, because while some threads are blocked, the CPU cannot run other tasks. With more threads, it can (at the cost of some memory). This is the point of Dispatchers.IO, which can create more threads as needed and can exceed the number of logical cores (within a reasonable limit).
Why is 10 threads of coroutines better than only one thread with many coroutines?
Let's assume you have 100 coroutines to dispatch.
Using only one thread to run those coroutines implies that only 1 core at most is doing the work at a given time, so nothing happens in parallel. This means all the other cores are idle, which is suboptimal. Worse, any I/O operation done by a coroutine blocks this only thread and prevents the CPU from doing anything while we're waiting on I/O.
Using 10 threads, you can literally execute 10 coroutines at the same time if your hardware is sufficient, which can be 10x faster (if your coroutines don't have inter-dependencies).
Using 100 threads would not be that beneficial if your coroutines are CPU-bound, but might be useful if you have a bunch of I/O tasks (as we've seen). That said, the more threads you use, the more memory is consumed. So even with a ton of I/O operations, you have to find a balance between throughput and memory, you don't want to spawn millions of threads.
In short, using multi-threading still has the same advantages with or without coroutines: it allows to make use of your hardware resources as much as possible. Using coroutines is just an easier way to define tasks, dispatch them onto threads, express dependencies, avoid blocking threads unnecessarily, etc.

Use of the terms "queues", "multicore", and "threads" in Grand Central Dispatch

I am trying to get my head around the concepts of Grand Central Dispatch. I want to understand these quotes from Vandad's book on Concurrent Programming.
The real use for GCD is to dispatch tasks to multiple cores, without making you the programmer, worry about which core is executing which task.
and
At the heart of GCD are dispatch queues. Dispatch queues are pools of threads.
and finally
You will not be working with these threads directly. You will just work with dispatch queues, dispatching tasks to these queues and asking queues to invoke your task.
I have bolded the key terms.
Are multiple cores the same as queues? Does a queue consist of many threads? Does each thread perform a task?

So multiple cores are the same as queues?
Not really. A queue is a programming abstraction, a core is a physical resource in your processor. There is no unique relationship between a queue and a core, although at any given point in time it can be said that a given queue is executing a given task on a given core.
A queue consists of many threads?
A queue consists of tasks. Tasks are assigned to threads by the queue managing system when it comes the time to execute that task. Threads are OS resources and are allocated to cores, which effectively run them and have no notion of what a task is (except for Hyper-Threading CPUs).
If you do not account for hardware-multithreading (e.g., Hyper-threading), at any given point in time a core is running a specific thread; when it comes the time to run a different thread, a context-switch occurs in that core. If you account for hardware-multithreading, you can have multiple threads running on virtual cores hosted in the same physical core.
The relationship between queues and threads is opaque. A queue could manage several threads at once, or several threads once at a time, or just one all the time; in the first case, you have a parallel queue, able to execute parallel tasks on simultaneous threads; in the second and third case, you have a serial queue.
Each thread performs a task?
At any given point in time, a thread is performing a task. You can have threads that are spawn, execute their task, and die; or you can have long running threads (i.e., the main thread) that execute several tasks.
Maybe it is pretty puzzling at start, you might need some reading about Operating Systems and maybe high-level Processor Architectures to fully understand this.
GCD aims at letting you reason exclusively in abstract terms: i.e., in terms of tasks and queues, and forget about threads and cores, that are seen as a sort of "implementation means", or low-level details that you can leave to the system to use efficiently.

Queues are just list of tasks to execute, cores depend on the processor, you can have 1 or many cores.
Queues are configurable and you can decide if tasks can be executed concurently or not, if you allow concurency in your queue, tasks in the queue can be executed at the same time in different cores.

I'm not sure those quotes really do GCD justice. For example, to take each quote in turn:
GCD is more than useable (and useful) even if you have only a single core available, since multi-threading certain tasks have their place in computer science regardless of the number of physical CPU cores available. Better to think of it as an alternative to managing threads explicitly - GCD will do the thread management so you don't have to, you (as the programmer) just have to think in terms of queues and whether certain related tasks must be done serially or can be done concurrently.
Dispatch queues are not "pools of threads". Dispatch queues are "units of work aggregation" and should be thought of that way. How that work is physically performed, by one thread or multiple threads, is not the programmer's concern and, in fact, the less assumptions the programmer makes about that the better since GCD tries very hard to be efficient and use as few threads as possible while still effectively utilizing hardware resources.
The third quote is good - that is the appropriate idiom to embrace. Just submit your work (be it blocks or function/context tuples) to the appropriate queue, creating queues as necessary to associate with resources that require synchronization, and you've got the gist of GCD.

Running two threads at the same time

I want to know if a program can run two threads at the same time (that is basically what it is used for correct?). But if I were to do a system call in one function where it runs on thread A, and have some other tasks running in another function where it runs on thread B, would they both be able to run at the same time or would my second function wait until the system call finishes?
Add-on to my original question: Now would this process still be an uninterruptable process while the system call is going on? I am talking about using any system call on UNIX/LINUX.

Yes, they would, at least potentially, run "at the same time", that's exactly what threads are for; of course there are many details, for example:
If both threads run system calls that e.g. write to the same file descriptor they might temporarily block each other.
If thread synchronisation primitives like mutexes are used then the parallel execution will be blocked.
You need a processor with at least two cores in order to have two threads truly run at the same time.
It's a very large and very complex subject.

If your computer has only a single CPU, you should know, how it can execute more than one thread at the same time.
In single-processor systems, only a single thread of execution occurs at a given instant. because Single-processor systems support logical concurrency, not physical concurrency.
On multiprocessor systems, several threads do, in fact, execute at the same time, and physical concurrency is achieved.
The important feature of multithreaded programs is that they support logical concurrency, not whether physical concurrency is actually achieved.

The basics are simple, but the details get complex real quickly.
You can break a program into multiple threads (if it makes sense to do so), and each thread will run "at its own pace", such that if one must wait for, eg, some file I/O that doesn't slow down the others.
On a single processor multiple threads are accommodated by "time slicing" the processor somehow -- either on a simple clock basis or by letting one thread run until it must wait (eg, for I/O) and then "switching" to the next thread. There is a whole art/science to doing this for maximum efficiency.
On a multi-processor (such as most modern PCs which have from 2 to 8 "cores") each thread is assigned to a separate processor, and if there are not enough processors then they are shared as in the single processor case.
The whole area of assuring "atomicity" of operations by a single thread, and assuring that threads don't somehow interfere with each other is incredibly complex. In general a there is a "kernel" or "nucleus" category of system call that will not be interrupted by another thread, but thats only a small subset of all system calls, and you have to consult the OS documentation to know which category a particular system call falls into.

They will run at the same time, for one thread is independent from another, even if you perform a system call.
It's pretty easy to test it though, you can create one thread that prints something to the console output and perform a system call at another thread, that you know will take some reasonable amount of time. You will notice that the messages will continue to be printed by the other thread.

Yes, A program can run two threads at the same time.
it is called Multi threading.
would they both be able to run at the same time or would my second function wait until the system call finishes?
They both are able to run at the same time.
if you want, you can make thread B wait until Thread A completes or reverse

Two thread can run concurrently only if it is running on multiple core processor system, but if it has only one core processor then two threads can not run concurrently. So only one thread run at a time and if it finishes its job then the next thread which is on queue take the time.

Is a Task lightweight compared to a Thread?

I overheard a coworker saying that a Task is basically a lightweight thread. Coming from a C++ background (where threads where the lightest weight processing unit), this seems counter-intuitive to me.
Aren't Tasks just as heavy as Threads?

You need to distinguish between a unit of work (Tasks) from the underlying process used to host/execute them. It isn't even necessary for Tasks to run on other threads. For example, Tasks can be executed in a single threaded application that periodically yields control to the task pool.
Even when Tasks are executed on separate threads, there is usually not a 1 to 1 relationship between Task and Thread. The threads are preallocated as part of a pool, and then tasks are scheduled to run on these threads as available. Creating a new task does not require the overhead of creating a thread, it only requires the cost of an enque in a task queue.
This makes tasks inherently more scalable. I can have millions of tasks throughout the lifetime of my application, but only ever actually use some constant number of threads.

Typically a "thread" implies mandatory concurrency. Starting up a thread requires allocating a stack and internal OS data structures for it. In contrast, a "task" often refers to a piece of work for which concurrency is optional, hence a parallel framework (such as OpenMP, Cilk Plus, TBB, PPL) can use the same thread to execute many tasks, by serializing the tasks, and converting optional parallelism to real parallelism only as necessary to keep the machine busy.

You are right - everything runs on a thread under the covers.
The reason people say that a Task is more lightweight than a Thread is that Microsoft put a lot of thought into having Tasks make efficient use of Threads, and the implementation is probably much lighter weight than what the average developer would come up with on their own using the Thread class.
EDIT
A more clear explanation is that a Task object is lighter weight than a Thread object, and while each Task is eventually run on a Thread, creating N Task objects concurrently leads to less than N concurrent Thread objects being used, for large N.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string