OpenMPI vs OpenMP execution efficiency - multithreading

I've been studying the efficiency of parallelizing Dijkstra Algorithm using both OpenMPI and OpenMP. In fact, when I use OpenMP, the execution time appears to be higher than using OpenMPI which is a bit strange to me since as far as I know threads are supposed to be faster than processes. OpenMPI creates a process for each execution while OpenMP creates threads under each process execution. My question is: Is my finding implementation dependent? In other word, can we always say parallelizing with OpenMP cannot be always faster than OpenMPI since it is an implementation dependent?
Thank you.

As always, it all depends on your current circumstances. OpenMP works only on your local CPU, whereas OpenMPI connects to several nodes over a network. As long as you can only split your work over as many threads as your have local CPU cores, OpenMP should be faster, because there is less messaging overhead. In lager scaling appliances, OpenMPI is superior, because it can be distributed across several systems, which also may have a better individual computation speed.

Related

Are Tcl threads multi process/multi core

I'm new to using threads in tcl but thought it was a nice way to solve a problem I'm having
I was trying to read through the tcl thread documentation but i can't quite figure out if tcl threads span threads across multiple cpu cores or try to keep all threads within the CPU core from which the master process was started?
Tcl's threads are threads as supported by the operating system's standard libraries (e.g., they're normal POSIX threads on Linux and OSX), and so are entirely capable of running over as many cores as the OS allows.
Tcl takes care to limit the use of locks in its implementation as much as possible, so as to make multi-core operation as efficient as possible; this came from experience supporting high-performance application servers in the 1990s, where it turned out that reducing the sharing of resources was a big win as hardware scaled up the number of cores.
It also means that you've got a non-shared memory model based on structured message passing; it scales well, but it was very different to what most programmers knew at the time. It's a little bit more mainstream now because shared-memory parallelism remains annoyingly troublesome on modern hardware.

Multi-thread code with single-core processor and single-thread code with multi-core processor

I'm new to multi-threaded programming. I have been reading some articles, but two main points I'm not completely sure about.
If I have a single-thread code (sequential), and I run it on multi-core processor. Will the OS try to divide the thread into multiple threads (while taking care of dependencies) to take advantage of the muli-core processor?
If I have a multi-thread code, and I run it on single-core processor. Will the OS make time-sharing between different threads (the same way it does with multiple processes)?
1) No
If an application makes use of, for example, the Intel maths libraries and has been compiled with the right switches, routines like FFTs will at runtime be split out into separate threads matching the number of cores in the machine. Your source code remains 'single threaded', but the library is creating and destroying threads behind your back.
Similarly some compilers (e.h. Intel's icc, Sun's C compiler) may turn some loops into separate threads, each tackling a share of the iterations. Again the source code looks single threaded, but the compiler generates threaded code on your behalf. It's a bit like automatically applying some OpenMP to your source code.
OSes cannot second guess what an application is going to do, so they cannot intervene like this. Libraries and compilers know what is about to happen, so they can.
Libraries and compiler tricks like this have been developed so as to make it easy for programmers to extract higher performance from 'single' threaded code. Intel started adding features like that to their maths library around about the same time they started heading towards multi-core CPUs. The idea was to create (from the programmer's point of view) the impression of better 'single' thread performance, whilst the speed was actually being delivered by multiple cores. Similarly with Sun when they started doing multi-processor computers.
And with everyone more or less giving up on making significant improvements to the performance of a single core, this is the only way ahead.
2) Yes. How else would it do it?
No, the operating system has not enough information to do that. In parallelization you need to consider the dependencies between operations. Some compiler try to do that, they have more information about the intent of the code. But even they often fail to do that effectively.
Yes, for example the Linux scheduler does not even distinguish between threads and processes.

Is multithreading what i'm looking for?

I'm working on a program that would benefit from using multiple cpu cores. In the past while working on similar programs my cpu would max out at ~25% and I had quad core processor so will the threads distribute to the other available cores? Im a newbie when it comes to multithreading so excuse me I something stated above makes absolutely no sense.
Yes, you seem to have correct understanding of the problem.
When you write a single threaded program it cannot be shared/split among multiple cores. So writing a multi-threaded program may make use of multiple cores.

Openmp thread divergence?

The term thread divergence is used in CUDA; from my understanding it's a situation where different threads are assigned to do different tasks and this results in a big performance hit.
I was wondering, is there a similar penalty for doing this in openmp? For example, say I have a 6 core processor and a program with 6 threads. If I have a conditional that makes 3 threads perform a certain task, and then have the other three threads perform a completely different task, will there be a big performance hit? I guess in essence it's sort of using openmp to do MIMD.
Basically, I'm writing a program with openmp and CUDA. I want two threads to run a CUDA kernel while the other left over threads run C code. Thanks.
No, there is no performance hit for diverging threads using OpenMP. It is a problem in CUDA because of the way instructions are broadcast simultaneously to a set of cores. When an OpenMP thread targets a CPU core, each CPU core has its own independent set of instructions to follow, and it runs just like any other single-threaded program would.
You may see some of your cores being underutilized if you have synchronization barriers following thread divergence, because that would force faster threads to wait for the slower threads to catch up.
When speaking about CPU parallelism, there's no intrinsic performance hit from using a certain threading design pattern. Not at the theoretical level at least.
The only problem I see is that since the threads are doing different things which may have varying completion times, some of the threads may sit idle after finishing their work, waiting for the others to finish a longer task.
The term thread divergence in CUDA refers to the situation when not all threads of a bock evaluate a conditional with the same outcome. Such threads are said to diverge. If diverging threads are in the same warp then such threads may perform work serially which leads to performance loss.
I am not sure that OpenMP has the same issue, though. When different threads perform different work then load balancing may be used by the runtime perhaps, but it doesn't lead to work serialization necessarily.
there is no this kind of problem in openmp because every openmp thread has its own PC.

Parallel coding Vs Multithreading (on single cpu)

can we use interchangeably "Parallel coding" and "Multithreading coding " on single cpu?
i am not much experience in both,
but i want to shift my coding style to any one of the above.
As i found now a days many single thred application are obsolete, which would be better for future software industy as a career prospect?
There is definitely overlap between multithreading and parallel coding/computing, with the main differences in the target processing architecture.
Multithreading has been used to exploit the benefits of concurrency within a single process on a single CPU with shared memory. Running the same programs on a machine with multiple CPUs may result in significant speedup, but is often a bonus rather than intended (until recently). Many OSes have threading models (e.g. pthreads), which benefit from but do not require multiple CPUs.
Multiprocessing is the standard model for parallel programming targeting multiple CPUs, from early SMP machines with many CPUs on a big machine, then to cluster computing across many machines, and now back to many CPUs/cores on a single computer. MPI is a standard that can work across many different architectures.
Of course, one can program a parallel design using threads with language frameworks like OpenMP. I've heard of multicomponent GUIs/applications that rely on separate processing that could theoretically run anywhere. Practically, there's more of the former than the latter.
Probably the main distinction is when the program runs across multiple machines, where it's not practical to use multithreading, and existing applications that share memory will not work.
Parallel coding is the concept of executing multiple actions in parallel(Same time).
Multi-threaded Programming on a Single Processor
Multi-threading on a single processor gives the illusion of running in parallel. Behind the scenes, the processor is switching between threads depending on how threads have been prioritized.
Multi-threaded Programming on Multiple Processors
Multi-threading on multiple processor cores is truly parallel. Each microprocessor is running a single thread. Consequently, there are multiple parallel, concurrent tasks happening at once.
The question is a bit confusing as you can perform parallel operations in multiple threads, but all multi-thread applications are not using parallel computing.
In parallel code, you typically have many "workers" that consume a set of data to return results asynchronously. But multithread is used in a broader scope, like GUI, blocking I/O and networking.
Being on a single or many CPU does not change much, as the management depends on how your OS can handle threads and processes.
Multithreading will be useful everywhere, parallel is not everyday computing paradigm, so it might be a "niche" in a career prospect.
Some demos I saw in .NET 4.0, the Parallel code changes seem easier then doing threads. There is new syntax for "For Loops" and other things to support parallel processing. So there is a difference.
I think in the future you will do both, but I think the Parallel support will be better and easier. You still need threads for background operations and other things.
The fact is that you cannot achieve "real" parallelism on a single CPU. There are several libraries (such as C's MPI) that help a little bit on this area. But the concept of paralellism it's not that used among developers working on popular solutions.
Multithreading is common these days thanks to the introduction of multiple cores on a single CPU, it's easy and almost transparent to implement in every language thanks to thread libs and threadsafe types, methods, classes and so on. This way you can simulate paralellism.
Anyway, if you're starting with this, start by reading about concurrency and threading topics. And of course, threads + parallelism work good together.
I'm not sure about what do you think "Parallel coding" is but Parallel coding as I understand it refers to producing code which is executed in parallel by the CPU, and therefore Multithreaded code falls inside that description.
In that way, obviously you can use them interchangeably (as one falls inside the other).
Nonetheless I'll suggest you take it slowly and start learning from the basics. Understand WHY multithreading is becoming important, what's the difference between processes, threads and fibers, how do you synchronize either of them and so on.
Keep in mind that parallel coding, as you call it, is quite complex, specially compared to sequential coding so be prepared. Also don't just rush into it. Just because you use 3 threads instead of one won't make your program faster, it can even make it slower. You need to understand the hows and the whys. Not every thing can be made parallel and not everthing that can, should.
in simple Language
multithreading is available in the CPu by itself and
parallel programming is an explicit task either done by the compiler or my constructs written by programmers "#pragma"

Resources