Parallel software? - multithreading

What is the meaning of "parallel software" and what are the differences between "parallel software" and "regular software"?
What are its advantages and disadvantages?
Does writing "parallel software" require a specific hardware or programming language ?

Are the "parallel software" requires a specified hardware or programing language ?
Yes and Yes.
The first one is trivially easy. Most modern CPU's (Say anything newer than m6800) have hardware features that make it possible to do more than one thing at a time, though not necessarily both at the same time. For instance, when a timer interrupt goes off, a CPU could save what it's doing, and then start doing something else. Those tasks run concurrently.
Even without that, you could just get two machines with some sort of connection to each other (like a simple serial connection via a Null modem adapter) and they can both work on the same task in parallel.
Most new (not just modern but recent) CPU's have parallel computing resources built in. These multi-core CPU's can actually be working on two or more tasks at the same time, one task per core, and have special features that make it a bit more efficient for those tasks to cooperate.
The second one, requiring special software tools such as a parallel enabled language, is in some ways the hardest part of parallel computing. If you're the only person in the kitchen, it's pretty easy to cook a meal, by following each recipe from start to finish, one after the next, until all dishes are cooked. If you want to speed that up by adding more cooks, you have to be a bit more careful to not step on each other's toes.
The simplest way this is handled is by using a threading library that offers some tools so that multiple tasks can arrange to not clobber each other. This is not as easy as flagging a program as parallel and the system takes care of the rest, rather, you have to write each task to communicate with every other task at every place where there is the possibility of them interfering.

http://en.wikipedia.org/wiki/Thread_(computer_science)
In computer science, a thread of
execution results from a fork of a
computer program into two or more
concurrently running tasks. The
implementation of threads and
processes differs from one operating
system to another, but in most cases,
a thread is contained inside a
process. Multiple threads can exist
within the same process and share
resources such as memory, while
different processes do not share these
resources.
Most modern programming languages support multithreading in one way or another (even Javascript in the newest versions). :-)
Advantages and Disadvantages can depend on the task. If you have a lot of processing that you need to do, then multithreading can help you break that up into smaller units of work that each CPU can work on independently at the same time. However, multithreaded code will generally be more complex to write and maintain than single threaded code.
You can still write/run multithreaded code on a machine that has only one processor. Although there will only be one processor to execute the tasks, the operating system will ensure that they happen simultaneously by rapidly switching context and executing a few instructions for each thread at a time.
Some specialized hardware you may be familiar with which does parallel tasks is the GPU which can be found on most new computers. In this video, the Mythbusters demonstrate the difference between drawing on a single-threaded CPU, and a multi-threaded GPU:
http://www.youtube.com/watch?v=XtGf0HaW7x4&feature=player_embedded

parallel software can natively take advantage on multiple cores/cpus on a computer or sometimes across multiple computers. Examples include graphics rendering software and circuit design software.
Not so sure about disadvantages other than multi-processor aware software tends to be a CPU hog.

Related

Context switch: what decides when?

I'm looking for some background explanations about context switch in modern personal computers with mainstream architecture (say x64).
While context switch is mainly done by the hardware, I wonder what in the computer decides of task scheduling and context switch when running multiple threads and/or multiple processes. Is it the CPU itself, the operating system, the compiler/virtual machine... ?
I'd like to have an idea of what strategies are used to decide when to switch. For example, if I start a hundred threads doing independent dummy additions in endless loops, when will the context switches happen?
This is a complex subject that I can't do justice in a simple response here. But let me hit some high-points. I further am going to assume modern OS's like Windows or the various Unix derivatives and ignore embedded real-time systems.
The context switch is not performed in hardware. It is critical to understand this. It is performed in software via a OS subsystem known as the scheduler. The scheduler is a glorified interrupt controller that will fire many times a microsecond and decide what thread will execute next. The algorithms for doing so are numerous and the subject of many a PHD thesis. A good overview I found quickly is here: http://www.studytonight.com/operating-system/cpu-scheduling
Good Operating Systems books will go over this in detail. There are too many to note so pick your poison.
One last point, to grasp at a complete level how scheduling is performed it really helps to understand how virtual addressing schemes work as that is truly what differentiates processes from threads. Threads are what is critical in terms of the Scheduler put processes encapsulate threads and the virtual memory space.
I'm not sure this helps but I was at least able to correct one misconception and point you at a simple article on OS thread scheduling.

What's the point of multi-threading on a single core?

I've been playing with the Linux kernel recently and diving back into the days of OS courses from college.
Just like back then, I'm playing around with threads and the like. All this time I had been assuming that threads were automatically running concurrently on multiple cores but I've recently discovered that you actually have to explicitly code for handling multiple cores.
So what's the point of multi-threading on a single core? The only example I can think of is from college when writing a client/server program but that seems like a weak point.
All this time I had been assuming that threads were automatically
running concurrently on multiple cores but I've recently discovered
that you actually have to explicitly code for handling multiple cores.
The above is incorrect for any widely used, modern OS. All of Linux's schedulers, for example, will automatically schedule threads on different cores and even automatically move threads from one core to another when necessary to maximize core usage. There are some APIs that allow you to modify the schedulers' behavior, but these APIs are generally used to disable automatic thread-to-core scheduling, not to enable it.
So what's the point of multi-threading on a single core?
Imagine you have a GUI program whose purpose is to execute an expensive computation (for example, render a 3D image or a Mandelbrot set) and then display the result. Let's say this computation takes 30 seconds to complete on this particular CPU. If you implement that program the obvious way, and use only a single thread, then the user's GUI controls will be unresponsive for 30 seconds while the calculation is executing -- the user will be unable to do anything with your program, and possibly unable to do anything with his computer at all. Since users expect GUI controls to be responsive at all times, that would be a poor user experience.
If you implement that program with two threads (one GUI thread and one rendering thread), on the other hand, the user will be able to click buttons, resize the window, quit the program, choose menu items, etc, even while the computation is executing, because the OS is able to wake up the GUI thread and allow it to handle mouse/keyboard events when necessary.
Of course, it is possible to write this program with a single thread and keep its GUI responsive, by writing your single thread to do just a few milliseconds worth of computation, then check to see if there are GUI events available to process, handling them, then going back to do a bit more computation, etc. But if you code your app this way, you are essentially writing your own (very primitive) thread scheduler inside your app anyway, so why reinvent the wheel?
The first versions of MacOS were designed to run on a single core, but had no real concept of multithreading. This forced every application developer to correctly implement some manual thread management -- even if their app did not have any extended computations, they had to explicitly indicate when they were done using the CPU, e.g. by calling WaitNextEvent. This lack of multithreading made early (pre-MacOS-X) versions of MacOS famously unreliable at multitasking, since just one poorly written application could bring the whole computer to a grinding halt.
First, a program not only computes, but also waits for input/output and so can be considered as executing on an I/O processor. So even single-core machine is a multi-processor machine, and employing of multi-threading is justified.
Second, a task can be divided in several threads in the sake of modularity.
Multithreading is not only for taking advantage of multiple cores.
You need multiple processes for multitasking. For similar reason you are allowed to have multiple threads, which are lightweight compared with processes.
You probably don't want to spawn processes all the time for things like blocking I/O. That may be overkill.
And there is fiber, which is even more lightweight. So we have process, thread, and fiber for different levels of needs.
Well, when you say multithreading on a single core, there are things you need to consider. For example, the thread API that you are using - is it user level or kernel level. Most probably from you question I believe you are using user level threads.
Now, user level threads, depending upon the host OS or the API itself may map to single kernel thread or multiple. Many relations are possible like 1-1,many-1 or many-many.
Now, if there is a single core, your OS can still provide you several Kernel level threads which may behave as multiple processes to the CPU. In which case, OS will give you a time-slicing (and multi-programming) on the kernel threads leading to superfast context switch and via the user level API - you/your code will seem to have multithreaded features.
Also note that eventhough your processor is a single core, depending on the make, it can be hyperthreaded and have super deep pipelines allowing the concurrent running of Kernel threads with very low overhead.
For references: Check Intel/AMD architecture and how various OS provide Kernel threads.

Multi-thread code with single-core processor and single-thread code with multi-core processor

I'm new to multi-threaded programming. I have been reading some articles, but two main points I'm not completely sure about.
If I have a single-thread code (sequential), and I run it on multi-core processor. Will the OS try to divide the thread into multiple threads (while taking care of dependencies) to take advantage of the muli-core processor?
If I have a multi-thread code, and I run it on single-core processor. Will the OS make time-sharing between different threads (the same way it does with multiple processes)?
1) No
If an application makes use of, for example, the Intel maths libraries and has been compiled with the right switches, routines like FFTs will at runtime be split out into separate threads matching the number of cores in the machine. Your source code remains 'single threaded', but the library is creating and destroying threads behind your back.
Similarly some compilers (e.h. Intel's icc, Sun's C compiler) may turn some loops into separate threads, each tackling a share of the iterations. Again the source code looks single threaded, but the compiler generates threaded code on your behalf. It's a bit like automatically applying some OpenMP to your source code.
OSes cannot second guess what an application is going to do, so they cannot intervene like this. Libraries and compilers know what is about to happen, so they can.
Libraries and compiler tricks like this have been developed so as to make it easy for programmers to extract higher performance from 'single' threaded code. Intel started adding features like that to their maths library around about the same time they started heading towards multi-core CPUs. The idea was to create (from the programmer's point of view) the impression of better 'single' thread performance, whilst the speed was actually being delivered by multiple cores. Similarly with Sun when they started doing multi-processor computers.
And with everyone more or less giving up on making significant improvements to the performance of a single core, this is the only way ahead.
2) Yes. How else would it do it?
No, the operating system has not enough information to do that. In parallelization you need to consider the dependencies between operations. Some compiler try to do that, they have more information about the intent of the code. But even they often fail to do that effectively.
Yes, for example the Linux scheduler does not even distinguish between threads and processes.

What is the difference between multicore and concurrent programming

Can anyone help me out I am working on a presentation and would like to include a bit about - 'The difference between multicore and concurrent programming', I have googled a bit but not turning up many good descriptions, any help appreciated! :)
Thanks,
Eamonn
Concurrent (occurring or existing simultaneously) implies that different code MAY execute at the exact same cycle. It means that things can possibly happen in parallel if multiple processors or a processor with multiple cores is available and the program is crafted correctly. Just adding threads does not imply concurrent execution.
The reason I say MAY and possibly is that anytime the programs separate threads need to share volatile/mutable state, other threads that need access to that state can not continue executing and will have to wait their turn to access that state, and things start happening serially again.
Typically this is implemented in a single program as more than one thread executing code concurrently at the same exact cycle as another thread, given that there is no resource contentions as listed above. This requires multiple physical processors or cores. Other models run multiple heavyweight OS processes that can execute concurrently.
Concurrent programming is very hard to do correctly with mutable shared state.
You can write a concurrent program
that runs serially on a single single
core processor, but scales up to
execute more things at the same time
when more processors or cores, or even
multiple processors with multiple
cores is present.
You can also cause single threaded programs to appear concurrent on a multi-core / multi-processor system if they can operate on independent ranges of input data at the same time. Example: a single threaded 3D rendering program can on a dual core machine can run 2 separate instances the first rendering all the odd frames and the second rendering all the even frames. As long as they don't try to share any mutable resources.
Multi-core means that a single CPU has multiple Processor cores that can execute threads or processes concurrently and typically appears as multiple processors to mainstream operating systems.
It does NOT imply that programs that are single threaded gain any concurrency behaviors or benefits from the additional processor cores available.
Concurrent Programming is more broad - it just refers to writing software that will run "concurrently" - ie: more than one thing will happen at a time.
"Multi-core" programming is really referring to a specific subset of concurrent programming, in which you are targetting multiple available CPU cores on a specific machine. This is the most common form of concurrent programming (typically single process running on a single computer), but still only one form of concurrent programming.
You can do concurrent programming on a machine that has only a single CPU core. The operating system provides the illusion that more than one thread is running at the same time, it rapidly switches back-and-forth between them.
A machine with multiple cores simply needs to this context switching less often since two threads can run at the same time on two cores. It is only a bit special because threading bugs can make your life difficult much quicker. The odds that two threads try to access a shared memory location at the same time is much higher.
At a high level, multi-core is an attribute of the processor chip in your computer. Multi core means it has got multiple processing cores. There are several types of multi-processor computers: the old style super computers with thousands of computers connected via ethernet, systems with more than processors (like 2 Pentium 4s), and contemporary multi-core systems where every processor package has multiple processing cores 9like Intel i7). The third type is often called multi-core of Chip Multiprocessor (CMP).
Concurrent programming is an attribute of software. Concurrent programming is about writing code which has is split into multiple tasks that can execute concurrently if processors are available. While concurrent programs do leverage multi-core, concurrent programming is broader in two dimensions:
Concurrent programs can run on a single core or multiple cores.
Concurrent programs can be used on any type of multi-processors I mentioned above.
Thus, to summarize:
Concurrent programming is about software that can use multiple processors if available. those processors can be on the same chip (multi-core or Chip Multiprocessor) or on different chips (often known as SMP). You can have systems where you can put two multi-core chips in the same system making it a CMP and an SMP at the same time. Concurrent programming will work for that as well.
Concurrent programming regards operations that appear to overlap and is primarily concerned with the complexity that arises due to non-deterministic control flow. The quantitative costs associated with concurrent programs are typically both throughput and latency. Concurrent programs are often IO bound but not always, e.g. concurrent garbage collectors are entirely on-CPU. The pedagogical example of a concurrent program is a web crawler. This program initiates requests for web pages and accepts the responses concurrently as the results of the downloads become available, accumulating a set of pages that have already been visited. Control flow is non-deterministic because the responses are not necessarily received in the same order each time the program is run. This characteristic can make it very hard to debug concurrent programs. Some applications are fundamentally concurrent, e.g. web servers must handle client connections concurrently. Erlang, F# asynchronous workflows and Scala's Akka library are perhaps the most promising approaches to highly concurrent programming.
Multicore programming is a special case of parallel programming. Parallel programming concerns operations that are overlapped for the specific goal of improving throughput. The difficulties of concurrent programming are evaded by making control flow deterministic. Typically, programs spawn sets of child tasks that run in parallel and the parent task only continues once every subtask has finished. This makes parallel programs much easier to debug than concurrent programs. The hard part of parallel programming is performance optimization with respect to issues such as granularity and communication. The latter is still an issue in the context of multicores because there is a considerable cost associated with transferring data from one cache to another. Dense matrix-matrix multiply is a pedagogical example of parallel programming and it can be solved efficiently by using Straasen's divide-and-conquer algorithm and attacking the sub-problems in parallel. Cilk is perhaps the most promising approach for high-performance parallel programming on multicores and it has been adopted in both Intel's Threaded Building Blocks and Microsoft's Task Parallel Library (in .NET 4).

Parallel coding Vs Multithreading (on single cpu)

can we use interchangeably "Parallel coding" and "Multithreading coding " on single cpu?
i am not much experience in both,
but i want to shift my coding style to any one of the above.
As i found now a days many single thred application are obsolete, which would be better for future software industy as a career prospect?
There is definitely overlap between multithreading and parallel coding/computing, with the main differences in the target processing architecture.
Multithreading has been used to exploit the benefits of concurrency within a single process on a single CPU with shared memory. Running the same programs on a machine with multiple CPUs may result in significant speedup, but is often a bonus rather than intended (until recently). Many OSes have threading models (e.g. pthreads), which benefit from but do not require multiple CPUs.
Multiprocessing is the standard model for parallel programming targeting multiple CPUs, from early SMP machines with many CPUs on a big machine, then to cluster computing across many machines, and now back to many CPUs/cores on a single computer. MPI is a standard that can work across many different architectures.
Of course, one can program a parallel design using threads with language frameworks like OpenMP. I've heard of multicomponent GUIs/applications that rely on separate processing that could theoretically run anywhere. Practically, there's more of the former than the latter.
Probably the main distinction is when the program runs across multiple machines, where it's not practical to use multithreading, and existing applications that share memory will not work.
Parallel coding is the concept of executing multiple actions in parallel(Same time).
Multi-threaded Programming on a Single Processor
Multi-threading on a single processor gives the illusion of running in parallel. Behind the scenes, the processor is switching between threads depending on how threads have been prioritized.
Multi-threaded Programming on Multiple Processors
Multi-threading on multiple processor cores is truly parallel. Each microprocessor is running a single thread. Consequently, there are multiple parallel, concurrent tasks happening at once.
The question is a bit confusing as you can perform parallel operations in multiple threads, but all multi-thread applications are not using parallel computing.
In parallel code, you typically have many "workers" that consume a set of data to return results asynchronously. But multithread is used in a broader scope, like GUI, blocking I/O and networking.
Being on a single or many CPU does not change much, as the management depends on how your OS can handle threads and processes.
Multithreading will be useful everywhere, parallel is not everyday computing paradigm, so it might be a "niche" in a career prospect.
Some demos I saw in .NET 4.0, the Parallel code changes seem easier then doing threads. There is new syntax for "For Loops" and other things to support parallel processing. So there is a difference.
I think in the future you will do both, but I think the Parallel support will be better and easier. You still need threads for background operations and other things.
The fact is that you cannot achieve "real" parallelism on a single CPU. There are several libraries (such as C's MPI) that help a little bit on this area. But the concept of paralellism it's not that used among developers working on popular solutions.
Multithreading is common these days thanks to the introduction of multiple cores on a single CPU, it's easy and almost transparent to implement in every language thanks to thread libs and threadsafe types, methods, classes and so on. This way you can simulate paralellism.
Anyway, if you're starting with this, start by reading about concurrency and threading topics. And of course, threads + parallelism work good together.
I'm not sure about what do you think "Parallel coding" is but Parallel coding as I understand it refers to producing code which is executed in parallel by the CPU, and therefore Multithreaded code falls inside that description.
In that way, obviously you can use them interchangeably (as one falls inside the other).
Nonetheless I'll suggest you take it slowly and start learning from the basics. Understand WHY multithreading is becoming important, what's the difference between processes, threads and fibers, how do you synchronize either of them and so on.
Keep in mind that parallel coding, as you call it, is quite complex, specially compared to sequential coding so be prepared. Also don't just rush into it. Just because you use 3 threads instead of one won't make your program faster, it can even make it slower. You need to understand the hows and the whys. Not every thing can be made parallel and not everthing that can, should.
in simple Language
multithreading is available in the CPu by itself and
parallel programming is an explicit task either done by the compiler or my constructs written by programmers "#pragma"

Resources