Related
What is overhead in term of parallel and concurrent programming (Haskell)?
However, even in a purely functional language, automatic parallelization is thwarted by an age-old problem: To make the program faster, we have to gain more from parallelism than we lose due to the overhead of adding it, and compile-time analysis cannot make good judgments in this area. An alternative approach is to use runtime profiling to find good candidates for parallelization and to feed this information back into the compiler. Even this, however, has not been terribly successful in practice.
(quoted from Simon Marlow's book Parallel and Concurrent Programming in Haskell)
What are some examples in Haskell?
In any system, a thread takes resources. You have to store the state of that thread somewhere. It takes time to create the thread and set it running. Now GHC uses lightweight "green threads", which are much less expensive than OS threads. But they still cost something.
If you were to (for example) spawn a new thread for every single add, subtract, multiply and divide... well, the work to spawn a new thread has to be at least several dozen machine instructions, whereas a trivial arithmetic operation is probably a single instruction. Queuing the work as sparks takes even less work than spawning a whole new thread, but even that isn't as cheap as just doing the operation on the current thread.
Basically the cost of the work you want to do in parallel has to exceed the cost of arranging to do it in parallel. (Whether that's launching an OS thread or a green thread or queuing a spark or whatever.) GHC has all sorts of stuff to lower the cost, but it's still not free.
You have to understand that threads are resources. They do not come for free. In other words: when you create a thread (independent of the language) then you have to make system calls, the OS has to create a thread instance, and so on. Threads have state - which changes over time; so some kind of thread management happens in the background.
And of course, when you end up with more threads than the underlying hardware can support - then the system will have to switch threads from time to time. Of course, that is not as expensive as switching full blown processes, but it still means that registers need to be saved (or restored), your hardware caches might be affected, and so on.
Many platforms promote asynchrony and parallelism as means for improving responsiveness. I understand the difference generally, but often find it difficult to articulate in my own mind, as well as for others.
I am a workaday programmer and use async & callbacks fairly often. Parallelism feels exotic.
But I feel like they are easily conflated, especially at the language design level. Would love a clear description of how they relate (or don't), and the classes of programs where each is best applied.
When you run something asynchronously it means it is non-blocking, you execute it without waiting for it to complete and carry on with other things. Parallelism means to run multiple things at the same time, in parallel. Parallelism works well when you can separate tasks into independent pieces of work.
Take for example rendering frames of a 3D animation. To render the animation takes a long time so if you were to launch that render from within your animation editing software you would make sure it was running asynchronously so it didn't lock up your UI and you could continue doing other things. Now, each frame of that animation can also be considered as an individual task. If we have multiple CPUs/Cores or multiple machines available, we can render multiple frames in parallel to speed up the overall workload.
I believe the main distinction is between concurrency and parallelism.
Async and Callbacks are generally a way (tool or mechanism) to express concurrency i.e. a set of entities possibly talking to each other and sharing resources.
In the case of async or callback communication is implicit while sharing of resources is optional (consider RMI where results are computed in a remote machine).
As correctly noted this is usually done with responsiveness in mind; to not wait for long latency events.
Parallel programming has usually throughput as the main objective while latency, i.e. the completion time for a single element, might be worse than a equivalent sequential program.
To better understand the distinction between concurrency and parallelism I am going to quote from Probabilistic models for concurrency of Daniele Varacca which is a good set of notes for theory of concurrency:
A model of computation is a model for concurrency when it is able to represent systems as composed of independent autonomous components, possibly communicating with each other. The notion of concurrency should not be confused with the notion of parallelism. Parallel computations usually involve a central control which distributes the work among several processors. In concurrency we stress the independence of the components, and the fact that they communicate with each other. Parallelism is like ancient Egypt, where the Pharaoh decides and the slaves work. Concurrency is like modern Italy, where everybody does what they want, and all use mobile phones.
In conclusion, parallel programming is somewhat a special case of concurrency where separate entities collaborate to obtain high performance and throughput (generally).
Async and Callbacks are just a mechanism that allows the programmer to express concurrency.
Consider that well-known parallel programming design patterns such as master/worker or map/reduce are implemented by frameworks that use such lower level mechanisms (async) to implement more complex centralized interactions.
This article explains it very well: http://urda.cc/blog/2010/10/04/asynchronous-versus-parallel-programming
It has this about asynchronous programming:
Asynchronous calls are used to prevent “blocking” within an application. [Such a] call will spin-off in an already existing thread (such as an I/O thread) and do its task when it can.
this about parallel programming:
In parallel programming you still break up work or tasks, but the key differences is that you spin up new threads for each chunk of work
and this in summary:
asynchronous calls will use threads already in use by the system and parallel programming requires the developer to break the work up, spinup, and teardown threads needed.
async: Do this by yourself somewhere else and notify me when you complete(callback). By the time i can continue to do my thing.
parallel: Hire as many guys(threads) as you wish and split the job to them to complete quicker and let me know(callback) when you complete. By the time i might continue to do my other stuff.
the main difference is parallelism mostly depends on hardware.
My basic understanding is:
Asynchonous programming solves the problem of waiting around for an expensive operation to complete before you can do anything else. If you can get other stuff done while you're waiting for the operation to complete then that's a good thing. Example: keeping a UI running while you go and retrieve more data from a web service.
Parallel programming is related but is more concerned with breaking a large task into smaller chunks that can be computed at the same time. The results of the smaller chunks can then be combined to produce the overall result. Example: ray-tracing where the colour of individual pixels is essentially independent.
It's probably more complicated than that, but I think that's the basic distinction.
I tend to think of the difference in these terms:
Asynchronous: Go away and do this task, when you're finished come back and tell me and bring the results. I'll be getting on with other things in the mean time.
Parallel: I want you to do this task. If it makes it easier, get some folks in to help. This is urgent though, so I'll wait here until you come back with the results. I can do nothing else until you come back.
Of course an asynchronous task might make use of parallelism, but the differentiation - to my mind at least - is whether you get on with other things while the operation is being carried out or if you stop everything completely until the results are in.
It is a question of order of execution.
If A is asynchronous with B, then I cannot predict beforehand when subparts of A will happen with respect to subparts of B.
If A is parallel with B, then things in A are happening at the same time as things in B. However, an order of execution may still be defined.
Perhaps the difficulty is that the word asynchronous is equivocal.
I execute an asynchronous task when I tell my butler to run to the store for more wine and cheese, and then forget about him and work on my novel until he knocks on the study door again. Parallelism is happening here, but the butler and I are engaged in fundamentally different tasks and of different social classes, so we don't apply that label here.
My team of maids is working in parallel when each of them is washing a different window.
My race car support team is asynchronously parallel in that each team works on a different tire and they don't need to communicate with each other or manage shared resources while they do their job.
My football (aka soccer) team does parallel work as each player independently processes information about the field and moves about on it, but they are not fully asynchronous because they must communicate and respond to the communication of others.
My marching band is also parallel as each player reads music and controls their instrument, but they are highly synchronous: they play and march in time to each other.
A cammed gatling gun could be considered parallel, but everything is 100% synchronous, so it is as though one process is moving forward.
Why Asynchronous ?
With today's application's growing more and more connected and also potentially
long running tasks or blocking operations such as Network I/O or Database Operations.So it's very important to hide the latency of these operations by starting them in background and returning back to the user interface quickly as possible. Here Asynchronous come in to the picture, Responsiveness.
Why parallel programming?
With today's data sets growing larger and computations growing more complex. So it's very important to reduce the execution time of these CPU-bound operations, in this case, by dividing the workload into chunks and then executing those chunks simultaneously. We can call this as "Parallel" .
Obviously it will give high Performance to our application.
Asynchronous
Let's say you are the point of contact for your client and you need to be responsive i.e. you need to share status, complexity of operation, resources required etc whenever asked. Now you have a time-consuming operation to be done and hence cannot take this up as you need to be responsive to the client 24/7. Hence, you delegate the time-consuming operation to someone else so that you can be responsive. This is asynchronous.
Parallel programming
Let's say you have a task to read, say, 100 lines from a text file, and reading one line takes 1 second. Hence, you'll require 100 seconds to read the text file. Now you're worried that the client must wait for 100 seconds for the operation to finish. Hence you create 9 more clones and make each of them read 10 lines from the text file. Now the time taken is only 10 seconds to read 100 lines. Hence you have better performance.
To sum up, asynchronous coding is done to achieve responsiveness and parallel programming is done for performance.
Asynchronous: Running a method or task in background, without blocking. May not necessorily run on a separate thread. Uses Context Switching / time scheduling.
Parallel Tasks: Each task runs parallally. Does not use context switching / time scheduling.
I came here fairly comfortable with the two concepts, but with something not clear to me about them.
After reading through some of the answers, I think I have a correct and helpful metaphor to describe the difference.
If you think of your individual lines of code as separate but ordered playing cards (stop me if I am explaining how old-school punch cards work), then for each separate procedure written, you will have a unique stack of cards (don't copy & paste!) and the difference between what normally goes on when run code normally and asynchronously depends on whether you care or not.
When you run the code, you hand the OS a set of single operations (that your compiler or interpreter broke your "higher" level code into) to be passed to the processor. With one processor, only one line of code can be executed at any one time. So, in order to accomplish the illusion of running multiple processes at the same time, the OS uses a technique in which it sends the processor only a few lines from a given process at a time, switching between all the processes according to how it sees fit. The result is multiple processes showing progress to the end user at what seems to be the same time.
For our metaphor, the relationship is that the OS always shuffles the cards before sending them to the processor. If your stack of cards doesn't depend on another stack, you don't notice that your stack stopped getting selected from while another stack became active. So if you don't care, it doesn't matter.
However, if you do care (e.g., there are multiple processes - or stacks of cards - that do depend on each other), then the OS's shuffling will screw up your results.
Writing asynchronous code requires handling the dependencies between the order of execution regardless of what that ordering ends up being. This is why constructs like "call-backs" are used. They say to the processor, "the next thing to do is tell the other stack what we did". By using such tools, you can be assured that the other stack gets notified before it allows the OS to run any more of its instructions. ("If called_back == false: send(no_operation)" - not sure if this is actually how it is implemented, but logically, I think it is consistent.)
For parallel processes, the difference is that you have two stacks that don't care about each other and two workers to process them. At the end of the day, you may need to combine the results from the two stacks, which would then be a matter of synchronicity but, for execution, you don't care again.
Not sure if this helps but, I always find multiple explanations helpful. Also, note that asynchronous execution is not constrained to an individual computer and its processors. Generally speaking, it deals with time, or (even more generally speaking) an order of events. So if you send dependent stack A to network node X and its coupled stack B to Y, the correct asynchronous code should be able to account for the situation as if it was running locally on your laptop.
Generally, there are only two ways you can do more than one thing each time. One is asynchronous, the other is parallel.
From the high level, like the popular server NGINX and famous Python library Tornado, they both fully utilize asynchronous paradigm which is Single thread server could simultaneously serve thousands of clients (some IOloop and callback). Using ECF(exception control follow) which could implement the asynchronous programming paradigm. so asynchronous sometimes doesn't really do thing simultaneous, but some io bound work, asynchronous could really promotes the performance.
The parallel paradigm always refers multi-threading, and multiprocessing. This can fully utilize multi-core processors, do things really simultaneously.
Summary of all above answers
parallel computing:
▪ solves throughput issue.
Concerned with breaking a large task into smaller chunks
▪ is machine related (multi machine/core/cpu/processor needed), eg: master slave, map reduce.
Parallel computations usually involve a central control which distributes the work among several processors
asynchronous:
▪ solves latency issue
ie, the problem of 'waiting around' for an expensive operation to complete before you can do anything else
▪ is thread related (multi thread needed)
Threading (using Thread, Runnable, Executor) is one fundamental way to perform asynchronous operations in Java
Normally it is said that multi threaded programs are non-deterministic, meaning that if it crashes it will be next to impossible to recreate the error that caused the condition. One doesn't ever really know what thread is going to run next, and when it will be preempted again.
Of course this has to do with the OS thread scheduling algorithm and the fact that one doesn't know what thread is going to be run next, and how long it will effectively run.
Program execution order also plays a role as well, etc...
But what if you had the algorithm used for thread scheduling and what if you could know when what thread is running, could a multi threaded program then become "deterministic", as in, you'll be able to reproduce a crash?
Knowing the algorithm will not actually allow you to predict what will happen when. All kinds of delays that happen in the execution of a program or thread are dependent on environmental conditions such as: available memory, swapping, incoming interrupts, other busy tasks, etc.
If you were to map your multi-threaded program to a sequential execution, and your threads in themselves behave deterministically, then your whole program could be deterministic and 'concurrency' issues could be made reproducible. Of course, at that point they would not be concurrency issues any more.
If you would like to learn more, http://en.wikipedia.org/wiki/Process_calculus is very interesting reading.
My opinion is: technically no (but mathematically yes). You can write deterministic threading algorithm, but it will be extremely hard to predict state of the application after some sensible amount of time that you can treat it is non-deterministic.
There are some tools (in development) that will try to create race-conditions in a somewhat predictable manner but this is about forward-looking testing, not about reconstructing a 'bug in the wild'.
CHESS is an example.
It would be possible to run a program on a virtual multi-threaded machine where the allocation of virtual cycles to each thread was done via some entirely deterministic process, possibly using a pseudo-random generator (which could be seeded with a constant before each program run). Another, possibly more interesting, possibility would be to have a virtual machine which would alternate between running threads in 'splatter' mode (where almost any variable they touch would have its value become 'unknown' to other threads) and 'cleanup' mode (where results of operations with known operands would be visible and known to other threads). I would expect the situation would probably be somewhat analogous to hardware simulation: if the output of every gate is regarded as "unknown" between its minimum and maximum propagation times, but the simulation works anyway, that's a good indication the design is robust, but there are many useful designs which could not be constructed to work in such simulations (the states would be essentially guaranteed to evolve into a valid combination, though one could not guarantee which one). Still, it might be an interesting avenue of exploration, since large parts of many programs could be written to work correctly even in a 'splatter mode' VM.
I don't think it is practicable. To enforce a specific thread interleaving we require to place locks on shared variables, forcing the threads to access them in a specific order. This would cause severe performance degradation.
Replaying concurrency bugs is usually handled by record&replay systems. Since the recording of such large amounts of information also degrades performance, the most recent systems do partial logging and later complete the thread interleavings using SMT solving. I believe that the most recent advance in this type of systems is Symbiosis (published in this year's PLDI conference). Tou can find open source implementations in this URL:
http://www.gsd.inesc-id.pt/~nmachado/software/Symbiosis_Tutorial.html
This is actually a valid requirement in many systems today which want to execute tasks parallelly but also want some determinism from time to time.
For example, a mobile company would want to process subscription events of multiple users parallelly but would want to execute events of a single user one at a time.
One solution is to of course write everything to get executed on a single thread. Another solution is deterministic threading. I have written a simple library in Java that can be used to achieve the behavior I have described in the above example. Take a look at this- https://github.com/mukulbansal93/deterministic-threading.
Now, having said that, the actual allocation of CPU to a thread or process is in the hands of the OS. So, it is possible that the threads get the CPU cycles in a different order every time you run the same program. So, you cannot achieve the determinism in the order the threads are allocated CPU cycles. However, by delegating tasks effectively amongst threads such that sequential tasks are assigned to a single thread, you can achieve determinism in overall task execution.
Also, to answer your question about the simulation of a crash. All modern CPU scheduling algorithms are free from starvation. So, each and every thread is bound to get guaranteed CPU cycles. Now, it is possible that your crash was a result of the execution of a certain sequence of threads on a single CPU. There is no way to rerun that same execution order or rather the same CPU cycle allocation order. However, the combination of modern CPU scheduling algorithms being starvation-free and Murphy's law will help you simulate the error if you run your code enough times.
PS, the definition of enough times is quite vague and depends on a lot of factors like execution cycles need by the entire program, number of threads, etc. Mathematically speaking, a crude way to calculate the probability of simulating the same error caused by the same execution sequence is on a single processor is-
1/Number of ways to execute all atomic operations of all defined threads
For instance, a program with 2 threads with 2 atomic instructions each can be allocated CPU cycles in 4 different ways on a single processor. So probability would be 1/4.
Lots of crashes in multithreaded programs have nothing to do with the multithreading itself (or the associated resource contention).
Normally it is said that multi threaded programs are non-deterministic, meaning that if it crashes it will be next to impossible to recreate the error that caused the condition.
I disagree with this entirely, sure multi-threaded programs are non-deterministic, but then so are single-threaded ones, considering user input, message pumps, mouse/keyboard handling, and many other factors. A multi-threaded program usually makes it more difficult to reproduce the error, but definitely not impossible. For whatever reasons, program execution is not completely random, there is some sort of repeatability (but not predictability), I can usually reproduce multi-threaded bugs rather quickly in my apps, but then I have lots of verbose logging in my apps, for the end users' actions.
As an aside, if you are getting crashes, can't you also get crash logs, with call stack info? That will greatly aid in the debugging process.
I don’t want to make this subjective...
If I/O and other input/output-related bottlenecks are not of concern, then do we need to write multithreaded code? Theoretically the single threaded code will fare better since it will get all the CPU cycles. Right?
Would JavaScript or ActionScript have fared any better, had they been multithreaded?
I am just trying to understand the real need for multithreading.
I don't know if you have payed any attention to trends in hardware lately (last 5 years) but we are heading to a multicore world.
A general wake-up call was this "The free lunch is over" article.
On a dual core PC, a single-threaded app will only get half the CPU cycles. And CPUs are not getting faster anymore, that part of Moores law has died.
In the words of Herb Sutter The free lunch is over, i.e. the future performance path for computing will be in terms of more cores not higher clockspeeds. The thing is that adding more cores typically does not scale the performance of software that is not multithreaded, and even then it depends entirely on the correct use of multithreaded programming techniques, hence multithreading is a big deal.
Another obvious reason is maintaining a responsive GUI, when e.g. a click of a button initiates substantial computations, or I/O operations that may take a while, as you point out yourself.
The primary reason I use multithreading these days is to keep the UI responsive while the program does something time-consuming. Sure, it's not high-tech, but it keeps the users happy :-)
Most CPUs these days are multi-core. Put simply, that means they have several processors on the same chip.
If you only have a single thread, you can only use one of the cores - the other cores will either idle or be used for other tasks that are running. If you have multiple threads, each can run on its own core. You can divide your problem into X parts, and, assuming each part can run indepedently, you can finish the calculations in close to 1/Xth of the time it would normally take.
By definition, the fastest algorithm running in parallel will spend at least as much CPU time as the fastest sequential algorithm - that is, parallelizing does not decrease the amount of work required - but the work is distributed across several independent units, leading to a decrease in the real-time spent solving the problem. That means the user doesn't have to wait as long for the answer, and they can move on quicker.
10 years ago, when multi-core was unheard of, then it's true: you'd gain nothing if we disregard I/O delays, because there was only one unit to do the execution. However, the race to increase clock speeds has stopped; and we're instead looking at multi-core to increase the amount of computing power available. With companies like Intel looking at 80-core CPUs, it becomes more and more important that you look at parallelization to reduce the time solving a problem - if you only have a single thread, you can only use that one core, and the other 79 cores will be doing something else instead of helping you finish sooner.
Much of the multithreading is done just to make the programming model easier when doing blocking operations while maintaining concurrency in the program - sometimes languages/libraries/apis give you little other choice, or alternatives makes the programming model too hard and error prone.
Other than that the main benefit of multi threading is to take advantage of multiple CPUs/cores - one thread can only run at one processor/core at a time.
No. You can't continue to gain the new CPU cycles, because they exist on a different core and the core that your single-threaded app exists on is not going to get any faster. A multi-threaded app, on the other hand, will benefit from another core. Well-written parallel code can go up to about 95% faster- on a dual core, which is all the new CPUs in the last five years. That's double that again for a quad core. So while your single-threaded app isn't getting any more cycles than it did five years ago, my quad-threaded app has four times as many and is vastly outstripping yours in terms of response time and performance.
Your question would be valid had we only had single cores. The things is though, we mostly have multicore CPU's these days. If you have a quadcore and write a single threaded program, you will have three cores which is not used by your program.
So actually you will have at most 25% of the CPU cycles and not 100%. Since the technology today is to add more cores and less clockspeed, threading will be more and more crucial for performance.
That's kind of like asking whether a screwdriver is necessary if I only need to drive this nail. Multithreading is another tool in your toolbox to be used in situations that can benefit from it. It isn't necessarily appropriate in every programming situation.
Here are some answers:
You write "If input/output related problems are not bottlenecks...". That's a big "if". Many programs do have issues like that, remembering that networking issues are included in "IO", and in those cases multithreading is clearly worthwhile. If you are writing one of those rare apps that does no IO and no communication then multithreading might not be an issue
"The single threaded code will get all the CPU cycles". Not necessarily. A multi-threaded code might well get more cycles than a single threaded app. These days an app is hardly ever the only app running on a system.
Multithreading allows you to take advantage of multicore systems, which are becoming almost universal these days.
Multithreading allows you to keep a GUI responsive while some action is taking place. Even if you don't want two user-initiated actions to be taking place simultaneously you might want the GUI to be able to repaint and respond to other events while a calculation is taking place.
So in short, yes there are applications that don't need multithreading, but they are fairly rare and becoming rarer.
First, modern processors have multiple cores, so a single thraed will never get all the CPU cycles.
On a dualcore system, a single thread will utilize only half the CPU. On a 8-core CPU, it'll use only 1/8th.
So from a plain performance point of view, you need multiple threads to utilize the CPU.
Beyond that, some tasks are also easier to express using multithreading.
Some tasks are conceptually independent, and so it is more natural to code them as separate threads running in parallel, than to write a singlethreaded application which interleaves the two tasks and switches between them as necessary.
For example, you typically want the GUI of your application to stay responsive, even if pressing a button starts some CPU-heavy work process that might go for several minutes. In that time, you still want the GUI to work. The natural way to express this is to put the two tasks in separate threads.
Most of the answers here make the conclusion multicore => multithreading look inevitable. However, there is another way of utilizing multiple processors - multi-processing. On Linux especially, where, AFAIK, threads are implemented as just processes perhaps with some restrictions, and processes are cheap as opposed to Windows, there are good reasons to avoid multithreading. So, there are software architecture issues here that should not be neglected.
Of course, if the concurrent lines of execution (either threads or processes) need to operate on the common data, threads have an advantage. But this is also the main reason for headache with threads. Can such program be designed such that the pieces are as much autonomous and independent as possible, so we can use processes? Again, a software architecture issue.
I'd speculate that multi-threading today is what memory management was in the days of C:
it's quite hard to do it right, and quite easy to mess up.
thread-safety bugs, same as memory leaks, are nasty and hard to find
Finally, you may find this article interesting (follow this first link on the page). I admit that I've read only the abstract, though.
I've been investigating different scheduling algorithms for a thread pool I am implementing. Due to the nature of the problem I am solving I can assume that the tasks being run in parallel are independent and do not spawn any new tasks. The tasks can be of varying sizes.
I went immediately for the most popular scheduling algorithm "work stealing" using lock-free deques for the local job queues, and I am relatively happy with this approach. However I'm wondering whether there are any common cases where work-stealing is not the best approach.
For this particular problem I have a good estimate of the size of each individual task. Work-stealing does not make use of this information and I'm wondering if there is any scheduler which will give better load-balancing than work-stealing with this information (obviously with the same efficiency).
NB. This question ties up with a previous question.
I'd distribute the tasks upfront. With the information of their estimated running time you can distribute them into individual queues, for each thread one.
Distributing the tasks is basically the knapsack problem, each queue should take the same amount of time.
You should add some logic to modify the queues while they run. For example a re-distribution should occur after the estimated running time differs by a certain amount from the real running time.
It is true that work-stealing scheduler does not use that information, but it is because it does not depend on it to provide the theoretical limits it does (for example, the memory it uses, the expected total communication among workers and also the expected time to execute a fully strict computation as you can read here: http://supertech.csail.mit.edu/papers/steal.pdf)
One interesting paper (that I hope you can access: http://dl.acm.org/citation.cfm?id=2442538) actually uses bounded execution times to provide formal proofs (that try to be as close to the original work-stealing bounds as possible).
And yes, there are cases in which work-stealing does not perform optimally (for example, unbalanced tree searches and other particular cases). But for those cases, optimizations have been made (for example by allowing the steal of half of the victim's deque, instead of taking only one task: http://dl.acm.org/citation.cfm?id=571876).