Example problems for concurrent computation - programming-languages

There's a plethora of paradigms and methods for concurrent programming in use today. Software transactional memory, actors, shared state concurrency, tuple spaces and many, many more.
What I find lacking, however, is a library of interesting test problems for concurrency. One well known example is the "Dining Philosophers Problem", which is neither a complex enough nor motivating nor realistic one. Then there are many parallel algorithms (matrix multiplication, rendering, general nested data parallelism) that just require distribution of work, but no real concurrency with communication between threads of execution.
So, can anyone point me to some interesting sets of problems that require real concurrency in an interactive, perhaps even distributed environment, that are simple enough to use as examples for concurrency paradigms? Ideally, I want to find a set of problems to serve as a "lackmus-test" for concurrency paradigms (or to highlight their differences, as every paradigm has its strengths and weaknesses).
Any help is much appreciated :)

I've previously considered this exact issue, having previously proposed some concurrent programing paradigms myself :p
The conclusion I reached then is that such a test set doesn't seem to really exist in a language-independent manner. While it might be helpful for it to exist, there seem to be some fairly good reasons it doesn't (to the best of my knowledge).
Most of the focus within concurrent programming tends to be on data parallelism, such that the same operation is applied in parallel to different pieces of the same data set. The kinds of task-level parallelism (i.e. different tasks being performed in parallel, possibly sharing data) that I think you're talking about is actually not done very much. I think this is because it's kinda hard. But I think it's also kinda hard because most problems don't lend themselves particularly well to this kind of concurrency. Describing a distributed system in terms of concurrency primitives may be helpful, but these systems tend to be decoupled such that there is a protocol (written or implied) moderating their communication. People tend not to think of these kinds of systems as obviously "concurrent" programming situations, even though they are when viewed within the right frame (i.e. considering the "client" and "server" as agents operating in parallel with synchronisation at some points).
The only places I think you could find some sources of inspiration would be within individual implementations. Erlang, Occam (and Occam-pi), Alice, CML, Concurrent Haskell etc all are likely to have small test corpuses, but both the problems and their implementations are going to be biased towards being implementable within a specific language (because they obviously are implementable within that language!). Perhaps you could also look to the communities working on multi-party session types, and various process calculi such as pi-calculus, CCS and CSP to see what kinds of systems they are using as example models. The idea of a standard language-agnostic set of problems for describing concurrent communicating systems is appealing, but somewhat elusive at this point, I think.

Related

How Haskell handles parallel computing on a multicore machine/cluster

I'm considering a new language to learn those days to be used in high performance computing on a cluster of computers we have, among those languages, I'm considering Haskell.
I have read some about Haskell, but still have questions about using Haskell in high performance and distributed computing, which the language is known for, but I read some debates about Haskell is not good for those type of systems due to laziness, I can summarize my questions in the following lines:
Haskell uses green threads, which is great for handling big number of concurrent connections, but what happens when one of tasks takes longer than average and blocks the rest, does the whole thread block (Node.js style), forward the next task to another processor/thread (Golang), use reductions technique (Erlang), which kicks the task out of processing context after a pre-determined number of ticks, or else?
In a distributed computing environment, what happens to lazily-evaluated functions, do they have to be forced strict?
If one function/module requires strict evaluation, but it depends on other lazy functions/modules, shall I modify the code of other functions/modules to make them strict as well, or the compiler will handle this to me and force everything in that chain to strict or lazy.
When processing a very large sequence of data, how does Haskell handle parallel processing, is it by following some kind of implicit map-reduce technique, or I have do it by myself.
Is there a clustering abstract in the language, that handles the computing power for me, that automatically forwards the next task to the free processor wherever it is, be it on the same computer or another computer in the same cluster.
How does Haskell ensure fair-share of work is evenly distributed to all the available cores on the same computer or on the available cluster.
GHC uses a pool of available work (called sparks) and a work-stealing system: when a thread runs out of work, it will look for work in the pool or on the work queues of other threads that it can steal.
There is no built-in support for distributed computing as there is in (say) Erlang. The semantics are whatever your implementation defines. There are existing implementations like Cloud Haskell that you can look at for examples.
Neither. Haskell will automatically do whatever work is necessary to provide a value that is demanded and no more.
Haskell (and GHC in particular) does not do anything to automatically parallelize evaluation because there is no known universal strategy for parallelizing that is strictly better than not parallelizing. See Why is there no implicit parallelism in Haskell? for more info.
No. See (2).
For the same machine, it uses the pool of sparks and the work-stealing system described above. There is no notion of "clustering".
For an overview of parallel and concurrent programming in Haskell, see the free book of the same name by Simon Marlow, a primary author of GHC's runtime system.
Multithreading
As far as SMP parallelism† is concerned, Haskell is very effective. It's not quite automatic, but the parallel library makes it really easy to parallelise just about anything. Because the sparks are so cheap, you can be pretty careless and just ask for lots of parallelism; the runtime will then know what to do.
Unlike in most other languages, it is not a big problem if you have highly branched data structures, tricky dynamic algorithms etc. – thanks to the purely functional paradigm, parallel Haskell never needs to worry about locks when accessed data is shared.
I think the biggest caveat is memory: GHC's garbage collector is not concurrent, and the functional style is rather allocation-happy.
Apart from that, it's possible to write programs that look like they're parallel, but really don't do any work at all but just start and immediately return because of laziness. Some testing and experience is still necessary. But laziness and parallelism are not incompatible; at least not if you make sure you have big enough “chunks” of strictness in it. And forcing something strict is largely trivial.
Simpler, common parallelism tasks (which could be expressed in a map-reduce manner, or the classic array-vector stuff – the ones which are also easy in many languages) can generally be handled even easier in Haskell with libraries that parallelise the data structures; the best-known of these is repa.
Distributed computing
There has been quite some work on Cloud Haskell, which is basically Erlang in library form. This kind of task is less straightforward: the idea of any explicit message sending is a bit against Haskell's grain, and many aspects of the workflow become more cumbersome if the language is so heavily focused on its strong static typing (which is in Haskell otherwise often a huge bonus that doesn't just improve safety and performance but also makes it easier to write).
I think it's not far off to use Haskell in a distributed concurrent manner, but we can't say it's mature in that role yet. For distributed concurrent tasks, Erlang itself is certainly the way to go.
Clusters
Honestly, Haskell won't help you much at all here. A cluster is of course in principle a special case of a distributed setup, so you could employ Cloud Haskell; but in practice the needs are very different. The HPC world today (and probably quite some time into the future) hinges on MPI, and though there is a bit of existing work on MPI bindings, I haven't found them usable, at least not just like that.
MPI is definitely also quite against Haskell's grain, what with it's FORTRAN-oriented array centrism, weird ways of handling types and so on. But unless you go nuts with Haskell's cool features (though often it is so tempting!) there is no reason you couldn't write typical number-crunching code also in Haskell. The only problem is again support/maturity, but it's a considerable problem; so for cluster computing I'd recommend C++, Python or Julia instead.
An interesting alternative is to generate MPI-parallelised C or C++ code from Haskell. Paraiso is one nice project that does this.
Pipe dreams
I have often though about what could be done to make the distributed computing feasible in idiomatic Haskell. In principle I believe laziness could be a big help there. The model I'd envision is to let all machines compute independently the same program, but make use of the fact that Haskell evaluation has generally no predetermined order. The order would be randomised on each machine. Also the runtime would track how long some computation branch took to complete, and how big the result is. If a result is deemed both expensive and compact enough to warrant it, it would then be broadcast to the other nodes, together with some suitable hash that would allow them to shortcut that computation.
Such a system would never be quite as efficient as a hand-optimised MPI application, but it could at least offer the same asymptotics in many cases. And it could handle vastly more complex algorithms with ease.
But again, that's totally just my vague hopes for the not-so-near future.
†You said concurrency (which isn't so much about computation as about interaction), but it seems your question is in essence about pure computations?

Distributed Haskell state of the art in 2011?

I've read a lot of articles about distributed Haskell. Much work has been done but seems to be in the area of distributing computations. I saw the remote package which seems to implement Erlang-style messaging passing but it is 0.1 and early stage.
I'd like to implement a system where there are many separate processes that provide distinct services, and are tied together by several main processes. This seems to be a natural fit for Erlang, but not so for Haskell. But I like Haskell's type safety.
Has there been any recent adoption of Erlang-style process management in Haskell?
If you want to learn more about the remote package, a.k.a CloudHaskell, see the paper as well as Jeff Epstein's thesis. It aims to provide precisely the actor abstraction you want, but as you say it is in the early stages. There is active discussion regarding improvements on the parallel-haskell mailing list, so if you have specific needs that remote doesn't provide, we'd be happy for you to jump in and help us decide its future directions.
More mature but lower-level than remote is the haskell-mpi package. If you stick to the Simple interface, messages can be sent containing arbitrary Serialize instances, but the abstraction is still way lower than remote.
There are some experimental systems, such as described in Implementing a High-level Distributed-Memory Parallel Haskell in Haskell (Patrick Maier and Phil Trinder, IFL 2011, can't find a pdf online). It blends a monad-par approach of deterministic dataflow parallelism with a limited ability to make the I-structures serializable over the network. These sorts of abstraction have promise for doing distributed computation, but since the focus is on computing purely-functional values rather than providing Erlang-style processes, they probably wouldn't be a good fit for your application.
Also, for completeness, I should point out the Haskell wiki page on cloud and HPC Haskell, which covers what I describe here, as well as the subsection on distributed Haskell, which seems in need of a refresh.
I frequently get the feeling that IPC and actors are an oversold feature. There are plenty of attractive messaging systems out there that have Haskell bindings e.g. MessagePack, 0MQ or Thrift. IMHO the only thing you have to add is proper addressing of processes and decide who/what is managing this addressing capability.
By the way: a number of coders adopt e.g. 0MQ into their Erlang environments, simply because it offers the possibility to structure messaging via message brokers rather then relying on pure process to process messaging in super scale.
In a "massively multicore world" I personally assume that shared memory approaches will eventually be outperforming messaging. Someone can then always come and argue with asynchrony of course. But already when you write that you want to "tie together" your processes by "several main processes" you in fact speak about synchronization. Also, you can of course challenge whether a single function, process or thread is the right level of parallelization.
In short: I would probably see whether MessagePack or 0MQ could fit my needs in Haskell and care for the rest in my code.

Threading paradigm?

Are there any paradigm that give you a different mindset or have a different take to writing multi thread applications? Perhaps something that feels vastly different like procedural programming to function programming.
Concurrency has many different models for different problems. The Wikipedia page for concurrency lists a few models and there's also a page for concurrency patterns which has some good starting point for different kinds of ways to approach concurrency.
The approach you take is very dependent on the problem at hand. Different models solve various different issues that can arise in concurrent applications, and some build on others.
In class I was taught that concurrency uses mutual exclusion and synchronization together to solve concurrency issues. Some solutions only require one, but with both you should be able to solve any concurrency issue.
For a vastly different concept you could look at immutability and concurrency. If all data is immutable then the conventional approaches to concurrency aren't even required. This article explores that topic.
I don't really understand the question, but if you start doing some coding using CUDA give you some different way of thinking about multi-threading applications.
It differs from general multi-threading technics, like Semaphores, Monitors, etc. because you have thousands of threads concurrently. So the problem of parallelism in CUDA resides more in partitioning your data and mixing the chunks of data later.
Just a small example of a complete rethinking of a common serial problem is the SCAN algorithm. It is as simple as:
Given a SET {a,b,c,d,e}
I want the following set:
{a, a+b, a+b+c, a+b+c+d, a+b+c+d+e}
Where the symbol '+' in this case is any Commutattive operator (not only plus, you can do multiplication also).
How to do this in parallel? It's a complete rethink of the problem, it is described in this paper.
Many more implementations of different algorithms in CUDA can be found in the NVIDIA website
Well, a very conservative paradigm shift is from thread-centric concurrency (share everything) towards process-centric concurrency (address-space separation). This way one can avoid unintended data sharing and it's easier to enforce a communication policy between different sub-systems.
This idea is old and was propagated (among others) by the Micro-Kernel OS community to build more reliable operating systems. Interestingly, the Singularity OS prototype by Microsoft Research shows that traditional address spaces are not even required when working with this model.
The relatively new idea I like best is transactional memory: avoid concurrency issues by making sure updates are always atomic.
Have a looksee at OpenMP for an interesting variation.

Sample Problems for Multithreading Practice

I'm about to tackle what I see as a hard problem, I think. I need to multi-thread a pipeline of producers and consumers.
So I want to start small. What are some practice problems, in varying levels of difficulty, that would be good for multi-threading practice? (And not contrived, impractical examples you see in books not dedicated to concurrency).
What books or references would you recommend that focus on concurrency and give in-depth problems and cases?
(I'd rather not focus on the problem I want to solve. I just want to ask for good references and sample problems. This would be more useful to other users. I'm not stuck on the problem.)
The little book of semaphores is a good free book. The author takes a unique approach of first asking a problem and then presenting hints before answering. The problems increase in difficulty level gradually, and the book isn't written for any language in particular but covers general multithreading concepts.
If you have enough time to invest I would recommend the book "Concurrency: State Models & Java Programs, 2nd Edition" by Jeff Magee and Jeff Kramer, John Wiley&Sons 2006
You can ignore the Java part if you are using some other language
There's a language used to model processes and concurrent processes called FSP. It needs some time and energy to be invested in order to be proficient in the language. There's a tool (LTSA, both are free and supported by an Eclipse plugin or stand alone app) which verifies your models and make you pretty shure that your model is correct from the standpoint of concurrent execution.
Translating this models to your language constructs is then just a question of programming technique and few design patterns.
Most text book problems, like readers-writers, producers-consumers or dinning philosophers are all illustrations of the mutex. I would prefer to model a prototype which is a simplistic approximation the bigger problem and go ahead.
I have some times seen situations where dead-lock avoidance is what is needed and dead-lock prevention measures are being used. It is always a good idea to analyse if Banker's algorithm would suit the case or not.
Completely ignoring your request, I'll suggest that you should look at SEDA (staged event driven architecture) as a way to think about setting up a multi-threaded pipeline of producers and consumers.
I'm not sure what you are looking for. But in real world enterprise situation, we usually use some kind of messaging framework when doing producers consumers stuff. Tipically in Java, that's JMS. And you can use the excellent Spring Framework to help you along.
If you're working with Java at all (and possibly even if you're not), you should definitely read Java Concurrency In Practice.
To be honest, many real-world multithreading programs are not doing much more than reading/writing some value (whether string or int) -- circular buffers (as a network connection might need), readers/writers of log files, etc.
In fact, I'd say that if you implement (or find) a solid (and generic) circular buffer, and then run all thread-to-thread communication through those buffers as the only contact point, that'll cover a very large portion of any multithread syncing you might need to do. (Unless you're working in a buzzword-compliant environment, and need to tack "enterprise", "messaging", or whatever onto the buzzword list... or you're writing a database or operating system.)
(Note that "circular buffer" is a fairly C-centric term, being rooted in the relatively direct manipulation of a block of memory. Python's Queue class implements the same basic principle in a list-centric way, and I'm sure that numerous other languages have conceptually similar constructs under slightly different names...)

switch to parallel coding

we all writing code for single processor.
i wonder when we all are able to write code on multi processors?
what do we need (software tools, logic, algorithms) for this switching?
edit: in my view, as we do many task parallely, same way we need to convert those real life solutions(algorithms) to computer lang. just as OOPs coding did for procedural coding. OOPs is more real life coding style than procedural one. so i hope for that kind of solutions.
I think the most important requirement is a good language that has native constructs that support parallelism or one that can automatically generate parallel code. There are quite a few languages that fit that description, but none of them is popular enough to really be considered for mainstream use. That, in turn is caused by several things:
By their very nature, these languages are very different from today's imperative languages, and are therefor harder to learn (or at least seem that way).
They often lack good tools and libraries, making them unusable for any "real" project.
Of course, if it were more popular more people would be willing to learn it and there would be more support, so it's a kind of cycle that's pretty hard to break out of. I guess all we can do is hope. :)
An example of a language designed with heavy parallelization in mind is Erlang - and it's actually used in commercial projects.
What we need are natural abstractions for highly-concurrent algorithms. Actors (think: Erlang) go a long way in this direction, but they aren't a one-size-fits-all solution. Some more specific abstractions like fork/join or map/reduce can be even easier to apply to common problems.
The trick with all of these concurrency abstractions is they require functional-style programming. Concurrency doesn't mesh well with shared mutable state. As they say, "Locks considered harmful". Since most developers come from a strictly imperative background, switching to a shared-nothing continuation passing approach is often extremely challenging.
Incidentally, with respect to concurrency abstractions, Clojure has some very interesting features in this direction. Not only does it have sort-of actors, but it also defines a transactional memory model (think: databases) along with a global, atomic references mechanism. These two features allow concurrent operations to share "mutable" state without ever having to worry about locking or race conditions.
In the end, it comes down to education. Much of the needed theoretical work into concurrency abstractions has already been done, we just need to accept it. Unfortunately, as Erlang and Haskell prove, sometimes the best ideas remain relegated to an extremely fringe demographic. Hopefully efforts like Scala and Clojure will succeed in bringing the more advanced abstractions into the mainstream by sneaking them onto an existing, well-supported platform (the JVM).
Unfortunately for massive concurrent programming - unless there is a breakthrough in compilers to help, we will be throwing out a lot of what we know about algorithms (I think Don Knuth even said that). Read about Erlang for a glimpse of this possible future.
There are several tools/languages that are popular or are gaining popularity. If you use FORTRAN, C, or C++, you can use OpenMP (not too hard to implement) or the Message Passing Interface (MPI) libraries (powerful and greatest speedup potential, but also complex and difficult). OpenMP uses preprocessor directives to mark areas that can be parallelized, especially loops. MPI uses messages that pass data back and forth between processes, and the greatest difficulty is keeping everything synchronized without hitting bottlenecks and keeping processes waiting. I would say MPI is definitely on the way out, however. It's become clear in the scientific/high-performance computing communities that the speedup is rarely worth the additional development time.
As for up and coming languages, check out Fortress. It's still being designed, but the goal is to create a language even easier for scientific computing than FORTRAN. Programs will be specified in a very high level mathematical syntax. Additionally, parallelism will be implicit; the programmer will have to work to do things in serial. Plus, it's being championed by Sun and is based on java, so it will be portable.
There is no simple answer, and in many ways even the complex answers are currently inadequate or incomplete. You'll get a better answer if you are more specific about the replies you want: pointers to dev libraries and tools, instructional materials, pointers to current research projects and issues in this area, or something else?
The most important requirement is to be able to split your problem into smaller problems that can be solved independently of each other. Once you've worked out how you're going to do that, everything else is easier to think about and further questions of implementation (e.g. "parts of my calculation depend on other parts - how do I wait for them to have finished?") become concrete, specific things you can research or ask here about.
for java you can now look to Parallel Java Library or DPJ(deterministic Parallel Java!)
It will offer you great help in extracting parallelism from codes!!

Resources