Non-deterministic programming languages - multithreading

I know in Prolog you can do something like
someFunction(List) :-
someOtherFunction(X, List)
doSomethingWith(X)
% and so on
This will not iterate over every element in List; instead, it will branch off into different "machines" (by using multiple threads, backtracking on a single thread, creating parallel universes or what have you), with a separate execution for every possible value of X that causes someOtherFunction(X, List) to return true!
(I have no idea how it does this, but that's not important to the question)
My question is: What other non-deterministic programming languages are out there? It seems like non-determinism is the simplest and most logical way to implement multi-threading in a language with immutable variables, but I've never seen this done before - Why isn't this technique more popular?

Prolog is actually deterministic—the order of evaluation is prescribed, and order matters.
Why isn't nondeterminism more popular?
Nondeterminism is unpopular because it makes it harder to reason about the outcomes of your programs, and truly nondeterministic executions (as opposed to semantics) are hard to implement.
The only nondeterministic languages I'm aware of are
Dijkstra's calculus of guarded commands, which he wanted never to be implemented
Concurrent ML, in which communications may be synchronized nondeterministically
Gerard Holzmann's Promela language, which is the language of the model checker SPIN
SPIN does actually use the nondeterminism and explores the entire state space when it can.
And of course any multithreaded language behaves nondeterministically if the threads are not synchronized, but that's exactly the sort of thing that's difficult to reason about—and why it's so hard to implement efficient, correct lock-free data structures.
Incidentally, if you are looking to achieve parallelism, you can achieve the same thing by a simple map function in a pure functional language like Haskell. There's a reason Google MapReduce is based on functional languages.

The Wikipedia article points to Amb which is a Scheme-derivative with capacities for non-deterministic programming.
As far as I understand, the main reason why programming languages do not do that is because running a non-deterministic program on a deterministic machine (as are all existing computers) is inherently expensive. Basically, a non-deterministic Turing machine can solve complex problems in polynomial time, for which no polynomial algorithm for a deterministic Turing machine is known. In other words, non-deterministic programming fails to capture the essence of algorithmics in the context of existing computers.
The same problem impacts Prolog. Any efficient, or at least not-awfully-inefficient Prolog application must use the "cut" operator to avoid exploring an exponential number of paths. That operator works only as long as the programmer has a good mental view of how the Prolog interpreter will explore the possible paths, in a deterministic and very procedural way. Things which are very procedural do not mix well with functional programming, since the latter is mostly an effort of not thinking procedurally at all.
As a side note, in between deterministic and non-deterministic Turing machines, there is the "quantum computing" model. A quantum computer, assuming that one exists, does not do everything that a non-deterministic Turing machine can do, but it can do more than a deterministic Turing machine. There are people who are currently designing programming languages for the quantum computer (assuming that a quantum computer will ultimately be built). Some of those new languages are functional. You may find a host of useful links on this Wikipedia page. Apparently, designing a quantum programming language, functional or not, and using it, is not easy and certainly not "simple".

One example of a non-deterministic language is Occam, based on CSP theory. The combination of the PAR and ALT constructs can give rise to non-deterministic behaviour in multiprocessor systems, implementing fine grain parallel programs.
When using soft channels, i.e. channels between processes on the same processor, the implementation of ALT will make the behaviour close to deterministic†, but as soon as you start using hard channels (physical off-processor communication links) any illusion of determinism vanishes. Different remote processors are not expected to be synchronised in any way and they may not even have the same core or clock speed.
†The ALT construct is often implemented with a PRI ALT, so you have to explicitly code in fairness if you need it to be fair.
Non-determinism is seen as a disadvantage when it comes to reasoning about and proving programs correct, but in many ways once you've accepted it, you are freed from many of the constraints that determinism forces on your reasoning.
As long as the sequencing of communication doesn't lead to deadlock, which can be done by applying CSP techniques, then the precise order in which things are done should matter much less than whether you get the results that you want in time.
It was arguably this lack of determinism which was a major factor in preventing the adoption of Occam and Transputer systems in military projects, dominated by Ada at the time, where knowing precisely what a CPU was doing at every clock cycle was considered essential to proving a system correct. Without this constraint, Occam and the Transputer systems it ran on (the only CPUs at the time with a formally proven IEEE floating point implementation) would have been a perfect fit for hard real-time military systems needing high levels of processing functionality in a small space.

In Prolog you can have both non-determinism and concurrency. Non-determinism is what you described in your question concerning the example code. You can imagine that a Prolog clause is full of implicit amb statements. It is less known that concurrency is also supported by logic-programming.
History says:
The first concurrent logic programming language was the Relational
Language of Clark and Gregory, which was an offshoot of IC-Prolog.
Later versions of concurrent logic programming include Shapiro's
Concurrent Prolog and Ueda's Guarded Horn Clause language GHC.
https://en.wikipedia.org/wiki/Concurrent_logic_programming
But today we might just go with treads inside logic programming. Here is an example to implement a findall via threads. This can also be modded to perform all kinds of tasks on the collection, or maybe even produce agent networks towards distributed artificial intelligence.

I believe Haskell has the capability to construct and non-deterministic machine. Haskell at first may seem too difficult and abstract for practical use, but it's actually very powerful.

There is a programming language for non-deterministic problems which is called as "control network programming". If you want more information go to http://controlnetworkprogramming.com. This site is still in progress but you can read some info about it.

Java 2K
Note: Before you click the link and being disappointed: This is an esoteric language and has nothing to do with parallelism.

The Sly programming language under development at IBM Research is an attempt to include the non-determinism inherent in multi-threaded execution in the execution of certain types of algorithms. Looks to be very much a work in progress though.

Related

How Haskell handles parallel computing on a multicore machine/cluster

I'm considering a new language to learn those days to be used in high performance computing on a cluster of computers we have, among those languages, I'm considering Haskell.
I have read some about Haskell, but still have questions about using Haskell in high performance and distributed computing, which the language is known for, but I read some debates about Haskell is not good for those type of systems due to laziness, I can summarize my questions in the following lines:
Haskell uses green threads, which is great for handling big number of concurrent connections, but what happens when one of tasks takes longer than average and blocks the rest, does the whole thread block (Node.js style), forward the next task to another processor/thread (Golang), use reductions technique (Erlang), which kicks the task out of processing context after a pre-determined number of ticks, or else?
In a distributed computing environment, what happens to lazily-evaluated functions, do they have to be forced strict?
If one function/module requires strict evaluation, but it depends on other lazy functions/modules, shall I modify the code of other functions/modules to make them strict as well, or the compiler will handle this to me and force everything in that chain to strict or lazy.
When processing a very large sequence of data, how does Haskell handle parallel processing, is it by following some kind of implicit map-reduce technique, or I have do it by myself.
Is there a clustering abstract in the language, that handles the computing power for me, that automatically forwards the next task to the free processor wherever it is, be it on the same computer or another computer in the same cluster.
How does Haskell ensure fair-share of work is evenly distributed to all the available cores on the same computer or on the available cluster.
GHC uses a pool of available work (called sparks) and a work-stealing system: when a thread runs out of work, it will look for work in the pool or on the work queues of other threads that it can steal.
There is no built-in support for distributed computing as there is in (say) Erlang. The semantics are whatever your implementation defines. There are existing implementations like Cloud Haskell that you can look at for examples.
Neither. Haskell will automatically do whatever work is necessary to provide a value that is demanded and no more.
Haskell (and GHC in particular) does not do anything to automatically parallelize evaluation because there is no known universal strategy for parallelizing that is strictly better than not parallelizing. See Why is there no implicit parallelism in Haskell? for more info.
No. See (2).
For the same machine, it uses the pool of sparks and the work-stealing system described above. There is no notion of "clustering".
For an overview of parallel and concurrent programming in Haskell, see the free book of the same name by Simon Marlow, a primary author of GHC's runtime system.
Multithreading
As far as SMP parallelism† is concerned, Haskell is very effective. It's not quite automatic, but the parallel library makes it really easy to parallelise just about anything. Because the sparks are so cheap, you can be pretty careless and just ask for lots of parallelism; the runtime will then know what to do.
Unlike in most other languages, it is not a big problem if you have highly branched data structures, tricky dynamic algorithms etc. – thanks to the purely functional paradigm, parallel Haskell never needs to worry about locks when accessed data is shared.
I think the biggest caveat is memory: GHC's garbage collector is not concurrent, and the functional style is rather allocation-happy.
Apart from that, it's possible to write programs that look like they're parallel, but really don't do any work at all but just start and immediately return because of laziness. Some testing and experience is still necessary. But laziness and parallelism are not incompatible; at least not if you make sure you have big enough “chunks” of strictness in it. And forcing something strict is largely trivial.
Simpler, common parallelism tasks (which could be expressed in a map-reduce manner, or the classic array-vector stuff – the ones which are also easy in many languages) can generally be handled even easier in Haskell with libraries that parallelise the data structures; the best-known of these is repa.
Distributed computing
There has been quite some work on Cloud Haskell, which is basically Erlang in library form. This kind of task is less straightforward: the idea of any explicit message sending is a bit against Haskell's grain, and many aspects of the workflow become more cumbersome if the language is so heavily focused on its strong static typing (which is in Haskell otherwise often a huge bonus that doesn't just improve safety and performance but also makes it easier to write).
I think it's not far off to use Haskell in a distributed concurrent manner, but we can't say it's mature in that role yet. For distributed concurrent tasks, Erlang itself is certainly the way to go.
Clusters
Honestly, Haskell won't help you much at all here. A cluster is of course in principle a special case of a distributed setup, so you could employ Cloud Haskell; but in practice the needs are very different. The HPC world today (and probably quite some time into the future) hinges on MPI, and though there is a bit of existing work on MPI bindings, I haven't found them usable, at least not just like that.
MPI is definitely also quite against Haskell's grain, what with it's FORTRAN-oriented array centrism, weird ways of handling types and so on. But unless you go nuts with Haskell's cool features (though often it is so tempting!) there is no reason you couldn't write typical number-crunching code also in Haskell. The only problem is again support/maturity, but it's a considerable problem; so for cluster computing I'd recommend C++, Python or Julia instead.
An interesting alternative is to generate MPI-parallelised C or C++ code from Haskell. Paraiso is one nice project that does this.
Pipe dreams
I have often though about what could be done to make the distributed computing feasible in idiomatic Haskell. In principle I believe laziness could be a big help there. The model I'd envision is to let all machines compute independently the same program, but make use of the fact that Haskell evaluation has generally no predetermined order. The order would be randomised on each machine. Also the runtime would track how long some computation branch took to complete, and how big the result is. If a result is deemed both expensive and compact enough to warrant it, it would then be broadcast to the other nodes, together with some suitable hash that would allow them to shortcut that computation.
Such a system would never be quite as efficient as a hand-optimised MPI application, but it could at least offer the same asymptotics in many cases. And it could handle vastly more complex algorithms with ease.
But again, that's totally just my vague hopes for the not-so-near future.
†You said concurrency (which isn't so much about computation as about interaction), but it seems your question is in essence about pure computations?

Is natural language Turing complete?

I'm pretty sure a human language (e.g. English) is powerful enough to simulate a Turing machine, which would make it Turing complete. However, that would imply natural languages are no more or less expressive than programming languages, which seems questionable.
Is natural language Turing complete?
First of all "Is language X Turing complete" is only a well-defined question given a well-defined semantics for language X. It is nearly impossible to define one for natural languages due to natural languages' complex nature and reliance on context and intuition. Most (all?) natural languages don't even have a well-defined syntax.
That aside, your main confusion is based on the assumption that it's not possible for a computational model to be strictly more powerful than a Turing machine, i.e. be able to simulate a Turing machine, but also to express computations that a Turing machine can not. This is not true. For example we can extend Turing machines with oracles and we get a computational model that's strictly more powerful than plain Turing machines.
In the same vein we could define a programming language MagicLang that can do everything an ordinary programming language can do plus solve the halting problem. Defining a semantics for such a language is easy: just take the semantics of the language we used as a basis and add a function bool halts(string src, string input) with the semantics "returns true if the program described by the source code src successfully terminates after a finite amount of time when given the input input". So that's easy. What's hard, or rather impossible, is implementing this language.
Now one may argue that natural language can also describe the halting problem and our brain can "execute" natural language, i.e. it can answer the question "does this program halt". So if we could build a computer that could do everything our brain can do, it should be able to do this as well. But the thing is our brain can't solve the halting problem with 100% accuracy. Our brain can't even execute regular programs with 100% accuracy. Just remember how often you've stepped through a program in your head and came up with a different result than reality. Our brain is very good at learning, making intuitive connections and applying heuristics, but those things always come with the risk of giving the wrong result.
So could a computer do the same thing? Yes, we can use heuristics and machine learning to approach otherwise unsolvable problems and with that normal programming languages can attempt to solve every problem that can be described in natural language (even the undecidable ones). But just like the brain, those programs will sometimes give wrong results. In fact they will give wrong results much more often as our machine learning algorithms and heuristics aren't nearly as advanced as those of the human brain.
If a software language is sufficiently complex that it can be used to define arbitrary extensions to itself (such as defining arbitrary new functions), then it's clearly Turing-complete.
Using natural language I can, given sufficient time, teach another human terminology and concepts to extend their understanding and ability to discuss arbitrary subjects that they previously couldn't -- I could teach them copyright law, or astrophysics, for example (if they didn't already know them). So, while this may be more of an analogy than an exact identity, there does seem to be a Turing-completeness-like property to natural languages: they can be used to define and transmit arbitrary extensions to themselves. (Admittedly, not every human is really cut out to learn astrophysics -- but then any non-idealized Turning machine has only some finite amount of memory, so it's always possible to define a program that it can't run because it doesn't have enough memory.)

What does "powerful" mean, when discussing programming languages?

In the context of programming language discussion/comparison, what does the term "power" mean?
Does it have a well defined meaning? Even a poorly defined meaning?
Say if someone says "language X is more powerful than language Y" or asks the same as a question, what do they mean - or what information are they trying to find out?
It does not have a well-defined meaning. In these types of discussions, "language X is more powerful than language Y" usually means little more than "I like language X more than language Y." On the other end of the spectrum, you'll also usually have someone chime in about how any Turing-complete language can accomplish the same tasks as any other Turing-complete language, so that neither is strictly more powerful than the other.
I think a good meaning for it is expressivity. When a language is highly expressive, it means less code is required to express concepts. To me, this doesn't just mean that you have to write less code to accomplish the same tasks, but also that the code is easily readable by humans. Of course, generally (to a point), having fewer lines of code to read makes the task of reading and understanding easier for humans.
Having a "powerful" standard library comes into play here along the same lines. If a language comes equipped with thorough, complete libraries, then idiomatic code in that language will be able to benefit from the existing library code and not have to repeat or reinvent common functionality in application code. The end result is, again, having to write and read less code to accomplish the same tasks.
I keep saying "generally" and "to a point", because once a language gets too terse, it gets more difficult for humans to decipher. I suppose at this extreme, a language may still be considered "more powerful" (or even "too powerful"). So I guess I'm saying my personal interpretation of "powerful" includes some aspects of "useful" and "readable" in it as well.
C is powerful, because it is low level and gives you access to hardware. Python is powerful because you can prototype quickly. Lisp is powerful because its REPL gives you fantastic debugging opportunities. SQL is powerful because you say what you want and the DMBS will figure out the best way to do it for you. Haskell is powerful because each function can be tested in isolation. C++ is powerful because it has ten times the number of syntactic constructs that any one person ever needs or uses. APL is powerful since it can squeeze a ten-screen program into ten characters. Hell, COBOL is powerful because... why else would all the banks be using it? :)
"Powerful" has no real technical meaning, but lots of people have made proposals.
A couple of the more interesting ones:
Paul Graham wants to call a language "more powerful" if you can write the same programs in fewer lines of code (or some other sane, sensible measure of program size).
Matthias Felleisen has written a very serious theoretical study called On the Expressive Power of Programming Language.
As someone who knows and uses many programming languages, I believe that there are real differences between languages, and that "power" can be a convenient shorthand to describe ways in which one language might be better than another. Nevertheless, whenever I hear a discussion or claim that one language is more powerful than another, I tend to keep one hand firmly on my wallet.
The only meaningful way to describe "power" in a programming language is "can do what I require with the least amount of resources" where "resources" is defined as "whatever costs I'd rather not pay" and could, thus, be development time, CPU time, memory space, money, etc.
So basically the definition of "power" is purely subjective and rendered meaningless in any objective discussion.
Powerful means "high in power". "Power" is something that increases your ability to do things. "Things" vary in shape, size and other things. Loosely speaking therefore, "powerful" when applied to a programming language means that it helps you to do perform your tasks quickly and efficiently.
This makes "powerful" somewhat well defined but not constant across domains. A language powerful in one domain might be crippling in another eg. C is very powerful if you want to do systems level programming since it gives you direct access to the machine and hardware and structures that let you code much faster than you would in assembly. C compilers also produce tight code that runs fast. However, once you move to web applications, C can become very "unpowerful" and crippling since it's so much effort to get something up and running and you have to worry about a lot of extraneous details like memory etc.
Sometimes, languages are "powerful" in multiple domains. This gives them a general "powerful" tag (or badge since were are on SO here). PG's claim is that with LISP, this is the case. That might be true or might not be.
At the end of the day, "powerful" is a loaded word so you should evaluate who is saying it, why he's saying it and what it means to to your work.
There are really only two meanings people are worried about:
"Powerful" in the sense of "takes less resources (time, money, programmers, LOC, etc.) to achieve the same/better result", and "powerful" in the sense of "is capable of doing a wide range of tasks".
Some languages are extrememly resource-effective for a small range of tasks. Others are not so resource-effective but can be applied to a wide range of tasks (e.g. C, which is often used in OS development, creation of compilers and runtime libraries, and work with microcontrollers).
Which of these two meanings someone has in mind when they use the term "powerful" depends on the context (and even then is not always clear). Indeed often it is a bit of both.
Typically there are two distinct meanings:
Expressive, meaning the code tends to be very short and understandable
Low level, meaning you have very fine-grained control over the hardware.
For the most languages, these two definitions are at opposite ends of the spectrum: Python is very expressive but not very low level; C is very low level but not very expressive. Depending on which definition you pick, either language is powerful or not powerful.
nothing absolutely nothing.
To high level programmers it might mean alot of available datatypes built in. Or maybe abstractions to easily create or follow Design Patterns.
Paul Graham is a very high level guy here is what he has to say:
http://www.paulgraham.com/avg.html
Java guys might tell you something about portability, the power to reach every platform.
C/UNIX programmers may tell you that its speed and efficiency, complete control over every inch of memory.
VHDL/Verilog programmers will tell you its complete control over every clock and gate so as to not waste any electricity or time.
But in my opinion a "powerful language" supports all of the features for you to complete your task. Documentation may be important, or perhaps it is portability, or the ability to do graphics. It could be anything, writing a gui from Assembly is just stupid, so is trying to design an embedded processor in flash.
Choosing a language that suits your needs perfectly will always feel like power.
I view the term as marketing fluff, no one well-defined meaning.
If you consider, say, Assembler, C, and C++. On occasions one drops from C++ "down" to C for particualr needs, and in turn from C down to assembler. So that make assembler the most powerful because it's the only language that can do everything. Or, to argue the other way, a single line of C++ code can replace several of C (hiding polymorphic dispatch via function pointers for example) and a single line of C replaces many of assembler. So C++ is more powerful because one line does "more".
I think the term had some currency when products such as early databases and spreadsheets had in-built languages, some quite restricted. So vendors would tout their language as being "powerful" because it was less restricted.
It can have several meanings. In the very basic sense there's power as far as what is computable. In that sense the most powerful languages are Turing Complete which includes pretty much every general purpose programming language (as opposed to most markup languages and domain specific languages which are often not Turing complete).
In a more pragmatic sense it often refers to how concisely (and readably) you can do certain things. Basically how easy is it to do certain tasks in one language compared to another.
What language is more powerful (besides being somewhat subjective) depends heavily on what you're trying to do. If your requirements are to get something running on a small device with 64k of memory you're likely not going to be using Java. Most likely the right language would be C or C++ (or if you're really hard core assembly). If you need a very simple CRUD app done in 1 day, maybe something like Ruby On Rails would be the way to go (I know Rails is a framework and Ruby is the language, but these days what libraries and frameworks are available factor greatly into picking a language)
I think that, perhaps coincidentally, the physics definition of power is relevant here: "The rate at which work is performed."
Of course, a toaster does not perform very quickly the work of putting out fires. Similarly, the power of a programming language is not universal, but specific to the domain or task to which it is being applied. C is a powerful language for writing device drivers or implementations of higher-level languages; Python is a powerful language for writing general-purpose applications; XPath is a powerful language for writing queries on structured data sets.
So given a problem domain, the power of a language can be said to be the rate at which a competent programmer is able to use it to solve problems in that domain.
A precise answer can be tried to reach, by not assuming that the elements that define "powerful" (in the context of languages) come from so many dimensions.
See how many could be, and a lot will be missing:
runtime speed
code size
expressiveness
supported paradigms
development / debugging time
domain specialization
standard libs
codebase
toolchain ecosystem
portability
community
support / documentation
popularity
(add more here)
These and more parameters draw together X picture of how "programming in some language" would be like at X level. That will be only the definition, though, the only real knowledge comes with the actual practice of using the language, but i digress.
The question comes down to which parameter will represent the intrinsic quality of a language. If you refer to a language in itself, its ultimate, intrinsic purpose is "express things", and thus the most representative parameter is rightfully expressiveness, and is also one that resonates frequently when someone talks about how powerful a language is.
At the moment you try to widen the question/answer to cover more than the expressiveness of the language "as a language, as a tongue", you are more talking about different kinds of "environment", social environment, development environment, commercial environment, etc.
Depending of the complexity of the environment to be defined you'll have to mix more parameters that come from multiple, vast, overlapping and sometimes contradictory dimensions, and eventually the point of getting the definition will be lost or the question will have to be narrowed.
This approximation still won't answer "what is an expressive language", but, again, a common understanding are the definitions that Vineet well points out in its answer, and Forest remarks in the comments. I agree, for me "expression" is "conveying meaning".
I remember many instructors in college calling whatever language they were teaching "powerful".
Leads me to think:
Powerful = a relative term comparing the latest way to code something vs. the original or previous way.
I find it useless to use the word "powerful" in regards to discussing anything software related. Every time my professor in college would introduce a new concept such as polymorphism he would say "so this is a really powerful feature". After a while I got annoyed. If everything is powerful then nothing is. It's all the same. You can write code to do anything. Does is really matter how much code is required to do it? You can say it's short or efficient but powerful is just useless. Nuclear energy is powerful. Code is words.
I think that power would normally refer to how quickly it can process data, for example I found that in python as soon as a list exceeds a length of approx. 2000 it becomes unbearably slow whereas in C++ a list can easily contain 20,000 entries without doing so.

Why are functional languages considered a boon for multi threaded environments?

I hear a lot about functional languages, and how they scale well because there is no state around a function; and therefore that function can be massively parallelized.
However, this makes little sense to me because almost all real-world practical programs need/have state to take care of. I also find it interesting that most major scaling libraries, i.e. MapReduce, are typically written in imperative languages like C or C++.
I'd like to hear from the functional camp where this hype I'm hearing is coming from..
It's important to add one word: "there's no shared state".
Any meaningful program (in any language) changes the state of the world. But (some) functional languages make it impossible to access the same resource from multiple threads simultaneously. The absence of shared state makes multithreading safe.
Functional languages such as Haskell, Scheme and others have what are called "pure functions". A pure function is a function with no side effects. It doesn't modify any other state in the program. This is by definition threadsafe.
Of course you can write pure functions in imperative languages. You also find multi-paradigm languages like Python, Ruby and even C# where you can do imperative programming, functional programming or both.
But the point of Haskell (etc) is that you can't write a non-pure function. Well that's not strictly true but it's mostly true.
Similarly, many imperative languages have immutable objects for much the same reason. An immutable object is one whose state doesn't change once created. Again by definition an immutable object is threadsafe.
You're talking about two different things and don't realize it.
Yes, most real-world programs have state somewhere, but if you want to do multithreading, that state should not be everywhere, and in fact, the fewer places it's in, the better. In functional programs, the default is not to have state, and you can introduce state exactly where you need it and nowhere else. Those parts that are dealing with state will not be as easily multithreaded, but since all the rest of your program is free of side-effects and thus it doesn't matter what order those parts are executed in, it removes a huge barrier to parallelization.
However, this makes little sense to me because almost all real-world
practical programs need/have state to take care of.
You'd be surprised! Yes, all programs need some state (I/O in particular) but often you don't need much more. Just because most programs have heaps of state doesn't mean they need it.
Programming in a functional language encourages you to use less state, and thus your programs become easier to parallelise.
Many functional languages are "impure" which means they allow some state. Haskell doesn't, but Haskell has monads which basically let you get something from nothing: you get state using stateless constructs. Monads are a bit fiddly to work with which is why Haskell gives you a strong incentive to restrict state to as small a part of your program as possible.
I also find it interesting that most major scaling libraries, i.e.
MapReduce, are typically written in imperative languages like C or C++.
Programming concurrent applications is "hard" in C/C++. That's why it's best to do all the dangerous stuff in a library which is heavily tested and inspected. But you still get the flexibility and performance of C/C++.
Higher order functions. Consider a simple reduction operation, summing the elements of an array. In an imperative language, programmers typically write themselves a loop and perform reductions one element at a time.
But that code isn't easy to make multi-threaded. When you write a loop you're assuming an order of operations and you have to spell out how to get from one element to the next. You'd really like to just say "sum the array" and have the compiler, or runtime, or whatever, make the decision about how to work through the array, dividing up the task as necessary between multiple cores, and combining those results together. So instead of writing a loop, with some addition code embedded inside it, an alternative is to pass something representing "addition" into a function that can do the divvying. As soon as you do that, you're writing functionally. You're passing a function (addition) into another function (the reducer). If you write this way then it not only makes more readable code, but when you change architecture, or want to write for heterogeneous architecture, you don't have to change the summer, just the reducer. In practice you might have many different algorithms that all share one reducer so this is a big payoff.
This is just a simple example. You may want to build on this. Functions to apply other functions on 2D arrays, functions to apply functions to tree structures, functions to combine functions to apply functions (eg. if you have a hierarchical structure with trees above and arrays below) and so on.

How difficult is Haskell multi-threading?

I have heard that in Haskell, creating a multi-threaded application is as easy as taking a standard Haskell application and compiling it with the -threaded flag. Other cases, however, have described the use of a par command within the actual source code.
What is the state of Haskell multi-threading? How easy is it to introduce into programs? Is there a good multi-threading tutorial that goes over these different commands and their uses?
What is the state of Haskell multi-threading?
Mature. The implementation is around 15 years old, with transactional memory for 5 years. GHC is a widely used compiler, with large open source support, and commercial backing.
How easy is it to introduce into programs?
This depends on the algorithm. Sometimes it can be a one line use of par to gain parallelism. Sometimes new algorithms must be developed. In general it will be easier to introduce safe parallelism and concurrency in Haskell, than in typical languages, and performance is good.
Is there a good multi-threading tutorial that goes over these different commands and their uses?
There are 3 major parallel and concurrent programming models in Haskell.
implicit parallelism via par
explicit concurrency and parallelism via forkIO / MVars and software transactional memory
data parallelism via the DPH libraries
These are the main things. In all cases you compile with -threaded to use the multicore runtime, but how easy it is to parallelise a particular problem depends on the algorithm you use, and the parallel programming model you adopt from that list.
Here is an introduction to the main parallel programming models in Haskell, and how to achieve speedups.
I think Chapter 24 of Real World Haskell is a good tutorial.
There is also concurrency term.
Without any changes in code your haskell rts will try to use them for some internal process, but to use in your application you should give a hint that's done by par b (f a b) which forces Haskell to be not so lazy on caculation of b even if f will not require it for result.
One of the reason to not do that for every function which require its all its arguments (like a+b) is that synchronization (scheduling calculations and waiting for results) gives some overhead and you probably don't want to spend extra-ticks for (2*3)+(3*4) just because you can calculate multiplications in parallel. And you probably will loose some cache-hits or something like this or optimizations which is done when you do that on single processor (i.e. you'll need to pass result from one processor to another anyway).
Of course code which is uses par is ugly and when you fold list or some other data structures with light sub-elements you'll probably want to calculate some chunks of that light elements to make sure that overhead/calc will be really small. To resolve that you can look at parallel.
There is also Data Parallel Haskell (DPH).
If your program is more about IO monad than you definitely need many changes. See forkIO, Software Transactional Memory (STM) and many others from Concurrency category

Resources