Non-Trivial Lazy Evaluation

Non-Trivial Lazy Evaluation - haskell

I'm currently digesting the nice presentation Why learn Haskell? by Keegan McAllister. There he uses the snippet
minimum = head . sort
as an illustration of Haskell's lazy evaluation by stating that minimum has time-complexity O(n) in Haskell. However, I think the example is kind of academic in nature. I'm therefore asking for a more practical example where it's not trivially apparent that most of the intermediate calculations are thrown away.

Have you ever written an AI? Isn't it annoying that you have to thread pruning information (e.g. maximum depth, the minimum cost of an adjacent branch, or other such information) through the tree traversal function? This means you have to write a new tree traversal every time you want to improve your AI. That's dumb. With lazy evaluation, this is no longer a problem: write your tree traversal function once, to produce a huge (maybe even infinite!) game tree, and let your consumer decide how much of it to consume.
Writing a GUI that shows lots of information? Want it to run fast anyway? In other languages, you might have to write code that renders only the visible scenes. In Haskell, you can write code that renders the whole scene, and then later choose which pixels to observe. Similarly, rendering a complicated scene? Why not compute an infinite sequence of scenes at various detail levels, and pick the most appropriate one as the program runs?
You write an expensive function, and decide to memoize it for speed. In other languages, this requires building a data structure that tracks which inputs for the function you know the answer to, and updating the structure as you see new inputs. Remember to make it thread safe -- if we really need speed, we need parallelism, too! In Haskell, you build an infinite data structure, with an entry for each possible input, and evaluate the parts of the data structure that correspond to the inputs you care about. Thread safety comes for free with purity.
Here's one that's perhaps a bit more prosaic than the previous ones. Have you ever found a time when && and || weren't the only things you wanted to be short-circuiting? I sure have! For example, I love the <|> function for combining Maybe values: it takes the first one of its arguments that actually has a value. So Just 3 <|> Nothing = Just 3; Nothing <|> Just 7 = Just 7; and Nothing <|> Nothing = Nothing. Moreover, it's short-circuiting: if it turns out that its first argument is a Just, it won't bother doing the computation required to figure out what its second argument is.
And <|> isn't built in to the language; it's tacked on by a library. That is: laziness allows you to write brand new short-circuiting forms. (Indeed, in Haskell, even the short-circuiting behavior of (&&) and (||) aren't built-in compiler magic: they arise naturally from the semantics of the language plus their definitions in the standard libraries.)
In general, the common theme here is that you can separate the production of values from the determination of which values are interesting to look at. This makes things more composable, because the choice of what is interesting to look at need not be known by the producer.

Here's a well-known example I posted to another thread yesterday. Hamming numbers are numbers that don't have any prime factors larger than 5. I.e. they have the form 2^i*3^j*5^k. The first 20 of them are:
[1,2,3,4,5,6,8,9,10,12,15,16,18,20,24,25,27,30,32,36]
The 500000th one is:
1962938367679548095642112423564462631020433036610484123229980468750
The program that printed the 500000th one (after a brief moment of computation) is:
merge xxs#(x:xs) yys#(y:ys) =
case (x`compare`y) of
LT -> x:merge xs yys
EQ -> x:merge xs ys
GT -> y:merge xxs ys
hamming = 1 : m 2 `merge` m 3 `merge` m 5
where
m k = map (k *) hamming
main = print (hamming !! 499999)
Computing that number with reasonable speed in a non-lazy language takes quite a bit more code and head-scratching. There are a lot of examples here

Consider generating and consuming the first n elements of an infinite sequence. Without lazy evaluation, the naive encoding would run forever in the generation step, and never consume anything. With lazy evaluation, only as many elements are generated as the code tries to consume.

Related

how lazy is haskell when dealing with complicated program flow

I was wondering how smart/lazy Haskell is. Can I always be sure that Haskell will only do what is necessary to generate a certain output?

No.
Haskell specifies a denotational semantics for its core lambda-like calculus, and you can rely on that semantics. Additionally, there is a metatheory proof that a particular reduction order -- known colloquially as "lazy evaluation" -- realizes that semantics; and so many people use that as their mental model of how Haskell programs behave.
There are two broad categories of ways that a Haskell program may end up evaluating more than necessary:
The Haskell implementation may choose to evaluate more. GHC uses lazy evaluation in most places, but I believe it will use other evaluation orders for efficiency in some cases. You could also look at the Eager Haskell project, which is attempting to use another implementation strategy. In principle, an implementation would be within its rights to choose to speculatively fork some computations to another thread (and then throw away the results if they weren't needed). And so on and so forth -- there are many possible variations on this theme.
The denotational semantics specified may demand more evaluation than "necessary". For example, one that occasionally trips up beginners:
primes :: [Int]
primes = 2 : filter prime [3,5..]
prime :: Int -> Bool
prime x = and [x `mod` p /= 0 | p <- primes, p < x]
When checking whether 3 should be in the list primes, it is in fact not necessary to check any of the elements of primes past 2, because the sequence is strictly monotonically increasing. But Haskell is not (does not try to be) smart enough to notice that; it will go straight on trying to check the rest of the primes and end up in an infinite loop instead of giving the list of primes.
An even smaller example: you could think that x && False is always False, but x will typically be evaluated anyway, because the semantics says this should be an infinite loop if x is. (Contrast False && x, which typically does not result in evaluating x.)
That said, when you say "complex structure", one thing that comes to mind is: does Haskell do the laziness thing even with custom data types that I define? That is, do complex structures like hash maps and balanced trees and k-d trees and so forth get treated lazily? The answer there is yes, at least for GHC; there is nothing fundamentally special about any of the types in the Prelude except IO. Lists, booleans, Maybes, and so forth are lazy not because the compiler knows special things about them, but simply as a consequence of the lazy evaluation reduction rules specified by Haskell and their declarations as types.
Of course, there are ways to opt-out of laziness. Some of the structures you will see on Hackage do that in various ways; but don't worry, usually this will be declared in their documentation.

Is there a way to "preserve" results in Haskell?

I'm new to Haskell and understand that it is (basically) a pure functional language, which has the advantage that results to functions will not change across multiple evaluations. Given this, I'm puzzled by why I can't easily mark a function in such a way that its remembers the results of its first evaluation, and does not have to be evaluated again each time its value is required.
In Mathematica, for example, there is a simple idiom for accomplishing this:
f[x_]:=f[x]= ...
but in Haskell, the closest things I've found is something like
f' = (map f [0 ..] !!)
where f 0 = ...
f n = f' ...
which in addition to being far less clear (and apparently limited to Int arguments?) does not (seem to) preserve results within an interactive session.
Admittedly (and clearly), I don't understand exactly what's going on here; but naively, it seems like Haskel should have some way, at the function definition level, of
taking advantage of the fact that its functions are functions and skipping re-computation of their results once they have been computed, and
indicating a desire to do this at the function definition level with a simple and clean idiom.
Is there a way to accomplish this in Haskell that I'm missing? I understand (sort of) that Haskell can't store the evaluations as "state", but why can't it simply (in effect) redefine evaluated functions to be their computed value?
This grows out of this question, in which lack of this feature results in terrible performance.

Use a suitable library, such as MemoTrie.
import Data.MemoTrie
f' = memo f
where f 0 = ...
f n = f' ...
That's hardly less nice than the Mathematica version, is it?
Regarding
“why can't it simply (in effect) redefine evaluated functions to be their computed value?”
Well, it's not so easy in general. These values have to be stored somewhere. Even for an Int-valued function, you can't just allocate an array with all possible values – it wouldn't fit in memory. The list solution only works because Haskell is lazy and therefore allows infinite lists, but that's not particularly satisfying since lookup is O(n).
For other types it's simply hopeless – you'd need to somehow diagonalise an over-countably infinite domain.
You need some cleverer organisation. I don't know how Mathematica does this, but it probably uses a lot of “proprietary magic”. I wouldn't be so sure that it does really work the way you'd like, for any inputs.
Haskell fortunately has type classes, and these allow you to express exactly what a type needs in order to be quickly memoisable. HasTrie is such a class.

Benefit of avoiding multiple list traversals

I've seen many examples in functional languages about processing a list and constructing a function to do something with its elements after receiving some additional value (usually not present at the time the function was generated), such as:
Calculating the difference between each element and the average
(the last 2 examples under "Lazy Evaluation")
Staging a list append in strict functional languages such as ML/OCaml, to avoid traversing the first list more than once
(the section titled "Staging")
Comparing a list to another with foldr (i.e. generating a function to compare another list to the first)
listEq a b = foldr comb null a b
where comb x frec [] = False
comb x frec (e:es) = x == e && frec es
cmp1To10 = listEq [1..10]
In all these examples, the authors generally remark the benefit of traversing the original list only once. But I can't keep myself from thinking "sure, instead of traversing a list of N elements, you are traversing a chain of N evaluations, so what?". I know there must be some benefit to it, could someone explain it please?
Edit: Thanks to both for the answers. Unfortunately, that's not what I wanted to know. I'll try to clarify my question, so it's not confused with the (more common) one about creating intermediate lists (which I already read about in various places). Also thanks for correcting my post formatting.
I'm interested in the cases where you construct a function to be applied to a list, where you don't yet have the necessary value to evaluate the result (be it a list or not). Then you can't avoid generating references to each list element (even if the list structure is not referenced anymore). And you have the same memory accesses as before, but you don't have to deconstruct the list (pattern matching).
For example, see the "staging" chapter in the mentioned ML book. I've tried it in ML and Racket, more specifically the staged version of "append" which traverses the first list and returns a function to insert the second list at the tail, without traversing the first list many times. Surprisingly for me, it was much faster even considering it still had to copy the list structure as the last pointer was different on each case.
The following is a variant of map which after applied to a list, it should be faster when changing the function. As Haskell is not strict, I would have to force the evaluation of listMap [1..100000] in cachedList (or maybe not, as after the first application it should still be in memory).
listMap = foldr comb (const [])
where comb x rest = \f -> f x : rest f
cachedList = listMap [1..100000]
doubles = cachedList (2*)
squares = cachedList (\x -> x*x)
-- print doubles and squares
-- ...
I know in Haskell it doesn't make a difference (please correct me if I'm wrong) using comb x rest f = ... vs comb x rest = \f -> ..., but I chose this version to emphasize the idea.
Update: after some simple tests, I couldn't find any difference in execution times in Haskell. The question then is only about strict languages such as Scheme (at least the Racket implementation, where I tested it) and ML.

Executing a few extra arithmetic instructions in your loop body is cheaper than executing a few extra memory fetches, basically.
Traversals mean doing lots of memory access, so the less you do, the better. Fusion of traversals reduces memory traffic, and increases the straight line compute load, so you get better performance.
Concretely, consider this program to compute some math on a list:
go :: [Int] -> [Int]
go = map (+2) . map (^3)
Clearly, we design it with two traversals of the list. Between the first and the second traversal, a result is stored in an intermediate data structure. However, it is a lazy structure, so only costs O(1) memory.
Now, the Haskell compiler immediately fuses the two loops into:
go = map ((+2) . (^3))
Why is that? After all, both are O(n) complexity, right?
The difference is in the constant factors.
Considering this abstraction: for each step of the first pipeline we do:
i <- read memory -- cost M
j = i ^ 3 -- cost A
write memory j -- cost M
k <- read memory -- cost M
l = k + 2 -- cost A
write memory l -- cost M
so we pay 4 memory accesses, and 2 arithmetic operations.
For the fused result we have:
i <- read memory -- cost M
j = (i ^ 3) + 2 -- cost 2A
write memory j -- cost M
where A and M are the constant factors for doing math on the ALU and memory access.
There are other constant factors as well (two loop branches) instead of one.
So unless memory access is free (it is not, by a long shot) then the second version is always faster.
Note that compilers that operate on immutable sequences can implement array fusion, the transformation that does this for you. GHC is such a compiler.

There is another very important reason. If you traverse a list only once, and you have no other reference to it, the GC can release the memory claimed by the list elements as you traverse them. Moreover, if the list is generated lazily, you always have only a constant memory consumption. For example
import Data.List
main = do
let xs = [1..10000000]
sum = foldl' (+) 0 xs
len = foldl' (\_ -> (+ 1)) 0 xs
print (sum / len)
computes sum, but needs to keep the reference to xs and the memory it occupies cannot be released, because it is needed to compute len later. (Or vice versa.) So the program consumes a considerable amount of memory, the larger xs the more memory it needs.
However, if we traverse the list only once, it is created lazily and the elements can be GC immediately, so no matter how big the list is, the program takes only O(1) memory.
{-# LANGUAGE BangPatterns #-}
import Data.List
main = do
let xs = [1..10000000]
(sum, len) = foldl' (\(!s,!l) x -> (s + x, l + 1)) (0, 0) xs
print (sum / len)

Sorry in advance for a chatty-style answer.
That's probably obvious, but if we're talking about the performance, you should always verify hypotheses by measuring.
A couple of years ago I was thinking about the operational semantics of GHC, the STG machine. And I asked myself the same question — surely the famous "one-traversal" algorithms are not that great? It only looks like one traversal on the surface, but under the hood you also have this chain-of-thunks structure which is usually quite similar to the original list.
I wrote a few versions (varying in strictness) of the famous RepMin problem — given a tree filled with numbers, generate the tree of the same shape, but replace every number with the minimum of all the numbers. If my memory is right (remember — always verify stuff yourself!), the naive two-traversal algorithm performed much faster than various clever one-traversal algorithms.
I also shared my observations with Simon Marlow (we were both at an FP summer school during that time), and he said that they use this approach in GHC. But not to improve performance, as you might have thought. Instead, he said, for a big AST (such as Haskell's one) writing down all the constructors takes much space (in terms of lines of code), and so they just reduce the amount of code by writing down just one (syntactic) traversal.
Personally I avoid this trick because if you make a mistake, you get a loop which is a very unpleasant thing to debug.

So the answer to your question is, partial compilation. Done ahead of time, it makes it so that there's no need to traverse the list to get to the individual elements - all the references are found in advance and stored inside the pre-compiled function.
As to your concern about the need for that function to be traversed too, it would be true in interpreted languages. But compilation eliminates this problem.
In the presence of laziness this coding trick may lead to the opposite results. Having full equations, e.g. Haskell GHC compiler is able to perform all kinds of optimizations, which essentially eliminate the lists completely and turn the code into an equivalent of loops. This happens when we compile the code with e.g. -O2 switch.
Writing out the partial equations may prevent this compiler optimization and force the actual creation of functions - with drastic slowdown of the resulting code. I tried your cachedList code and saw a 0.01s execution time turn into 0.20s (don't remember right now the exact test I did).

ST Monad == code smell?

I'm working on implementing the UCT algorithm in Haskell, which requires a fair amount of data juggling. Without getting into too much detail, it's a simulation algorithm where, at each "step," a leaf node in the search tree is selected based on some statistical properties, a new child node is constructed at that leaf, and the stats corresponding to the new leaf and all of its ancestors are updated.
Given all that juggling, I'm not really sharp enough to figure out how to make the whole search tree a nice immutable data structure à la Okasaki. Instead, I've been playing around with the ST monad a bit, creating structures composed of mutable STRefs. A contrived example (unrelated to UCT):
import Control.Monad
import Control.Monad.ST
import Data.STRef
data STRefPair s a b = STRefPair { left :: STRef s a, right :: STRef s b }
mkStRefPair :: a -> b -> ST s (STRefPair s a b)
mkStRefPair a b = do
a' <- newSTRef a
b' <- newSTRef b
return $ STRefPair a' b'
derp :: (Num a, Num b) => STRefPair s a b -> ST s ()
derp p = do
modifySTRef (left p) (\x -> x + 1)
modifySTRef (right p) (\x -> x - 1)
herp :: (Num a, Num b) => (a, b)
herp = runST $ do
p <- mkStRefPair 0 0
replicateM_ 10 $ derp p
a <- readSTRef $ left p
b <- readSTRef $ right p
return (a, b)
main = print herp -- should print (10, -10)
Obviously this particular example would be much easier to write without using ST, but hopefully it's clear where I'm going with this... if I were to apply this sort of style to my UCT use case, is that wrong-headed?
Somebody asked a similar question here a couple years back, but I think my question is a bit different... I have no problem using monads to encapsulate mutable state when appropriate, but it's that "when appropriate" clause that gets me. I'm worried that I'm reverting to an object-oriented mindset prematurely, where I have a bunch of objects with getters and setters. Not exactly idiomatic Haskell...
On the other hand, if it is a reasonable coding style for some set of problems, I guess my question becomes: are there any well-known ways to keep this kind of code readable and maintainable? I'm sort of grossed out by all the explicit reads and writes, and especially grossed out by having to translate from my STRef-based structures inside the ST monad to isomorphic but immutable structures outside.

I don't use ST much, but sometimes it is just the best solution. This can be in many scenarios:
There are already well-known, efficient ways to solve a problem. Quicksort is a perfect example of this. It is known for its speed and in-place behavior, which cannot be imitated by pure code very well.
You need rigid time and space bounds. Especially with lazy evaluation (and Haskell doesn't even specify whether there is lazy evaluation, just that it is non-strict), the behavior of your programs can be very unpredictable. Whether there is a memory leak could depend on whether a certain optimization is enabled. This is very different from imperative code, which has a fixed set of variables (usually) and defined evaluation order.
You've got a deadline. Although the pure style is almost always better practice and cleaner code, if you are used to writing imperatively and need the code soon, starting imperative and moving to functional later is a perfectly reasonable choice.
When I do use ST (and other monads), I try to follow these general guidelines:
Use Applicative style often. This makes the code easier to read and, if you do switch to an immutable version, much easier to convert. Not only that, but Applicative style is much more compact.
Don't just use ST. If you program only in ST, the result will be no better than a huge C program, possibly worse because of the explicit reads and writes. Instead, intersperse pure Haskell code where it applies. I often find myself using things like STRef s (Map k [v]). The map itself is being mutated, but much of the heavy lifting is done purely.
Don't remake libraries if you don't have to. A lot of code written for IO can be cleanly, and fairly mechanically, converted to ST. Replacing all the IORefs with STRefs and IOs with STs in Data.HashTable was much easier than writing a hand-coded hash table implementation would have been, and probably faster too.
One last note - if you are having trouble with the explicit reads and writes, there are ways around it.

Algorithms which make use of mutation and algorithms which do not are different algorithms. Sometimes there is a strightforward bounds-preserving translation from the former to the latter, sometimes a difficult one, and sometimes only one which does not preserve complexity bounds.
A skim of the paper reveals to me that I don't think it makes essential use of mutation -- and so I think a potentially really nifty lazy functional algorithm could be developed. But it would be a different but related algorithm to that described.
Below, I describe one such approach -- not necessarily the best or most clever, but pretty straightforward:
Here's the setup a I understand it -- A) a branching tree is constructed B) payoffs are then pushed back from the leafs to the root which then indicates the best choice at any given step. But this is expensive, so instead, only portions of the tree are explored to the leafs in a nondeterministic manner. Furthermore, each further exploration of the tree is determined by what's been learned in previous explorations.
So we build code to describe the "stage-wise" tree. Then, we have another data structure to define a partially explored tree along with partial reward estimates. We then have a function of randseed -> ptree -> ptree that given a random seed and a partially explored tree, embarks on one further exploration of the tree, updating the ptree structure as we go. Then, we can just iterate this function over an empty seed ptree to get a list of increasingly more sampled spaces in the ptree. We then can walk this list until some specified cutoff condition is met.
So now we've gone from one algorithm where everything is blended together to three distinct steps -- 1) building the whole state tree, lazily, 2) updating some partial exploration with some sampling of a structure and 3) deciding when we've gathered enough samples.

It's can be really difficult to tell when using ST is appropriate. I would suggest you do it with ST and without ST (not necessarily in that order). Keep the non-ST version simple; using ST should be seen as an optimization, and you don't want to do that until you know you need it.

I have to admit that I cannot read the Haskell code. But if you use ST for mutating the tree, then you can probably replace this with an immutable tree without losing much because:
Same complexity for mutable and immutable tree
You have to mutate every node above the new leaf. An immutable tree has to replace all nodes above the modified node. So in both cases the touched nodes are the same, thus you don't gain anything in complexity.
For e.g. Java object creation is more expensive than mutation, so maybe you can gain a bit here in Haskell by using mutation. But this I don't know for sure. But a small gain does not buy you much because of the next point.
Updating the tree is presumably not the bottleneck
The evaluation of the new leaf will probably be much more expensive than updating the tree. At least this is the case for UCT in computer Go.

Use of the ST monad is usually (but not always) as an optimization. For any optimization, I apply the same procedure:
Write the code without it,
Profile and identify bottlenecks,
Incrementally rewrite the bottlenecks and test for improvements/regressions,
The other use case I know of is as an alternative to the state monad. The key difference being that with the state monad the type of all of the data stored is specified in a top-down way, whereas with the ST monad it is specified bottom-up. There are cases where this is useful.

What's the fuss about Haskell? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
I know a few programmers who keep talking about Haskell when they are among themselves, and here on SO everyone seems to love that language. Being good at Haskell seems somewhat like the hallmark of a genius programmer.
Can someone give a few Haskell examples that show why it is so elegant / superior?

This is the example that convinced me to learn Haskell (and boy am I glad I did).
-- program to copy a file --
import System.Environment
main = do
--read command-line arguments
[file1, file2] <- getArgs
--copy file contents
str <- readFile file1
writeFile file2 str
OK, it's a short, readable program. In that sense it's better than a C program. But how is this so different from (say) a Python program with a very similar structure?
The answer is lazy evaluation. In most languages (even some functional ones), a program structured like the one above would result in the entire file being loaded into memory, and then written out again under a new name.
Haskell is "lazy". It doesn't calculate things until it needs to, and by extension doesn't calculate things it never needs. For instance, if you were to remove the writeFile line, Haskell wouldn't bother reading anything from the file in the first place.
As it is, Haskell realises that the writeFile depends on the readFile, and so is able to optimise this data path.
While the results are compiler-dependent, what will typically happen when you run the above program is this: the program reads a block (say 8KB) of the first file, then writes it to the second file, then reads another block from the first file, and writes it to the second file, and so on. (Try running strace on it!)
... which looks a lot like what the efficient C implementation of a file copy would do.
So, Haskell lets you write compact, readable programs - often without sacrificing a lot of performance.
Another thing I must add is that Haskell simply makes it difficult to write buggy programs. The amazing type system, lack of side-effects, and of course the compactness of Haskell code reduces bugs for at least three reasons:
Better program design. Reduced complexity leads to fewer logic errors.
Compact code. Fewer lines for bugs to exist on.
Compile errors. Lots of bugs just aren't valid Haskell.
Haskell isn't for everyone. But everyone should give it a try.

The way it was pitched to me, and what I think is true after having worked on learning on Haskell for a month now, is the fact that functional programming twists your brain in interesting ways: it forces you to think about familiar problems in different ways: instead of loops, think in maps and folds and filters, etc. In general, if you have more than one perspective on a problem, it makes you better enabled to reason about this problem, and switch viewpoints as necessary.
The other really neat thing about Haskell is its type system. It's strictly typed, but the type inference engine makes it feel like a Python program that magically tells you when you've done a stupid type-related mistake. Haskell's error messages in this regard are somewhat lacking, but as you get more acquainted with the language you'll say to yourself: this is what typing is supposed to be!

You are kind of asking the wrong question.
Haskell is not a language where you go look at a few cool examples and go "aha, I see now, that's what makes it good!"
It's more like, we have all these other programming languages, and they're all more or less similar, and then there's Haskell which is totally different and wacky in a way that's totally awesome once you get used to the wackiness. But the problem is, it takes quite a while to acclimate to the wackiness. Things that set Haskell apart from almost any other even-semi-mainstream language:
Lazy evaluation
No side effects (everything is pure, IO/etc happens via monads)
Incredibly expressive static type system
as well as some other aspects that are different from many mainstream languages (but shared by some):
functional
significant whitespace
type inferred
As some other posters have answered, the combination of all these features means that you think about programming in an entirely different way. And so it's hard to come up with an example (or set of examples) that adequately communicates this to Joe-mainstream-programmer. It's an experiential thing. (To make an analogy, I can show you photos of my 1970 trip to China, but after seeing the photos, you still won't know what it was like to have lived there during that time. Similarly, I can show you a Haskell 'quicksort', but you still won't know what it means to be a Haskeller.)

What really sets Haskell apart is the effort it goes to in its design to enforce functional programming. You can program in a functional style in pretty much any language, but it's all too easy to abandon at the first convenience. Haskell does not allow you to abandon functional programming, so you must take it to its logical conclusion, which is a final program that is easier to reason about, and sidesteps a whole class of the thorniest types of bugs.
When it comes to writing a program for real world use, you may find Haskell lacking in some practical fashion, but your final solution will be better for having known Haskell to begin with. I'm definitely not there yet, but so far learning Haskell has been much more enlightening than say, Lisp was in college.

Part of the fuss is that purity and static typing enable for parallelism combined with aggressive optimisations. Parallel languages are hot now with multicore being a bit disruptive.
Haskell gives you more options for parallelism than pretty much any general purpose language, along with a fast, native code compiler. There is really no competition with this kind of support for parallel styles:
semi-implicit parallelism via thread sparks
explicit threads
data parallel arrays
actors and message passing
transactional memory
So if you care about making your multicore work, Haskell has something to say.
A great place to start is with Simon Peyton Jones' tutorial on parallel and concurrent programming in Haskell.

I've spent the last year learning Haskell and writing a reasonably large and complex project in it. (The project is an automated options trading system, and everything from the trading algorithms to the parsing and handling of low-level, high-speed market data feeds is done in Haskell.) It's considerably more concise and easier to understand (for those with appropriate background) than a Java version would be, as well as extremely robust.
Possibly the biggest win for me has been the ability to modularize control flow through things such as monoids, monads, and so on. A very simple example would be the Ordering monoid; in an expression such as
c1 `mappend` c2 `mappend` c3
where c1 and so on return LT, EQ or GT, c1 returning EQ causes the expression to continue, evaluating c2; if c2 returns LT or GT that's the value of the whole, and c3 is not evaluated. This sort of thing gets considerably more sophisticated and complex in things like monadic message generators and parsers where I may be carrying around different types of state, have varying abort conditions, or may want to be able to decide for any particular call whether abort really means "no further processing" or means, "return an error at the end, but carry on processing to collect further error messages."
This is all stuff it takes some time and probably quite some effort to learn, and thus it can be hard to make a convincing argument for it for those who don't already know these techniques. I think that the All About Monads tutorial gives a pretty impressive demonstration of one facet of this, but I wouldn't expect that anybody not familiar with the material already would "get it" on the first, or even the third, careful reading.
Anyway, there's lots of other good stuff in Haskell as well, but this is a major one that I don't see mentioned so often, probably because it's rather complex.

Software Transactional Memory is a pretty cool way to deal with concurrency. It's much more flexible than message passing, and not deadlock prone like mutexes. GHC's implementation of STM is considered one of the best.

For an interesting example you can look at:
http://en.literateprograms.org/Quicksort_(Haskell)
What is interesting is to look at the implementation in various languages.
What makes Haskell so interesting, along with other functional languages, is the fact that you have to think differently about how to program. For example, you will generally not use for or while loops, but will use recursion.
As is mentioned above, Haskell and other functional languages excel with parallel processing and writing applications to work on multi-cores.

I couldn't give you an example, I'm an OCaml guy, but when I'm in such a situation as yourself, curiosity just takes hold and I have to download a compiler/interpreter and give it a go. You'll likely learn far more that way about the strengths and weaknesses of a given functional language.

One thing I find very cool when dealing with algorithms or mathematical problems is Haskell's inherent lazy evaluation of computations, which is only possible due to its strict functional nature.
For example, if you want to calculate all primes, you could use
primes = sieve [2..]
where sieve (p:xs) = p : sieve [x | x<-xs, x `mod` p /= 0]
and the result is actually an infinite list. But Haskell will evaluate it left from right, so as long as you don't try to do something that requires the entire list, you can can still use it without the program getting stuck in infinity, such as:
foo = sum $ takeWhile (<100) primes
which sums all primes less than 100. This is nice for several reasons. First of all, I only need to write one prime function that generates all primes and then I'm pretty much ready to work with primes. In an object-oriented programming language, I would need some way to tell the function how many primes it should compute before returning, or emulate the infinite list behavior with an object. Another thing is that in general, you end up writing code that expresses what you want to compute and not in which order to evaluate things - instead the compiler does that for you.
This is not only useful for infinite lists, in fact it gets used without you knowing it all the time when there is no need to evaluate more than necessary.

I find that for certain tasks I am incredibly productive with Haskell.
The reason is because of the succinct syntax and the ease of testing.
This is what the function declaration syntax is like:
foo a = a + 5
That's is simplest way I can think of defining a function.
If I write the inverse
inverseFoo a = a - 5
I can check that it is an inverse for any random input by writing
prop_IsInverse :: Double -> Bool
prop_IsInverse a = a == (inverseFoo $ foo a)
And calling from the command line
jonny#ubuntu: runhaskell quickCheck +names fooFileName.hs
Which will check that all the properties in my file are held, by randomly testing inputs a hundred times of so.
I don't think Haskell is the perfect language for everything, but when it comes to writing little functions and testing, I haven't seen anything better. If your programming has a mathematical component this is very important.

I agree with others that seeing a few small examples is not the best way to show off Haskell. But I'll give some anyway. Here's a lightning-fast solution to Euler Project problems 18 and 67, which ask you to find the maximum-sum path from the base to the apex of a triangle:
bottomUp :: (Ord a, Num a) => [[a]] -> a
bottomUp = head . bu
where bu [bottom] = bottom
bu (row : base) = merge row $ bu base
merge [] [_] = []
merge (x:xs) (y1:y2:ys) = x + max y1 y2 : merge xs (y2:ys)
Here is a complete, reusable implementation of the BubbleSearch algorithm by Lesh and Mitzenmacher. I used it to pack large media files for archival storage on DVD with no waste:
data BubbleResult i o = BubbleResult { bestResult :: o
, result :: o
, leftoverRandoms :: [Double]
}
bubbleSearch :: (Ord result) =>
([a] -> result) -> -- greedy search algorithm
Double -> -- probability
[a] -> -- list of items to be searched
[Double] -> -- list of random numbers
[BubbleResult a result] -- monotone list of results
bubbleSearch search p startOrder rs = bubble startOrder rs
where bubble order rs = BubbleResult answer answer rs : walk tries
where answer = search order
tries = perturbations p order rs
walk ((order, rs) : rest) =
if result > answer then bubble order rs
else BubbleResult answer result rs : walk rest
where result = search order
perturbations :: Double -> [a] -> [Double] -> [([a], [Double])]
perturbations p xs rs = xr' : perturbations p xs (snd xr')
where xr' = perturb xs rs
perturb :: [a] -> [Double] -> ([a], [Double])
perturb xs rs = shift_all p [] xs rs
shift_all p new' [] rs = (reverse new', rs)
shift_all p new' old rs = shift_one new' old rs (shift_all p)
where shift_one :: [a] -> [a] -> [Double] -> ([a]->[a]->[Double]->b) -> b
shift_one new' xs rs k = shift new' [] xs rs
where shift new' prev' [x] rs = k (x:new') (reverse prev') rs
shift new' prev' (x:xs) (r:rs)
| r <= p = k (x:new') (prev' `revApp` xs) rs
| otherwise = shift new' (x:prev') xs rs
revApp xs ys = foldl (flip (:)) ys xs
I'm sure this code looks like random gibberish. But if you read Mitzenmacher's blog entry and understand the algorithm, you'll be amazed that it's possible to package the algorithm into code without saying anything about what you're searching for.
Having given you some examples as you asked for, I will say that the best way to start to appreciate Haskell is to read the paper that gave me the ideas I needed to write the DVD packer: Why Functional Programming Matters by John Hughes. The paper actually predates Haskell, but it brilliantly explains some of the ideas that make people like Haskell.

For me, the attraction of Haskell is the promise of compiler guaranteed correctness. Even if it is for pure parts of the code.
I have written a lot of scientific simulation code, and have wondered so many times if there was a bug in my prior codes, which could invalidate a lot of current work.

it has no loop constructs. not many languages have this trait.

If you can wrap your head around the type system in Haskell I think that in itself is quite an accomplishment.

I agree with those that said that functional programming twists your brain into seeing programming from a different angle. I've only used it as a hobbyist, but I think it fundamentally changed the way I approach a problem. I don't think I would have been nearly as effective with LINQ without having been exposed to Haskell (and using generators and list comprehensions in Python).

To air a contrarian view: Steve Yegge writes that Hindely-Milner languages lack the flexibility required to write good systems:
H-M is very pretty, in a totally
useless formal mathematical sense. It
handles a few computation constructs
very nicely; the pattern matching
dispatch found in Haskell, SML and
OCaml is particularly handy.
Unsurprisingly, it handles some other
common and highly desirable constructs
awkwardly at best, but they explain
those scenarios away by saying that
you're mistaken, you don't actually
want them. You know, things like, oh,
setting variables.
Haskell is worth learning, but it has its own weaknesses.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string