I'm an intermediate Haskell programmer with tons of experience in strict FP and non-FP languages. Most of my Haskell code analyzes moderately large datasets (10^6..10^9 things), so laziness is always lurking. I have a reasonably good understanding of thunks, WHNF, pattern matching, and sharing, and I've been able to fix leaks with bang patterns and seq, but this profile-and-pray approach feels sordid and wrong.
I want to know how experienced Haskell programmers approach laziness at design time. I'm not asking about easy items like Data.ByteString.Lazy or foldl'; rather, I want to know how you think about the lower-level lazy machinery that causes runtime memory problems and tricky debugging.
How do you think about thunks, pattern matching, and sharing during design time?
What design patterns and idioms do you use to avoid leaks?
How did you learn these patterns and idioms, and do you have some good refs?
How do you avoid premature optimization of non-leaking non-problems?
(Amended 2014-05-15 for time budgeting):
Do you budget substantial project time for finding and fixing memory problems?
Or, do your design skills typically circumvent memory problems, and you get the expected memory consumption very early in the development cycle?
I think most of the trouble with "strictness leaks" happens because people don't have a good conceptual model. Haskellers without a good conceptual model tend to have and propagate the superstition that stricter is better. Perhaps this intuition comes from their results from toying with small examples & tight loops. But it is incorrect. It's just as important to be lazy at the right times as to be strict at the right times.
There are two camps of data types, usually referred to as "data" and "codata". It is essential to respect the patterns of each one.
Operations which produce "data" (Int, ByteString, ...) must be forced close to where they occur. If I add a number to an accumulator, I am careful to make sure that it will be forced before I add another one. A good understanding of laziness is very important here, especially its conditional nature (i.e. strictness propositions don't take the form "X gets evaluated" but rather "when Y is evaluated, so is X").
Operations which produce and consume "codata" (lists most of the time, trees, most other recursive types) must do so incrementally. Usually codata -> codata transformation should produce some information for each bit of information they consume (modulo skipping like filter). Another important piece for codata is that you use it linearly whenever possible -- i.e. use the tail of a list exactly once; use each branch of a tree exactly once. This ensures that the GC can collect pieces as they are consumed.
Things take a special amount of care when you have codata that contains data. E.g. iterate (+1) 0 !! 1000 will end up producing a size-1000 thunk before evaluating it. You need to think about conditional strictness again -- the way to prevent this case is to ensure that when a cons of the list is consumed, the addition of its element occurs. iterate violates this, so we need a better version.
iterate' :: (a -> a) -> a -> [a]
iterate' f x = x : (x `seq` iterate' f (f x))
As you start composing things, of course it gets harder to tell when bad cases happen. In general it is hard to make efficient data structures / functions that work equally well on data and codata, and it's important to keep in mind which is which (even in a polymorphic setting where it's not guaranteed, you should have one in mind and try to respect it).
Sharing is tricky, and I think I approach it mostly on a case-by-case basis. Because it's tricky, I try to keep it localized, choosing not to expose large data structures to module users in general. This can usually be done by exposing combinators for generating the thing in question, and then producing and consuming it all in one go (the codensity transformation on monads is an example of this).
My design goal is to get every function to be respectful of the data / codata patterns of my types. I can usually hit it (though sometimes it requires some heavy thought -- it has become natural over the years), and I seldom have leak problems when I do. But I don't claim that it's easy -- it requires experience with the canonical libraries and patterns of the language. These decisions are not made in isolation, and everything has to be right at once for it to work well. One poorly tuned instrument can ruin the whole concert (which is why "optimization by random perturbation" almost never works for these kinds of issues).
Apfelmus's Space Invariants article is helpful for developing your space/thunk intuition further. Also see Edward Kmett's comment below.
I'm struggling with what Super Combinators are:
A supercombinator is either a constant, or a combinator which contains only supercombinators as subexpressions.
And also with what Constant Applicative Forms are:
Any super combinator which is not a lambda abstraction. This includes truly constant expressions such as 12, ((+) 1 2), [1,2,3] as well as partially applied functions such as ((+) 4). Note that this last example is equivalent under eta abstraction to \ x -> (+) 4 x which is not a CAF.
This is just not making any sense to me! Isn't ((+) 4) just as "truly constant" as 12? CAFs sound like values to my simple mind.
These Haskell wiki pages you reference are old, and I think unfortunately written. Particularly unfortunate is that they mix up CAFs and supercombinators. Supercombinators are interesting but unrelated to GHC. CAFs are still very much a part of GHC, and can be understood without reference to supercombinators.
So let's start with supercombinators. Combinators derive from combinatory logic, and, in the usage here, consist of functions which only apply the values passed in to one another in one or another form -- i.e. they combine their arguments. The most famous set of combinators are S, K, and I, which taken together are Turing-complete. Supercombinators, in this context, are functions built only of values passed in, combinators, and other supercombinators. Hence any supercombinator can be expanded, through substitution, into a plain old combinator.
Some compilers for functional languages (not GHC!) use combinators and supercombinators as intermediate steps in compilation. As with any similar compiler technology, the reason for doing this is to admit optimization analysis that is more easily performed in such a simplified, minimal language. One such core language built on supercombinators is Edwin Brady's epic.
Constant Applicative Forms are something else entirely. They're a bit more subtle, and have a few gotchas. The way to think of them is as an aspect of compiler implementation with no separate semantic meaning but with a potentially profound effect on runtime performance. The following may not be a perfect description of a CAF, but it'll try to convey my intuition of what one is, since I haven't seen a really good description anywhere else for me to crib from. The clean "authoritative" description in the GHC Commentary Wiki reads as follows:
Constant Applicative Forms, or CAFs for short, are top-level values
defined in a program. Essentially, they are objects that are not
allocated dynamically at run-time but, instead, are part of the static
data of the program.
That's a good start. Pure, functional, lazy languages can be thought of in some sense as a graph reduction machine. The first time you demand the value of a node, that forces its evaluation, which in turn can demand the values of subnodes, etc. One a node is evaluated, the resultant value sticks around (although it does not have to stick around -- since this is a pure language we could always keep the subnodes live and recalculate with no semantic effect). A CAF is indeed just a value. But, in the context, a special kind of value -- one which the compiler can determine has a meaning entirely dependent on its subnodes. That is to say:
foo x = ...
where thisIsACaf = [1..10::Int]
thisIsNotACaf = [1..x::Int]
thisIsAlsoNotACaf :: Num a => [a]
thisIsAlsoNotACaf = [1..10] -- oops, polymorphic! the "num" dictionary is implicitly a parameter.
thisCouldBeACaf = const [1..10::Int] x -- requires a sufficiently smart compiler
thisAlsoCouldBeACaf _ = [1..10::Int] -- also requires a sufficiently smart compiler
So why do we care if things are CAFs? Basically because sometimes we really really don't want to recompute something (for example, a memotable!) and so want to make sure it is shared properly. Other times we really do want to recompute something (e.g. a huge boring easy to generate list -- such as the naturals -- which we're just walking over) and not have it stick around in memory forever. A combination of naming things and binding them under lets or writing them inline, etc. typically lets us specify these sorts of things in a natural, intuitive way. Occasionally, however, the compiler is smarter or dumber than we expect, and something we think should only be computed once is always recomputed, or something we don't want to hang on to gets lifted out as a CAF. Then, we need to think things through more carefully. See this discussion to get an idea about some of the trickiness involved: A good way to avoid "sharing"?
[By the way, I don't feel up to it, but anyone that wants to should feel free to take as much of this answer as they want to try and integrate it with the existing Haskell Wiki pages and improve/update them]
Matt is right in that the definition is confusing. It is even contradictory. A CAF is defined as:
Any super combinator which is not a lambda abstraction. This includes
truly constant expressions such as 12, ((+) 1 2), [1,2,3] as
well as partially applied functions such as ((+) 4).
Hence, ((+) 4) is seen as a CAF. But in the very next sentence we're told it is equivalent to something that is not a CAF:
this last example is equivalent under eta abstraction to \ x -> (+) 4 x which is not a CAF.
It would be cleaner to rule out partially applied functions on the ground that they are equivalent to lambda abstractions.
I'm currently digesting the nice presentation Why learn Haskell? by Keegan McAllister. There he uses the snippet
minimum = head . sort
as an illustration of Haskell's lazy evaluation by stating that minimum has time-complexity O(n) in Haskell. However, I think the example is kind of academic in nature. I'm therefore asking for a more practical example where it's not trivially apparent that most of the intermediate calculations are thrown away.
Have you ever written an AI? Isn't it annoying that you have to thread pruning information (e.g. maximum depth, the minimum cost of an adjacent branch, or other such information) through the tree traversal function? This means you have to write a new tree traversal every time you want to improve your AI. That's dumb. With lazy evaluation, this is no longer a problem: write your tree traversal function once, to produce a huge (maybe even infinite!) game tree, and let your consumer decide how much of it to consume.
Writing a GUI that shows lots of information? Want it to run fast anyway? In other languages, you might have to write code that renders only the visible scenes. In Haskell, you can write code that renders the whole scene, and then later choose which pixels to observe. Similarly, rendering a complicated scene? Why not compute an infinite sequence of scenes at various detail levels, and pick the most appropriate one as the program runs?
You write an expensive function, and decide to memoize it for speed. In other languages, this requires building a data structure that tracks which inputs for the function you know the answer to, and updating the structure as you see new inputs. Remember to make it thread safe -- if we really need speed, we need parallelism, too! In Haskell, you build an infinite data structure, with an entry for each possible input, and evaluate the parts of the data structure that correspond to the inputs you care about. Thread safety comes for free with purity.
Here's one that's perhaps a bit more prosaic than the previous ones. Have you ever found a time when && and || weren't the only things you wanted to be short-circuiting? I sure have! For example, I love the <|> function for combining Maybe values: it takes the first one of its arguments that actually has a value. So Just 3 <|> Nothing = Just 3; Nothing <|> Just 7 = Just 7; and Nothing <|> Nothing = Nothing. Moreover, it's short-circuiting: if it turns out that its first argument is a Just, it won't bother doing the computation required to figure out what its second argument is.
And <|> isn't built in to the language; it's tacked on by a library. That is: laziness allows you to write brand new short-circuiting forms. (Indeed, in Haskell, even the short-circuiting behavior of (&&) and (||) aren't built-in compiler magic: they arise naturally from the semantics of the language plus their definitions in the standard libraries.)
In general, the common theme here is that you can separate the production of values from the determination of which values are interesting to look at. This makes things more composable, because the choice of what is interesting to look at need not be known by the producer.
Here's a well-known example I posted to another thread yesterday. Hamming numbers are numbers that don't have any prime factors larger than 5. I.e. they have the form 2^i*3^j*5^k. The first 20 of them are:
[1,2,3,4,5,6,8,9,10,12,15,16,18,20,24,25,27,30,32,36]
The 500000th one is:
1962938367679548095642112423564462631020433036610484123229980468750
The program that printed the 500000th one (after a brief moment of computation) is:
merge xxs#(x:xs) yys#(y:ys) =
case (x`compare`y) of
LT -> x:merge xs yys
EQ -> x:merge xs ys
GT -> y:merge xxs ys
hamming = 1 : m 2 `merge` m 3 `merge` m 5
where
m k = map (k *) hamming
main = print (hamming !! 499999)
Computing that number with reasonable speed in a non-lazy language takes quite a bit more code and head-scratching. There are a lot of examples here
Consider generating and consuming the first n elements of an infinite sequence. Without lazy evaluation, the naive encoding would run forever in the generation step, and never consume anything. With lazy evaluation, only as many elements are generated as the code tries to consume.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
I know a few programmers who keep talking about Haskell when they are among themselves, and here on SO everyone seems to love that language. Being good at Haskell seems somewhat like the hallmark of a genius programmer.
Can someone give a few Haskell examples that show why it is so elegant / superior?
This is the example that convinced me to learn Haskell (and boy am I glad I did).
-- program to copy a file --
import System.Environment
main = do
--read command-line arguments
[file1, file2] <- getArgs
--copy file contents
str <- readFile file1
writeFile file2 str
OK, it's a short, readable program. In that sense it's better than a C program. But how is this so different from (say) a Python program with a very similar structure?
The answer is lazy evaluation. In most languages (even some functional ones), a program structured like the one above would result in the entire file being loaded into memory, and then written out again under a new name.
Haskell is "lazy". It doesn't calculate things until it needs to, and by extension doesn't calculate things it never needs. For instance, if you were to remove the writeFile line, Haskell wouldn't bother reading anything from the file in the first place.
As it is, Haskell realises that the writeFile depends on the readFile, and so is able to optimise this data path.
While the results are compiler-dependent, what will typically happen when you run the above program is this: the program reads a block (say 8KB) of the first file, then writes it to the second file, then reads another block from the first file, and writes it to the second file, and so on. (Try running strace on it!)
... which looks a lot like what the efficient C implementation of a file copy would do.
So, Haskell lets you write compact, readable programs - often without sacrificing a lot of performance.
Another thing I must add is that Haskell simply makes it difficult to write buggy programs. The amazing type system, lack of side-effects, and of course the compactness of Haskell code reduces bugs for at least three reasons:
Better program design. Reduced complexity leads to fewer logic errors.
Compact code. Fewer lines for bugs to exist on.
Compile errors. Lots of bugs just aren't valid Haskell.
Haskell isn't for everyone. But everyone should give it a try.
The way it was pitched to me, and what I think is true after having worked on learning on Haskell for a month now, is the fact that functional programming twists your brain in interesting ways: it forces you to think about familiar problems in different ways: instead of loops, think in maps and folds and filters, etc. In general, if you have more than one perspective on a problem, it makes you better enabled to reason about this problem, and switch viewpoints as necessary.
The other really neat thing about Haskell is its type system. It's strictly typed, but the type inference engine makes it feel like a Python program that magically tells you when you've done a stupid type-related mistake. Haskell's error messages in this regard are somewhat lacking, but as you get more acquainted with the language you'll say to yourself: this is what typing is supposed to be!
You are kind of asking the wrong question.
Haskell is not a language where you go look at a few cool examples and go "aha, I see now, that's what makes it good!"
It's more like, we have all these other programming languages, and they're all more or less similar, and then there's Haskell which is totally different and wacky in a way that's totally awesome once you get used to the wackiness. But the problem is, it takes quite a while to acclimate to the wackiness. Things that set Haskell apart from almost any other even-semi-mainstream language:
Lazy evaluation
No side effects (everything is pure, IO/etc happens via monads)
Incredibly expressive static type system
as well as some other aspects that are different from many mainstream languages (but shared by some):
functional
significant whitespace
type inferred
As some other posters have answered, the combination of all these features means that you think about programming in an entirely different way. And so it's hard to come up with an example (or set of examples) that adequately communicates this to Joe-mainstream-programmer. It's an experiential thing. (To make an analogy, I can show you photos of my 1970 trip to China, but after seeing the photos, you still won't know what it was like to have lived there during that time. Similarly, I can show you a Haskell 'quicksort', but you still won't know what it means to be a Haskeller.)
What really sets Haskell apart is the effort it goes to in its design to enforce functional programming. You can program in a functional style in pretty much any language, but it's all too easy to abandon at the first convenience. Haskell does not allow you to abandon functional programming, so you must take it to its logical conclusion, which is a final program that is easier to reason about, and sidesteps a whole class of the thorniest types of bugs.
When it comes to writing a program for real world use, you may find Haskell lacking in some practical fashion, but your final solution will be better for having known Haskell to begin with. I'm definitely not there yet, but so far learning Haskell has been much more enlightening than say, Lisp was in college.
Part of the fuss is that purity and static typing enable for parallelism combined with aggressive optimisations. Parallel languages are hot now with multicore being a bit disruptive.
Haskell gives you more options for parallelism than pretty much any general purpose language, along with a fast, native code compiler. There is really no competition with this kind of support for parallel styles:
semi-implicit parallelism via thread sparks
explicit threads
data parallel arrays
actors and message passing
transactional memory
So if you care about making your multicore work, Haskell has something to say.
A great place to start is with Simon Peyton Jones' tutorial on parallel and concurrent programming in Haskell.
I've spent the last year learning Haskell and writing a reasonably large and complex project in it. (The project is an automated options trading system, and everything from the trading algorithms to the parsing and handling of low-level, high-speed market data feeds is done in Haskell.) It's considerably more concise and easier to understand (for those with appropriate background) than a Java version would be, as well as extremely robust.
Possibly the biggest win for me has been the ability to modularize control flow through things such as monoids, monads, and so on. A very simple example would be the Ordering monoid; in an expression such as
c1 `mappend` c2 `mappend` c3
where c1 and so on return LT, EQ or GT, c1 returning EQ causes the expression to continue, evaluating c2; if c2 returns LT or GT that's the value of the whole, and c3 is not evaluated. This sort of thing gets considerably more sophisticated and complex in things like monadic message generators and parsers where I may be carrying around different types of state, have varying abort conditions, or may want to be able to decide for any particular call whether abort really means "no further processing" or means, "return an error at the end, but carry on processing to collect further error messages."
This is all stuff it takes some time and probably quite some effort to learn, and thus it can be hard to make a convincing argument for it for those who don't already know these techniques. I think that the All About Monads tutorial gives a pretty impressive demonstration of one facet of this, but I wouldn't expect that anybody not familiar with the material already would "get it" on the first, or even the third, careful reading.
Anyway, there's lots of other good stuff in Haskell as well, but this is a major one that I don't see mentioned so often, probably because it's rather complex.
Software Transactional Memory is a pretty cool way to deal with concurrency. It's much more flexible than message passing, and not deadlock prone like mutexes. GHC's implementation of STM is considered one of the best.
For an interesting example you can look at:
http://en.literateprograms.org/Quicksort_(Haskell)
What is interesting is to look at the implementation in various languages.
What makes Haskell so interesting, along with other functional languages, is the fact that you have to think differently about how to program. For example, you will generally not use for or while loops, but will use recursion.
As is mentioned above, Haskell and other functional languages excel with parallel processing and writing applications to work on multi-cores.
I couldn't give you an example, I'm an OCaml guy, but when I'm in such a situation as yourself, curiosity just takes hold and I have to download a compiler/interpreter and give it a go. You'll likely learn far more that way about the strengths and weaknesses of a given functional language.
One thing I find very cool when dealing with algorithms or mathematical problems is Haskell's inherent lazy evaluation of computations, which is only possible due to its strict functional nature.
For example, if you want to calculate all primes, you could use
primes = sieve [2..]
where sieve (p:xs) = p : sieve [x | x<-xs, x `mod` p /= 0]
and the result is actually an infinite list. But Haskell will evaluate it left from right, so as long as you don't try to do something that requires the entire list, you can can still use it without the program getting stuck in infinity, such as:
foo = sum $ takeWhile (<100) primes
which sums all primes less than 100. This is nice for several reasons. First of all, I only need to write one prime function that generates all primes and then I'm pretty much ready to work with primes. In an object-oriented programming language, I would need some way to tell the function how many primes it should compute before returning, or emulate the infinite list behavior with an object. Another thing is that in general, you end up writing code that expresses what you want to compute and not in which order to evaluate things - instead the compiler does that for you.
This is not only useful for infinite lists, in fact it gets used without you knowing it all the time when there is no need to evaluate more than necessary.
I find that for certain tasks I am incredibly productive with Haskell.
The reason is because of the succinct syntax and the ease of testing.
This is what the function declaration syntax is like:
foo a = a + 5
That's is simplest way I can think of defining a function.
If I write the inverse
inverseFoo a = a - 5
I can check that it is an inverse for any random input by writing
prop_IsInverse :: Double -> Bool
prop_IsInverse a = a == (inverseFoo $ foo a)
And calling from the command line
jonny#ubuntu: runhaskell quickCheck +names fooFileName.hs
Which will check that all the properties in my file are held, by randomly testing inputs a hundred times of so.
I don't think Haskell is the perfect language for everything, but when it comes to writing little functions and testing, I haven't seen anything better. If your programming has a mathematical component this is very important.
I agree with others that seeing a few small examples is not the best way to show off Haskell. But I'll give some anyway. Here's a lightning-fast solution to Euler Project problems 18 and 67, which ask you to find the maximum-sum path from the base to the apex of a triangle:
bottomUp :: (Ord a, Num a) => [[a]] -> a
bottomUp = head . bu
where bu [bottom] = bottom
bu (row : base) = merge row $ bu base
merge [] [_] = []
merge (x:xs) (y1:y2:ys) = x + max y1 y2 : merge xs (y2:ys)
Here is a complete, reusable implementation of the BubbleSearch algorithm by Lesh and Mitzenmacher. I used it to pack large media files for archival storage on DVD with no waste:
data BubbleResult i o = BubbleResult { bestResult :: o
, result :: o
, leftoverRandoms :: [Double]
}
bubbleSearch :: (Ord result) =>
([a] -> result) -> -- greedy search algorithm
Double -> -- probability
[a] -> -- list of items to be searched
[Double] -> -- list of random numbers
[BubbleResult a result] -- monotone list of results
bubbleSearch search p startOrder rs = bubble startOrder rs
where bubble order rs = BubbleResult answer answer rs : walk tries
where answer = search order
tries = perturbations p order rs
walk ((order, rs) : rest) =
if result > answer then bubble order rs
else BubbleResult answer result rs : walk rest
where result = search order
perturbations :: Double -> [a] -> [Double] -> [([a], [Double])]
perturbations p xs rs = xr' : perturbations p xs (snd xr')
where xr' = perturb xs rs
perturb :: [a] -> [Double] -> ([a], [Double])
perturb xs rs = shift_all p [] xs rs
shift_all p new' [] rs = (reverse new', rs)
shift_all p new' old rs = shift_one new' old rs (shift_all p)
where shift_one :: [a] -> [a] -> [Double] -> ([a]->[a]->[Double]->b) -> b
shift_one new' xs rs k = shift new' [] xs rs
where shift new' prev' [x] rs = k (x:new') (reverse prev') rs
shift new' prev' (x:xs) (r:rs)
| r <= p = k (x:new') (prev' `revApp` xs) rs
| otherwise = shift new' (x:prev') xs rs
revApp xs ys = foldl (flip (:)) ys xs
I'm sure this code looks like random gibberish. But if you read Mitzenmacher's blog entry and understand the algorithm, you'll be amazed that it's possible to package the algorithm into code without saying anything about what you're searching for.
Having given you some examples as you asked for, I will say that the best way to start to appreciate Haskell is to read the paper that gave me the ideas I needed to write the DVD packer: Why Functional Programming Matters by John Hughes. The paper actually predates Haskell, but it brilliantly explains some of the ideas that make people like Haskell.
For me, the attraction of Haskell is the promise of compiler guaranteed correctness. Even if it is for pure parts of the code.
I have written a lot of scientific simulation code, and have wondered so many times if there was a bug in my prior codes, which could invalidate a lot of current work.
it has no loop constructs. not many languages have this trait.
If you can wrap your head around the type system in Haskell I think that in itself is quite an accomplishment.
I agree with those that said that functional programming twists your brain into seeing programming from a different angle. I've only used it as a hobbyist, but I think it fundamentally changed the way I approach a problem. I don't think I would have been nearly as effective with LINQ without having been exposed to Haskell (and using generators and list comprehensions in Python).
To air a contrarian view: Steve Yegge writes that Hindely-Milner languages lack the flexibility required to write good systems:
H-M is very pretty, in a totally
useless formal mathematical sense. It
handles a few computation constructs
very nicely; the pattern matching
dispatch found in Haskell, SML and
OCaml is particularly handy.
Unsurprisingly, it handles some other
common and highly desirable constructs
awkwardly at best, but they explain
those scenarios away by saying that
you're mistaken, you don't actually
want them. You know, things like, oh,
setting variables.
Haskell is worth learning, but it has its own weaknesses.