Related
I'm new to Haskell and understand that it is (basically) a pure functional language, which has the advantage that results to functions will not change across multiple evaluations. Given this, I'm puzzled by why I can't easily mark a function in such a way that its remembers the results of its first evaluation, and does not have to be evaluated again each time its value is required.
In Mathematica, for example, there is a simple idiom for accomplishing this:
f[x_]:=f[x]= ...
but in Haskell, the closest things I've found is something like
f' = (map f [0 ..] !!)
where f 0 = ...
f n = f' ...
which in addition to being far less clear (and apparently limited to Int arguments?) does not (seem to) preserve results within an interactive session.
Admittedly (and clearly), I don't understand exactly what's going on here; but naively, it seems like Haskel should have some way, at the function definition level, of
taking advantage of the fact that its functions are functions and skipping re-computation of their results once they have been computed, and
indicating a desire to do this at the function definition level with a simple and clean idiom.
Is there a way to accomplish this in Haskell that I'm missing? I understand (sort of) that Haskell can't store the evaluations as "state", but why can't it simply (in effect) redefine evaluated functions to be their computed value?
This grows out of this question, in which lack of this feature results in terrible performance.
Use a suitable library, such as MemoTrie.
import Data.MemoTrie
f' = memo f
where f 0 = ...
f n = f' ...
That's hardly less nice than the Mathematica version, is it?
Regarding
“why can't it simply (in effect) redefine evaluated functions to be their computed value?”
Well, it's not so easy in general. These values have to be stored somewhere. Even for an Int-valued function, you can't just allocate an array with all possible values – it wouldn't fit in memory. The list solution only works because Haskell is lazy and therefore allows infinite lists, but that's not particularly satisfying since lookup is O(n).
For other types it's simply hopeless – you'd need to somehow diagonalise an over-countably infinite domain.
You need some cleverer organisation. I don't know how Mathematica does this, but it probably uses a lot of “proprietary magic”. I wouldn't be so sure that it does really work the way you'd like, for any inputs.
Haskell fortunately has type classes, and these allow you to express exactly what a type needs in order to be quickly memoisable. HasTrie is such a class.
I'm starting to learn functional programming language like Haskell, ML and most of the exercises will show off things like:
foldr (+) 0 [ 1 ..10]
which is equivalent to
sum = 0
for( i in [1..10] )
sum += i
So that leads me to think why can't compiler know that this is Arithmetic Progression and use O(1) formula to calculate?
Especially for pure FP languages without side effect?
The same applies for
sum reverse list == sum list
Given a + b = b + a
and definition of reverse, can compilers/languages prove it automatically?
Compilers generally don't try to prove this kind of thing automatically, because it's hard to implement.
As well as adding the logic to the compiler to transform one fragment of code into another, you have to be very careful that it only tries to do it when it's actually safe - i.e. there are often lots of "side conditions" to worry about. For example in your example above, someone might have written an instance of the type class Num (and hence the (+) operator) where the a + b is not b + a.
However, GHC does have rewrite rules which you can add to your own source code and could be used to cover some relatively simple cases like the ones you list above, particularly if you're not too bothered about the side conditions.
For example, and I haven't tested this, you might use the following rule for one of your examples above:
{-# RULES
"sum/reverse" forall list . sum (reverse list) = sum list
#-}
Note the parentheses around reverse list - what you've written in your question actually means (sum reverse) list and wouldn't typecheck.
EDIT:
As you're looking for official sources and pointers to research, I've listed a few.
Obviously it's hard to prove a negative but the fact that no-one has given an example of a general-purpose compiler that does this kind of thing routinely is probably quite strong evidence in itself.
As others have pointed out, even simple arithmetic optimisations are surprisingly dangerous, particularly on floating point numbers, and compilers generally have flags to turn them off - for example Visual C++, gcc. Even integer arithmetic isn't always clear-cut and people occasionally have big arguments about how to deal with things like overflow.
As Joachim noted, integer variables in loops are one place where slightly more sophisticated optimisations are applied because there are actually significant wins to be had. Muchnick's book is probably the best general source on the topic but it's not that cheap. The wikipedia page on strength reduction is probably as good an introduction as any to one of the standard optimisations of this kind, and has some references to the relevant literature.
FFTW is an example of a library that does all kinds of mathematical optimization internally. Some of its code is generated by a customised compiler the authors wrote specifically for the purpose. It's worthwhile because the authors have domain-specific knowledge of optimizations that in the specific context of the library are both worth the effort and safe
People sometimes use template metaprogramming to write "self-optimising libraries" that again might rely on arithmetic identities, see for example Blitz++. Todd Veldhuizen's PhD dissertation has a good overview.
If you descend into the realms of toy and academic compilers all sorts of things go. For example my own PhD dissertation is about writing inefficient functional programs along with little scripts that explain how to optimise them. Many of the examples (see Chapter 6) rely on applying arithmetic rules to justify the underlying optimisations.
Also, it's worth emphasising that the last few examples are of specialised optimisations being applied only to certain parts of the code (e.g. calls to specific libraries) where it is expected to be worthwhile. As other answers have pointed out, it's simply too expensive for a compiler to go searching for all possible places in an entire program where an optimisation might apply. The GHC rewrite rules that I mentioned above are a great example of a compiler exposing a generic mechanism for individual libraries to use in a way that's most appropriate for them.
The answer
No, compilers don’t do that kind of stuff.
One reason why
And for your examples, it would even be wrong: Since you did not give type annotations, the Haskell compiler will infer the most general type, which would be
foldr (+) 0 [ 1 ..10] :: Num a => a
and similar
(\list -> sum (reverse list)) :: Num a => [a] -> a
and the Num instance for the type that is being used might well not fulfil the mathematical laws required for the transformation you suggest. The compiler should, before everything else, avoid to change the meaning (i.e. the semantics) of your program.
More pragmatically: The cases where the compiler could detect such large-scale transformations rarely occur in practice, so it would not be worth it to implement them.
An exception
Note notable exceptions are linear transformations in loops. Most compilers will rewrite
for (int i = 0; i < n; i++) {
... 200 + 4 * i ...
}
to
for (int i = 0, j = 200; i < n; i++, j += 4) {
... j ...
}
or something similar, as that pattern does often occur in code working on array.
The optimizations you have in mind will probably not be done even in the presence of monomorphic types, because there are so many possibilities and so much knowledge required. For example, in this example:
sum list == sum (reverse list)
The compiler would need to know or take into account the following facts:
sum = foldl (+) 0
(+) is commutative
reverse list is a permutation of list
foldl x c l, where x is commutative and c is a constant, yields the same result for all permutations of l.
This all seems trivial. Sure, the compiler can most probably look up the definition of sumand inline it. It could be required that (+) be commutative, but remember that +is just another symbol without attached meaning to the compiler. The third point would require the compiler to prove some non trivial properties about reverse.
But the point is:
You don't want to perform the compiler to do those calculations with each and every expression. Remember, to make this really useful, you'd have to heap up a lot of knowledge about many, many standard functions and operators.
You still can't replace the expression above with True unless you can rule out the possibility that list or some list element is bottom. Usually, one cannot do this. You can't even do the following "trivial" optimization of f x == f x in all cases
f x `seq` True
For, consider
f x = (undefined :: Bool, x)
then
f x `seq` True ==> True
f x == f x ==> undefined
That being said, regarding your first example slightly modified for monomorphism:
f n = n * foldl (+) 0 [1..10] :: Int
it is imaginable to optimize the program by moving the expression out of its context and replace it with the name of a constant, like so:
const1 = foldl (+) 0 [1..10] :: Int
f n = n * const1
This is because the compiler can see that the expression must be constant.
What you're describing looks like super-compilation. In your case, if the expression had a monomorphic type like Int (as opposed to polymorphic Num a => a), the compiler could infer that the expression foldr (+) 0 [1 ..10] has no external dependencies, therefore it could be evaluated at compile time and replaced by 55. However, AFAIK no mainstream compiler currently does this kind of optimization.
(In functional programming "proving" is usually associated with something different. In languages with dependent types types are powerful enough to express complex proposition and then through the Curry-Howard correspondence programs become proofs of such propositions.)
As others have noted, it's unclear that your simplifications even hold in Haskell. For instance, I can define
newtype NInt = N Int
instance Num NInt where
N a + _ = N a
N b * _ = N b
... -- etc
and now sum . reverse :: Num [a] -> a does not equal sum :: Num [a] -> a since I can specialize each to [NInt] -> NInt where sum . reverse == sum clearly does not hold.
This is one general tension that exists around optimizing "complex" operations—you actually need quite a lot of information in order to successfully prove that it's okay to optimize something. This is why the syntax-level compiler optimization which do exist are usually monomorphic and related to the structure of programs---it's usually such a simplified domain that there's "no way" for the optimization to go wrong. Even that is often unsafe because the domain is never quite so simplified and well-known to the compiler.
As an example, a very popular "high-level" syntactic optimization is stream fusion. In this case the compiler is given enough information to know that stream fusion can occur and is basically safe, but even in this canonical example we have to skirt around notions of non-termination.
So what does it take to have \x -> sum [0..x] get replaced by \x -> x*(x + 1)/2? The compiler would need a theory of numbers and algebra built-in. This is not possible in Haskell or ML, but becomes possible in dependently typed languages like Coq, Agda, or Idris. There you could specify things like
revCommute :: (_+_ :: a -> a -> a)
-> Commutative _+_
-> foldr _+_ z (reverse as) == foldr _+_ z as
and then, theoretically, tell the compiler to rewrite according to revCommute. This would still be difficult and finicky, but at least we'd have enough information around. To be clear, I'm writing something very strange above, a dependent type. The type not only depends on the ability to introduce both a type and a name for the argument inline, but also the existence of the entire syntax of your language "at the type level".
There are a lot of differences between what I just wrote and what you'd do in Haskell, though. First, in order to form a basis where such promises can be taken seriously, we must throw away general recursion (and thus we already don't have to worry about questions of non-termination like stream-fusion does). We also must have enough structure around to create something like the promise Commutative _+_---this likely depends upon there being an entire theory of operators and mathematics built into the language's standard library else you would need to create that yourself. Finally, the richness of type system required to even express these kinds of theories adds a lot of complexity to the entire system and tosses out type inference as you know it today.
But, given all that structure, I'd never be able to create an obligation Commutative _+_ for the _+_ defined to work on NInts and so we could be certain that foldr (+) 0 . reverse == foldr (+) 0 actually does hold.
But now we'd need to tell the compiler how to actually perform that optimization. For stream-fusion, the compiler rules only kick in when we write something in exactly the right syntactic form to be "clearly" an optimization redex. The same kinds of restrictions would apply to our sum . reverse rule. In fact, already we're sunk because
foldr (+) 0 . reverse
foldr (+) 0 (reverse as)
don't match. They're "obviously" the same due to some rules we could prove about (.), but that means that now the compiler must invoke two built-in rules in order to perform our optimization.
At the end of the day, you need a very smart optimization search over the sets of known laws in order to achieve the kinds of automatic optimizations you're talking about.
So not only do we add a lot of complexity to the entire system, require a lot of base work to build-in some useful algebraic theories, and lose Turing completeness (which might not be the worst thing), we also only get a finicky promise that our rule would even fire unless we perform an exponentially painful search during compilation.
Blech.
The compromise that exists today tends to be that sometimes we have enough control over what's being written to be mostly certain that a certain obvious optimization can be performed. This is the regime of stream fusion and it requires a lot of hidden types, carefully written proofs, exploitations of parametricity, and hand-waving before it's something the community trusts enough to run on their code.
And it doesn't even always fire. For an example of battling that problem take a look at the source of Vector for all of the RULES pragmas that specify all of the common circumstances where Vector's stream-fusion optimizations should kick in.
All of this is not at all a critique of compiler optimizations or dependent type theories. Both are really incredible. Instead it's just an amplification of the tradeoffs involved in introducing such an optimization. It's not to be done lightly.
Fun fact: Given two arbitrary formulas, do they both give the same output for the same inputs? The answer to this trivial question is not computable! In other words, it is mathematically impossible to write a computer program that always gives the correct answer in finite time.
Given this fact, it's perhaps not surprising that nobody has a compiler that can magically transform every possible computation into its most efficient form.
Also, isn't this the programmer's job? If you want the sum of an arithmetic sequence commonly enough that it's a performance bottleneck, why not just write some more efficient code yourself? Similarly, if you really want Fibonacci numbers (why?), use the O(1) algorithm.
I have been reading many articles trying to understand the difference between functional and logic programming, but the only deduction I have been able to make so far is that logic programming defines programs through mathematical expressions. But such a thing is not associated with logic programming.
I would really appreciate some light being shed on the difference between functional and logic programming.
I wouldn't say that logic programming defines programs through mathematical expressions; that sounds more like functional programming. Logic programming uses logic expressions (well, eventually logic is math).
In my opinion, the major difference between functional and logic programming is the "building blocks": functional programming uses functions while logic programming uses predicates. A predicate is not a function; it does not have a return value. Depending on the value of it's arguments it may be true or false; if some values are undefined it will try to find the values that would make the predicate true.
Prolog in particular uses a special form of logic clauses named Horn clauses that belong to first order logic; Hilog uses clauses of higher order logic.
When you write a prolog predicate you are defining a horn clause:
foo :- bar1, bar2, bar3. means that foo is true if bar1, bar2 and bar3 is true.
note that I did not say if and only if; you can have multiple clauses for one predicate:
foo:-
bar1.
foo:-
bar2.
means that foo is true if bar1 is true or if bar2 is true
Some say that logic programming is a superset of functional programming since each function could be expressed as a predicate:
foo(x,y) -> x+y.
could be written as
foo(X, Y, ReturnValue):-
ReturnValue is X+Y.
but I think that such statements are a bit misleading
Another difference between logic and functional is backtracking. In functional programming once you enter the body of the function you cannot fail and move to the next definition. For example you can write
abs(x) ->
if x>0 x else -x
or even use guards:
abs(x) x>0 -> x;
abs(x) x=<0 -> -x.
but you cannot write
abs(x) ->
x>0,
x;
abs(x) ->
-x.
on the other hand, in Prolog you could write
abs(X, R):-
X>0,
R is X.
abs(X, R):-
R is -X.
if then you call abs(-3, R), Prolog would try the first clause, and fail when the execution reaches the -3 > 0 point but you wont get an error; Prolog will try the second clause and return R = 3.
I do not think that it is impossible for a functional language to implement something similar (but I haven't used such a language).
All in all, although both paradigms are considered declarative, they are quite different; so different that comparing them feels like comparing functional and imperative styles. I would suggest to try a bit of logic programming; it should be a mind-boggling experience. However, you should try to understand the philosophy and not simply write programs; Prolog allows you to write in functional or even imperative style (with monstrous results).
In a nutshell:
In functional programming, your program is a set of function definitions. The return value for each function is evaluated as a mathematical expression, possibly making use of passed arguments and other defined functions. For example, you can define a factorial function, which returns a factorial of a given number:
factorial 0 = 1 // a factorial of 0 is 1
factorial n = n * factorial (n - 1) // a factorial of n is n times factorial of n - 1
In logic programming, your program is a set of predicates. Predicates are usually defined as sets of clauses, where each clause can be defined using mathematical expressions, other defined predicates, and propositional calculus. For example, you can define a 'factorial' predicate, which holds whenever second argument is a factorial of first:
factorial(0, 1). // it is true that a factorial of 0 is 1
factorial(X, Y) :- // it is true that a factorial of X is Y, when all following are true:
X1 is X - 1, // there is a X1, equal to X - 1,
factorial(X1, Z), // and it is true that factorial of X1 is Z,
Y is Z * X. // and Y is Z * X
Both styles allow using mathematical expressions in the programs.
First, there are a lot of commonalities between functional and logic programming. That is, a lot of notions developed in one community can also be used in the other. Both paradigms started with rather crude implementations and strive towards purity.
But you want to know the differences.
So I will take Haskell on the one side and Prolog with constraints on the other. Practically all current Prolog systems offer constraints of some sort, like B, Ciao, ECLiPSe, GNU, IF, Scryer, SICStus, SWI, YAP, XSB. For the sake of the argument, I will use a very simple constraint dif/2 meaning inequality, which was present even in the very first Prolog implementation - so I will not use anything more advanced than that.
What functional programming is lacking
The most fundamental difference revolves around the notion of a variable. In functional programming a variable denotes a concrete value. This value must not be entirely defined, but only those parts that are defined can be used in computations. Consider in Haskell:
> let v = iterate (tail) [1..3]
> v
[[1,2,3],[2,3],[3],[],*** Exception: Prelude.tail: empty list
After the 4th element, the value is undefined. Nevertheless, you can use the first 4 elements safely:
> take 4 v
[[1,2,3],[2,3],[3],[]]
Note that the syntax in functional programs is cleverly restricted to avoid that a variable is left undefined.
In logic programming, a variable does not need to refer to a concrete value. So, if we want a list of 3 elements, we might say:
?- length(Xs,3).
Xs = [_A,_B,_C].
In this answer, the elements of the list are variables. All possible instances of these variables are valid solutions. Like Xs = [1,2,3]. Now, lets say that the first element should be different to the remaining elements:
?- length(Xs,3), Xs = [X|Ys], maplist(dif(X), Ys).
Xs = [X,_A,_B], Ys = [_A,_B], dif(X,_B), dif(X,_A).
Later on, we might demand that the elements in Xs are all equal. Before I write it out, I will try it alone:
?- maplist(=(_),Xs).
Xs = []
; Xs = [_A]
; Xs = [_A,_A]
; Xs = [_A,_A,_A]
; Xs = [_A,_A,_A,_A]
; ... .
See that the answers contain always the same variable? Now, I can combine both queries:
?- length(Xs,3), Xs = [X|Ys], maplist(dif(X), Ys), maplist(=(_),Xs).
false.
So what we have shown here is that there is no 3 element list where the first element is different to the other elements and all elements are equal.
This generality has permitted to develop several constraint languages which are offered as libraries to Prolog systems, the most prominent are CLPFD and CHR.
There is no straight forward way to get similar functionality in functional programming. You can emulate things, but the emulation isn't quite the same.
What logic programming is lacking
But there are many things that are lacking in logic programming that make functional programming so interesting. In particular:
Higher-order programming: Functional programming has here a very long tradition and has developed a rich set of idioms. For Prolog, the first proposals date back to the early 1980s, but it is still not very common. At least ISO Prolog has now the homologue to apply called call/2, call/3 ....
Lambdas: Again, it is possible to extend logic programming in that direction, the most prominent system is Lambda Prolog. More recently, lambdas have been developed also for ISO Prolog.
Type systems: There have been attempts, like Mercury, but it has not caught on that much. And there is no system with functionality comparable to type classes.
Purity: Haskell is entirely pure, a function Integer -> Integer is a function. No fine print lurking around. And still you can perform side effects. Comparable approaches are very slowly evolving.
There are many areas where functional and logic programming more or less overlap. For example backtracking and lazyness and list comprehensions, lazy evaluation and freeze/2, when/2, block. DCGs and monads. I will leave discussing these issues to others...
Logic programming and functional programming use different "metaphors" for computation. This often affects how you think about producing a solution, and sometimes means that different algorithms come naturally to a functional programmer than a logic programmer.
Both are based on mathematical foundations that provide more benefits for "pure" code; code that doesn't operate with side effects. There are languages for both paradigms that enforce purity, as well as languages that allow unconstrained side effects, but culturally the programmers for such languages tend to still value purity.
I'm going to consider append, a fairly basic operation in both logical and functional programming, for appending a list on to the end of another list.
In functional programming, we might consider append to be something like this:
append [] ys = ys
append (x:xs) ys = x : append xs ys
While in logic programming, we might consider append to be something like this:
append([], Ys, Ys).
append([X|Xs], Ys, [X|Zs]) :- append(Xs, Ys, Zs).
These implement the same algorithm, and even work basically the same way, but they "mean" something very different.
The functional append defines the list that results from appending ys onto the end of xs. We think of append as a function from two lists to another list, and the runtime system is designed to calculate the result of the function when we invoke it on two lists.
The logical append defines a relationship between three lists, which is true if the third list is the elements of the first list followed by the elements of the second list. We think of append as a predicate that is either true or false for any 3 given lists, and the runtime system is designed to find values that will make this predicate true when we invoke it with some arguments bound to specific lists and some left unbound.
The thing that makes logical append different is you can use it to compute the list that results from appending one list onto another, but you can also use it to compute the list you'd need to append onto the end of another to get a third list (or whether no such list exists), or to compute the list to which you need to append another to get a third list, or to give you two possible lists that can be appended together to get a given third (and to explore all possible ways of doing this).
While equivalent in that you can do anything you can do in one in the other, they lead to different ways of thinking about your programming task. To implement something in functional programming, you think about how to produce your result from the results of other function calls (which you may also have to implement). To implement something in logic programming, you think about what relationships between your arguments (some of which are input and some of which are output, and not necessarily the same ones from call to call) will imply the desired relationship.
Prolog, being a logical language, gives you free backtracking, it's pretty noticeable.
To elaborate, and I precise that I'm in no way expert in any of the paradigms, it looks to me like logical programming is way better when it comes to solving things. Because that's precisely what the language does (that appears clearly when backtracking is needed for example).
I think the difference is this:
imperative programming=modelling actions
function programming=modelling reasoning
logic programming =modelling knowledge
choose what fits your mind best
functional programming:
when 6PM, light on.
logic programming:
when dark, light on.
I'm currently digesting the nice presentation Why learn Haskell? by Keegan McAllister. There he uses the snippet
minimum = head . sort
as an illustration of Haskell's lazy evaluation by stating that minimum has time-complexity O(n) in Haskell. However, I think the example is kind of academic in nature. I'm therefore asking for a more practical example where it's not trivially apparent that most of the intermediate calculations are thrown away.
Have you ever written an AI? Isn't it annoying that you have to thread pruning information (e.g. maximum depth, the minimum cost of an adjacent branch, or other such information) through the tree traversal function? This means you have to write a new tree traversal every time you want to improve your AI. That's dumb. With lazy evaluation, this is no longer a problem: write your tree traversal function once, to produce a huge (maybe even infinite!) game tree, and let your consumer decide how much of it to consume.
Writing a GUI that shows lots of information? Want it to run fast anyway? In other languages, you might have to write code that renders only the visible scenes. In Haskell, you can write code that renders the whole scene, and then later choose which pixels to observe. Similarly, rendering a complicated scene? Why not compute an infinite sequence of scenes at various detail levels, and pick the most appropriate one as the program runs?
You write an expensive function, and decide to memoize it for speed. In other languages, this requires building a data structure that tracks which inputs for the function you know the answer to, and updating the structure as you see new inputs. Remember to make it thread safe -- if we really need speed, we need parallelism, too! In Haskell, you build an infinite data structure, with an entry for each possible input, and evaluate the parts of the data structure that correspond to the inputs you care about. Thread safety comes for free with purity.
Here's one that's perhaps a bit more prosaic than the previous ones. Have you ever found a time when && and || weren't the only things you wanted to be short-circuiting? I sure have! For example, I love the <|> function for combining Maybe values: it takes the first one of its arguments that actually has a value. So Just 3 <|> Nothing = Just 3; Nothing <|> Just 7 = Just 7; and Nothing <|> Nothing = Nothing. Moreover, it's short-circuiting: if it turns out that its first argument is a Just, it won't bother doing the computation required to figure out what its second argument is.
And <|> isn't built in to the language; it's tacked on by a library. That is: laziness allows you to write brand new short-circuiting forms. (Indeed, in Haskell, even the short-circuiting behavior of (&&) and (||) aren't built-in compiler magic: they arise naturally from the semantics of the language plus their definitions in the standard libraries.)
In general, the common theme here is that you can separate the production of values from the determination of which values are interesting to look at. This makes things more composable, because the choice of what is interesting to look at need not be known by the producer.
Here's a well-known example I posted to another thread yesterday. Hamming numbers are numbers that don't have any prime factors larger than 5. I.e. they have the form 2^i*3^j*5^k. The first 20 of them are:
[1,2,3,4,5,6,8,9,10,12,15,16,18,20,24,25,27,30,32,36]
The 500000th one is:
1962938367679548095642112423564462631020433036610484123229980468750
The program that printed the 500000th one (after a brief moment of computation) is:
merge xxs#(x:xs) yys#(y:ys) =
case (x`compare`y) of
LT -> x:merge xs yys
EQ -> x:merge xs ys
GT -> y:merge xxs ys
hamming = 1 : m 2 `merge` m 3 `merge` m 5
where
m k = map (k *) hamming
main = print (hamming !! 499999)
Computing that number with reasonable speed in a non-lazy language takes quite a bit more code and head-scratching. There are a lot of examples here
Consider generating and consuming the first n elements of an infinite sequence. Without lazy evaluation, the naive encoding would run forever in the generation step, and never consume anything. With lazy evaluation, only as many elements are generated as the code tries to consume.
We all know (or should know) that Haskell is lazy by default. Nothing is evaluated until it must be evaluated. So when must something be evaluated? There are points where Haskell must be strict. I call these "strictness points", although this particular term isn't as widespread as I had thought. According to me:
Reduction (or evaluation) in Haskell only occurs at strictness points.
So the question is: what, precisely, are Haskell's strictness points? My intuition says that main, seq / bang patterns, pattern matching, and any IO action performed via main are the primary strictness points, but I don't really know why I know that.
(Also, if they're not called "strictness points", what are they called?)
I imagine a good answer will include some discussion about WHNF and so on. I also imagine it might touch on lambda calculus.
Edit: additional thoughts about this question.
As I've reflected on this question, I think it would be clearer to add something to the definition of a strictness point. Strictness points can have varying contexts and varying depth (or strictness). Falling back to my definition that "reduction in Haskell only occurs at strictness points", let us add to that definition this clause: "a strictness point is only triggered when its surrounding context is evaluated or reduced."
So, let me try to get you started on the kind of answer I want. main is a strictness point. It is specially designated as the primary strictness point of its context: the program. When the program (main's context) is evaluated, the strictness point of main is activated. Main's depth is maximal: it must be fully evaluated. Main is usually composed of IO actions, which are also strictness points, whose context is main.
Now you try: discuss seq and pattern matching in these terms. Explain the nuances of function application: how is it strict? How is it not? What about deepseq? let and case statements? unsafePerformIO? Debug.Trace? Top-level definitions? Strict data types? Bang patterns? Etc. How many of these items can be described in terms of just seq or pattern matching?
A good place to start is by understanding this paper: A Natural Semantics for Lazy Evalution (Launchbury). That will tell you when expressions are evaluated for a small language similar to GHC's Core. Then the remaining question is how to map full Haskell to Core, and most of that translation is given by the Haskell report itself. In GHC we call this process "desugaring", because it removes syntactic sugar.
Well, that's not the whole story, because GHC includes a whole raft of optimisations between desugaring and code generation, and many of these transformations will rearrange the Core so that things get evaluated at different times (strictness analysis in particular will cause things to be evaluated earlier). So to really understand how your
program will be evaluated, you need to look at the Core produced by GHC.
Perhaps this answer seems a bit abstract to you (I didn't specifically mention bang patterns or seq), but you asked for something precise, and this is about the best we can do.
I would probably recast this question as, Under what circumstances will Haskell evaluate an expression? (Perhaps tack on a "to weak head normal form.")
To a first approximation, we can specify this as follows:
Executing IO actions will evaluate any expressions that they “need.” (So you need to know if the IO action is executed, e.g. it's name is main, or it is called from main AND you need to know what the action needs.)
An expression that is being evaluated (hey, that's a recursive definition!) will evaluate any expressions it needs.
From your intuitive list, main and IO actions fall into the first category, and seq and pattern matching fall into the second category. But I think that the first category is more in line with your idea of "strictness point", because that is in fact how we cause evaluation in Haskell to become observable effects for users.
Giving all of the details specifically is a large task, since Haskell is a large language. It's also quite subtle, because Concurrent Haskell may evaluate things speculatively, even though we end up not using the result in the end: this is a third breed of things that cause evaluation. The second category is quite well studied: you want to look at the strictness of the functions involved. The first category too can be thought to be a sort of "strictness", though this is a little dodgy because evaluate x and seq x $ return () are actually different things! You can treat it properly if you give some sort of semantics to the IO monad (explicitly passing a RealWorld# token works for simple cases), but I don't know if there's a name for this sort of stratified strictness analysis in general.
C has the concept of sequence points, which are guarantees for particular operations that one operand will be evaluated before the other. I think that's the closest existing concept, but the essentially equivalent term strictness point (or possibly force point) is more in line with Haskell thinking.
In practice Haskell is not a purely lazy language: for instance pattern matching is usually strict (So trying a pattern match forces evaluation to happen at least far enough to accept or reject the match.
…
Programmers can also use the seq primitive to force an expression to evaluate regardless of whether the result will ever be used.
$! is defined in terms of seq.
—Lazy vs. non-strict.
So your thinking about !/$! and seq is essentially right, but pattern matching is subject to subtler rules. You can always use ~ to force lazy pattern matching, of course. An interesting point from that same article:
The strictness analyzer also looks for cases where sub-expressions are always required by the outer expression, and converts those into eager evaluation. It can do this because the semantics (in terms of "bottom") don't change.
Let's continue down the rabbit hole and look at the docs for optimisations performed by GHC:
Strictness analysis is a process by which GHC attempts to determine, at compile-time, which data definitely will 'always be needed'. GHC can then build code to just calculate such data, rather than the normal (higher overhead) process for storing up the calculation and executing it later.
—GHC Optimisations: Strictness Analysis.
In other words, strict code may be generated anywhere as an optimisation, because creating thunks is unnecessarily expensive when the data will always be needed (and/or may only be used once).
…no more evaluation can be performed on the value; it is said to be in normal form. If we are at any of the intermediate steps so that we've performed at least some evaluation on a value, it is in weak head normal form (WHNF). (There is also a 'head normal form', but it's not used in Haskell.) Fully evaluating something in WHNF reduces it to something in normal form…
—Wikibooks Haskell: Laziness
(A term is in head normal form if there is no beta-redex in head position1. A redex is a head redex if it is preceded only by lambda abstractors of non-redexes 2.) So when you start to force a thunk, you're working in WHNF; when there are no more thunks left to force, you're in normal form. Another interesting point:
…if at some point we needed to, say, print z out to the user, we'd need to fully evaluate it…
Which naturally implies that, indeed, any IO action performed from main does force evaluation, which should be obvious considering that Haskell programs do, in fact, do things. Anything that needs to go through the sequence defined in main must be in normal form and is therefore subject to strict evaluation.
C. A. McCann got it right in the comments, though: the only thing that's special about main is that main is defined as special; pattern matching on the constructor is sufficient to ensure the sequence imposed by the IO monad. In that respect only seq and pattern-matching are fundamental.
Haskell is AFAIK not a pure lazy language, but rather a non-strict language. This means that it does not necessarily evaluate terms at the last possible moment.
A good source for haskell's model of "lazyness" can be found here: http://en.wikibooks.org/wiki/Haskell/Laziness
Basically, it is important to understand the difference between a thunk and the weak header normal form WHNF.
My understanding is that haskell pulls computations through backwards as compared to imperative languages. What this means is that in the absence of "seq" and bang patterns, it will ultimately be some kind of side effect that forces the evaluation of a thunk, which may cause prior evaluations in turn (true lazyness).
As this would lead to a horrible space leak, the compiler then figures out how and when to evaluate thunks ahead of time to save space. The programmer can then support this process by giving strictness annotations (en.wikibooks.org/wiki/Haskell/Strictness , www.haskell.org/haskellwiki/Performance/Strictness) to further reduce space usage in form of nested thunks.
I am not an expert in the operational semantics of haskell, so I will just leave the link as a resource.
Some more resources:
http://www.haskell.org/haskellwiki/Performance/Laziness
http://www.haskell.org/haskellwiki/Haskell/Lazy_Evaluation
Lazy doesn't mean do nothing. Whenever your program pattern matches a case expression, it evaluates something -- just enough anyway. Otherwise it can't figure out which RHS to use. Don't see any case expressions in your code? Don't worry, the compiler is translating your code to a stripped down form of Haskell where they are hard to avoid using.
For a beginner, a basic rule of thumb is let is lazy, case is less lazy.
This is not a full answer aiming for karma, but just a piece of the puzzle -- to the extent that this is about semantics, bear in mind that there are multiple evaluation strategies that provide the same semantics. One good example here -- and the project also speaks to how we typically think of Haskell semantics -- was the Eager Haskell project, which radically altered evaluation strategies while maintaining the same semantics: http://csg.csail.mit.edu/pubs/haskell.html
The Glasgow Haskell compiler translates your code into a Lambda-calculus-like language called core. In this language, something is going to be evaluated, whenever you pattern match it by a case-statement. Thus if a function is called, the outermost constructor and only it (if there are no forced fields) is going to be evaluated. Anything else is canned in a thunk. (Thunks are introduced by let bindings).
Of course this is not exactly what happens in the real language. The compiler convert Haskell into Core in a very sophisticated way, making as many things as possibly lazy and anything that is always needed lazy. Additionally, there are unboxed values and tuples that are always strict.
If you try to evaluate a function by hand, you can basically think:
Try to evaluate the outermost constructor of the return.
If anything else is needed to get the result (but only if it's really needed) is also going to be evaluated. The order doesn't matters.
In case of IO you have to evaluate the results of all statements from the first to the last in that. This is a bit more complicated, since the IO monad does some tricks to force evaluation in a specific order.
We all know (or should know) that Haskell is lazy by default. Nothing is evaluated until it must be evaluated.
No.
Haskell is not a lazy language
Haskell is a language in which evaluation order doesn't matter because there are no side effects.
It's not quite true that evaluation order doesn't matter, because the language allows for infinite loops. If you aren't careful, it's possible to get stuck in a cul-de-sac where you evaluate a subexpression forever when a different evaluation order would have led to termination in finite time. So it's more accurate to say:
Haskell implementations must evaluate the program in a way that terminates if there is any evaluation order that terminates. Only if every possible evaluation order fails to terminate can the implementation fail to terminate.
This still leaves implementations with a huge freedom in how they evaluate the program.
A Haskell program is a single expression, namely let {all top-level bindings} in Main.main. Evaluation can be understood as a sequence of reduction (small-)steps which change the expression (which represents the current state of the executing program).
You can divide reduction steps into two categories: those that are provably necessary (provably will be part of any terminating sequence of reductions), and those that aren't. You can divide the provably necessary reductions somewhat vaguely into two subcategories: those that are "obviously" necessary, and those that require some nontrivial analysis to prove them necessary.
Performing only obviously necessary reductions is what's called "lazy evaluation". I don't know whether a purely lazy evaluating implementation of Haskell has ever existed. Hugs may have been one. GHC definitely isn't.
GHC performs reduction steps at compile time that aren't provably necessary; for example, it will replace 1+2::Int with 3::Int even if it can't prove that the result will be used.
GHC may also perform not-provably-necessary reductions at run time in some circumstances. For example, when generating code to evaluate f (x+y), if x and y are of type Int and their values will be known at run time, but f can't be proven to use its argument, there is no reason not to compute x+y before calling f. It uses less heap space and less code space and is probably faster even if the argument isn't used. However, I don't know whether GHC actually takes these sorts of optimization opportunities.
GHC definitely performs evaluation steps at run time that are proven necessary only by fairly complex cross-module analyses. This is extremely common and may represent the bulk of the evaluation of realistic programs. Lazy evaluation is a last-resort fallback evaluation strategy; it isn't what happens as a rule.
There was an "optimistic evaluation" branch of GHC that did much more speculative evaluation at run time. It was abandoned because of its complexity and the ongoing maintenance burden, not because it didn't perform well. If Haskell was as popular as Python or C++ then I'm sure there would be implementations with much more sophisticated runtime evaluation strategies, maintained by deep-pocketed corporations. Non-lazy evaluation isn't a change to the language, it's just an engineering challenge.
Reduction is driven by top-level I/O, and nothing else
You can model interaction with the outside world by means of special side-effectful reduction rules like: "If the current program is of the form getChar >>= <expr>, then get a character from standard input and reduce the program to <expr> applied to the character you got."
The entire goal of the run time system is to evaluate the program until it has one of these side-effecting forms, then do the side effect, then repeat until the program has some form that implies termination, such as return ().
There are no other rules about what is reduced when. There are only rules about what can reduce to what.
For example, the only rules for if expressions are that if True then <expr1> else <expr2> can be reduced to <expr1>, if False then <expr1> else <expr2> can be reduced to <expr2>, and if <exc> then <expr1> else <expr2>, where <exc> is an exceptional value, can be reduced to an exceptional value.
If the expression representing the current state of your program is an if expression, you have no choice but to perform reductions on the condition until it's True or False or <exc>, because that's the only way you'll ever get rid of the if expression and have any hope of reaching a state that matches one of the I/O rules. But the language specification doesn't tell you to do that in so many words.
These sorts of implicit ordering constraints are the only way that evaluation can be "forced" to happen. This is a frequent source of confusion for beginners. For example, people sometimes try to make foldl more strict by writing foldl (\x y -> x `seq` x+y) instead of foldl (+). This doesn't work, and nothing like it can ever work, because no expression can make itself evaluate. The evaluation can only "come from above". seq is not special in any way in this regard.
Reduction happens everywhere
Reduction (or evaluation) in Haskell only occurs at strictness points. [...] My intuition says that main, seq / bang patterns, pattern matching, and any IO action performed via main are the primary strictness points [...].
I don't see how to make sense of that statement. Every part of the program has some meaning, and that meaning is defined by reduction rules, so reduction happens everywhere.
To reduce a function application <expr1> <expr2>, you have to evaluate <expr1> until it has a form like (\x -> <expr1'>) or (getChar >>=) or something else that matches a rule. But for some reason function application doesn't tend to show up on lists of expressions that allegedly "force evaluation", while case always does.
You can see this misconception in a quote from the Haskell wiki, found in another answer:
In practice Haskell is not a purely lazy language: for instance pattern matching is usually strict
I don't understand what could qualify as a "purely lazy language" to whoever wrote that, except, perhaps, a language in which every program hangs because the runtime never does anything. If pattern matching is a feature of your language then you've got to actually do it at some point. To do it, you have to evaluate the scrutinee enough to determine whether it matches the pattern. That's the laziest way to match a pattern that is possible in principle.
~-prefixed patterns are often called "lazy" by programmers, but the language spec calls them "irrefutable". Their defining property is that they always match. Because they always match, you don't have to evaluate the scrutinee to determine whether they match or not, so a lazy implementation won't. The difference between regular and irrefutable patterns is what expressions they match, not what evaluation strategy you're supposed to use. The spec says nothing about evaluation strategies.
main is a strictness point. It is specially designated as the primary strictness point of its context: the program. When the program (main's context) is evaluated, the strictness point of main is activated. [...] Main is usually composed of IO actions, which are also strictness points, whose context is main.
I'm not convinced that any of that has any meaning.
Main's depth is maximal: it must be fully evaluated.
No, main only has to be evaluated "shallowly", to make I/O actions appear at the top level. main is the entire program, and the program isn't completely evaluated on every run because not all code is relevant to every run (in general).
discuss seq and pattern matching in these terms.
I already talked about pattern matching. seq can be defined by rules that are similar to case and application: for example, (\x -> <expr1>) `seq` <expr2> reduces to <expr2>. This "forces evaluation" in the same way that case and application do. WHNF is just a name for what these expressions "force evaluation" to.
Explain the nuances of function application: how is it strict? How is it not?
It's strict in its left expression just like case is strict in its scrutinee. It's also strict in the function body after substitution just like case is strict in the RHS of the selected alternative after substitution.
What about deepseq?
It's just a library function, not a builtin.
Incidentally, deepseq is semantically weird. It should take only one argument. I think that whoever invented it just blindly copied seq with no understanding of why seq needs two arguments. I count deepseq's name and specification as evidence that a poor understanding of Haskell evaluation is common even among experienced Haskell programmers.
let and case statements?
I talked about case. let, after desugaring and type checking, is just a way of writing an arbitrary expression graph in tree form. Here's a paper about it.
unsafePerformIO?
To an extent it can be defined by reduction rules. For example, case unsafePerformIO <expr> of <alts> reduces to unsafePerformIO (<expr> >>= \x -> return (case x of <alts>)), and at the top level only, unsafePerformIO <expr> reduces to <expr>.
This doesn't do any memoization. You could try to simulate memoization by rewriting every unsafePerformIO expression to explicitly memoize itself, and creating the associated IORefs... somewhere. But you could never reproduce GHC's memoization behavior, because it depends on unpredictable details of the optimization process, and because it isn't even type safe (as shown by the infamous polymorphic IORef example in the GHC documentation).
Debug.Trace?
Debug.Trace.trace is just a simple wrapper around unsafePerformIO.
Top-level definitions?
Top-level variable bindings are the same as nested let bindings. data, class, import, and such are a whole different ball game.
Strict data types? Bang patterns?
Just sugar for seq.