Related
I was wondering how smart/lazy Haskell is. Can I always be sure that Haskell will only do what is necessary to generate a certain output?
No.
Haskell specifies a denotational semantics for its core lambda-like calculus, and you can rely on that semantics. Additionally, there is a metatheory proof that a particular reduction order -- known colloquially as "lazy evaluation" -- realizes that semantics; and so many people use that as their mental model of how Haskell programs behave.
There are two broad categories of ways that a Haskell program may end up evaluating more than necessary:
The Haskell implementation may choose to evaluate more. GHC uses lazy evaluation in most places, but I believe it will use other evaluation orders for efficiency in some cases. You could also look at the Eager Haskell project, which is attempting to use another implementation strategy. In principle, an implementation would be within its rights to choose to speculatively fork some computations to another thread (and then throw away the results if they weren't needed). And so on and so forth -- there are many possible variations on this theme.
The denotational semantics specified may demand more evaluation than "necessary". For example, one that occasionally trips up beginners:
primes :: [Int]
primes = 2 : filter prime [3,5..]
prime :: Int -> Bool
prime x = and [x `mod` p /= 0 | p <- primes, p < x]
When checking whether 3 should be in the list primes, it is in fact not necessary to check any of the elements of primes past 2, because the sequence is strictly monotonically increasing. But Haskell is not (does not try to be) smart enough to notice that; it will go straight on trying to check the rest of the primes and end up in an infinite loop instead of giving the list of primes.
An even smaller example: you could think that x && False is always False, but x will typically be evaluated anyway, because the semantics says this should be an infinite loop if x is. (Contrast False && x, which typically does not result in evaluating x.)
That said, when you say "complex structure", one thing that comes to mind is: does Haskell do the laziness thing even with custom data types that I define? That is, do complex structures like hash maps and balanced trees and k-d trees and so forth get treated lazily? The answer there is yes, at least for GHC; there is nothing fundamentally special about any of the types in the Prelude except IO. Lists, booleans, Maybes, and so forth are lazy not because the compiler knows special things about them, but simply as a consequence of the lazy evaluation reduction rules specified by Haskell and their declarations as types.
Of course, there are ways to opt-out of laziness. Some of the structures you will see on Hackage do that in various ways; but don't worry, usually this will be declared in their documentation.
Because Prolog uses chronological backtracking(from the Prolog Wikipedia page) even after an answer is found(in this example where there can only be one solution), would this justify Prolog as using eager evaluation?
mother_child(trude, sally).
father_child(tom, sally).
father_child(tom, erica).
father_child(mike, tom).
sibling(X, Y) :- parent_child(Z, X), parent_child(Z, Y).
parent_child(X, Y) :- father_child(X, Y).
parent_child(X, Y) :- mother_child(X, Y).
With the following output:
?- sibling(sally, erica).
true ;
false.
To summarize the discussion with #WillNess below, yes, Prolog is strict. However, Prolog's execution model and semantics are substantially different from the languages that are usually labelled strict or non-strict. For more about this, see below.
I'm not sure the question really applies to Prolog, because it doesn't really have the kind of implicit evaluation ordering that other languages have. Where this really comes into play in a language like Haskell, you might have an expression like:
f (g x) (h y)
In a strict language like ML, there is a defined evaluation order: g x will be evaluated, then h y, and f (g x) (h y) last. In a language like Haskell, g x and h y will only be evaluated as required ("non-strict" is more accurate than "lazy"). But in Prolog,
f(g(X), h(Y))
does not have the same meaning, because it isn't using a function notation. The query would be broken down into three parts, g(X, A), h(Y, B), and f(A,B,C), and those constituents can be placed in any order. The evaluation strategy is strict in the sense that what comes earlier in a sequence will be evaluated before what comes next, but it is non-strict in the sense that there is no requirement that variables be instantiated to ground terms before evaluation can proceed. Unification is perfectly content to complete without having given you values for every variable. I am bringing this up because you have to break down a complex, nested expression in another language into several expressions in Prolog.
Backtracking has nothing to do with it, as far as I can tell. I don't think backtracking to the nearest choice point and resuming from there precludes a non-strict evaluation method, it just happens that Prolog's is strict.
That Prolog pauses after giving each of the several correct answers to a problem has nothing to do with laziness; it is a part of its user interaction protocol. Each answer is calculated eagerly.
Sometimes there will be only one answer but Prolog doesn't know that in advance, so it waits for us to press ; to continue search, in hopes of finding another solution. Sometimes it is able to deduce it in advance and will just stop right away, but only sometimes.
update:
Prolog does no evaluation on its own. All terms are unevaluated, as if "quoted" in Lisp.
Prolog will unfold your predicate definitions as written and is perfectly happy to keep your data structures full of unevaluated uninstantiated holes, if so entailed by your predicate definitions.
Haskell does not need any values, a user does, when requesting an output.
Similarly, Prolog produces solutions one-by-one, as per the user requests.
Prolog can even be seen to be lazier than Haskell where all arithmetic is strict, i.e. immediate, whereas in Prolog you have to explicitly request the arithmetic evaluation, with is/2.
So perhaps the question is ill-posed. Prolog's operations model is just too different. There are no "results" nor "functions", for one; but viewed from another angle, everything is a result, and predicates are "multi"-functions.
As it stands, the question is not correct in what it states. Chronological backtracking does not mean that Prolog will necessarily backtrack "in an example where there can be only one solution".
Consider this:
foo(a, 1).
foo(b, 2).
foo(c, 3).
?- foo(b, X).
X = 2.
?- foo(X, 2).
X = b.
So this is an example that does have only one solution and Prolog recognizes that, and does not attempt to backtrack. There are cases in which you can implement a solution to a problem in a way that Prolog will not recognize that there is only one logical solution, but this is due to the implementation and is not inherent to Prolog's execution model.
You should read up on Prolog's execution model. From the Wikipedia article which you seem to cite, "Operationally, Prolog's execution strategy can be thought of as a generalization of function calls in other languages, one difference being that multiple clause heads can match a given call. In that case, [emphasis mine] the system creates a choice-point, unifies the goal with the clause head of the first alternative, and continues with the goals of that first alternative." Read Sterling and Shapiro's "The Art of Prolog" for a far more complete discussion of the subject.
from Wikipedia I got
In eager evaluation, an expression is evaluated as soon as it is bound to a variable.
Then I think there are 2 levels - at user level (our predicates) Prolog is not eager.
But it is at 'system' level, because variables are implemented as efficiently as possible.
Indeed, attributed variables are implemented to be lazy, and are rather 'orthogonal' to 'logic' Prolog variables.
We all know (or should know) that Haskell is lazy by default. Nothing is evaluated until it must be evaluated. So when must something be evaluated? There are points where Haskell must be strict. I call these "strictness points", although this particular term isn't as widespread as I had thought. According to me:
Reduction (or evaluation) in Haskell only occurs at strictness points.
So the question is: what, precisely, are Haskell's strictness points? My intuition says that main, seq / bang patterns, pattern matching, and any IO action performed via main are the primary strictness points, but I don't really know why I know that.
(Also, if they're not called "strictness points", what are they called?)
I imagine a good answer will include some discussion about WHNF and so on. I also imagine it might touch on lambda calculus.
Edit: additional thoughts about this question.
As I've reflected on this question, I think it would be clearer to add something to the definition of a strictness point. Strictness points can have varying contexts and varying depth (or strictness). Falling back to my definition that "reduction in Haskell only occurs at strictness points", let us add to that definition this clause: "a strictness point is only triggered when its surrounding context is evaluated or reduced."
So, let me try to get you started on the kind of answer I want. main is a strictness point. It is specially designated as the primary strictness point of its context: the program. When the program (main's context) is evaluated, the strictness point of main is activated. Main's depth is maximal: it must be fully evaluated. Main is usually composed of IO actions, which are also strictness points, whose context is main.
Now you try: discuss seq and pattern matching in these terms. Explain the nuances of function application: how is it strict? How is it not? What about deepseq? let and case statements? unsafePerformIO? Debug.Trace? Top-level definitions? Strict data types? Bang patterns? Etc. How many of these items can be described in terms of just seq or pattern matching?
A good place to start is by understanding this paper: A Natural Semantics for Lazy Evalution (Launchbury). That will tell you when expressions are evaluated for a small language similar to GHC's Core. Then the remaining question is how to map full Haskell to Core, and most of that translation is given by the Haskell report itself. In GHC we call this process "desugaring", because it removes syntactic sugar.
Well, that's not the whole story, because GHC includes a whole raft of optimisations between desugaring and code generation, and many of these transformations will rearrange the Core so that things get evaluated at different times (strictness analysis in particular will cause things to be evaluated earlier). So to really understand how your
program will be evaluated, you need to look at the Core produced by GHC.
Perhaps this answer seems a bit abstract to you (I didn't specifically mention bang patterns or seq), but you asked for something precise, and this is about the best we can do.
I would probably recast this question as, Under what circumstances will Haskell evaluate an expression? (Perhaps tack on a "to weak head normal form.")
To a first approximation, we can specify this as follows:
Executing IO actions will evaluate any expressions that they “need.” (So you need to know if the IO action is executed, e.g. it's name is main, or it is called from main AND you need to know what the action needs.)
An expression that is being evaluated (hey, that's a recursive definition!) will evaluate any expressions it needs.
From your intuitive list, main and IO actions fall into the first category, and seq and pattern matching fall into the second category. But I think that the first category is more in line with your idea of "strictness point", because that is in fact how we cause evaluation in Haskell to become observable effects for users.
Giving all of the details specifically is a large task, since Haskell is a large language. It's also quite subtle, because Concurrent Haskell may evaluate things speculatively, even though we end up not using the result in the end: this is a third breed of things that cause evaluation. The second category is quite well studied: you want to look at the strictness of the functions involved. The first category too can be thought to be a sort of "strictness", though this is a little dodgy because evaluate x and seq x $ return () are actually different things! You can treat it properly if you give some sort of semantics to the IO monad (explicitly passing a RealWorld# token works for simple cases), but I don't know if there's a name for this sort of stratified strictness analysis in general.
C has the concept of sequence points, which are guarantees for particular operations that one operand will be evaluated before the other. I think that's the closest existing concept, but the essentially equivalent term strictness point (or possibly force point) is more in line with Haskell thinking.
In practice Haskell is not a purely lazy language: for instance pattern matching is usually strict (So trying a pattern match forces evaluation to happen at least far enough to accept or reject the match.
…
Programmers can also use the seq primitive to force an expression to evaluate regardless of whether the result will ever be used.
$! is defined in terms of seq.
—Lazy vs. non-strict.
So your thinking about !/$! and seq is essentially right, but pattern matching is subject to subtler rules. You can always use ~ to force lazy pattern matching, of course. An interesting point from that same article:
The strictness analyzer also looks for cases where sub-expressions are always required by the outer expression, and converts those into eager evaluation. It can do this because the semantics (in terms of "bottom") don't change.
Let's continue down the rabbit hole and look at the docs for optimisations performed by GHC:
Strictness analysis is a process by which GHC attempts to determine, at compile-time, which data definitely will 'always be needed'. GHC can then build code to just calculate such data, rather than the normal (higher overhead) process for storing up the calculation and executing it later.
—GHC Optimisations: Strictness Analysis.
In other words, strict code may be generated anywhere as an optimisation, because creating thunks is unnecessarily expensive when the data will always be needed (and/or may only be used once).
…no more evaluation can be performed on the value; it is said to be in normal form. If we are at any of the intermediate steps so that we've performed at least some evaluation on a value, it is in weak head normal form (WHNF). (There is also a 'head normal form', but it's not used in Haskell.) Fully evaluating something in WHNF reduces it to something in normal form…
—Wikibooks Haskell: Laziness
(A term is in head normal form if there is no beta-redex in head position1. A redex is a head redex if it is preceded only by lambda abstractors of non-redexes 2.) So when you start to force a thunk, you're working in WHNF; when there are no more thunks left to force, you're in normal form. Another interesting point:
…if at some point we needed to, say, print z out to the user, we'd need to fully evaluate it…
Which naturally implies that, indeed, any IO action performed from main does force evaluation, which should be obvious considering that Haskell programs do, in fact, do things. Anything that needs to go through the sequence defined in main must be in normal form and is therefore subject to strict evaluation.
C. A. McCann got it right in the comments, though: the only thing that's special about main is that main is defined as special; pattern matching on the constructor is sufficient to ensure the sequence imposed by the IO monad. In that respect only seq and pattern-matching are fundamental.
Haskell is AFAIK not a pure lazy language, but rather a non-strict language. This means that it does not necessarily evaluate terms at the last possible moment.
A good source for haskell's model of "lazyness" can be found here: http://en.wikibooks.org/wiki/Haskell/Laziness
Basically, it is important to understand the difference between a thunk and the weak header normal form WHNF.
My understanding is that haskell pulls computations through backwards as compared to imperative languages. What this means is that in the absence of "seq" and bang patterns, it will ultimately be some kind of side effect that forces the evaluation of a thunk, which may cause prior evaluations in turn (true lazyness).
As this would lead to a horrible space leak, the compiler then figures out how and when to evaluate thunks ahead of time to save space. The programmer can then support this process by giving strictness annotations (en.wikibooks.org/wiki/Haskell/Strictness , www.haskell.org/haskellwiki/Performance/Strictness) to further reduce space usage in form of nested thunks.
I am not an expert in the operational semantics of haskell, so I will just leave the link as a resource.
Some more resources:
http://www.haskell.org/haskellwiki/Performance/Laziness
http://www.haskell.org/haskellwiki/Haskell/Lazy_Evaluation
Lazy doesn't mean do nothing. Whenever your program pattern matches a case expression, it evaluates something -- just enough anyway. Otherwise it can't figure out which RHS to use. Don't see any case expressions in your code? Don't worry, the compiler is translating your code to a stripped down form of Haskell where they are hard to avoid using.
For a beginner, a basic rule of thumb is let is lazy, case is less lazy.
This is not a full answer aiming for karma, but just a piece of the puzzle -- to the extent that this is about semantics, bear in mind that there are multiple evaluation strategies that provide the same semantics. One good example here -- and the project also speaks to how we typically think of Haskell semantics -- was the Eager Haskell project, which radically altered evaluation strategies while maintaining the same semantics: http://csg.csail.mit.edu/pubs/haskell.html
The Glasgow Haskell compiler translates your code into a Lambda-calculus-like language called core. In this language, something is going to be evaluated, whenever you pattern match it by a case-statement. Thus if a function is called, the outermost constructor and only it (if there are no forced fields) is going to be evaluated. Anything else is canned in a thunk. (Thunks are introduced by let bindings).
Of course this is not exactly what happens in the real language. The compiler convert Haskell into Core in a very sophisticated way, making as many things as possibly lazy and anything that is always needed lazy. Additionally, there are unboxed values and tuples that are always strict.
If you try to evaluate a function by hand, you can basically think:
Try to evaluate the outermost constructor of the return.
If anything else is needed to get the result (but only if it's really needed) is also going to be evaluated. The order doesn't matters.
In case of IO you have to evaluate the results of all statements from the first to the last in that. This is a bit more complicated, since the IO monad does some tricks to force evaluation in a specific order.
We all know (or should know) that Haskell is lazy by default. Nothing is evaluated until it must be evaluated.
No.
Haskell is not a lazy language
Haskell is a language in which evaluation order doesn't matter because there are no side effects.
It's not quite true that evaluation order doesn't matter, because the language allows for infinite loops. If you aren't careful, it's possible to get stuck in a cul-de-sac where you evaluate a subexpression forever when a different evaluation order would have led to termination in finite time. So it's more accurate to say:
Haskell implementations must evaluate the program in a way that terminates if there is any evaluation order that terminates. Only if every possible evaluation order fails to terminate can the implementation fail to terminate.
This still leaves implementations with a huge freedom in how they evaluate the program.
A Haskell program is a single expression, namely let {all top-level bindings} in Main.main. Evaluation can be understood as a sequence of reduction (small-)steps which change the expression (which represents the current state of the executing program).
You can divide reduction steps into two categories: those that are provably necessary (provably will be part of any terminating sequence of reductions), and those that aren't. You can divide the provably necessary reductions somewhat vaguely into two subcategories: those that are "obviously" necessary, and those that require some nontrivial analysis to prove them necessary.
Performing only obviously necessary reductions is what's called "lazy evaluation". I don't know whether a purely lazy evaluating implementation of Haskell has ever existed. Hugs may have been one. GHC definitely isn't.
GHC performs reduction steps at compile time that aren't provably necessary; for example, it will replace 1+2::Int with 3::Int even if it can't prove that the result will be used.
GHC may also perform not-provably-necessary reductions at run time in some circumstances. For example, when generating code to evaluate f (x+y), if x and y are of type Int and their values will be known at run time, but f can't be proven to use its argument, there is no reason not to compute x+y before calling f. It uses less heap space and less code space and is probably faster even if the argument isn't used. However, I don't know whether GHC actually takes these sorts of optimization opportunities.
GHC definitely performs evaluation steps at run time that are proven necessary only by fairly complex cross-module analyses. This is extremely common and may represent the bulk of the evaluation of realistic programs. Lazy evaluation is a last-resort fallback evaluation strategy; it isn't what happens as a rule.
There was an "optimistic evaluation" branch of GHC that did much more speculative evaluation at run time. It was abandoned because of its complexity and the ongoing maintenance burden, not because it didn't perform well. If Haskell was as popular as Python or C++ then I'm sure there would be implementations with much more sophisticated runtime evaluation strategies, maintained by deep-pocketed corporations. Non-lazy evaluation isn't a change to the language, it's just an engineering challenge.
Reduction is driven by top-level I/O, and nothing else
You can model interaction with the outside world by means of special side-effectful reduction rules like: "If the current program is of the form getChar >>= <expr>, then get a character from standard input and reduce the program to <expr> applied to the character you got."
The entire goal of the run time system is to evaluate the program until it has one of these side-effecting forms, then do the side effect, then repeat until the program has some form that implies termination, such as return ().
There are no other rules about what is reduced when. There are only rules about what can reduce to what.
For example, the only rules for if expressions are that if True then <expr1> else <expr2> can be reduced to <expr1>, if False then <expr1> else <expr2> can be reduced to <expr2>, and if <exc> then <expr1> else <expr2>, where <exc> is an exceptional value, can be reduced to an exceptional value.
If the expression representing the current state of your program is an if expression, you have no choice but to perform reductions on the condition until it's True or False or <exc>, because that's the only way you'll ever get rid of the if expression and have any hope of reaching a state that matches one of the I/O rules. But the language specification doesn't tell you to do that in so many words.
These sorts of implicit ordering constraints are the only way that evaluation can be "forced" to happen. This is a frequent source of confusion for beginners. For example, people sometimes try to make foldl more strict by writing foldl (\x y -> x `seq` x+y) instead of foldl (+). This doesn't work, and nothing like it can ever work, because no expression can make itself evaluate. The evaluation can only "come from above". seq is not special in any way in this regard.
Reduction happens everywhere
Reduction (or evaluation) in Haskell only occurs at strictness points. [...] My intuition says that main, seq / bang patterns, pattern matching, and any IO action performed via main are the primary strictness points [...].
I don't see how to make sense of that statement. Every part of the program has some meaning, and that meaning is defined by reduction rules, so reduction happens everywhere.
To reduce a function application <expr1> <expr2>, you have to evaluate <expr1> until it has a form like (\x -> <expr1'>) or (getChar >>=) or something else that matches a rule. But for some reason function application doesn't tend to show up on lists of expressions that allegedly "force evaluation", while case always does.
You can see this misconception in a quote from the Haskell wiki, found in another answer:
In practice Haskell is not a purely lazy language: for instance pattern matching is usually strict
I don't understand what could qualify as a "purely lazy language" to whoever wrote that, except, perhaps, a language in which every program hangs because the runtime never does anything. If pattern matching is a feature of your language then you've got to actually do it at some point. To do it, you have to evaluate the scrutinee enough to determine whether it matches the pattern. That's the laziest way to match a pattern that is possible in principle.
~-prefixed patterns are often called "lazy" by programmers, but the language spec calls them "irrefutable". Their defining property is that they always match. Because they always match, you don't have to evaluate the scrutinee to determine whether they match or not, so a lazy implementation won't. The difference between regular and irrefutable patterns is what expressions they match, not what evaluation strategy you're supposed to use. The spec says nothing about evaluation strategies.
main is a strictness point. It is specially designated as the primary strictness point of its context: the program. When the program (main's context) is evaluated, the strictness point of main is activated. [...] Main is usually composed of IO actions, which are also strictness points, whose context is main.
I'm not convinced that any of that has any meaning.
Main's depth is maximal: it must be fully evaluated.
No, main only has to be evaluated "shallowly", to make I/O actions appear at the top level. main is the entire program, and the program isn't completely evaluated on every run because not all code is relevant to every run (in general).
discuss seq and pattern matching in these terms.
I already talked about pattern matching. seq can be defined by rules that are similar to case and application: for example, (\x -> <expr1>) `seq` <expr2> reduces to <expr2>. This "forces evaluation" in the same way that case and application do. WHNF is just a name for what these expressions "force evaluation" to.
Explain the nuances of function application: how is it strict? How is it not?
It's strict in its left expression just like case is strict in its scrutinee. It's also strict in the function body after substitution just like case is strict in the RHS of the selected alternative after substitution.
What about deepseq?
It's just a library function, not a builtin.
Incidentally, deepseq is semantically weird. It should take only one argument. I think that whoever invented it just blindly copied seq with no understanding of why seq needs two arguments. I count deepseq's name and specification as evidence that a poor understanding of Haskell evaluation is common even among experienced Haskell programmers.
let and case statements?
I talked about case. let, after desugaring and type checking, is just a way of writing an arbitrary expression graph in tree form. Here's a paper about it.
unsafePerformIO?
To an extent it can be defined by reduction rules. For example, case unsafePerformIO <expr> of <alts> reduces to unsafePerformIO (<expr> >>= \x -> return (case x of <alts>)), and at the top level only, unsafePerformIO <expr> reduces to <expr>.
This doesn't do any memoization. You could try to simulate memoization by rewriting every unsafePerformIO expression to explicitly memoize itself, and creating the associated IORefs... somewhere. But you could never reproduce GHC's memoization behavior, because it depends on unpredictable details of the optimization process, and because it isn't even type safe (as shown by the infamous polymorphic IORef example in the GHC documentation).
Debug.Trace?
Debug.Trace.trace is just a simple wrapper around unsafePerformIO.
Top-level definitions?
Top-level variable bindings are the same as nested let bindings. data, class, import, and such are a whole different ball game.
Strict data types? Bang patterns?
Just sugar for seq.
I often read that lazy is not the same as non-strict but I find it hard to understand the difference. They seem to be used interchangeably but I understand that they have different meanings. I would appreciate some help understanding the difference.
I have a few questions which are scattered about this post. I will summarize those questions at the end of this post. I have a few example snippets, I did not test them, I only presented them as concepts. I have added quotes to save you from looking them up. Maybe it will help someone later on with the same question.
Non-Strict Def:
A function f is said to be strict if, when applied to a nonterminating
expression, it also fails to terminate. In other words, f is strict
iff the value of f bot is |. For most programming languages, all
functions are strict. But this is not so in Haskell. As a simple
example, consider const1, the constant 1 function, defined by:
const1 x = 1
The value of const1 bot in Haskell is 1. Operationally speaking, since
const1 does not "need" the value of its argument, it never attempts to
evaluate it, and thus never gets caught in a nonterminating
computation. For this reason, non-strict functions are also called
"lazy functions", and are said to evaluate their arguments "lazily",
or "by need".
-A Gentle Introduction To Haskell: Functions
I really like this definition. It seems the best one I could find for understanding strict. Is const1 x = 1 lazy as well?
Non-strictness means that reduction (the mathematical term for
evaluation) proceeds from the outside in,
so if you have (a+(bc)) then first you reduce the +, then you reduce
the inner (bc).
-Haskell Wiki: Lazy vs non-strict
The Haskell Wiki really confuses me. I understand what they are saying about order but I fail to see how (a+(b*c)) would evaluate non-strictly if it was pass _|_?
In non-strict evaluation, arguments to a function are not evaluated
unless they are actually used in the evaluation of the function body.
Under Church encoding, lazy evaluation of operators maps to non-strict
evaluation of functions; for this reason, non-strict evaluation is
often referred to as "lazy". Boolean expressions in many languages use
a form of non-strict evaluation called short-circuit evaluation, where
evaluation returns as soon as it can be determined that an unambiguous
Boolean will result — for example, in a disjunctive expression where
true is encountered, or in a conjunctive expression where false is
encountered, and so forth. Conditional expressions also usually use
lazy evaluation, where evaluation returns as soon as an unambiguous
branch will result.
-Wikipedia: Evaluation Strategy
Lazy Def:
Lazy evaluation, on the other hand, means only evaluating an
expression when its results are needed (note the shift from
"reduction" to "evaluation"). So when the evaluation engine sees an
expression it builds a thunk data structure containing whatever values
are needed to evaluate the expression, plus a pointer to the
expression itself. When the result is actually needed the evaluation
engine calls the expression and then replaces the thunk with the
result for future reference.
...
Obviously there is a strong correspondence between a thunk and a
partly-evaluated expression. Hence in most cases the terms "lazy" and
"non-strict" are synonyms. But not quite.
-Haskell Wiki: Lazy vs non-strict
This seems like a Haskell specific answer. I take that lazy means thunks and non-strict means partial evaluation. Is that comparison too simplified? Does lazy always mean thunks and non-strict always mean partial evaluation.
In programming language theory, lazy evaluation or call-by-need1 is
an evaluation strategy which delays the evaluation of an expression
until its value is actually required (non-strict evaluation) and also
avoid repeated evaluations (sharing).
-Wikipedia: Lazy Evaluation
Imperative Examples
I know most people say forget imperative programming when learning a functional language. However, I would like to know if these qualify as non-strict, lazy, both or neither? At the very least it would provide something familiar.
Short Circuiting
f1() || f2()
C#, Python and other languages with "yield"
public static IEnumerable Power(int number, int exponent)
{
int counter = 0;
int result = 1;
while (counter++ < exponent)
{
result = result * number;
yield return result;
}
}
-MSDN: yield (c#)
Callbacks
int f1() { return 1;}
int f2() { return 2;}
int lazy(int (*cb1)(), int (*cb2)() , int x) {
if (x == 0)
return cb1();
else
return cb2();
}
int eager(int e1, int e2, int x) {
if (x == 0)
return e1;
else
return e2;
}
lazy(f1, f2, x);
eager(f1(), f2(), x);
Questions
I know the answer is right in front of me with all those resources, but I can't grasp it. It all seems like the definition is too easily dismissed as implied or obvious.
I know I have a lot of questions. Feel free to answer whatever questions you feel are relevant. I added those questions for discussion.
Is const1 x = 1 also lazy?
How is evaluating from "inward" non-strict? Is it because inward allows reductions of unnecessary expressions, like in const1 x = 1? Reductions seem to fit the definition of lazy.
Does lazy always mean thunks and non-strict always mean partial evaluation? Is this just a generalization?
Are the following imperative concepts Lazy, Non-Strict, Both or Neither?
Short Circuiting
Using yield
Passing Callbacks to delay or avoid execution
Is lazy a subset of non-strict or vice versa, or are they mutually exclusive. For example is it possible to be non-strict without being lazy, or lazy without being non-strict?
Is Haskell's non-strictness achieved by laziness?
Thank you SO!
Non-strict and lazy, while informally interchangeable, apply to different domains of discussion.
Non-strict refers to semantics: the mathematical meaning of an expression. The world to which non-strict applies has no concept of the running time of a function, memory consumption, or even a computer. It simply talks about what kinds of values in the domain map to which kinds of values in the codomain. In particular, a strict function must map the value ⊥ ("bottom" -- see the semantics link above for more about this) to ⊥; a non strict function is allowed not to do this.
Lazy refers to operational behavior: the way code is executed on a real computer. Most programmers think of programs operationally, so this is probably what you are thinking. Lazy evaluation refers to implementation using thunks -- pointers to code which are replaced with a value the first time they are executed. Notice the non-semantic words here: "pointer", "first time", "executed".
Lazy evaluation gives rise to non-strict semantics, which is why the concepts seem so close together. But as FUZxxl points out, laziness is not the only way to implement non-strict semantics.
If you are interested in learning more about this distinction, I highly recommend the link above. Reading it was a turning point in my conception of the meaning of computer programs.
An example for an evaluation model, that is neither strict nor lazy: optimistic evaluation, which gives some speedup as it can avoid a lot of "easy" thunks:
Optimistic evaluation means that even if a subexpression may not be needed to evaluate the superexpression, we still evaluate some of it using some heuristics. If the subexpression doesn't terminate quickly enough, we suspend its evaluation until it's really needed. This gives us an advantage over lazy evaluation if the subexpression is needed later, as we don't need to generate a thunk. On the other hand, we don't lose too much if the expression doesn't terminate, as we can abort it quickly enough.
As you can see, this evaluation model is not strict: If something that yields _|_ is evaluated, but not needed, the function will still terminate, as the engine aborts the evaluation. On the other hand, it may be possible that more expressions than needed are evaluated, so it's not completely lazy.
Yes, there is some unclear use of terminology here, but the terms coincide in most cases regardless, so it's not too much of a problem.
One major difference is when terms are evaluated. There are multiple strategies for this, ranging on a spectrum from "as soon as possible" to "only at the last moment". The term eager evaluation is sometimes used for strategies leaning toward the former, while lazy evaluation properly refers to a family of strategies leaning heavily toward the latter. The distinction between "lazy evaluation" and related strategies tend to involve when and where the result of evaluating something is retained, vs. being tossed aside. The familiar memoization technique in Haskell of assigning a name to a data structure and indexing into it is based on this. In contrast, a language that simply spliced expressions into each other (as in "call-by-name" evaluation) might not support this.
The other difference is which terms are evaluated, ranging from "absolutely everything" to "as little as possible". Since any value actually used to compute the final result can't be ignored, the difference here is how many superfluous terms are evaluated. As well as reducing the amount of work the program has to do, ignoring unused terms means that any errors they would have generated won't occur. When a distinction is being drawn, strictness refers to the property of evaluating everything under consideration (in the case of a strict function, for instance, this means the terms it's applied to. It doesn't necessarily mean sub-expressions inside the arguments), while non-strict means evaluating only some things (either by delaying evaluation, or by discarding terms entirely).
It should be easy to see how these interact in complicated ways; decisions are not at all orthogonal, as the extremes tend to be incompatible. For instance:
Very non-strict evaluation precludes some amount of eagerness; if you don't know whether a term will be needed, you can't evaluate it yet.
Very strict evaluation makes non-eagerness somewhat irrelevant; if you're evaluating everything, the decision of when to do so is less significant.
Alternate definitions do exist, though. For instance, at least in Haskell, a "strict function" is often defined as one that forces its arguments sufficiently that the function will evaluate to _|_ ("bottom") whenever any argument does; note that by this definition, id is strict (in a trivial sense), because forcing the result of id x will have exactly the same behavior as forcing x alone.
This started out as an update but it started to get long.
Laziness / Call-by-need is a memoized version of call-by-name where, if the function argument is evaluated, that value is stored for subsequent uses. In a "pure" (effect-free) setting, this produces the same results as call-by-name; when the function argument is used two or more times, call-by-need is almost always faster.
Imperative Example - Apparently this is possible. There is an interesting article on Lazy Imperative Languages. It says there are two methods. One requires closures the second uses graph reductions. Since C does not support closures you would need to explicitly pass an argument to your iterator. You could wrap a map structure and if the value does not exist calculate it otherwise return value.
Note: Haskell implements this by "pointers to code which are replaced with a value the first time they are executed" - luqui.
This is non-strict call-by-name but with sharing/memorization of the results.
Call-By-Name - In call-by-name evaluation, the arguments to a function are not evaluated before the function is called — rather, they are substituted directly into the function body (using capture-avoiding substitution) and then left to be evaluated whenever they appear in the function. If an argument is not used in the function body, the argument is never evaluated; if it is used several times, it is re-evaluated each time it appears.
Imperative Example: callbacks
Note: This is non-strict as it avoids evaluation if not used.
Non-Strict = In non-strict evaluation, arguments to a function are not evaluated unless they are actually used in the evaluation of the function body.
Imperative Example: short-circuiting
Note: _|_ appears to be a way to test if a function is non-strict
So a function can be non-strict but not lazy. A function that is lazy is always non-strict.
Call-By-Need is partly defined by Call-By-Name which is partly defined by Non-Strict
An Excerpt from "Lazy Imperative Languages"
2.1. NON-STRICT SEMANTICS VS. LAZY EVALUATION We must first clarify
the distinction between "non-strict semantics" and "lazy evaluation".
Non-strictsemantics are those which specify that an expression is not
evaluated until it is needed by a primitiveoperation. There may be
various types of non-strict semantics. For instance, non-strict
procedure calls donot evaluate the arguments until their values are
required. Data constructors may have non-strictsemantics, in which
compound data are assembled out of unevaluated pieces Lazy evaluation,
also called delayed evaluation, is the technique normally used to
implement non-strictsemantics. In section 4, the two methods commonly
used to implement lazy evaluation are very brieflysummarized.
CALL BY VALUE, CALL BY LAZY, AND CALL BY NAME "Call by value" is the
general name used for procedure calls with strict semantics. In call
by valuelanguages, each argument to a procedure call is evaluated
before the procedure call is made; the value isthen passed to the
procedure or enclosing expression. Another name for call by value is
"eager" evaluation.Call by value is also known as "applicative order"
evaluation, because all arguments are evaluated beforethe function is
applied to them."Call by lazy" (using William Clinger's terminology in
[8]) is the name given to procedure calls which usenon-strict
semantics. In languages with call by lazy procedure calls, the
arguments are not evaluatedbefore being substituted into the procedure
body. Call by lazy evaluation is also known as "normal
order"evaluation, because of the order (outermost to innermost, left
to right) of evaluation of an expression."Call by name" is a
particular implementation of call by lazy, used in Algol-60 [18]. The
designers ofAlgol-60 intended that call-by-name parameters be
physically substituted into the procedure body, enclosedby parentheses
and with suitable name changes to avoid conflicts, before the body was
evaluated.
CALL BY LAZY VS. CALL BY NEED Call by need is an
extension of call by lazy, prompted by the observation that a lazy
evaluation could beoptimized by remembering the value of a given
delayed expression, once forced, so that the value need notbe
recalculated if it is needed again. Call by need evaluation,
therefore, extends call by lazy methods byusing memoization to avoid
the need for repeated evaluation. Friedman and Wise were among the
earliestadvocates of call by need evaluation, proposing "suicidal
suspensions" which self-destructed when theywere first evaluated,
replacing themselves with their values.
The way I understand it, "non-strict" means trying to reduce workload by reaching completion in a lower amount of work.
Whereas "lazy evaluation" and similar try to reduce overall workload by avoiding full completion (hopefully forever).
from your examples...
f1() || f2()
...short circuiting from this expression would not possibly result in triggering 'future work', and there's neither a speculative/amortized factor to the reasoning, nor any computational complexity debt accruing.
Whereas in the C# example, "lazy" conserves a function call in the overall view, but in exchange, comes with those above sorts of difficulty (at least from point of call until possible full completion... in this code that's a negligible-distance pathway, but imagine those functions had some high-contention locks to put up with).
int f1() { return 1;}
int f2() { return 2;}
int lazy(int (*cb1)(), int (*cb2)() , int x) {
if (x == 0)
return cb1();
else
return cb2();
}
int eager(int e1, int e2, int x) {
if (x == 0)
return e1;
else
return e2;
}
lazy(f1, f2, x);
eager(f1(), f2(), x);
If we're talking general computer science jargon, then "lazy" and "non-strict" are generally synonyms -- they stand for the same overall idea, which expresses itself in different ways for different situations.
However, in a given particular specialized context, they may have acquired differing technical meanings. I don't think you can say anything accurate and universal about what the difference between "lazy" and "non-strict" is likely to be in the situation where there is a difference to make.
As far as I know, eager evaluation/applicative order evaluates all arguments to a function before applying it, on the other hand, lazy evaluation/normal order evaluates the arguments only when needed.
So, what are the differences between the pair of terms eager evaluation and applicative order, and lazy evaluation and normal order?
Thanks.
Lazy evaluation evaluates a term at most once, while normal order would evaluate it as often as it appears. So for example if you have f(x) = x+x and you call it as f(g(42)) then g(42) is called once under lazy evaluation or applicative order, but twice under normal order.
Eager evaluation and applicative order are synonymous, at least when using the definition of applicative order found in Structure and Interpretation of Computer Programs, which seems to match yours. (Wikipedia defines applicative order a bit differently and has it as a special case of eager evaluation).
I'm reading SICP too, and I've been curious by the definition of normal order given by the authors. It seemed rather similar to Lazy evaluation to me, so I went looking for some more information regarding both.
I know this question was asked a long time ago, but I looked at the FAQ and found no mention of answering old questions, so I thought I'd leave what I've found here so other people could use it in the future.
This is what I've found, and I'm inclined to agree with those:
I would argue (as have others) that lazy evaluation and
NormalOrderEvaluation are two different things; the difference is
alluded to above. In lazy evaluation, evaluation of the argument is
deferred until it is needed, at which point the argument is evaluated
and its result saved (memoized). Further uses of the argument in the
function use the computed value. The C/C++ operators ||, &&, and ? :
are both examples of lazy evaluation. (Unless some newbie C/C++
programmer is daft enough to overload && or ||, in which case the
overloaded versions are evaluated in strict order; which is why the &&
and || operators should NEVER be overloaded in C++).
In other words, each argument is evaluated at most once, possibly not
at all.
NormalOrderEvaluation, on the other hand, re-evaluates the expression
each time it is used. Think of C macros, CallByName in languages which
support it, and the semantics of looping control structures, etc.
Normal-order evaluation can take much longer than applicative order
evaluation, and can cause side effects to happen more than once.
(Which is why, of course, statements with side effects generally ought
not be given as arguments to macros in C/C++)
If the argument is invariant and has no side effects, the only
difference between the two is performance. Indeed, in a purely
functional language, lazy eval can be viewed as an optimization of
normal-order evaluation. With side effects present, or expressions
which can return a different value when re-evaluated, the two have
different behavior; normal order eval in particular has a bad
reputation in procedural languages due to the difficulty of reasoning
about such programs without ReferentialTransparency
Should also be noted that strict-order evaluation (as well as lazy
evaluation) can be achieved in a language which supports normal-order
evaluation via explicit memoing. The opposite isn't true; it requires
passing in thunks, functions, or objects which can be called/messaged
in order to defer/repeat the evaluation.
And
Lazy evaluation combines normal-order evaluation and sharing:
• Never evaluate something until you have to (normal-order)
• Never evaluate something more than once (sharing)
http://c2.com/cgi/wiki?LazyEvaluation
http://cs.anu.edu.au/student/comp3610/lectures/Lazy.pdf
From Kevin Sookocheff's Normal, Applicative and Lazy Evaluation post (emphases, stylistic changes mine):
Lazy Evaluation
While normal-order evaluation may result in doing extra work by
requiring function arguments to be evaluated more than once,
applicative-order evaluation may result in programs that do not
terminate where their normal-order equivalents do. In practise, most
functional programming languages solve this problem using lazy
evaluation.
With lazy evalution, we delay function evaluation in a way that avoids
multiple evaluations of the same function — thus combining the
benefits of normal-order and applicative-order evaluation.
With lazy
evaluation, we evaluate a value when it is needed, and after
evaluation all copies of that expression are updated with the new
value. In effect, a parameter passed into a function is stored in a
single location in memory so that the parameter need only be evaluated
once. That is, we remember all of the locations where we a certain
argument will be used, and when we evaluate a function, we replace the
argument with the value.
As a result, with lazy evaluation, every
parameter is evaluated at most once.
This was too long to post as a comment beneath the question, and updating existing answers with it seemed inappropriate, hence this answer.