Eager evaluation/applicative order and lazy evaluation/normal order

Eager evaluation/applicative order and lazy evaluation/normal order - programming-languages

As far as I know, eager evaluation/applicative order evaluates all arguments to a function before applying it, on the other hand, lazy evaluation/normal order evaluates the arguments only when needed.
So, what are the differences between the pair of terms eager evaluation and applicative order, and lazy evaluation and normal order?
Thanks.

Lazy evaluation evaluates a term at most once, while normal order would evaluate it as often as it appears. So for example if you have f(x) = x+x and you call it as f(g(42)) then g(42) is called once under lazy evaluation or applicative order, but twice under normal order.
Eager evaluation and applicative order are synonymous, at least when using the definition of applicative order found in Structure and Interpretation of Computer Programs, which seems to match yours. (Wikipedia defines applicative order a bit differently and has it as a special case of eager evaluation).

I'm reading SICP too, and I've been curious by the definition of normal order given by the authors. It seemed rather similar to Lazy evaluation to me, so I went looking for some more information regarding both.
I know this question was asked a long time ago, but I looked at the FAQ and found no mention of answering old questions, so I thought I'd leave what I've found here so other people could use it in the future.
This is what I've found, and I'm inclined to agree with those:
I would argue (as have others) that lazy evaluation and
NormalOrderEvaluation are two different things; the difference is
alluded to above. In lazy evaluation, evaluation of the argument is
deferred until it is needed, at which point the argument is evaluated
and its result saved (memoized). Further uses of the argument in the
function use the computed value. The C/C++ operators ||, &&, and ? :
are both examples of lazy evaluation. (Unless some newbie C/C++
programmer is daft enough to overload && or ||, in which case the
overloaded versions are evaluated in strict order; which is why the &&
and || operators should NEVER be overloaded in C++).
In other words, each argument is evaluated at most once, possibly not
at all.
NormalOrderEvaluation, on the other hand, re-evaluates the expression
each time it is used. Think of C macros, CallByName in languages which
support it, and the semantics of looping control structures, etc.
Normal-order evaluation can take much longer than applicative order
evaluation, and can cause side effects to happen more than once.
(Which is why, of course, statements with side effects generally ought
not be given as arguments to macros in C/C++)
If the argument is invariant and has no side effects, the only
difference between the two is performance. Indeed, in a purely
functional language, lazy eval can be viewed as an optimization of
normal-order evaluation. With side effects present, or expressions
which can return a different value when re-evaluated, the two have
different behavior; normal order eval in particular has a bad
reputation in procedural languages due to the difficulty of reasoning
about such programs without ReferentialTransparency
Should also be noted that strict-order evaluation (as well as lazy
evaluation) can be achieved in a language which supports normal-order
evaluation via explicit memoing. The opposite isn't true; it requires
passing in thunks, functions, or objects which can be called/messaged
in order to defer/repeat the evaluation.
And
Lazy evaluation combines normal-order evaluation and sharing:
• Never evaluate something until you have to (normal-order)
• Never evaluate something more than once (sharing)
http://c2.com/cgi/wiki?LazyEvaluation
http://cs.anu.edu.au/student/comp3610/lectures/Lazy.pdf

From Kevin Sookocheff's Normal, Applicative and Lazy Evaluation post (emphases, stylistic changes mine):
Lazy Evaluation
While normal-order evaluation may result in doing extra work by
requiring function arguments to be evaluated more than once,
applicative-order evaluation may result in programs that do not
terminate where their normal-order equivalents do. In practise, most
functional programming languages solve this problem using lazy
evaluation.
With lazy evalution, we delay function evaluation in a way that avoids
multiple evaluations of the same function — thus combining the
benefits of normal-order and applicative-order evaluation.
With lazy
evaluation, we evaluate a value when it is needed, and after
evaluation all copies of that expression are updated with the new
value. In effect, a parameter passed into a function is stored in a
single location in memory so that the parameter need only be evaluated
once. That is, we remember all of the locations where we a certain
argument will be used, and when we evaluate a function, we replace the
argument with the value.
As a result, with lazy evaluation, every
parameter is evaluated at most once.
This was too long to post as a comment beneath the question, and updating existing answers with it seemed inappropriate, hence this answer.

Related

Can every functional language be lazy?

In a functional language, functions are first class citizens and thus calling them is not the only thing I can do. I can also store them away.
Now when I have a language, which is strict by default, then I am still not forced to evaluate a function call. I have the option to store the function and its parameters e.g. in a tuple for later evaluation.
So instead of
x = f a b c
I do something like
x = (f,a,b,c)
And later, I can evaluate this thing with something like
eval (f,a,b,c) = f a b c
Well, there is probably more to it, because I want to evaluate each unevaluated function call only once, but it seems to me, that this can also be solved with a data structure which is a bit fancier than just a tuple.
The inverse also seems to be the case, because e.g. in Haskell, which is lazy be default I can enforce evaluation with seq or BangPatterns.
So is it correct to say, that every functional language has the potential of being lazy, but most of them are just not lazy by default and thus require additional programming effort to call a function lazily, whereas haskell is lazy by default and requires additional programming effort to call a function in a strict way?
Should that be the case, what is more difficult for the programmer: writing lazy function calls in a strict language or writing strict function calls in a lazy language?
As a side note: was Simon P. Jone serious when he said: "the next version of haskell will be strict". I first thought that this was a joke. But now I think strict by default isn't all that bad as long as you can be lazy if required.

Lazy evaluation, at the low level, is implemented by a concept called a thunk, which comprises two things:
A closure that computes the value of the deferred computation
A set-once mutable storage cell for memoizing the result.
The first part, the closure, can be modeled in an even simpler way than your tuple with the function and its arguments. You can just use a function that accepts unit or no arguments (depending on how your language works), and in its body you apply the function to the arguments. To compute the result, you just invoke the function.
Paul Johnson mentions Scheme, which is a perfect language to demonstrate this. As a Scheme macro (pseudocode, untested):
(define-syntax delay
(syntax-rules ()
((delay expr ...)
;; `(delay expr)` evaluates to a lambda that closes over
;; two variables—one to store the result, one to record
;; whether the thunk has been forced.
(let ((value #f)
(done? #f))
(lambda ()
(unless done?
(set! value (begin expr ...))
(set! done? #t))
value)))))
(define (force thunk)
;; Thunks are procedures, so to force them you just invoke them.
(thunk))
But to get this back to the title of the question: does this mean that every functional language can be lazy? The answer is no. Eager languages can implement thunking and use it to provide opt-in delayed evaluation at user-selected spots, but this isn't the same as having pervasive lazy evaluation like Haskell implementations provide.

The answer is a qualified yes. Your intuition that laziness can be implemented in a strict language where functions are first-class objects is correct. But going into the details reveals a number of subtleties.
Let's take a functional language (by which I mean a language where functions can be constructed and manipulated as first-class objects, like in the lambda calculus), where function application is strict (i.e. the function¹ and its argument(s) are fully evaluated before the function is applied). I'll use the syntax of Standard ML, since this is a popular and historically important strict functional language. A strict application F A (where F and A are two expressions) can be delayed by encoding it as
Thunk (F, A)
This object contains a function and an argument is called a thunk. We can define a type of thunks:
datatype ('a, 'b) thunk = Thunk of ('a -> 'b) * 'a;
and a function to evaluate a thunk:
fun evaluate (Thunk (f, x)) = f x;
Nice and easy so far. But we have not, in fact, implemented lazy evaluation! What we've implemented is normal-order evaluation, also known as call-by-name. The difference is that if the value of the thunk is used more than once, it is calculated every time. Lazy evaluation (also known as call-by-need) requires evaluating the expression at most once.
In a pure, strict language, lazy evaluation is in fact impossible to implement. The reason is that evaluating a thunk modifies the state of the system: it goes from unevaluated to evaluated. Implementing lazy evaluation requires a way to change the state of the system.
There's a subtlety here: if the semantics of the language is defined purely in terms of the termination status of expressions and the value of terminating expressions, then, in a pure language, call-by-need and call-by-name are indistinguishable. Call-by-value (i.e. strict evaluation) is distinguishable because fewer expressions terminate — call-by-need and call-by-name hide any non-termination that happens in a thunk that is never evaluated. The equivalence of call-by-need and call-by-name allows lazy evaluation to be considered as an optimization of normal-order evaluation (which has nice theoretical properties). But in many programs, using call-by-name instead of call-by-value would blow up the running time by computing the value of the same expressions over and over again.
In a language with mutable state, lazy evaluation can be expressed by storing the value into the thunk when it is calculated.
datatype ('a, 'b) lazy_state = Lazy of ('a -> 'b) * 'a | Value of 'a;
type ('a, 'b) lazy_state = ('a, 'b) lazy_state ref;
let lazy (f, x) = ref (Lazy (f, x));
fun force r =
case !r of Value y => y
| Lazy (f, x) => let val y = f x in r := Value y; y end;
This code is not very complicated, so even in ML dialects that provide lazy evaluation as a library feature (possibly with syntactic sugar), it isn't used all that often in practice — often, the point at which the value will be needed is a known location in the programs, and programmers just use a function and pass it its argument at that location.
While this is getting into subjective territory, I would say that it's much easier to implement lazy evaluation like this, than to implement strict evaluation in a language like Haskell. Forcing strict evaluation in Haskell is basically impossible (except by wrapping everything into a state monad and essentially writing ML code in Haskell syntax). Of course, strict evaluation doesn't change the values calculated by the program, but it can have a significant impact on performance (and, in particular, it is sometimes much appreciated because it makes performance a lot more predictable — predicting the performance of a Haskell program can be very hard).
This evaluate-and-store mechanism is effectively the core of what a Haskell compiler does under the hood. Haskell is pure², so you cannot implemente this in the language itself! However, it's sound for the compiler to do it under the hood, because this particular use of side effects does not break the purity of the language, so it does not invalidate any program transformation. The reason storing the value of a thunk is sound is that it turns call-by-name evaluation into call-by-need, and as we saw above, this neither changes the values of terminating expressions, nor changes which expressions terminate.
This approach can be somewhat problematic in a language that combines purely functional local evaluation with a multithreaded environment and message passing between threads. (This is notably the model of Erlang.) If one thread starts evaluating a thunk, and another thread needs its value just then, what is going to happen? If no precautions are taken, then both threads will calculate the value and store it. In a pure language, this is harmless in the sense that both threads will calculate the same value anyway³. However this can hurt performance. To ensure that a thunk is evaluated only once, the calculation of the value must be wrapped in a lock; this helps with long calculations that are performed many times but hurts short calculations that are performed only once, as taking and releasing a lock takes some time.
¹ The function, not the function body of course.
² Or rather, the fragment of Haskell that doesn't use a side effect monad is pure.
³ It is necessary for the transition between a delayed thunk and a computed value to be atomic — concurrent threads must be able to read a lazy value and get either a valid delayed thunk or a valid computed value, not some mixture of the two that isn't a valid object. At the processor level, the transition from delayed thunk to computed value is usually a pointer assignment, which on most architectures is atomic, fortunately.

What you propose will work. The details for doing this in Scheme can be found in SICP. One could envisage a strict version of Haskell in which there is a "lazy" function which does the opposite of what "seq" does in Haskell. However adding this to a strict Haskell-like language would require compiler magic because otherwise the thunk gets forced before being passed to "lazy".
However if your language has uncontrolled effects then this can get hairy, because an effect happens whenever its enclosing function gets evaluated, and figuring out when figuring out when that is going to happen in a lazy langauge is difficult. Thats why Haskell has the IO monad.

What are Haskell's strictness points?

We all know (or should know) that Haskell is lazy by default. Nothing is evaluated until it must be evaluated. So when must something be evaluated? There are points where Haskell must be strict. I call these "strictness points", although this particular term isn't as widespread as I had thought. According to me:
Reduction (or evaluation) in Haskell only occurs at strictness points.
So the question is: what, precisely, are Haskell's strictness points? My intuition says that main, seq / bang patterns, pattern matching, and any IO action performed via main are the primary strictness points, but I don't really know why I know that.
(Also, if they're not called "strictness points", what are they called?)
I imagine a good answer will include some discussion about WHNF and so on. I also imagine it might touch on lambda calculus.
Edit: additional thoughts about this question.
As I've reflected on this question, I think it would be clearer to add something to the definition of a strictness point. Strictness points can have varying contexts and varying depth (or strictness). Falling back to my definition that "reduction in Haskell only occurs at strictness points", let us add to that definition this clause: "a strictness point is only triggered when its surrounding context is evaluated or reduced."
So, let me try to get you started on the kind of answer I want. main is a strictness point. It is specially designated as the primary strictness point of its context: the program. When the program (main's context) is evaluated, the strictness point of main is activated. Main's depth is maximal: it must be fully evaluated. Main is usually composed of IO actions, which are also strictness points, whose context is main.
Now you try: discuss seq and pattern matching in these terms. Explain the nuances of function application: how is it strict? How is it not? What about deepseq? let and case statements? unsafePerformIO? Debug.Trace? Top-level definitions? Strict data types? Bang patterns? Etc. How many of these items can be described in terms of just seq or pattern matching?

A good place to start is by understanding this paper: A Natural Semantics for Lazy Evalution (Launchbury). That will tell you when expressions are evaluated for a small language similar to GHC's Core. Then the remaining question is how to map full Haskell to Core, and most of that translation is given by the Haskell report itself. In GHC we call this process "desugaring", because it removes syntactic sugar.
Well, that's not the whole story, because GHC includes a whole raft of optimisations between desugaring and code generation, and many of these transformations will rearrange the Core so that things get evaluated at different times (strictness analysis in particular will cause things to be evaluated earlier). So to really understand how your
program will be evaluated, you need to look at the Core produced by GHC.
Perhaps this answer seems a bit abstract to you (I didn't specifically mention bang patterns or seq), but you asked for something precise, and this is about the best we can do.

I would probably recast this question as, Under what circumstances will Haskell evaluate an expression? (Perhaps tack on a "to weak head normal form.")
To a first approximation, we can specify this as follows:
Executing IO actions will evaluate any expressions that they “need.” (So you need to know if the IO action is executed, e.g. it's name is main, or it is called from main AND you need to know what the action needs.)
An expression that is being evaluated (hey, that's a recursive definition!) will evaluate any expressions it needs.
From your intuitive list, main and IO actions fall into the first category, and seq and pattern matching fall into the second category. But I think that the first category is more in line with your idea of "strictness point", because that is in fact how we cause evaluation in Haskell to become observable effects for users.
Giving all of the details specifically is a large task, since Haskell is a large language. It's also quite subtle, because Concurrent Haskell may evaluate things speculatively, even though we end up not using the result in the end: this is a third breed of things that cause evaluation. The second category is quite well studied: you want to look at the strictness of the functions involved. The first category too can be thought to be a sort of "strictness", though this is a little dodgy because evaluate x and seq x $ return () are actually different things! You can treat it properly if you give some sort of semantics to the IO monad (explicitly passing a RealWorld# token works for simple cases), but I don't know if there's a name for this sort of stratified strictness analysis in general.

C has the concept of sequence points, which are guarantees for particular operations that one operand will be evaluated before the other. I think that's the closest existing concept, but the essentially equivalent term strictness point (or possibly force point) is more in line with Haskell thinking.
In practice Haskell is not a purely lazy language: for instance pattern matching is usually strict (So trying a pattern match forces evaluation to happen at least far enough to accept or reject the match.
…
Programmers can also use the seq primitive to force an expression to evaluate regardless of whether the result will ever be used.
$! is defined in terms of seq.
—Lazy vs. non-strict.
So your thinking about !/$! and seq is essentially right, but pattern matching is subject to subtler rules. You can always use ~ to force lazy pattern matching, of course. An interesting point from that same article:
The strictness analyzer also looks for cases where sub-expressions are always required by the outer expression, and converts those into eager evaluation. It can do this because the semantics (in terms of "bottom") don't change.
Let's continue down the rabbit hole and look at the docs for optimisations performed by GHC:
Strictness analysis is a process by which GHC attempts to determine, at compile-time, which data definitely will 'always be needed'. GHC can then build code to just calculate such data, rather than the normal (higher overhead) process for storing up the calculation and executing it later.
—GHC Optimisations: Strictness Analysis.
In other words, strict code may be generated anywhere as an optimisation, because creating thunks is unnecessarily expensive when the data will always be needed (and/or may only be used once).
…no more evaluation can be performed on the value; it is said to be in normal form. If we are at any of the intermediate steps so that we've performed at least some evaluation on a value, it is in weak head normal form (WHNF). (There is also a 'head normal form', but it's not used in Haskell.) Fully evaluating something in WHNF reduces it to something in normal form…
—Wikibooks Haskell: Laziness
(A term is in head normal form if there is no beta-redex in head position1. A redex is a head redex if it is preceded only by lambda abstractors of non-redexes 2.) So when you start to force a thunk, you're working in WHNF; when there are no more thunks left to force, you're in normal form. Another interesting point:
…if at some point we needed to, say, print z out to the user, we'd need to fully evaluate it…
Which naturally implies that, indeed, any IO action performed from main does force evaluation, which should be obvious considering that Haskell programs do, in fact, do things. Anything that needs to go through the sequence defined in main must be in normal form and is therefore subject to strict evaluation.
C. A. McCann got it right in the comments, though: the only thing that's special about main is that main is defined as special; pattern matching on the constructor is sufficient to ensure the sequence imposed by the IO monad. In that respect only seq and pattern-matching are fundamental.

Haskell is AFAIK not a pure lazy language, but rather a non-strict language. This means that it does not necessarily evaluate terms at the last possible moment.
A good source for haskell's model of "lazyness" can be found here: http://en.wikibooks.org/wiki/Haskell/Laziness
Basically, it is important to understand the difference between a thunk and the weak header normal form WHNF.
My understanding is that haskell pulls computations through backwards as compared to imperative languages. What this means is that in the absence of "seq" and bang patterns, it will ultimately be some kind of side effect that forces the evaluation of a thunk, which may cause prior evaluations in turn (true lazyness).
As this would lead to a horrible space leak, the compiler then figures out how and when to evaluate thunks ahead of time to save space. The programmer can then support this process by giving strictness annotations (en.wikibooks.org/wiki/Haskell/Strictness , www.haskell.org/haskellwiki/Performance/Strictness) to further reduce space usage in form of nested thunks.
I am not an expert in the operational semantics of haskell, so I will just leave the link as a resource.
Some more resources:
http://www.haskell.org/haskellwiki/Performance/Laziness
http://www.haskell.org/haskellwiki/Haskell/Lazy_Evaluation

Lazy doesn't mean do nothing. Whenever your program pattern matches a case expression, it evaluates something -- just enough anyway. Otherwise it can't figure out which RHS to use. Don't see any case expressions in your code? Don't worry, the compiler is translating your code to a stripped down form of Haskell where they are hard to avoid using.
For a beginner, a basic rule of thumb is let is lazy, case is less lazy.

This is not a full answer aiming for karma, but just a piece of the puzzle -- to the extent that this is about semantics, bear in mind that there are multiple evaluation strategies that provide the same semantics. One good example here -- and the project also speaks to how we typically think of Haskell semantics -- was the Eager Haskell project, which radically altered evaluation strategies while maintaining the same semantics: http://csg.csail.mit.edu/pubs/haskell.html

The Glasgow Haskell compiler translates your code into a Lambda-calculus-like language called core. In this language, something is going to be evaluated, whenever you pattern match it by a case-statement. Thus if a function is called, the outermost constructor and only it (if there are no forced fields) is going to be evaluated. Anything else is canned in a thunk. (Thunks are introduced by let bindings).
Of course this is not exactly what happens in the real language. The compiler convert Haskell into Core in a very sophisticated way, making as many things as possibly lazy and anything that is always needed lazy. Additionally, there are unboxed values and tuples that are always strict.
If you try to evaluate a function by hand, you can basically think:
Try to evaluate the outermost constructor of the return.
If anything else is needed to get the result (but only if it's really needed) is also going to be evaluated. The order doesn't matters.
In case of IO you have to evaluate the results of all statements from the first to the last in that. This is a bit more complicated, since the IO monad does some tricks to force evaluation in a specific order.

We all know (or should know) that Haskell is lazy by default. Nothing is evaluated until it must be evaluated.
No.
Haskell is not a lazy language
Haskell is a language in which evaluation order doesn't matter because there are no side effects.
It's not quite true that evaluation order doesn't matter, because the language allows for infinite loops. If you aren't careful, it's possible to get stuck in a cul-de-sac where you evaluate a subexpression forever when a different evaluation order would have led to termination in finite time. So it's more accurate to say:
Haskell implementations must evaluate the program in a way that terminates if there is any evaluation order that terminates. Only if every possible evaluation order fails to terminate can the implementation fail to terminate.
This still leaves implementations with a huge freedom in how they evaluate the program.
A Haskell program is a single expression, namely let {all top-level bindings} in Main.main. Evaluation can be understood as a sequence of reduction (small-)steps which change the expression (which represents the current state of the executing program).
You can divide reduction steps into two categories: those that are provably necessary (provably will be part of any terminating sequence of reductions), and those that aren't. You can divide the provably necessary reductions somewhat vaguely into two subcategories: those that are "obviously" necessary, and those that require some nontrivial analysis to prove them necessary.
Performing only obviously necessary reductions is what's called "lazy evaluation". I don't know whether a purely lazy evaluating implementation of Haskell has ever existed. Hugs may have been one. GHC definitely isn't.
GHC performs reduction steps at compile time that aren't provably necessary; for example, it will replace 1+2::Int with 3::Int even if it can't prove that the result will be used.
GHC may also perform not-provably-necessary reductions at run time in some circumstances. For example, when generating code to evaluate f (x+y), if x and y are of type Int and their values will be known at run time, but f can't be proven to use its argument, there is no reason not to compute x+y before calling f. It uses less heap space and less code space and is probably faster even if the argument isn't used. However, I don't know whether GHC actually takes these sorts of optimization opportunities.
GHC definitely performs evaluation steps at run time that are proven necessary only by fairly complex cross-module analyses. This is extremely common and may represent the bulk of the evaluation of realistic programs. Lazy evaluation is a last-resort fallback evaluation strategy; it isn't what happens as a rule.
There was an "optimistic evaluation" branch of GHC that did much more speculative evaluation at run time. It was abandoned because of its complexity and the ongoing maintenance burden, not because it didn't perform well. If Haskell was as popular as Python or C++ then I'm sure there would be implementations with much more sophisticated runtime evaluation strategies, maintained by deep-pocketed corporations. Non-lazy evaluation isn't a change to the language, it's just an engineering challenge.
Reduction is driven by top-level I/O, and nothing else
You can model interaction with the outside world by means of special side-effectful reduction rules like: "If the current program is of the form getChar >>= <expr>, then get a character from standard input and reduce the program to <expr> applied to the character you got."
The entire goal of the run time system is to evaluate the program until it has one of these side-effecting forms, then do the side effect, then repeat until the program has some form that implies termination, such as return ().
There are no other rules about what is reduced when. There are only rules about what can reduce to what.
For example, the only rules for if expressions are that if True then <expr1> else <expr2> can be reduced to <expr1>, if False then <expr1> else <expr2> can be reduced to <expr2>, and if <exc> then <expr1> else <expr2>, where <exc> is an exceptional value, can be reduced to an exceptional value.
If the expression representing the current state of your program is an if expression, you have no choice but to perform reductions on the condition until it's True or False or <exc>, because that's the only way you'll ever get rid of the if expression and have any hope of reaching a state that matches one of the I/O rules. But the language specification doesn't tell you to do that in so many words.
These sorts of implicit ordering constraints are the only way that evaluation can be "forced" to happen. This is a frequent source of confusion for beginners. For example, people sometimes try to make foldl more strict by writing foldl (\x y -> x `seq` x+y) instead of foldl (+). This doesn't work, and nothing like it can ever work, because no expression can make itself evaluate. The evaluation can only "come from above". seq is not special in any way in this regard.
Reduction happens everywhere
Reduction (or evaluation) in Haskell only occurs at strictness points. [...] My intuition says that main, seq / bang patterns, pattern matching, and any IO action performed via main are the primary strictness points [...].
I don't see how to make sense of that statement. Every part of the program has some meaning, and that meaning is defined by reduction rules, so reduction happens everywhere.
To reduce a function application <expr1> <expr2>, you have to evaluate <expr1> until it has a form like (\x -> <expr1'>) or (getChar >>=) or something else that matches a rule. But for some reason function application doesn't tend to show up on lists of expressions that allegedly "force evaluation", while case always does.
You can see this misconception in a quote from the Haskell wiki, found in another answer:
In practice Haskell is not a purely lazy language: for instance pattern matching is usually strict
I don't understand what could qualify as a "purely lazy language" to whoever wrote that, except, perhaps, a language in which every program hangs because the runtime never does anything. If pattern matching is a feature of your language then you've got to actually do it at some point. To do it, you have to evaluate the scrutinee enough to determine whether it matches the pattern. That's the laziest way to match a pattern that is possible in principle.
~-prefixed patterns are often called "lazy" by programmers, but the language spec calls them "irrefutable". Their defining property is that they always match. Because they always match, you don't have to evaluate the scrutinee to determine whether they match or not, so a lazy implementation won't. The difference between regular and irrefutable patterns is what expressions they match, not what evaluation strategy you're supposed to use. The spec says nothing about evaluation strategies.
main is a strictness point. It is specially designated as the primary strictness point of its context: the program. When the program (main's context) is evaluated, the strictness point of main is activated. [...] Main is usually composed of IO actions, which are also strictness points, whose context is main.
I'm not convinced that any of that has any meaning.
Main's depth is maximal: it must be fully evaluated.
No, main only has to be evaluated "shallowly", to make I/O actions appear at the top level. main is the entire program, and the program isn't completely evaluated on every run because not all code is relevant to every run (in general).
discuss seq and pattern matching in these terms.
I already talked about pattern matching. seq can be defined by rules that are similar to case and application: for example, (\x -> <expr1>) `seq` <expr2> reduces to <expr2>. This "forces evaluation" in the same way that case and application do. WHNF is just a name for what these expressions "force evaluation" to.
Explain the nuances of function application: how is it strict? How is it not?
It's strict in its left expression just like case is strict in its scrutinee. It's also strict in the function body after substitution just like case is strict in the RHS of the selected alternative after substitution.
What about deepseq?
It's just a library function, not a builtin.
Incidentally, deepseq is semantically weird. It should take only one argument. I think that whoever invented it just blindly copied seq with no understanding of why seq needs two arguments. I count deepseq's name and specification as evidence that a poor understanding of Haskell evaluation is common even among experienced Haskell programmers.
let and case statements?
I talked about case. let, after desugaring and type checking, is just a way of writing an arbitrary expression graph in tree form. Here's a paper about it.
unsafePerformIO?
To an extent it can be defined by reduction rules. For example, case unsafePerformIO <expr> of <alts> reduces to unsafePerformIO (<expr> >>= \x -> return (case x of <alts>)), and at the top level only, unsafePerformIO <expr> reduces to <expr>.
This doesn't do any memoization. You could try to simulate memoization by rewriting every unsafePerformIO expression to explicitly memoize itself, and creating the associated IORefs... somewhere. But you could never reproduce GHC's memoization behavior, because it depends on unpredictable details of the optimization process, and because it isn't even type safe (as shown by the infamous polymorphic IORef example in the GHC documentation).
Debug.Trace?
Debug.Trace.trace is just a simple wrapper around unsafePerformIO.
Top-level definitions?
Top-level variable bindings are the same as nested let bindings. data, class, import, and such are a whole different ball game.
Strict data types? Bang patterns?
Just sugar for seq.

How does non-strict and lazy differ?

I often read that lazy is not the same as non-strict but I find it hard to understand the difference. They seem to be used interchangeably but I understand that they have different meanings. I would appreciate some help understanding the difference.
I have a few questions which are scattered about this post. I will summarize those questions at the end of this post. I have a few example snippets, I did not test them, I only presented them as concepts. I have added quotes to save you from looking them up. Maybe it will help someone later on with the same question.
Non-Strict Def:
A function f is said to be strict if, when applied to a nonterminating
expression, it also fails to terminate. In other words, f is strict
iff the value of f bot is |. For most programming languages, all
functions are strict. But this is not so in Haskell. As a simple
example, consider const1, the constant 1 function, defined by:
const1 x = 1
The value of const1 bot in Haskell is 1. Operationally speaking, since
const1 does not "need" the value of its argument, it never attempts to
evaluate it, and thus never gets caught in a nonterminating
computation. For this reason, non-strict functions are also called
"lazy functions", and are said to evaluate their arguments "lazily",
or "by need".
-A Gentle Introduction To Haskell: Functions
I really like this definition. It seems the best one I could find for understanding strict. Is const1 x = 1 lazy as well?
Non-strictness means that reduction (the mathematical term for
evaluation) proceeds from the outside in,
so if you have (a+(bc)) then first you reduce the +, then you reduce
the inner (bc).
-Haskell Wiki: Lazy vs non-strict
The Haskell Wiki really confuses me. I understand what they are saying about order but I fail to see how (a+(b*c)) would evaluate non-strictly if it was pass _|_?
In non-strict evaluation, arguments to a function are not evaluated
unless they are actually used in the evaluation of the function body.
Under Church encoding, lazy evaluation of operators maps to non-strict
evaluation of functions; for this reason, non-strict evaluation is
often referred to as "lazy". Boolean expressions in many languages use
a form of non-strict evaluation called short-circuit evaluation, where
evaluation returns as soon as it can be determined that an unambiguous
Boolean will result — for example, in a disjunctive expression where
true is encountered, or in a conjunctive expression where false is
encountered, and so forth. Conditional expressions also usually use
lazy evaluation, where evaluation returns as soon as an unambiguous
branch will result.
-Wikipedia: Evaluation Strategy
Lazy Def:
Lazy evaluation, on the other hand, means only evaluating an
expression when its results are needed (note the shift from
"reduction" to "evaluation"). So when the evaluation engine sees an
expression it builds a thunk data structure containing whatever values
are needed to evaluate the expression, plus a pointer to the
expression itself. When the result is actually needed the evaluation
engine calls the expression and then replaces the thunk with the
result for future reference.
...
Obviously there is a strong correspondence between a thunk and a
partly-evaluated expression. Hence in most cases the terms "lazy" and
"non-strict" are synonyms. But not quite.
-Haskell Wiki: Lazy vs non-strict
This seems like a Haskell specific answer. I take that lazy means thunks and non-strict means partial evaluation. Is that comparison too simplified? Does lazy always mean thunks and non-strict always mean partial evaluation.
In programming language theory, lazy evaluation or call-by-need1 is
an evaluation strategy which delays the evaluation of an expression
until its value is actually required (non-strict evaluation) and also
avoid repeated evaluations (sharing).
-Wikipedia: Lazy Evaluation
Imperative Examples
I know most people say forget imperative programming when learning a functional language. However, I would like to know if these qualify as non-strict, lazy, both or neither? At the very least it would provide something familiar.
Short Circuiting
f1() || f2()
C#, Python and other languages with "yield"
public static IEnumerable Power(int number, int exponent)
{
int counter = 0;
int result = 1;
while (counter++ < exponent)
{
result = result * number;
yield return result;
}
}
-MSDN: yield (c#)
Callbacks
int f1() { return 1;}
int f2() { return 2;}
int lazy(int (*cb1)(), int (*cb2)() , int x) {
if (x == 0)
return cb1();
else
return cb2();
}
int eager(int e1, int e2, int x) {
if (x == 0)
return e1;
else
return e2;
}
lazy(f1, f2, x);
eager(f1(), f2(), x);
Questions
I know the answer is right in front of me with all those resources, but I can't grasp it. It all seems like the definition is too easily dismissed as implied or obvious.
I know I have a lot of questions. Feel free to answer whatever questions you feel are relevant. I added those questions for discussion.
Is const1 x = 1 also lazy?
How is evaluating from "inward" non-strict? Is it because inward allows reductions of unnecessary expressions, like in const1 x = 1? Reductions seem to fit the definition of lazy.
Does lazy always mean thunks and non-strict always mean partial evaluation? Is this just a generalization?
Are the following imperative concepts Lazy, Non-Strict, Both or Neither?
Short Circuiting
Using yield
Passing Callbacks to delay or avoid execution
Is lazy a subset of non-strict or vice versa, or are they mutually exclusive. For example is it possible to be non-strict without being lazy, or lazy without being non-strict?
Is Haskell's non-strictness achieved by laziness?
Thank you SO!

Non-strict and lazy, while informally interchangeable, apply to different domains of discussion.
Non-strict refers to semantics: the mathematical meaning of an expression. The world to which non-strict applies has no concept of the running time of a function, memory consumption, or even a computer. It simply talks about what kinds of values in the domain map to which kinds of values in the codomain. In particular, a strict function must map the value ⊥ ("bottom" -- see the semantics link above for more about this) to ⊥; a non strict function is allowed not to do this.
Lazy refers to operational behavior: the way code is executed on a real computer. Most programmers think of programs operationally, so this is probably what you are thinking. Lazy evaluation refers to implementation using thunks -- pointers to code which are replaced with a value the first time they are executed. Notice the non-semantic words here: "pointer", "first time", "executed".
Lazy evaluation gives rise to non-strict semantics, which is why the concepts seem so close together. But as FUZxxl points out, laziness is not the only way to implement non-strict semantics.
If you are interested in learning more about this distinction, I highly recommend the link above. Reading it was a turning point in my conception of the meaning of computer programs.

An example for an evaluation model, that is neither strict nor lazy: optimistic evaluation, which gives some speedup as it can avoid a lot of "easy" thunks:
Optimistic evaluation means that even if a subexpression may not be needed to evaluate the superexpression, we still evaluate some of it using some heuristics. If the subexpression doesn't terminate quickly enough, we suspend its evaluation until it's really needed. This gives us an advantage over lazy evaluation if the subexpression is needed later, as we don't need to generate a thunk. On the other hand, we don't lose too much if the expression doesn't terminate, as we can abort it quickly enough.
As you can see, this evaluation model is not strict: If something that yields _|_ is evaluated, but not needed, the function will still terminate, as the engine aborts the evaluation. On the other hand, it may be possible that more expressions than needed are evaluated, so it's not completely lazy.

Yes, there is some unclear use of terminology here, but the terms coincide in most cases regardless, so it's not too much of a problem.
One major difference is when terms are evaluated. There are multiple strategies for this, ranging on a spectrum from "as soon as possible" to "only at the last moment". The term eager evaluation is sometimes used for strategies leaning toward the former, while lazy evaluation properly refers to a family of strategies leaning heavily toward the latter. The distinction between "lazy evaluation" and related strategies tend to involve when and where the result of evaluating something is retained, vs. being tossed aside. The familiar memoization technique in Haskell of assigning a name to a data structure and indexing into it is based on this. In contrast, a language that simply spliced expressions into each other (as in "call-by-name" evaluation) might not support this.
The other difference is which terms are evaluated, ranging from "absolutely everything" to "as little as possible". Since any value actually used to compute the final result can't be ignored, the difference here is how many superfluous terms are evaluated. As well as reducing the amount of work the program has to do, ignoring unused terms means that any errors they would have generated won't occur. When a distinction is being drawn, strictness refers to the property of evaluating everything under consideration (in the case of a strict function, for instance, this means the terms it's applied to. It doesn't necessarily mean sub-expressions inside the arguments), while non-strict means evaluating only some things (either by delaying evaluation, or by discarding terms entirely).
It should be easy to see how these interact in complicated ways; decisions are not at all orthogonal, as the extremes tend to be incompatible. For instance:
Very non-strict evaluation precludes some amount of eagerness; if you don't know whether a term will be needed, you can't evaluate it yet.
Very strict evaluation makes non-eagerness somewhat irrelevant; if you're evaluating everything, the decision of when to do so is less significant.
Alternate definitions do exist, though. For instance, at least in Haskell, a "strict function" is often defined as one that forces its arguments sufficiently that the function will evaluate to _|_ ("bottom") whenever any argument does; note that by this definition, id is strict (in a trivial sense), because forcing the result of id x will have exactly the same behavior as forcing x alone.

This started out as an update but it started to get long.
Laziness / Call-by-need is a memoized version of call-by-name where, if the function argument is evaluated, that value is stored for subsequent uses. In a "pure" (effect-free) setting, this produces the same results as call-by-name; when the function argument is used two or more times, call-by-need is almost always faster.
Imperative Example - Apparently this is possible. There is an interesting article on Lazy Imperative Languages. It says there are two methods. One requires closures the second uses graph reductions. Since C does not support closures you would need to explicitly pass an argument to your iterator. You could wrap a map structure and if the value does not exist calculate it otherwise return value.
Note: Haskell implements this by "pointers to code which are replaced with a value the first time they are executed" - luqui.
This is non-strict call-by-name but with sharing/memorization of the results.
Call-By-Name - In call-by-name evaluation, the arguments to a function are not evaluated before the function is called — rather, they are substituted directly into the function body (using capture-avoiding substitution) and then left to be evaluated whenever they appear in the function. If an argument is not used in the function body, the argument is never evaluated; if it is used several times, it is re-evaluated each time it appears.
Imperative Example: callbacks
Note: This is non-strict as it avoids evaluation if not used.
Non-Strict = In non-strict evaluation, arguments to a function are not evaluated unless they are actually used in the evaluation of the function body.
Imperative Example: short-circuiting
Note: _|_ appears to be a way to test if a function is non-strict
So a function can be non-strict but not lazy. A function that is lazy is always non-strict.
Call-By-Need is partly defined by Call-By-Name which is partly defined by Non-Strict
An Excerpt from "Lazy Imperative Languages"
2.1. NON-STRICT SEMANTICS VS. LAZY EVALUATION We must first clarify
the distinction between "non-strict semantics" and "lazy evaluation".
Non-strictsemantics are those which specify that an expression is not
evaluated until it is needed by a primitiveoperation. There may be
various types of non-strict semantics. For instance, non-strict
procedure calls donot evaluate the arguments until their values are
required. Data constructors may have non-strictsemantics, in which
compound data are assembled out of unevaluated pieces Lazy evaluation,
also called delayed evaluation, is the technique normally used to
implement non-strictsemantics. In section 4, the two methods commonly
used to implement lazy evaluation are very brieflysummarized.
CALL BY VALUE, CALL BY LAZY, AND CALL BY NAME "Call by value" is the
general name used for procedure calls with strict semantics. In call
by valuelanguages, each argument to a procedure call is evaluated
before the procedure call is made; the value isthen passed to the
procedure or enclosing expression. Another name for call by value is
"eager" evaluation.Call by value is also known as "applicative order"
evaluation, because all arguments are evaluated beforethe function is
applied to them."Call by lazy" (using William Clinger's terminology in
[8]) is the name given to procedure calls which usenon-strict
semantics. In languages with call by lazy procedure calls, the
arguments are not evaluatedbefore being substituted into the procedure
body. Call by lazy evaluation is also known as "normal
order"evaluation, because of the order (outermost to innermost, left
to right) of evaluation of an expression."Call by name" is a
particular implementation of call by lazy, used in Algol-60 [18]. The
designers ofAlgol-60 intended that call-by-name parameters be
physically substituted into the procedure body, enclosedby parentheses
and with suitable name changes to avoid conflicts, before the body was
evaluated.
CALL BY LAZY VS. CALL BY NEED Call by need is an
extension of call by lazy, prompted by the observation that a lazy
evaluation could beoptimized by remembering the value of a given
delayed expression, once forced, so that the value need notbe
recalculated if it is needed again. Call by need evaluation,
therefore, extends call by lazy methods byusing memoization to avoid
the need for repeated evaluation. Friedman and Wise were among the
earliestadvocates of call by need evaluation, proposing "suicidal
suspensions" which self-destructed when theywere first evaluated,
replacing themselves with their values.

The way I understand it, "non-strict" means trying to reduce workload by reaching completion in a lower amount of work.
Whereas "lazy evaluation" and similar try to reduce overall workload by avoiding full completion (hopefully forever).
from your examples...
f1() || f2()
...short circuiting from this expression would not possibly result in triggering 'future work', and there's neither a speculative/amortized factor to the reasoning, nor any computational complexity debt accruing.
Whereas in the C# example, "lazy" conserves a function call in the overall view, but in exchange, comes with those above sorts of difficulty (at least from point of call until possible full completion... in this code that's a negligible-distance pathway, but imagine those functions had some high-contention locks to put up with).
int f1() { return 1;}
int f2() { return 2;}
int lazy(int (*cb1)(), int (*cb2)() , int x) {
if (x == 0)
return cb1();
else
return cb2();
}
int eager(int e1, int e2, int x) {
if (x == 0)
return e1;
else
return e2;
}
lazy(f1, f2, x);
eager(f1(), f2(), x);

If we're talking general computer science jargon, then "lazy" and "non-strict" are generally synonyms -- they stand for the same overall idea, which expresses itself in different ways for different situations.
However, in a given particular specialized context, they may have acquired differing technical meanings. I don't think you can say anything accurate and universal about what the difference between "lazy" and "non-strict" is likely to be in the situation where there is a difference to make.

How about having a language provide both call-by-name and call-by-value?

Is it OK to have a language provide both call-by-need (CBN) and call-by-value (CBV) evaluation strategy? I mean without fixing it and simulating in one the other but let the user choose which when in need. For example, let the language has a eval function as in Scheme available which can accept one more argument from the user specifying which evaluation strategy he wants.

Combining call-by-need (laziness) and call-by-value (strictness) in one language implementation is certainly possible, as long as one takes care to avoid making computations with side effects lazy and making diverging computations strict.
Strictness analysis is used in lazy (CBN) functional languages to detect when functions can safely be evaluated using a CBV strategy. CBV evaluation is generally faster, but using this evaluation strategy for non-strict functions changes the semantics of the program.
Wadler describes how to combine lazy and strict computation in a functional language.
A lambda the ultimate thread also addresses the issue.
Scala has a keyword lazy for stating that certain computations are to be performed lazily. Other languages have similar constructs.

Will I develop good/bad habits because of lazy evaluation?

I'm looking to learn functional programming with either Haskell or F#.
Are there any programming habits (good or bad) that could form as a result Haskell's lazy evaluation? I like the idea of Haskell's functional programming purity for the purposes of understanding functional programming. I'm just a bit worried about two things:
I may misinterpret lazy-evaluation-based features as being part of the "functional paradigm".
I may develop thought patterns that work in a lazy world but not in a normal order/eager evaluation world.

There are habits that you get into when programming in a lazy language that don't work in a strict language. Some of these seem so natural to Haskell programmers that they don't think of them as lazy evaluation. A couple of examples off the top of my head:
f x y = if x > y then .. a .. b .. else c
where
a = expensive
b = expensive
c = expensive
here we define a bunch of subexpressions in a where clause, with complete disregard for which of them will ever be evaluated. It doesn't matter: the compiler will ensure that no unnecessary work is performed at runtime. Non-strict semantics means that the compiler is able to do this. Whenever I write in a strict language I trip over this a lot.
Another example that springs to mind is "numbering things":
pairs = zip xs [1..]
here we just want to associate each element in a list with its index, and zipping with the infinite list [1..] is the natural way to do it in Haskell. How do you write this without an infinite list? Well, the fold isn't too readable
pairs = foldr (\x xs -> \n -> (x,n) : xs (n+1)) (const []) xs 1
or you could write it with explicit recursion (too verbose, doesn't fuse). There are several other ways to write it, none of which are as simple and clear as the zip.
I'm sure there are many more. Laziness is surprisingly useful, when you get used to it.

You'll certainly learn about evaluation strategies. Non-strict evaluation strategies can be very powerful for particular kinds of programming problems, and once you're exposed to them, you may be frustrated that you can't use them in some language setting.
I may develop thought patterns that work in a lazy world but not in a normal order/eager evaluation world.
Right. You'll be a more rounded programmer. Abstractions that provide "delaying" mechanisms are fairly common now, so you'd be a worse programmer not to know them.

I may misinterpret lazy-evaluation-based features as being part of the "functional paradigm".
Lazy evaluation is an important part of the functional paradigm. It's not a requirement - you can program functionally with eager evaluation - but it's a tool that naturally fits functional programming.
You see people explicitly implement/invoke it (notably in the form of lazy sequences) in languages that don't make it the default; and while mixing it with imperative code requires caution, pure functional code allows safe use of laziness. And since laziness makes many constructs cleaner and more natural, it's a great fit!
(Disclaimer: no Haskell or F# experience)

To expand on Beni's answer: if we ignore operational aspects in terms of efficiency (and stick with a purely functional world for the moment), every terminating expression under eager evaluation is also terminating under non-strict evaluation, and the values of both (their denotations) coincide.
This is to say that lazy evaluation is strictly more expressive than eager evaluation. By allowing you to write more correct and useful expressions, it expands your "vocabulary" and ability to think functionally.
Here's one example of why:
A language can be lazy-by-default but with optional eagerness, or eager by default with optional laziness, but in fact its been shown (c.f. Okasaki for example) that there are certain purely functional data structures which can only achieve certain orders of performance if implemented in a language that provides laziness either optionally or by default.
Now when you do want to worry about efficiency, then the difference does matter, and sometimes you will want to be strict and sometimes you won't.
But worrying about strictness is a good thing, because very often the cleanest thing to do (and not only in a lazy-by-default language) is to use a thoughtful mix of lazy and eager evaluation, and thinking along these lines will be a good thing no matter which language you wind up using in the future.
Edit: Inspired by Simon's post, one additional point: many problems are most naturally thought about as traversals of infinite structures rather than basically recursive or iterative. (Although such traversals themselves will generally involve some sort of recursive call.) Even for finite structures, very often you only want to explore a small portion of a potentially large tree. Generally speaking, non-strict evaluation allows you to stop mixing up the operational issue of what the processor actually bothers to figure out with the semantic issue of the most natural way to represent the actual structure you're using.

Recently, i found myself doing Haskell-style programming in Python. I took over a monolithic function that extracted/computed/generated values and put them in a file sink, in one step.
I thought this was bad for understanding, reuse and testing. My plan was to separate value generation and value processing. In Haskell i would have generated a (lazy) list of those computed values in a pure function and would have done the post-processing in another (side-effect bearing) function.
Knowing that non-lazy lists in Python can be expensive, if they tend to get big, i thought about the next close Python solution. To me that was to use a generator for the value generation step.
The Python code got much better thanks to my lazy (pun intended) mindset.

I'd expect bad habits.
I saw one of my coworkers try to use (hand-coded) lazy evaluation in our .NET project. Unfortunately the consequence of lazy evaluation hid the bug where it would try remote invocations before the start of main executed, and thus outside the try/catch to handle the "Hey I can't connect to the internet" case.
Basically, the manner of something was hiding the fact that something really expensive was hiding behind a property read and so made it look like a good idea to do inside the type initializer.

Contextual information missing.
Laziness (or more specifically, the assumption of the availabilty of the purity and equational reasoning) is sometimes quite useful for specific problem domains, but not necessarily better in general. If you're talking about general-purpose language settings, relying on the lazy evaluation rules by default is considered harmful.
Analysis
Any languages has functional combination (or the applicable terms combination; i.e. function call expression, function-like macro invocation, FEXPRs, etc.) enforces rules on evaluation, implying the order of different parts of subcomputation therein. For convenience and the simplicity of the specification of the language, a language usually specify the rules in a flavor paired to the reduction strategy:
The strict evaluation, or the applicative-order reduction, which evaluates all subexpression first, before the subcomputation of the remaining evaluation of the hole combination.
The non-strict evaluation, or the normal-order reduction, which does not necessarily evaluate every subexpression at first.
The remaining subcomputation finally determines the result of the whole evaluation of the expression. (For program-defined constructs, this usually implies the substitution of the evaluated argument into something like a function body, and the subsequent evaluation of the result.)
Lazy evaluation, or the call-by-need strategy, is a typical concrete instance of the non-strict evaluation kind. To make it practically usable, subexpression evaluations are required to be pure (side-effect-free), so the reductions implementing the strategy can have the Church-Rosser property whatever the order of subexpression evaluation is actually adopted.
One significant merit of such design is the availability of the equational resoning: users can encode the equality of expression evaluation in the program, and optimizing implementation of the language can perform the transformation depending directly on such constructs.
However, there are many serious problems behind such design.
Equational reasoning is not important as it in the first glance in practice.
The encoding is not a separate feature. It has some specific requirements on the other features to carry the encoding. For a pure language, it is even more difficult to encode them elsewhere, so there is certain pressure to make the type system more expressive, hence more complicated typing and typechecking.
Whether the compiler uses the equational reasoning directly encoded in the program or not is an implementation detail. It is more of a taste of style to promote the importance.
Syntatic equations are not powerful enough to encode semantic conditions like cases of "unspecified behavior" in ISO C. It still needs some additional primitives to express non-determinism of such semantic equivalence classes to make optimization techniques based on such equivalence possible.
It is computationally inefficient at the very basic level by default, and not amendable by the programmer easily.
There is no systemic way to reduce the cost on equations which are known not required by the programmer.
One of the significance comes from the clash between lazily evaluated combinations and proper tail recursion over the combinations.
The unpredictable abuse of thunks to memoize the lazily evaluated expressions also makes troubles on the utilization of the machine resources (e.g. registers and the cache memory).
Purely functional languages like Haskell may declare the referential transparency is a good thingTM. However, this is faulty in certain contexts.
There are semantic gaps over the terminology itself. The purity is not the only aspect for the referential transparency; moreover, there are other kinds of such property not readily provided by the evaluation strategy.
In general, referential transparency should not be a goal about programming. Instead, it is an optional manner to implement the composable components of programs. Composability is essentially about the expected invariance on the interface of the components. There are many ways to keep the composability without the aid of any kinds of referential transparency. Whether the guarantee should be enforced by the language rules? It depends. At least, it should not depend totally on the language designers' point.
The lack of impure evaluations requires more syntax noises to encode many constructs simply expressible by mutable state cells in the traditional impure languages. The workarounds of the practical problems do make the solution more difficult and hard to reason by humans.
For example, I/O operations are side-effectful, thus not directly expressible in Haskell expressions under the usual non-strict evaluation rules, otherwise the order of effects will be non-deterministic.
To overcoming the shortcoming, some indirect conventional constructs like the IO monad to simulate the traditional imperative style are proposed. Such monadic constructs are in essential "indirect" in the sense similar to the continuation-passing style, which is considerably low-level and difficult to read. Even though monads can be "powerful" than continuations in expresiveness, it does not naturally powerful than more high-level alternatives (like algebraic effect systems) when the lazy evaluation strategy is not enforced by default.
Besides the intuition problem above, the necessity of using monadic constructs are often difficult to prove formally (if ever possible). As the result, they are very easily abused (just like the design patterns for "OOP" languages derived from Simula). The related syntax sugar, notably, the famous do-notation, is abused for a few decades before well-known by the Haskell community.
Simulating strict language constructs in languages like Haskell usually needs monadic constructs, while simulating non-strict constructs in strict languages are considerably simpler and easier to implement efficiently. For instance, there is SRFI-45.
The lazy evaluation strategy does not deal with many other non-strict constructs well.
For example, seq has to be a compiler magic in GHC. This is not easily expressible by other Haskell constructs without massive changes in the core Haskell language rules.
Although traditional strict languages also do not allow user programs to simulate the enforcement of the order easily so such sequential constructs are therefore primitive (examples: C-like ; is primitive; the derivation of Scheme's begin is relying on the primitive lambda which in turn implying an implicit evaluation order on expressions), it can be implementable reusing the applicative order rules without additional ad-hoc primitives, like the derivation of the$sequence operator in the Kernel language.
Concerns about specific questions
Lazy evaluation is not a must for the "functional paradigm", though as mentioned above, purely functional languages are likely have the lazy evaluation strategy by default. The common properties are the usability of first-class functions. Impure languages like Lisp and ML family are considered "functional", which use eager evaluation by default. Also note the popularity of "functional paradigm" came after the introducing of function-level programming. The latter is quite different, but still somewhat similar to "functional programming" on the treatment of first-classness.
As mentioned above, the way to simulate laziness in eager languages are well-known. Additionally, for pure programs, there may be no non-trivially semantic difference between call-by-need and normal order reduction. To figure out something really only work in a lazy world is actually not easy. (Do you want to implement the language?) Just go ahead.
Conclusion
Be careful to the problem domain. Lazy evaluation may work well for specific scenarios. However, making it by default is likely to be a bad idea in general, because users (whoever to use the language to program, or to derive a new dialect based on the current language) will likely have few chances to ignore all of the problems it will cause.

Well, try to think of something that would work if lazily evaluated, that wouldn't if eagerly evaluated. The most common category of these would be lazy logical operator evaluation used to hide a "side effect". I'll use C#-ish language to explain, but functional languages would have similar analogs.
Take the simple C# lambda:
(a,b) => a==0 || ++b < 20
In a lazy-evaluated language, if a==0, the expression ++b < 20 is not evaluated (because the entire expression evaluates to true either way), which means that b is not incremented. In both imperative and functional languages, this behavior (and similar behavior of the AND operator) can be used to "hide" logic containing side effects that should not be executed:
(a,b) => a==0 && save(b)
"a" in this case may be the number of validation errors. If there were validation errors, the first half fails and the second half is not evaluated. If there were no validation errors, the second half is evaluated (which would include the side effect of trying to save b) and the result (apparently true or false) is returned to be evaluated. If either side evaluates to false, the lambda returns false indicating that b was not successfully saved. If this were evaluated "eagerly", we would try to save regardless of the value of "a", which would probably be bad if a nonzero "a" indicated that we shouldn't.
Side effects in functional languages are generally considered a no-no. However, there are few non-trivial programs that do not require at least one side effect; there's generally no other way to make a functional algorithm integrate with non-functional code, or with peripherals like a data store, display, network channel, etc.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string