Memoization in Haskell

Memoization in Haskell - haskell

Context:
def fib(n):
if n < 2: return 1
return fib(n-1) + fib(n-2)
can be sped up by memoization:
fib_memo = {}
def fib(n):
if n < 2: return 1
if not fib_memo.has_key(n):
fib_memo[n] = fib(n-1) + fib(n-2)
return fib_memo[n]
This implementation technique of memoization is used widely in many
programming languages, but it can’t be applied directly to Haskell
because Haskell is pure and we don’t want to introduce impurity just
to memoize a function. Fortunately, it is possible to memoize a
function without side effects thanks to Haskell’s nature of lazy
evaluation.
The following memoize function takes a function of type Int -> a
and returns a memoized version of the same function. The trick is to
turn a function into a value because, in Haskell, functions are not
memoized but values are.
Questions:
Aren't functions and values the same in functional programming?
How is caching not considered a side effect (in the context of pure functions)

All functions are values but not all values are functions.
This is really about operational semantics, which are sometimes tricky to talk about in Haskell because Haskell is only defined in terms of its denotational semantics -- that is, what value an expression evaluates to, rather than how that evaluation happens. It's not a side-effect because the "stateful" nature of memoization is still hidden behind the abstraction of purity: while there is some internal state (represented in the partial graph-reduction of the program), there is no way for your program to observe that state in a way that would distinguish it from the non-memoized version. A subtlety here is that these memoization strategies are not actually required to memoize -- all that is guaranteed is the result they will give after some unspecified finite amount of time.
There is no requirement for a Haskell implementation to memoize anything -- it could use pure call-by-name, for example, which doesn't memoize values, instead recomputing everything. Here's an example of call-by-name.
let f x = x * x in f (2 + 2)
= (2 + 2) * (2 + 2)
= 4 * (2 + 2)
= 4 * 4
= 16
Here 2 + 2 is evaluated twice. Most Haskell implementations (optimizations aside) would create a thunk so that it would be computed at most once (which is called call-by-need). But a call-by-name Haskell implementation that evaluated it twice would be technically conforming. Because Haskell is pure, there will be no difference in the result computed. In the real world though, this strategy ends up being much too expensive to be practical.
As for the choice not to memoize functions the logic is the same. It is perfectly conforming for a compiler to aggressively memoize every function, using something like optimal evaluation. Haskell's purity means that there will be no difference in the result if this evaluation strategy were chosen. But again, in real-world applications, memoizing every function like this ends up taking a lot of memory, and the overhead of an optimal evaluator is too high to give good practical performance.
A Haskell compiler could also choose to memoize some functions but not others in order to maximize performance. This would be great--my understanding is that it is not really known how to do this reliably. It is very hard for a compiler to tell in advance which computations will be cheap and which will be expensive, and which computations will probably be reused and which will not.
So a balance is chosen, in which values, whose evaluated forms are usually smaller than their thunks, are memoized; whereas functions, whose memoized forms are usually larger than their definitions (since they need a whole memo table), are not. And then we get some techniques like those in the article to switch back and forth between these representations, according to our own judgment as programmers.

Related

how lazy is haskell when dealing with complicated program flow

I was wondering how smart/lazy Haskell is. Can I always be sure that Haskell will only do what is necessary to generate a certain output?

No.
Haskell specifies a denotational semantics for its core lambda-like calculus, and you can rely on that semantics. Additionally, there is a metatheory proof that a particular reduction order -- known colloquially as "lazy evaluation" -- realizes that semantics; and so many people use that as their mental model of how Haskell programs behave.
There are two broad categories of ways that a Haskell program may end up evaluating more than necessary:
The Haskell implementation may choose to evaluate more. GHC uses lazy evaluation in most places, but I believe it will use other evaluation orders for efficiency in some cases. You could also look at the Eager Haskell project, which is attempting to use another implementation strategy. In principle, an implementation would be within its rights to choose to speculatively fork some computations to another thread (and then throw away the results if they weren't needed). And so on and so forth -- there are many possible variations on this theme.
The denotational semantics specified may demand more evaluation than "necessary". For example, one that occasionally trips up beginners:
primes :: [Int]
primes = 2 : filter prime [3,5..]
prime :: Int -> Bool
prime x = and [x `mod` p /= 0 | p <- primes, p < x]
When checking whether 3 should be in the list primes, it is in fact not necessary to check any of the elements of primes past 2, because the sequence is strictly monotonically increasing. But Haskell is not (does not try to be) smart enough to notice that; it will go straight on trying to check the rest of the primes and end up in an infinite loop instead of giving the list of primes.
An even smaller example: you could think that x && False is always False, but x will typically be evaluated anyway, because the semantics says this should be an infinite loop if x is. (Contrast False && x, which typically does not result in evaluating x.)
That said, when you say "complex structure", one thing that comes to mind is: does Haskell do the laziness thing even with custom data types that I define? That is, do complex structures like hash maps and balanced trees and k-d trees and so forth get treated lazily? The answer there is yes, at least for GHC; there is nothing fundamentally special about any of the types in the Prelude except IO. Lists, booleans, Maybes, and so forth are lazy not because the compiler knows special things about them, but simply as a consequence of the lazy evaluation reduction rules specified by Haskell and their declarations as types.
Of course, there are ways to opt-out of laziness. Some of the structures you will see on Hackage do that in various ways; but don't worry, usually this will be declared in their documentation.

Can every functional language be lazy?

In a functional language, functions are first class citizens and thus calling them is not the only thing I can do. I can also store them away.
Now when I have a language, which is strict by default, then I am still not forced to evaluate a function call. I have the option to store the function and its parameters e.g. in a tuple for later evaluation.
So instead of
x = f a b c
I do something like
x = (f,a,b,c)
And later, I can evaluate this thing with something like
eval (f,a,b,c) = f a b c
Well, there is probably more to it, because I want to evaluate each unevaluated function call only once, but it seems to me, that this can also be solved with a data structure which is a bit fancier than just a tuple.
The inverse also seems to be the case, because e.g. in Haskell, which is lazy be default I can enforce evaluation with seq or BangPatterns.
So is it correct to say, that every functional language has the potential of being lazy, but most of them are just not lazy by default and thus require additional programming effort to call a function lazily, whereas haskell is lazy by default and requires additional programming effort to call a function in a strict way?
Should that be the case, what is more difficult for the programmer: writing lazy function calls in a strict language or writing strict function calls in a lazy language?
As a side note: was Simon P. Jone serious when he said: "the next version of haskell will be strict". I first thought that this was a joke. But now I think strict by default isn't all that bad as long as you can be lazy if required.

Lazy evaluation, at the low level, is implemented by a concept called a thunk, which comprises two things:
A closure that computes the value of the deferred computation
A set-once mutable storage cell for memoizing the result.
The first part, the closure, can be modeled in an even simpler way than your tuple with the function and its arguments. You can just use a function that accepts unit or no arguments (depending on how your language works), and in its body you apply the function to the arguments. To compute the result, you just invoke the function.
Paul Johnson mentions Scheme, which is a perfect language to demonstrate this. As a Scheme macro (pseudocode, untested):
(define-syntax delay
(syntax-rules ()
((delay expr ...)
;; `(delay expr)` evaluates to a lambda that closes over
;; two variables—one to store the result, one to record
;; whether the thunk has been forced.
(let ((value #f)
(done? #f))
(lambda ()
(unless done?
(set! value (begin expr ...))
(set! done? #t))
value)))))
(define (force thunk)
;; Thunks are procedures, so to force them you just invoke them.
(thunk))
But to get this back to the title of the question: does this mean that every functional language can be lazy? The answer is no. Eager languages can implement thunking and use it to provide opt-in delayed evaluation at user-selected spots, but this isn't the same as having pervasive lazy evaluation like Haskell implementations provide.

The answer is a qualified yes. Your intuition that laziness can be implemented in a strict language where functions are first-class objects is correct. But going into the details reveals a number of subtleties.
Let's take a functional language (by which I mean a language where functions can be constructed and manipulated as first-class objects, like in the lambda calculus), where function application is strict (i.e. the function¹ and its argument(s) are fully evaluated before the function is applied). I'll use the syntax of Standard ML, since this is a popular and historically important strict functional language. A strict application F A (where F and A are two expressions) can be delayed by encoding it as
Thunk (F, A)
This object contains a function and an argument is called a thunk. We can define a type of thunks:
datatype ('a, 'b) thunk = Thunk of ('a -> 'b) * 'a;
and a function to evaluate a thunk:
fun evaluate (Thunk (f, x)) = f x;
Nice and easy so far. But we have not, in fact, implemented lazy evaluation! What we've implemented is normal-order evaluation, also known as call-by-name. The difference is that if the value of the thunk is used more than once, it is calculated every time. Lazy evaluation (also known as call-by-need) requires evaluating the expression at most once.
In a pure, strict language, lazy evaluation is in fact impossible to implement. The reason is that evaluating a thunk modifies the state of the system: it goes from unevaluated to evaluated. Implementing lazy evaluation requires a way to change the state of the system.
There's a subtlety here: if the semantics of the language is defined purely in terms of the termination status of expressions and the value of terminating expressions, then, in a pure language, call-by-need and call-by-name are indistinguishable. Call-by-value (i.e. strict evaluation) is distinguishable because fewer expressions terminate — call-by-need and call-by-name hide any non-termination that happens in a thunk that is never evaluated. The equivalence of call-by-need and call-by-name allows lazy evaluation to be considered as an optimization of normal-order evaluation (which has nice theoretical properties). But in many programs, using call-by-name instead of call-by-value would blow up the running time by computing the value of the same expressions over and over again.
In a language with mutable state, lazy evaluation can be expressed by storing the value into the thunk when it is calculated.
datatype ('a, 'b) lazy_state = Lazy of ('a -> 'b) * 'a | Value of 'a;
type ('a, 'b) lazy_state = ('a, 'b) lazy_state ref;
let lazy (f, x) = ref (Lazy (f, x));
fun force r =
case !r of Value y => y
| Lazy (f, x) => let val y = f x in r := Value y; y end;
This code is not very complicated, so even in ML dialects that provide lazy evaluation as a library feature (possibly with syntactic sugar), it isn't used all that often in practice — often, the point at which the value will be needed is a known location in the programs, and programmers just use a function and pass it its argument at that location.
While this is getting into subjective territory, I would say that it's much easier to implement lazy evaluation like this, than to implement strict evaluation in a language like Haskell. Forcing strict evaluation in Haskell is basically impossible (except by wrapping everything into a state monad and essentially writing ML code in Haskell syntax). Of course, strict evaluation doesn't change the values calculated by the program, but it can have a significant impact on performance (and, in particular, it is sometimes much appreciated because it makes performance a lot more predictable — predicting the performance of a Haskell program can be very hard).
This evaluate-and-store mechanism is effectively the core of what a Haskell compiler does under the hood. Haskell is pure², so you cannot implemente this in the language itself! However, it's sound for the compiler to do it under the hood, because this particular use of side effects does not break the purity of the language, so it does not invalidate any program transformation. The reason storing the value of a thunk is sound is that it turns call-by-name evaluation into call-by-need, and as we saw above, this neither changes the values of terminating expressions, nor changes which expressions terminate.
This approach can be somewhat problematic in a language that combines purely functional local evaluation with a multithreaded environment and message passing between threads. (This is notably the model of Erlang.) If one thread starts evaluating a thunk, and another thread needs its value just then, what is going to happen? If no precautions are taken, then both threads will calculate the value and store it. In a pure language, this is harmless in the sense that both threads will calculate the same value anyway³. However this can hurt performance. To ensure that a thunk is evaluated only once, the calculation of the value must be wrapped in a lock; this helps with long calculations that are performed many times but hurts short calculations that are performed only once, as taking and releasing a lock takes some time.
¹ The function, not the function body of course.
² Or rather, the fragment of Haskell that doesn't use a side effect monad is pure.
³ It is necessary for the transition between a delayed thunk and a computed value to be atomic — concurrent threads must be able to read a lazy value and get either a valid delayed thunk or a valid computed value, not some mixture of the two that isn't a valid object. At the processor level, the transition from delayed thunk to computed value is usually a pointer assignment, which on most architectures is atomic, fortunately.

What you propose will work. The details for doing this in Scheme can be found in SICP. One could envisage a strict version of Haskell in which there is a "lazy" function which does the opposite of what "seq" does in Haskell. However adding this to a strict Haskell-like language would require compiler magic because otherwise the thunk gets forced before being passed to "lazy".
However if your language has uncontrolled effects then this can get hairy, because an effect happens whenever its enclosing function gets evaluated, and figuring out when figuring out when that is going to happen in a lazy langauge is difficult. Thats why Haskell has the IO monad.

Is there a way to "preserve" results in Haskell?

I'm new to Haskell and understand that it is (basically) a pure functional language, which has the advantage that results to functions will not change across multiple evaluations. Given this, I'm puzzled by why I can't easily mark a function in such a way that its remembers the results of its first evaluation, and does not have to be evaluated again each time its value is required.
In Mathematica, for example, there is a simple idiom for accomplishing this:
f[x_]:=f[x]= ...
but in Haskell, the closest things I've found is something like
f' = (map f [0 ..] !!)
where f 0 = ...
f n = f' ...
which in addition to being far less clear (and apparently limited to Int arguments?) does not (seem to) preserve results within an interactive session.
Admittedly (and clearly), I don't understand exactly what's going on here; but naively, it seems like Haskel should have some way, at the function definition level, of
taking advantage of the fact that its functions are functions and skipping re-computation of their results once they have been computed, and
indicating a desire to do this at the function definition level with a simple and clean idiom.
Is there a way to accomplish this in Haskell that I'm missing? I understand (sort of) that Haskell can't store the evaluations as "state", but why can't it simply (in effect) redefine evaluated functions to be their computed value?
This grows out of this question, in which lack of this feature results in terrible performance.

Use a suitable library, such as MemoTrie.
import Data.MemoTrie
f' = memo f
where f 0 = ...
f n = f' ...
That's hardly less nice than the Mathematica version, is it?
Regarding
“why can't it simply (in effect) redefine evaluated functions to be their computed value?”
Well, it's not so easy in general. These values have to be stored somewhere. Even for an Int-valued function, you can't just allocate an array with all possible values – it wouldn't fit in memory. The list solution only works because Haskell is lazy and therefore allows infinite lists, but that's not particularly satisfying since lookup is O(n).
For other types it's simply hopeless – you'd need to somehow diagonalise an over-countably infinite domain.
You need some cleverer organisation. I don't know how Mathematica does this, but it probably uses a lot of “proprietary magic”. I wouldn't be so sure that it does really work the way you'd like, for any inputs.
Haskell fortunately has type classes, and these allow you to express exactly what a type needs in order to be quickly memoisable. HasTrie is such a class.

How does non-strict and lazy differ?

I often read that lazy is not the same as non-strict but I find it hard to understand the difference. They seem to be used interchangeably but I understand that they have different meanings. I would appreciate some help understanding the difference.
I have a few questions which are scattered about this post. I will summarize those questions at the end of this post. I have a few example snippets, I did not test them, I only presented them as concepts. I have added quotes to save you from looking them up. Maybe it will help someone later on with the same question.
Non-Strict Def:
A function f is said to be strict if, when applied to a nonterminating
expression, it also fails to terminate. In other words, f is strict
iff the value of f bot is |. For most programming languages, all
functions are strict. But this is not so in Haskell. As a simple
example, consider const1, the constant 1 function, defined by:
const1 x = 1
The value of const1 bot in Haskell is 1. Operationally speaking, since
const1 does not "need" the value of its argument, it never attempts to
evaluate it, and thus never gets caught in a nonterminating
computation. For this reason, non-strict functions are also called
"lazy functions", and are said to evaluate their arguments "lazily",
or "by need".
-A Gentle Introduction To Haskell: Functions
I really like this definition. It seems the best one I could find for understanding strict. Is const1 x = 1 lazy as well?
Non-strictness means that reduction (the mathematical term for
evaluation) proceeds from the outside in,
so if you have (a+(bc)) then first you reduce the +, then you reduce
the inner (bc).
-Haskell Wiki: Lazy vs non-strict
The Haskell Wiki really confuses me. I understand what they are saying about order but I fail to see how (a+(b*c)) would evaluate non-strictly if it was pass _|_?
In non-strict evaluation, arguments to a function are not evaluated
unless they are actually used in the evaluation of the function body.
Under Church encoding, lazy evaluation of operators maps to non-strict
evaluation of functions; for this reason, non-strict evaluation is
often referred to as "lazy". Boolean expressions in many languages use
a form of non-strict evaluation called short-circuit evaluation, where
evaluation returns as soon as it can be determined that an unambiguous
Boolean will result — for example, in a disjunctive expression where
true is encountered, or in a conjunctive expression where false is
encountered, and so forth. Conditional expressions also usually use
lazy evaluation, where evaluation returns as soon as an unambiguous
branch will result.
-Wikipedia: Evaluation Strategy
Lazy Def:
Lazy evaluation, on the other hand, means only evaluating an
expression when its results are needed (note the shift from
"reduction" to "evaluation"). So when the evaluation engine sees an
expression it builds a thunk data structure containing whatever values
are needed to evaluate the expression, plus a pointer to the
expression itself. When the result is actually needed the evaluation
engine calls the expression and then replaces the thunk with the
result for future reference.
...
Obviously there is a strong correspondence between a thunk and a
partly-evaluated expression. Hence in most cases the terms "lazy" and
"non-strict" are synonyms. But not quite.
-Haskell Wiki: Lazy vs non-strict
This seems like a Haskell specific answer. I take that lazy means thunks and non-strict means partial evaluation. Is that comparison too simplified? Does lazy always mean thunks and non-strict always mean partial evaluation.
In programming language theory, lazy evaluation or call-by-need1 is
an evaluation strategy which delays the evaluation of an expression
until its value is actually required (non-strict evaluation) and also
avoid repeated evaluations (sharing).
-Wikipedia: Lazy Evaluation
Imperative Examples
I know most people say forget imperative programming when learning a functional language. However, I would like to know if these qualify as non-strict, lazy, both or neither? At the very least it would provide something familiar.
Short Circuiting
f1() || f2()
C#, Python and other languages with "yield"
public static IEnumerable Power(int number, int exponent)
{
int counter = 0;
int result = 1;
while (counter++ < exponent)
{
result = result * number;
yield return result;
}
}
-MSDN: yield (c#)
Callbacks
int f1() { return 1;}
int f2() { return 2;}
int lazy(int (*cb1)(), int (*cb2)() , int x) {
if (x == 0)
return cb1();
else
return cb2();
}
int eager(int e1, int e2, int x) {
if (x == 0)
return e1;
else
return e2;
}
lazy(f1, f2, x);
eager(f1(), f2(), x);
Questions
I know the answer is right in front of me with all those resources, but I can't grasp it. It all seems like the definition is too easily dismissed as implied or obvious.
I know I have a lot of questions. Feel free to answer whatever questions you feel are relevant. I added those questions for discussion.
Is const1 x = 1 also lazy?
How is evaluating from "inward" non-strict? Is it because inward allows reductions of unnecessary expressions, like in const1 x = 1? Reductions seem to fit the definition of lazy.
Does lazy always mean thunks and non-strict always mean partial evaluation? Is this just a generalization?
Are the following imperative concepts Lazy, Non-Strict, Both or Neither?
Short Circuiting
Using yield
Passing Callbacks to delay or avoid execution
Is lazy a subset of non-strict or vice versa, or are they mutually exclusive. For example is it possible to be non-strict without being lazy, or lazy without being non-strict?
Is Haskell's non-strictness achieved by laziness?
Thank you SO!

Non-strict and lazy, while informally interchangeable, apply to different domains of discussion.
Non-strict refers to semantics: the mathematical meaning of an expression. The world to which non-strict applies has no concept of the running time of a function, memory consumption, or even a computer. It simply talks about what kinds of values in the domain map to which kinds of values in the codomain. In particular, a strict function must map the value ⊥ ("bottom" -- see the semantics link above for more about this) to ⊥; a non strict function is allowed not to do this.
Lazy refers to operational behavior: the way code is executed on a real computer. Most programmers think of programs operationally, so this is probably what you are thinking. Lazy evaluation refers to implementation using thunks -- pointers to code which are replaced with a value the first time they are executed. Notice the non-semantic words here: "pointer", "first time", "executed".
Lazy evaluation gives rise to non-strict semantics, which is why the concepts seem so close together. But as FUZxxl points out, laziness is not the only way to implement non-strict semantics.
If you are interested in learning more about this distinction, I highly recommend the link above. Reading it was a turning point in my conception of the meaning of computer programs.

An example for an evaluation model, that is neither strict nor lazy: optimistic evaluation, which gives some speedup as it can avoid a lot of "easy" thunks:
Optimistic evaluation means that even if a subexpression may not be needed to evaluate the superexpression, we still evaluate some of it using some heuristics. If the subexpression doesn't terminate quickly enough, we suspend its evaluation until it's really needed. This gives us an advantage over lazy evaluation if the subexpression is needed later, as we don't need to generate a thunk. On the other hand, we don't lose too much if the expression doesn't terminate, as we can abort it quickly enough.
As you can see, this evaluation model is not strict: If something that yields _|_ is evaluated, but not needed, the function will still terminate, as the engine aborts the evaluation. On the other hand, it may be possible that more expressions than needed are evaluated, so it's not completely lazy.

Yes, there is some unclear use of terminology here, but the terms coincide in most cases regardless, so it's not too much of a problem.
One major difference is when terms are evaluated. There are multiple strategies for this, ranging on a spectrum from "as soon as possible" to "only at the last moment". The term eager evaluation is sometimes used for strategies leaning toward the former, while lazy evaluation properly refers to a family of strategies leaning heavily toward the latter. The distinction between "lazy evaluation" and related strategies tend to involve when and where the result of evaluating something is retained, vs. being tossed aside. The familiar memoization technique in Haskell of assigning a name to a data structure and indexing into it is based on this. In contrast, a language that simply spliced expressions into each other (as in "call-by-name" evaluation) might not support this.
The other difference is which terms are evaluated, ranging from "absolutely everything" to "as little as possible". Since any value actually used to compute the final result can't be ignored, the difference here is how many superfluous terms are evaluated. As well as reducing the amount of work the program has to do, ignoring unused terms means that any errors they would have generated won't occur. When a distinction is being drawn, strictness refers to the property of evaluating everything under consideration (in the case of a strict function, for instance, this means the terms it's applied to. It doesn't necessarily mean sub-expressions inside the arguments), while non-strict means evaluating only some things (either by delaying evaluation, or by discarding terms entirely).
It should be easy to see how these interact in complicated ways; decisions are not at all orthogonal, as the extremes tend to be incompatible. For instance:
Very non-strict evaluation precludes some amount of eagerness; if you don't know whether a term will be needed, you can't evaluate it yet.
Very strict evaluation makes non-eagerness somewhat irrelevant; if you're evaluating everything, the decision of when to do so is less significant.
Alternate definitions do exist, though. For instance, at least in Haskell, a "strict function" is often defined as one that forces its arguments sufficiently that the function will evaluate to _|_ ("bottom") whenever any argument does; note that by this definition, id is strict (in a trivial sense), because forcing the result of id x will have exactly the same behavior as forcing x alone.

This started out as an update but it started to get long.
Laziness / Call-by-need is a memoized version of call-by-name where, if the function argument is evaluated, that value is stored for subsequent uses. In a "pure" (effect-free) setting, this produces the same results as call-by-name; when the function argument is used two or more times, call-by-need is almost always faster.
Imperative Example - Apparently this is possible. There is an interesting article on Lazy Imperative Languages. It says there are two methods. One requires closures the second uses graph reductions. Since C does not support closures you would need to explicitly pass an argument to your iterator. You could wrap a map structure and if the value does not exist calculate it otherwise return value.
Note: Haskell implements this by "pointers to code which are replaced with a value the first time they are executed" - luqui.
This is non-strict call-by-name but with sharing/memorization of the results.
Call-By-Name - In call-by-name evaluation, the arguments to a function are not evaluated before the function is called — rather, they are substituted directly into the function body (using capture-avoiding substitution) and then left to be evaluated whenever they appear in the function. If an argument is not used in the function body, the argument is never evaluated; if it is used several times, it is re-evaluated each time it appears.
Imperative Example: callbacks
Note: This is non-strict as it avoids evaluation if not used.
Non-Strict = In non-strict evaluation, arguments to a function are not evaluated unless they are actually used in the evaluation of the function body.
Imperative Example: short-circuiting
Note: _|_ appears to be a way to test if a function is non-strict
So a function can be non-strict but not lazy. A function that is lazy is always non-strict.
Call-By-Need is partly defined by Call-By-Name which is partly defined by Non-Strict
An Excerpt from "Lazy Imperative Languages"
2.1. NON-STRICT SEMANTICS VS. LAZY EVALUATION We must first clarify
the distinction between "non-strict semantics" and "lazy evaluation".
Non-strictsemantics are those which specify that an expression is not
evaluated until it is needed by a primitiveoperation. There may be
various types of non-strict semantics. For instance, non-strict
procedure calls donot evaluate the arguments until their values are
required. Data constructors may have non-strictsemantics, in which
compound data are assembled out of unevaluated pieces Lazy evaluation,
also called delayed evaluation, is the technique normally used to
implement non-strictsemantics. In section 4, the two methods commonly
used to implement lazy evaluation are very brieflysummarized.
CALL BY VALUE, CALL BY LAZY, AND CALL BY NAME "Call by value" is the
general name used for procedure calls with strict semantics. In call
by valuelanguages, each argument to a procedure call is evaluated
before the procedure call is made; the value isthen passed to the
procedure or enclosing expression. Another name for call by value is
"eager" evaluation.Call by value is also known as "applicative order"
evaluation, because all arguments are evaluated beforethe function is
applied to them."Call by lazy" (using William Clinger's terminology in
[8]) is the name given to procedure calls which usenon-strict
semantics. In languages with call by lazy procedure calls, the
arguments are not evaluatedbefore being substituted into the procedure
body. Call by lazy evaluation is also known as "normal
order"evaluation, because of the order (outermost to innermost, left
to right) of evaluation of an expression."Call by name" is a
particular implementation of call by lazy, used in Algol-60 [18]. The
designers ofAlgol-60 intended that call-by-name parameters be
physically substituted into the procedure body, enclosedby parentheses
and with suitable name changes to avoid conflicts, before the body was
evaluated.
CALL BY LAZY VS. CALL BY NEED Call by need is an
extension of call by lazy, prompted by the observation that a lazy
evaluation could beoptimized by remembering the value of a given
delayed expression, once forced, so that the value need notbe
recalculated if it is needed again. Call by need evaluation,
therefore, extends call by lazy methods byusing memoization to avoid
the need for repeated evaluation. Friedman and Wise were among the
earliestadvocates of call by need evaluation, proposing "suicidal
suspensions" which self-destructed when theywere first evaluated,
replacing themselves with their values.

The way I understand it, "non-strict" means trying to reduce workload by reaching completion in a lower amount of work.
Whereas "lazy evaluation" and similar try to reduce overall workload by avoiding full completion (hopefully forever).
from your examples...
f1() || f2()
...short circuiting from this expression would not possibly result in triggering 'future work', and there's neither a speculative/amortized factor to the reasoning, nor any computational complexity debt accruing.
Whereas in the C# example, "lazy" conserves a function call in the overall view, but in exchange, comes with those above sorts of difficulty (at least from point of call until possible full completion... in this code that's a negligible-distance pathway, but imagine those functions had some high-contention locks to put up with).
int f1() { return 1;}
int f2() { return 2;}
int lazy(int (*cb1)(), int (*cb2)() , int x) {
if (x == 0)
return cb1();
else
return cb2();
}
int eager(int e1, int e2, int x) {
if (x == 0)
return e1;
else
return e2;
}
lazy(f1, f2, x);
eager(f1(), f2(), x);

If we're talking general computer science jargon, then "lazy" and "non-strict" are generally synonyms -- they stand for the same overall idea, which expresses itself in different ways for different situations.
However, in a given particular specialized context, they may have acquired differing technical meanings. I don't think you can say anything accurate and universal about what the difference between "lazy" and "non-strict" is likely to be in the situation where there is a difference to make.

What is the lifetime of a memoized value in a functional language like Haskell?

In a pure functional language with lazy semantics (such as Haskell), results of computations are memoized so that further evaluations of a function with the same inputs do not recompute the value but get it directly from the cache of memoized values.
I am wondering if these memoized values get recycled at some point in time?
If so, it means that the memoized values must be recomputed at a later time, and the benefits of memoization are not so exiting IMHO.
If not, then ok, this is clever to cache everything... but does it mean that a program - if run for a sufficient long period of time - will
always consume more and more memory ?
Imagine a program performing intensive numerical analysis: for example to find roots of a list of hundred of thousands mathematical functions using a dichotomy algorithm.
Every time the program evaluates a mathematical function with a specific Real Number, the result will be memoized. But there is only a really small probability
that exactly the same Real Number will appear again during the algorithm, leading to memory leakage (or at least, really bad usage).
My idea is that maybe memoized values are simply "scoped" to something in the program (for example to the current continuation, call stack, etc.), but I was unable to find something practical on the subject.
I admit I don't have looked deeply at the Haskell compiler implementation (lazy?), but please, could someone explain to me how it works in practice?
EDIT: Ok, I understand my mistake from the first few answers: Pure semantics implies Referential Transparency which in turn does not imply automatic Memoization, but just guarantees that there will be no problem with it.
I think that a few articles on the web are misleading about this, because from a beginner's point of view, it seems that the Referential Transparency property is so cool because it allows implicit memoization.

Haskell does not automatically memoize function calls, precisely because this would quickly consume tons of memory. If you do memoziation yourself, you get to choose at what scope the function is memoized. For example, let's say you have the fibonacci function defined like this:
fib n = fibs !! n
where fibs = 0 : 1 : zipWith (+) fibs (tail fibs)
Here the memoization is only done within one call to fib, whereas if you leave fibs at the top level
fib n = fibs !! n
fibs = 0 : 1 : zipWith (+) fibs (tail fibs)
then the memoized list is kept until the garbage collector can determine that there are no more ways to reach fibs from any part of your program.

My idea is that maybe memoized values are simply "scoped" to something in the program (for example to the current continuation, call stack, etc.), but I was unable to find something practical on the subject.
This is correct. Specifically, when you see something like:
fun x y = let
a = .....
b = ....
c = ....
in ....
or an equivalent where clause, the values a, b and c may not be computed until actually used (or they may be computed right away because the strictness analyser can proove that the values would be evaluated later anyway). But when those values depend on current function parameters (here x and y), the runtime will most likely not remember every combination of x and y and the resulting a, b and c.
Note also, that in a pure language, it is also okay to not remember the values at all - this is only practical insofar as memory is cheaper than CPU time.
So the answer to your question is: there is no such thing as lifetime of intermediary results in Haskell. All one can say is that the evaluated value will be there when needed.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string