This FAQ says that
The seq operator is
seq :: a -> b -> b
x seq y will evaluate x, enough to check that it is not bottom, then
discard the result and evaluate y. This might not seem useful, but it
means that x is guaranteed to be evaluated before y is considered.
That's awfully nice of Haskell, but does it mean that in
x `seq` f x
the cost of evaluating x will be paid twice ("discard the result")?
The seq function will discard the value of x, but since the value has been evaluated, all references to x are "updated" to no longer point to the unevaluated version of x, but to instead point to the evaluated version. So, even though seq evaluates and discards x, the value has been evaluated for other users of x as well, leading to no repeated evaluations.
No, it's not compute and forget, it's compute - which forces caching.
For example, consider this code:
let x = 1 + 1
in x + 1
Since Haskell is lazy, this evaluates to ((1 + 1) + 1). A thunk, containing the sum of a thunk and one, the inner thunk being one plus one.
Let's use javascript, a non-lazy language, to show what this looks like:
function(){
var x = function(){ return 1 + 1 };
return x() + 1;
}
Chaining together thunks like this can cause stack overflows, if done repeatedly, so seq to the rescue.
let x = 1 + 1
in x `seq` (x + 1)
I'm lying when I tell you this evaluates to (2 + 1), but that's almost true - it's just that the calculation of the 2 is forced to happen before the rest happens (but the 2 is still calculated lazily).
Going back to javascript:
function(){
var x = function(){ return 1 + 1 };
return (function(x){
return x + 1;
})( x() );
}
I believe x will only be evaluated once (and the result retained for future use, as is typical for lazy operations). That behavior is what makes seq useful.
You can always check with unsafePerformIO or trace…
import System.IO.Unsafe (unsafePerformIO)
main = print (x `seq` f (x + x))
where
f = (+4)
x = unsafePerformIO $ print "Batman!" >> return 3
Of course seq by itself does not "evaluate" anything. It just records the forcing order dependency. The forcing itself is triggered by pattern-matching. When seq x (f x) is forced, x will be forced first (memoizing the resulting value), and then f x will be forced. Haskell's lazy evaluation means it memoizes the results of forcing of expressions, so no repeat "evaluation" (scary quotes here) will be performed.
I put "evaluation" into scary quotes because it implies full evaluation. In the words of Haskell wikibook, "Haskell values are highly layered; 'evaluating' a Haskell value could mean evaluating down to any one of these layers."
Let me reiterate: seq by itself does not evaluate anything. seq x x does not evaluate x under any circumstance. seq x (f x) does not evaluate anything when f = id, contrary to what the report seems to have been saying.
Related
fix f = let {x = f x} in x
Talking about let, I thought that let P = Q in R would evaluate Q -> Q' then P is replaced by Q' in R, or: R[P -> Q'].
But in fix definition the Q depends on R, how to evaluate then?
I imagine that this is about lazy evaluation. Q' becomes a thunk, but I can't reason this in my head.
As a matter of context, I'm looking at Y combinator, it should find a fixed point of a function so if I have this function, one x = 1, then fix one == 1 must hold, right?
So fix one = let {x = one x} in x, but I can't see how 1 would emerge from that.
Talking about let, I thought that let P = Q in R would evaluate Q -> Q' then P is replaced by Q' in R, or: R[P -> Q'].
Morally, yes, but P is not immediately evaluated, it is evaluated when needed.
But in fix definition the Q depends on R, how to evaluate then?
Q does not depend on R, it depends on P. This makes P depend on itself, recursively. This can lead to several different outcomes. Roughly put:
If Q can not return any part of its result before evaluating P, then P represents an infinitely recursing computation, which does not terminate. For example,
let x = x + 1 in x -- loops forever with no result
-- (GHC is able to catch this specific case and raise an exception instead,
-- but it's an irrelevant detail)
If Q can instead return a part of its result before needing to evaluate P, it does so.
let x = 2 : x in x
-- x = 2 : .... can be generated immediately
-- This results in the infinite list 2:2:2:2:2:.....
let x = (32, 10 + fst x) in x
-- x = (32, ...) can be generated immediately
-- hence x = (32, 10 + fst (32, ...)) = (32, 10+32) = (32, 42)
I imagine that this is about lazy evaluation. Q' becomes a thunk, but I can't reason this in my head.
P is associated with a thunk. What matters is whether this thunk calls itself before returning some part of the output or not.
As a matter of context, I'm looking at Y combinator, it should find a fixed point of a function so if I have this function. one x = 1, then fix one == 1 must hold, right?
Yes.
So fix one = let x = one x in x, but I can't see why 1 would emerge from that
We can compute it like this:
fix one
= {- definition of fix -}
let x = one x in x
= {- definition of x -}
let x = one x in one x
= {- definition of one -}
let x = one x in 1
= {- x is now irrelevant -}
1
Just expand the definitions. Keep recursive definitions around in case you need them again.
Lets say we have this function:
foo n = let comp n = n * n * n + 10
otherComp n = (comp n) + (comp n)
in (otherComp n) + (otherComp n)
How many times will comp n get actually executed? 1 or 4? Does Haskell "store" function results in the scope of let?
In GHCi, without optimization, four times.
> import Debug.Trace
> :{
| f x = let comp n = trace "A" n
| otherComp n = comp n + comp n
| in otherComp x + otherComp x
| :}
> f 10
A
A
A
A
40
With optimization, GHC might be able to inline the functions and optimize everything. However, in the general case, I would not count on GHC to optimize multiple calls into one. That would require memoizing and/or CSE (common subexpression elimination), which is not always an optimization, hence GHC is quite conservative about it.
As a thumb rule, when evaluating performance, expect that each (evaluated) call in the code corresponds to an actual call at runtime.
The above discussion applies to function bindings, only. For simple pattern bindings made of just a variable like
let x = g 20
in x + x
then g 20 will be computed once, bound to x, and then x + x will reuse the same value twice. With one proviso: that x gets assigned a monomorphic type.
If x gets assigned a polymorphic type with a typeclass constraint, then it acts as a function in disguise.
> let x = trace "A" (200 * 350)
> :t x
x :: Num a => a
> x + x
A
A
140000
Above, 200 * 350 has been recomputed twice, since it got a polymorphic type.
This mostly only happens in GHCi. In regular Haskell source files, GHC uses the Dreaded Monomorphism Restriction to provide x a monomorphic type, precisely to avoid recomputation of variables. If that can not be done, and duplicate computation is needed, GHC prefers to raise an error than silently cause recomputation. (In GHCi, the DMR is disabled to make more code work as it is, and recomputation happens, as seen above.)
Summing up: variable bindings let x = ... should be fine in source code, and work as expected without duplicating computation. If you want to be completely sure, annotate x with an explicit monomorphic type annotation.
I found this statement while studying Functional Reactive Programming, from "Plugging a Space Leak with an Arrow" by Hai Liu and Paul Hudak ( page 5) :
Suppose we wish to define a function that repeats its argument indefinitely:
repeat x = x : repeat x
or, in lambdas:
repeat = λx → x : repeat x
This requires O(n) space. But we can achieve O(1) space by writing instead:
repeat = λx → let xs = x : xs
in xs
The difference here seems small but it hugely prompts the space efficiency. Why and how it happens ? The best guess I've made is to evaluate them by hand:
r = \x -> x: r x
r 3
-> 3: r 3
-> 3: 3: 3: ........
-> [3,3,3,......]
As above, we will need to create infinite new thunks for these recursion. Then I try to evaluate the second one:
r = \x -> let xs = x:xs in xs
r 3
-> let xs = 3:xs in xs
-> xs, according to the definition above:
-> 3:xs, where xs = 3:xs
-> 3:xs:xs, where xs = 3:xs
In the second form the xs appears and can be shared between every places it occurring, so I guess that's why we can only require O(1) spaces rather than O(n). But I'm not sure whether I'm right or not.
BTW: The keyword "shared" comes from the same paper's page 4:
The problem here is that the standard call-by-need evaluation rules
are unable to recognize that the function:
f = λdt → integralC (1 + dt) (f dt)
is the same as:
f = λdt → let x = integralC (1 + dt) x in x
The former definition causes work to be repeated in the recursive call
to f, whereas in the latter case the computation is shared.
It's easiest to understand with pictures:
The first version
repeat x = x : repeat x
creates a chain of (:) constructors ending in a thunk which will replace itself with more constructors as you demand them. Thus, O(n) space.
The second version
repeat x = let xs = x : xs in xs
uses let to "tie the knot", creating a single (:) constructor which refers to itself.
Put simply, variables are shared, but function applications are not. In
repeat x = x : repeat x
it is a coincidence (from the language's perspective) that the (co)recursive call to repeat is with the same argument. So, without additional optimization (which is called static argument transformation), the function will be called again and again.
But when you write
repeat x = let xs = x : xs in xs
there are no recursive function calls. You take an x, and construct a cyclic value xs using it. All sharing is explicit.
If you want to understand it more formally, you need to familiarize yourself with the semantics of lazy evaluation, such as A Natural Semantics for Lazy Evaluation.
Your intuition about xs being shared is correct. To restate the author's example in terms of repeat, instead of integral, when you write:
repeat x = x : repeat x
the language does not recognize that the repeat x on the right is the same as the value produced by the expression x : repeat x. Whereas if you write
repeat x = let xs = x : xs in xs
you're explicitly creating a structure that when evaluated looks like this:
{hd: x, tl:|}
^ |
\________/
I'm Haskell newbie and reading :
http://www.seas.upenn.edu/~cis194/spring13/lectures/01-intro.html
It states "In Haskell one can always “replace equals by equals”, just like you learned in algebra class.". What is meant by this and what are its advantages ?
I don't recall learning this in algebra but perhaps I do not recognise the terminology.
It means that if you know that A (an expression) is equal to B (another expression), then you may always replace A for B in any expression involving A, and vice-versa.
For instance, we know that even = not . odd. Therefore
filter even
=
filter (not . odd)
On the other hand, we know that odd satisfies the following equation
odd = (1 ==) . (`mod` 2)
As such, we also know that
filter even
=
filter (not . odd)
=
filter (not . (1 ==) . (`mod` 2))
Moreover, you know that mod 2 always returns 0 or 1. So, by case analysis, the following is valid.
not . (1 ==)
=
(0 ==)
Therefore, we can also say
filter even
=
filter ((0 ==) . (`mod` 2))
The advantage of being able to replace equals by equals is to design a program by massaging equation after equation until a suitable definition is found, like in typical solve for x kind of problems of Algebra.
In its simplest form, substituting "equals by equals" means replacing a defined identifier with its definition. For instance
let x = f 1 in x + x
can be equivalently written as
f 1 + f 1
in the sense that the result will be the same. In GHC, you can expect the second one to re-compute f 1 twice, possibly degrading performance, but the result of the sum is the same.
In impure languages, such as Ocaml, the two snippets above are instead not equivalent. This is because side effects are allowed: evaluating f 1 can have observable effects. For instance, f could be defined as follows:
(* Ocaml code *)
let f = let r = ref 0 in
fun x -> r := !r + x ; !r
Using the above definition, f has an internal mutable state, which gets incremented by its argument every time it is called, before the new state is returned. Because of this,
f 1 + f 1
would evaluate to 1 + 2 since the state is incremented twice, while
let x = f 1 in x + x
would evaluate to 1 + 1, since only one increment of the state is performed.
The consequence is that, in Ocaml, replacing x with its definition would not be a semantics-preserving program transformation. Of course, the same would hold in imperative languages, which allow side effects. Only in pure languages (Haskell, Agda, Coq, ...) the transformation is safe.
f x y z = [n | n <- z, n > x + y]
f 1 2 [3,4]
Would x + y be executed only once at first so that the successive calls be replaced by the value 3 instead? Is GHC Haskell optimized up to this job for FP brings us the virtue of referential transparency?
How to trace to prove it?
I don't think the computed value will be reused.
The general problem with this kind of thing is, x + y is cheap, but you could instead have some operation there that produces an utterly vast result, which you probably don't want to keep in memory. Which is a wordy way of saying "this is a time/space tradeoff".
Because of this, it seems GHC tends to not reuse work, in case the lost space doesn't make up for the gained time.
The way to find out for sure is to ask GHC to dump Core when it compiles your code. You can then see precisely what's going to get executed. (Be prepared for it to be very verbose though!) Oh, and make sure you turn on optimisations! (I.e., the -O2 flag.)
If you rephrase your function as
f x y z = let s = x + y in [ n | n <- z, n > s ]
Now s will definitely be executed only once. (I.e., once per call to f. Each time you call f it'll still recompute s.)
Incidentally, if you're interested in saving already-computed results for the whole function, the search term you're looking for is "memoisation".
What will happen can depend on whether you are using ghci vs. ghc and then, if you are compiling the code, what optimization level is being used.
Here is one way to test the evaluations:
import Debug.Trace
f x y z = [n | n <- z, n > tx x + ty y]
where tx = trace "x"
ty = trace "y"
main = print $ f 1 2 [3,4]
With 7.8.3 I get the following results:
ghci: x y x y [4]
ghc (no optimization): x y x y [4]
ghc -O2: x y [4]
It is possible that the addition of the trace calls affects CSE optimization. But this does show that -O2 will hoist x+y out of the loop.