How often is an expanding list evaluated - haskell

Is fib evaluated from start for each element of cumfib?
fib = (1:1: zipWith (+) fib (tail fib))
cumfib = [ sum $ take i fib | i<-[1..]]
Or are the first i elements cached and reused for element (i+1) of cumsum?
I am more or less guessing that fib is used in the same lambda expression and hence is is calculated only once.
Furthermore, does the implementation of fib matter regarding how often the i-th Fibonacci number is evaluated? My actual problem concerns prime numbers instead of Fibonacci numbers, which I wish to 'cache' to easily evaluate the prime factors of some number n. However, I only use
takeWhile (\x-> x*x<n) primes
of the primes. Since I evaluate the factors for small n first and later for bigger n, this subset of primes increases, and hence I wonder, how often is primes evaluated if I do:
primes = ... some way of calculating primes ...
helpHandlePrimes ... = ... using primes ...
handlePrimes = ... using primes and helpHandlePrimes ...
Please let me know whether primes evaluates once, multiple times, or whether this cannot be determined from how I formulated the question.

A let-bound term is usually shared within its scope. In particular, a top-level term in a module is shared in the entire program. However, you have to be careful about the type of the term. If the term is a function, then sharing means that just the lambda abstraction is shared, so the function isn't memoized. An overloaded term is internally represented as a function, and therefore sharing is rather meaningless for an overloaded term as well.
So if you have a monomorphic list of numbers, then it's going to be shared. By default, a list such as fib as you've given will be monomorphic, because of the "monomorphism restriction" (actually here's a case where it's useful). However, these days it's in fashion to disable the monomorphism restriction, so in any case I recommend giving an explicit type signature such as
fib :: [Integer]
to be sure and make it clear to everyone that you're expecting this to be a monomorphic list.

I'd like to add that this way, cumfib needlessly re-computes the sum of first i elements of fib. It can be more efficiently defined as
cumfib = tail $ scanl (+) 0 fib

Related

Does Haskell discards intermediary results during lazy evaluation?

If I define the Fibonacci sequence recursively:
fibo_lazy_list = 0 : 1 : zipWith (+) fibo_lazy_list (tail fibo_lazy_list)
Then ask for the first element above a given value, say:
print $ find (>100) fibo_lazy_list
I understand that Haskell evaluates only the elements which are required to get the printed results. But are they all kept in memory until the print ? Since only two elements of the list are required to compute the last one, does Haskell release the left-most elements or does the list keep growing in memory ?
It depends.
This is actually one of the most tricky things to get right for real-world Haskell code: to avoid memory leaks caused by holding on to unnecessary data, that was only supposed to be intermediary but turns out to be actually a dependency to some yet-unevaluated lazy thunk, and therefore can't be garbage-collected.
In your example, the leading elements of fibo_lazy_list (BTW, please use camelCase, not underscore_case in Haskell) will not be garbage-collected as long as fibo_lazy_list is refered by something that could still be evaluated. But as soon as it goes out of scope, that isn't possible. So if you write it like this
print $ let fibo_lazy_list = 0 : 1 : zipWith (+) fibo_lazy_list (tail fibo_lazy_list)
in find (>100) fibo_lazy_list
then you can be pretty confident that the unused elements will be garbage collected, possibly before the one to be printed is even found.
If however fibo_lazy_list is defined at the top-level, and is a CAF (as it will be if the type is not polymorphic)
fiboLazyList :: [Integer]
fiboLazyList = 0 : 1 : zipWith (+) fiboLazyList (tail fiboLazyList)
main :: IO ()
main = do
...
print $ find (>100) fiboLazyList
...
then you should better expect all the leading elements to stay in memory even after the >100 one has been extracted.
Compiler optimisation may come in helpful here, so can strictness annotations. But as I said, this is a bit of a pain in Haskell.

Universal memoization in Haskell [duplicate]

I can't figure out why m1 is apparently memoized while m2 is not in the following:
m1 = ((filter odd [1..]) !!)
m2 n = ((filter odd [1..]) !! n)
m1 10000000 takes about 1.5 seconds on the first call, and a fraction of that on subsequent calls (presumably it caches the list), whereas m2 10000000 always takes the same amount of time (rebuilding the list with each call). Any idea what's going on? Are there any rules of thumb as to if and when GHC will memoize a function? Thanks.
GHC does not memoize functions.
It does, however, compute any given expression in the code at most once per time that its surrounding lambda-expression is entered, or at most once ever if it is at top level. Determining where the lambda-expressions are can be a little tricky when you use syntactic sugar like in your example, so let's convert these to equivalent desugared syntax:
m1' = (!!) (filter odd [1..]) -- NB: See below!
m2' = \n -> (!!) (filter odd [1..]) n
(Note: The Haskell 98 report actually describes a left operator section like (a %) as equivalent to \b -> (%) a b, but GHC desugars it to (%) a. These are technically different because they can be distinguished by seq. I think I might have submitted a GHC Trac ticket about this.)
Given this, you can see that in m1', the expression filter odd [1..] is not contained in any lambda-expression, so it will only be computed once per run of your program, while in m2', filter odd [1..] will be computed each time the lambda-expression is entered, i.e., on each call of m2'. That explains the difference in timing you are seeing.
Actually, some versions of GHC, with certain optimization options, will share more values than the above description indicates. This can be problematic in some situations. For example, consider the function
f = \x -> let y = [1..30000000] in foldl' (+) 0 (y ++ [x])
GHC might notice that y does not depend on x and rewrite the function to
f = let y = [1..30000000] in \x -> foldl' (+) 0 (y ++ [x])
In this case, the new version is much less efficient because it will have to read about 1 GB from memory where y is stored, while the original version would run in constant space and fit in the processor's cache. In fact, under GHC 6.12.1, the function f is almost twice as fast when compiled without optimizations than it is compiled with -O2.
m1 is computed only once because it is a Constant Applicative Form, while m2 is not a CAF, and so is computed for each evaluation.
See the GHC wiki on CAFs: http://www.haskell.org/haskellwiki/Constant_applicative_form
There is a crucial difference between the two forms: the monomorphism restriction applies to m1 but not m2, because m2 has explicitly given arguments. So m2's type is general but m1's is specific. The types they are assigned are:
m1 :: Int -> Integer
m2 :: (Integral a) => Int -> a
Most Haskell compilers and interpreters (all of them that I know of actually) do not memoize polymorphic structures, so m2's internal list is recreated every time it's called, where m1's is not.
I'm not sure, because I'm quite new to Haskell myself, but it appears that it's beacuse the second function is parametrized and the first one is not. The nature of the function is that, it's result depends on input value and in functional paradigm especailly it depends ONLY on the input. Obvious implication is that a function with no parameters returns always the same value over and over, no matter what.
Aparently there's an optimizing mechanizm in GHC compiler that exploits this fact to compute the value of such a function only once for whole program runtime. It does it lazily, to be sure, but does it nonetheless. I noticed it myself, when I wrote the following function:
primes = filter isPrime [2..]
where isPrime n = null [factor | factor <- [2..n-1], factor `divides` n]
where f `divides` n = (n `mod` f) == 0
Then to test it, I entered GHCI and wrote: primes !! 1000. It took a few seconds, but finally I got the answer: 7927. Then I called primes !! 1001 and got the answer instantly. Similarly in an instant I got the result for take 1000 primes, because Haskell had to compute the whole thousand-element list to return 1001st element before.
Thus if you can write your function such that it takes no parameters, you probably want it. ;)

Does this Haskell example effectively demonstrate laziness?

I'm new to Haskell, and I'm writing a paper on it for my Programming Languages class. I want to to demonstrate Haskell's laziness with some sample code, but I'm not sure if what I'm seeing is actually laziness.
doubleMe xs = [x*2 | x <- xs]
In ghci:
let xs = [1..10]
import Debug.Trace
trace (show lst) doubleMe (trace (show lst) doubleMe (trace (show lst) doubleMe(lst)))
Output:
[1,2,3,4,5,6,7,8,9,10]
[1,2,3,4,5,6,7,8,9,10]
[1,2,3,4,5,6,7,8,9,10]
[8,16,24,32,40,48,56,64,72,80]
Thanks for your time and help!
Your usage of trace here isn't particularly insightful, or in fact at all. All you do is print out the same list at four different points in the evaluation, that doesn't tell you anything about the actual state of the program. What actually happens here is that trace is forced in every doubling step before calculation even starts (when the result list is requested to weak head normal form). Which is pretty much the same as you would get in a language with fully strict evaluation.
To see some lazyness, you could do something like
Prelude Debug.Trace> let doubleLsTracing xs = [trace("{Now doubling "++show x++"}")$ x*2 | x<-xs]
Prelude Debug.Trace> take 5 $ doubleLsTracing [1 .. 10]
{Now doubling 1}
{Now doubling 2}
{Now doubling 3}
{Now doubling 4}
{Now doubling 5}
[2,4,6,8,10]
where you can see that only five numbers are doubled, because only five results were requested; even though the list that doubleLsTracing was given has 10 entries.
Note that trace is generally not a great tool to monitor "flow of execution", it's merely a hack to allow "looking into" local variables to see what's going on in some function.
Infinite streams are always a good example. You cannot obtain them in other languages without special constructs - but in Haskell they are very natural.
One example is the fibonacci stream:
fib = 0 : 1 : zipWith (+) fib (tail fib)
take 10 fib => [0,1,1,2,3,5,8,13,21,34]
Another one is obtaining the stream of prime numbers by using the trial division method:
primes = sieve [2..]
where sieve (x:xs) = x : filter (not . (== 0) . (`mod` x)) (sieve xs)
take 10 primes => [2,3,5,7,11,13,17,19,23,29]
Also, implementing backtracking in Haskell is very simple, giving you the ability to obtain the list of solutions lazily, on demand:
http://rosettacode.org/wiki/N-queens_problem#Haskell
A more complex example, showing how you can implement min is the one found here:
Lazy Evaluation and Time Complexity
It basically shows how using Haskell's laziness you can obtain a very elegant definition of the minimum function (that finds the minimum in a list of elements):
minimum = head . sort
You could demonstrate Haskell's laziness by contrived examples. But I think it is far better to show how laziness helps you develop solutions to common problems that exhibit far greater modularity than in other languages.
The short answer is, "no". leftaroundabout explains that pretty well in his answer.
My suggestion is to:
Read and understand the definition of lazy evaluation.
Write a function where one of the arguments can diverge, an example that cannot work in your favorite strict (non-lazy) language (C, python, Java). For example, sumIfFirstArgIsNonZero(x, y), which returns x+y if x != 0 and 0 otherwise.
For bonus points, define your own function ifThenElse that doesn't use Haskell's built-in if-then-else syntax, and explain why writing new control-flow structures in lazy languages is easy.
That should be easier than trying to wrap your head around infinite data streams or tying-the-knot tricks.
The main point of laziness is that values that aren't needed aren't computed -- so to demonstrate this, you'll have to show things not being evaluated. Your example isn't the best to demonstrate laziness since all values are eventually computed.
Here's a small example, as a starting point:
someValueThatNeverTerminates = undefined -- for example, a divide-by-zero error
main = do (greeting, _) = ("Hello terminating world!", someValueThatNeverTerminates)
putStrLn greeting
This says hello immediately -- if it weren't lazy, the whole thing would break halfway through.

Haskell checking if number is from Fibonacci sequence

I'm Haskell beginner. Last time I have learnt about Fibonacci sequences, so I can create Fib sequence. Now I'm wondering how to write a function which checks if number belongs to Fib sequence.
I mean function:
belongToFib :: Int -> Bool
I don't really need code. Some hints how to handle with this would be enough. Thanks in advance.
I will give you some hints for a solution involving lazy evaluation:
Define the list of all fibonacci numbers.
Check whether your input number belongs to the sequence.
These are the signatures for the two things you'll need to define:
fib :: [Int]
belongToFib :: Int -> Bool
Of course you will need some tricks to make this work. Even though your list has a (theoretically) infinite sequence of numbers, if you make sure that you only need to work on a finite subsequence, thanks to its laziness, Haskell will generate only the strictly needed part, and your function will not loop forever. So, when checking for the membership of your number to fib, make sure you return False at some point.
Another possible solution is to try to find out whether your number is in the fibonacci sequence without actually generating it up to the input, but rather by relying on arithmetic only. As a hint for this, have a look at this thread.
On Wikipedia you'll find a number of other ways to check membership to the fibonacci sequence.
edit: by the way, beware of overflows with Int. You may wish to switch to Integer instead.
Here is a skeleton of a function that tests if a number occurs in an increasing list of numbers:
contains _ [] = False
contains n (x:xs)
| n == x = True
| n < x = ???
| otherwise = ???
Think about what should happen in the cases I left open...
Or, if you are both lazy and allowed to use Prelude functions, you may have a look at dropWhile instead.

When is memoization automatic in GHC Haskell?

I can't figure out why m1 is apparently memoized while m2 is not in the following:
m1 = ((filter odd [1..]) !!)
m2 n = ((filter odd [1..]) !! n)
m1 10000000 takes about 1.5 seconds on the first call, and a fraction of that on subsequent calls (presumably it caches the list), whereas m2 10000000 always takes the same amount of time (rebuilding the list with each call). Any idea what's going on? Are there any rules of thumb as to if and when GHC will memoize a function? Thanks.
GHC does not memoize functions.
It does, however, compute any given expression in the code at most once per time that its surrounding lambda-expression is entered, or at most once ever if it is at top level. Determining where the lambda-expressions are can be a little tricky when you use syntactic sugar like in your example, so let's convert these to equivalent desugared syntax:
m1' = (!!) (filter odd [1..]) -- NB: See below!
m2' = \n -> (!!) (filter odd [1..]) n
(Note: The Haskell 98 report actually describes a left operator section like (a %) as equivalent to \b -> (%) a b, but GHC desugars it to (%) a. These are technically different because they can be distinguished by seq. I think I might have submitted a GHC Trac ticket about this.)
Given this, you can see that in m1', the expression filter odd [1..] is not contained in any lambda-expression, so it will only be computed once per run of your program, while in m2', filter odd [1..] will be computed each time the lambda-expression is entered, i.e., on each call of m2'. That explains the difference in timing you are seeing.
Actually, some versions of GHC, with certain optimization options, will share more values than the above description indicates. This can be problematic in some situations. For example, consider the function
f = \x -> let y = [1..30000000] in foldl' (+) 0 (y ++ [x])
GHC might notice that y does not depend on x and rewrite the function to
f = let y = [1..30000000] in \x -> foldl' (+) 0 (y ++ [x])
In this case, the new version is much less efficient because it will have to read about 1 GB from memory where y is stored, while the original version would run in constant space and fit in the processor's cache. In fact, under GHC 6.12.1, the function f is almost twice as fast when compiled without optimizations than it is compiled with -O2.
m1 is computed only once because it is a Constant Applicative Form, while m2 is not a CAF, and so is computed for each evaluation.
See the GHC wiki on CAFs: http://www.haskell.org/haskellwiki/Constant_applicative_form
There is a crucial difference between the two forms: the monomorphism restriction applies to m1 but not m2, because m2 has explicitly given arguments. So m2's type is general but m1's is specific. The types they are assigned are:
m1 :: Int -> Integer
m2 :: (Integral a) => Int -> a
Most Haskell compilers and interpreters (all of them that I know of actually) do not memoize polymorphic structures, so m2's internal list is recreated every time it's called, where m1's is not.
I'm not sure, because I'm quite new to Haskell myself, but it appears that it's beacuse the second function is parametrized and the first one is not. The nature of the function is that, it's result depends on input value and in functional paradigm especailly it depends ONLY on the input. Obvious implication is that a function with no parameters returns always the same value over and over, no matter what.
Aparently there's an optimizing mechanizm in GHC compiler that exploits this fact to compute the value of such a function only once for whole program runtime. It does it lazily, to be sure, but does it nonetheless. I noticed it myself, when I wrote the following function:
primes = filter isPrime [2..]
where isPrime n = null [factor | factor <- [2..n-1], factor `divides` n]
where f `divides` n = (n `mod` f) == 0
Then to test it, I entered GHCI and wrote: primes !! 1000. It took a few seconds, but finally I got the answer: 7927. Then I called primes !! 1001 and got the answer instantly. Similarly in an instant I got the result for take 1000 primes, because Haskell had to compute the whole thousand-element list to return 1001st element before.
Thus if you can write your function such that it takes no parameters, you probably want it. ;)

Resources