I was writing a fibonacci sequence generator, and I was trying to understand the following code in Haskell
fibs = 1 : 1 : zipWith (+) fibs (tail fibs)
I understand what zipWith is, but I do not exactly know how the program executes and why it does generate all the fibonacci numbers. I was trying to understand why it does not terminate using the environment concept in functional languages as follows:
Initially, because Haskell's lazy evaluation, the binding in the env should be fibs : [1,1,x], then to evaluate fibs, the interpreter evaluates x which is zipWith (+) fibs (tail fibs) in this case. When evaluating zipWith, it gets fibs : [1,1,2,x], again because of the lazy evaluation of Haskell. And fibs in env is bound to [1,1,2,x] at this time. However, to fully evaluate fibs, it continues to evaluate x and we go back to the previous steps.
Is this correct?
Besides, I noticed that when I ran the program above in ghci, it instantly prompts the fibonacci sequence it currently has computed, why? Shouldn't it print the result once it finishes all the computation?
So, most of your reasoning is correct. In particular, you described correctly how each new element of the list is evaluated in terms of older ones. You are also correct that to fully evaluate fibs would require repeating the steps you outlined and would, in fact, loop forever.
The key ingredient you're missing is that we don't have to fully evaluate the list. A binding like fibs = ... just assigns a name to the expression; it does not require evaluating the whole list. Haskell will only evaluate as much of the list as it needs to run main. So, for example, if our main is
main = print $ fibs !! 100
Haskell will only calculate the first 100 elements of fibs (following the steps you outlined) but will not need any more than that and will not loop forever.
Moreover, even if we are evaluating the whole thing (which will loop forever), we can use the parts we've calculated as we go along. This is exactly what's happening when you see the value of fibs in ghci: it prints as much as it can as each element is being calculated and does not have to wait until the whole list is ready.
Seeing Strictness in GHCi
You can see how much of a list is evaluated in ghci using the :sprint command which will print a Haskell data structure with _ for the parts that haven't been evaluated yet (called "thunks"). You can use this to see how fibs gets evaluated in action:
Prelude> let fibs = 1 : 1 : zipWith (+) fibs (tail fibs)
Prelude> :sprint fibs
fibs = _
Prelude> print $ fibs !! 10
89
Prelude> :sprint fibs
fibs = _
Oops, that's not what we expected! In fact, this is a case where the lack of the monomorphism restriction is a problem! fibs gets a polymorphic type
Prelude> :t fibs
fibs :: Num a => [a]
which means it behaves like a function call each time you use it, not like a normal value. (In the background, GHC implements instantiating the Num type class as passing in a dictionary to fibs; it's implemented like a NumDictionary a -> [a] function.)
To really understand what's going on, we'll need to make fibs monomorphic explicitly. We can do this by loading it from a module where the restriction is active or by giving it an explicit type signature. Let's do the latter:
Prelude> let fibs :: [Integer]; fibs = 1 : 1 : zipWith (+) fibs (tail fibs)
Prelude> :sprint fibs
fibs = _
Prelude> print $ fibs !! 10
89
Prelude> :sprint fibs
fibs = 1 : 1 : 2 : 3 : 5 : 8 : 13 : 21 : 34 : 55 : 89 : _
And there you are: you can see which parts of the list needed to be evaluated and which ones didn't to get the 10th element. You can play around with other lists or other lazy data structures to get a good feel for what's going on in the background.
Also, you can take a look at my blog post about this sort of laziness. It goes into greater detail about the fibs example (with diagrams!) and talks about how to use this approach for general memoization and dynamic programming.
Related
In Haskell, the canonical zipWith implementation of the fibonacci function is :
fibs :: [Integer]
fibs = 1 : 1 : zipWith (+) fibs (tail fibs)
I have difficulty analysing the time complexity of this (fibs !! n).
Trying to write it on paper, at first i thought it was exponential.
Then O(n^2) , but I have no clue how it happens to be linear.
Why i think it is linear : https://wiki.haskell.org/The_Fibonacci_sequence#Canonical_zipWith_implementation
Also, i wrote another code :
fibs :: [Integer]
fibs = inc (0,1) where inc (a,b) = a : inc (b,a+b)
This, I believe is clearly O(n). But using the :set +s option in ghci, I see that the zipWith implementation clearly beats mine.
Note: I know that it takes O(n) time for addition of nth and (n-1)th fibonacci number. Thus while testing, i made the base case, i.e. the first two elements 0 : 0 .
Time complexities are mentioned using the same assumption.
It would be great if i could get some help with tracing these function calls. I'm interested to know which function was called when and maybe print something to let me know what is going on.
My unsuccessful attempt at this :
zipWith' = trace ("Called zipWith") (zipWith)
add' a b = trace (show a ++ " + " ++ (show b)) (add a b)
fibs = trace ("Called fibs") (1 : 1 : zipWith (+) fibs (tail fibs))
This does not work. The statements are printed exactly one.
Except for add' which works fine, surprisingly.
I wish to know how many times and in what order these functions were called.
I believe your version is slow primarily because you're running it without optimization, and so you end up building a bunch of unnecessary tuples. The partially hand-optimized (and more idiomatic) version would be
fibs = inc 0 1
where
inc a b = a : inc b (a+b)
Let's look at the classic:
fibs = 1 : 1 : zipWith (+) fibs (tail fibs)
The initial representation in memory looks very much like that. It's a list cons pointing to the number 1 and a second list cons pointing to the number 1 and a thunk representing zipWith (+) fibs (tail fibs). What happens when that thunk is forced? Well zipWith needs to inspect both of it's list arguments. It does so, and, seeing that they're not null, produces a list cons pointing to a thunk representing 1+1 and a thunk representing zipWith (+) fibs' (tail fibs'), where fibs' is a pointer to the second cons in the sequence. There's no need to evaluate fibs again for each of the zipWith arguments or anything like that.
This question already has answers here:
Can someone explain this lazy Fibonacci solution?
(1 answer)
Haskell Fibonacci Explanation
(2 answers)
Closed 5 years ago.
I'm new to Haskell, and was wondering what the difference between these two functions are.
fibs = 0 : 1 : zipWith (+) fibs (tail fibs)
fib1 = 1 :1 : [a+b | (a,b) <- zip fib1 (tail fib1)]
I would like to know how they are working more clearly.
How I currently understand them is as follows:
I know that zipWith in the first function will apply the addition function to two lists in this case "fibs" and "tail fibs". However I'm confused on how the recursion is working. I know fibs returns a list and tail fibs would be everything except the head of the list. However I guess I'm confused on the intermediate steps and how zipWith is working with this recursively.
In fib1 I have the same question as above for "zip" but also how exactly is this getting applied to "a" and "b". Also why are "a" and "b" in a tuple?
I'm sorry if I haven't been more clear. I appreciate any help people could offer
Thanks in advance
In the first function, what you have is an infinity recursion that create the Fibonacci series in a list. The tail is used to refer to the next element in the sequence, for adding those two together. I think that by tracing the creation of the third element you'll understand better what is happening:
zipWith (+) fibs (tail fibs) ->
zipWith (+) (0 : 1 : zipWith (+) fibs (tail fibs)) (1 : zipWith (+) fibs (tail fibs)) ->
(0 + 1 : 1 + zipWith (+) fibs (tail fibs) : ...)
You can see that each element in the fibs list is created by adding its two former numbers in the sequence.
In fib1 something similar is happening. You are grouping in a tuple two adjacent numbers in the sequence and then declaring that the list is the sum of those two tuples (you can "solve" it like I did above to better understand what is happening). Note that the tuple itself is not important, just a way to pass data around. You can have the same effect with a list or a user-defined type.
Please tell me if something is not clear.
Cheers
I have been trying to make a infinite fibonacci list producing function that can take first 2 values as parameters.
Without specifying the first two values it is possible like this
fib = 1 : 1 : zipWith (+) fib (tail fib)
Suppose I wanted to start the fibonacci sequence with 5 and 6 instead of 1,1 or 0,1 then I will have to change the above code. But when trying to make a lazy list generator in which I can specify the first 2 values of fibonacci sequence I am stumped. I came up with this but that didn't work.
fib a b = a : b : zipWith (+) fib (tail fib)
The problem is obvious. I am trying to convert the use of list in the hard-coded one. How can I solve that?
How about
fib a b = fibs where fibs = a : b : zipWith (+) fibs (tail fibs)
? Use the same method, but with your parameters in scope.
I should add that, in case you are tempted by
fib a b = a : b : zipWith (+) (fib a b) (tail (fib a b)) -- worth trying?
the where fibs version ensures that only one infinite stream is generated. The latter risks generating a fresh stream for each recursive invocation of fib. The compiler might be clever enough to spot the common subexpression, but it is not wise to rely on such luck. Try both versions in ghci and see how long it takes to compute the 1000th element.
The simplest way to do that is:
fib a b = a: fib b (a+b)
This stems from the inductive definition of the Fibonacci series: suppose we have a function that can produce a stream of Fibonacci numbers from Fi onwards, given Fi and Fi+1. What could that function look like? Well, Fi is given, and the rest of the stream can be computed using this function to produce a stream of Fibonacci numbers from Fi+1 onwards, if we can provide Fi+1 and Fi+2. Fi+1 is given, so we only need to work out Fi+2. The definition of series gives us Fi+2=Fi+Fi+1, so, there.
I am curious about the runtime performance of an infinite list like
the one below:
fibs = 1 : 1 : zipWith (+) fibs (tail fibs)
This will create an infinite list of the fibonacci sequence.
My question is that if I do the following:
takeWhile (<5) fibs
how many times does fibs evaluate each term in the list? It seems
that since takeWhile checks the predicate function for each item in
the list, the fibs list will evaluate each term multiple times. The
first 2 terms are given for free. When takeWhile wants to evaluate
(<5) on the 3rd element, we will get:
1 : 1 : zipWith (+) [(1, 1), (1)] => 1 : 1 : 3
Now, once takeWhile wants to evaluate (<5) on the 4th element: the
recursive nature of fibs will construct the list again like the
following:
1 : 1 : zipWith (+) [(1, 2), (2, 3)] => 1 : 1 : 3 : 5
It would seem that the 3rd element needs to be computed again when we
want to evaluate the value of the 4th element. Furthermore, if the
predicate in takeWhile is large, it would indicate the function is
doing more work that is needed since it is evaluating each preceding
element in the list multiple times. Is my analysis here correct or is
Haskell doing some caching to prevent multiple evaluations here?
This is a self-referential, lazy data structure, where "later" parts of the structure refer to earlier parts by name.
Initially, the structure is just a computation with unevaluated pointers back to itself. As it is unfolded, values are created in the structure. Later references to already-computed parts of the structure are able to find the value already there waiting for them. No need to re-evaluate the pieces, and no extra work to do!
The structure in memory begins as just an unevaluated pointer. Once we look at the first value, it looks like this:
> take 2 fibs
(a pointer to a cons cell, pointing at '1', and a tail holding the second '1', and a pointer to a function that holds references back to fibs, and the tail of fibs.
Evaluating one more step expands the structure, and slides the references along:
And so we go unfolding the structure, each time yielding a new unevaluated tail, which is a closure holding references back to 1st and 2nd elements of the last step. This process can continue infinitely :)
And because we're referring to prior values by name, GHC happily retains them in memory for us, so each item is evaluated only once.
Illustration:
module TraceFibs where
import Debug.Trace
fibs :: [Integer]
fibs = 0 : 1 : zipWith tadd fibs (tail fibs)
where
tadd x y = let s = x+y
in trace ("Adding " ++ show x ++ " and " ++ show y
++ "to obtain " ++ show s)
s
Which produces
*TraceFibs> fibs !! 5
Adding 0 and 1 to obtain 1
Adding 1 and 1 to obtain 2
Adding 1 and 2 to obtain 3
Adding 2 and 3 to obtain 5
5
*TraceFibs> fibs !! 5
5
*TraceFibs> fibs !! 6
Adding 3 and 5 to obtain 8
8
*TraceFibs> fibs !! 16
Adding 5 and 8 to obtain 13
Adding 8 and 13 to obtain 21
Adding 13 and 21 to obtain 34
Adding 21 and 34 to obtain 55
Adding 34 and 55 to obtain 89
Adding 55 and 89 to obtain 144
Adding 89 and 144 to obtain 233
Adding 144 and 233 to obtain 377
Adding 233 and 377 to obtain 610
Adding 377 and 610 to obtain 987
987
*TraceFibs>
When something is evaluated in Haskell, it stays evaluated, as long as it's referenced by the same name1.
In the following code, the list l is only evaluated once (which might be obvious):
let l = [1..10]
print l
print l -- None of the elements of the list are recomputed
Even if something is partially evaluated, that part stays evaluated:
let l = [1..10]
print $ take 5 l -- Evaluates l to [1, 2, 3, 4, 5, _]
print l -- 1 to 5 is already evaluated; only evaluates 6..10
In your example, when an element of the fibs list is evaluated, it stays evaluated. Since the arguments to zipWith reference the actual fibs list, it means that the zipping expression will use the already partially computed fibs list when computing the next elements in the list. This means that no element is evaluated twice.
1This is of course not strictly required by the language semantics, but in practice this is always the case.
Think of it this way. The variable fib is a pointer to a lazy value. (You can think of a lazy value underneath as a data structure like (not real syntax) Lazy a = IORef (Unevaluated (IO a) | Evaluated a); i.e. it starts out as unevaluated with a thunk; then when it is evaluated it "changes" to something that remembers the value.) Because the recursive expression uses the variable fib, they have a pointer to the same lazy value (they "share" the data structure). The first time someone evaluates fib, it runs the thunk to get the value and that value is remembered. And because the recursive expression points to the same lazy data structure, when they evaluate it, they will see the evaluated value already. As they traverse the lazy "infinite list", there will only be one "partial list" in memory; zipWith will have two pointers to "lists" which are simply pointers to previous members of the same "list", due to the fact that it started with pointers to the same list.
Note that this is not really "memoizing"; it's just a consequence of referring to the same variable. There is generally no "memoizing" of function results (the following will be inefficient):
fibs () = 0 : 1 : zipWith tadd (fibs ()) (tail (fibs ()))
I'm using project Euler to teach myself Haskell, and I'm having some trouble reasoning about how my code is being executed by haskell. The second problem has me computing the sum of all even Fibonacci numbers up to 4 million. My script looks like this:
fibs :: [Integer]
fibs = 1 : 2 : [ a+b | (a,b) <- zip fibs (tail fibs)]
evens :: Integer -> Integer -> Integer
evens x sum | (even x) = x + sum
| otherwise = sum
main = do
print (foldr evens 0 (take 4000000 fibs))
Hugs gives the error "Garbage collection fails to reclaim sufficient space", which I assume means that the list entries are not released as they are consumed by foldr.
What do I need to do to fix this? I tried writing a tail-recursive (I think) version that used accumulators, but couldn't get that to work either.
Firstly, you shouldn't use hugs. It is a toy for teaching purposes only.
GHC, however, is a fast, multicore-ready optimizing compiler for Haskell. Get it here. In particular, it does strictness analysis, and compiles to native code.
The main thing that stands out about your code is the use of foldr on a very large list. Probably you want a tail recursive loop. Like so:
import Data.List
fibs = 0 : 1 : zipWith (+) fibs (tail fibs)
evens x sum | even x = x + sum
| otherwise = sum
-- sum of even fibs in first 4M fibs
main = print (foldl' evens 0 (take 4000000 fibs))
Besides all this, the first 4M even fibs will use a fair amount of space, so it'll take a while.
Here's the sum of the first 400k even fibs, to save you some time (21s). :-)
A number of observations / hints:
the x + sums from even aren't getting evaluated until the very end
You're taking the first 4,000,000 fibs, not the fibs up to 4,000,000
There is an easier way to do this
Edit in response to comment
I'm not going to tell you what the easier way is, since that's the fun of Project Euler problems. But I will ask you a bunch of questions:
How many even fibs can you have in a row?
How long can you go without an even fib?
If you sum up all the even fibs and all the odd fibs (do this by hand), what do you notice about the sums?
You understood the problem wrong. The actual problem wants you to sum all the even Fibonacci numbers such that the Fibonacci number itself doesn't exceed 4 million (which happens to be only the first 33 Fibonacci numbers).
You are evaluating four million elements of fibs. Those numbers grow exponentially. I don't know how many bytes are required to represent the millionth Fibonacci number; just the one-thousandth Fibonacci number has 211 decimal digits, so that's going to take 22 32-bit words just to hold the digits, never mind whatever overhead gmp imposes. And these grow exponentially.
Exercise: calculuate the amount of memory needed to hold four million Fibonacci numbers.
have a look at the Prelude functions takeWhile, filter, even, and sum
takeWhile (<40) [0..]
filter even $ takeWhile (<40) [0..]
put 'em together:
ans = sum $ filter even $ takeWhile (< 4* 10^6) fibs