Haskell script running out of space - haskell

I'm using project Euler to teach myself Haskell, and I'm having some trouble reasoning about how my code is being executed by haskell. The second problem has me computing the sum of all even Fibonacci numbers up to 4 million. My script looks like this:
fibs :: [Integer]
fibs = 1 : 2 : [ a+b | (a,b) <- zip fibs (tail fibs)]
evens :: Integer -> Integer -> Integer
evens x sum | (even x) = x + sum
| otherwise = sum
main = do
print (foldr evens 0 (take 4000000 fibs))
Hugs gives the error "Garbage collection fails to reclaim sufficient space", which I assume means that the list entries are not released as they are consumed by foldr.
What do I need to do to fix this? I tried writing a tail-recursive (I think) version that used accumulators, but couldn't get that to work either.

Firstly, you shouldn't use hugs. It is a toy for teaching purposes only.
GHC, however, is a fast, multicore-ready optimizing compiler for Haskell. Get it here. In particular, it does strictness analysis, and compiles to native code.
The main thing that stands out about your code is the use of foldr on a very large list. Probably you want a tail recursive loop. Like so:
import Data.List
fibs = 0 : 1 : zipWith (+) fibs (tail fibs)
evens x sum | even x = x + sum
| otherwise = sum
-- sum of even fibs in first 4M fibs
main = print (foldl' evens 0 (take 4000000 fibs))
Besides all this, the first 4M even fibs will use a fair amount of space, so it'll take a while.
Here's the sum of the first 400k even fibs, to save you some time (21s). :-)

A number of observations / hints:
the x + sums from even aren't getting evaluated until the very end
You're taking the first 4,000,000 fibs, not the fibs up to 4,000,000
There is an easier way to do this
Edit in response to comment
I'm not going to tell you what the easier way is, since that's the fun of Project Euler problems. But I will ask you a bunch of questions:
How many even fibs can you have in a row?
How long can you go without an even fib?
If you sum up all the even fibs and all the odd fibs (do this by hand), what do you notice about the sums?

You understood the problem wrong. The actual problem wants you to sum all the even Fibonacci numbers such that the Fibonacci number itself doesn't exceed 4 million (which happens to be only the first 33 Fibonacci numbers).

You are evaluating four million elements of fibs. Those numbers grow exponentially. I don't know how many bytes are required to represent the millionth Fibonacci number; just the one-thousandth Fibonacci number has 211 decimal digits, so that's going to take 22 32-bit words just to hold the digits, never mind whatever overhead gmp imposes. And these grow exponentially.
Exercise: calculuate the amount of memory needed to hold four million Fibonacci numbers.

have a look at the Prelude functions takeWhile, filter, even, and sum
takeWhile (<40) [0..]
filter even $ takeWhile (<40) [0..]
put 'em together:
ans = sum $ filter even $ takeWhile (< 4* 10^6) fibs

Related

Time complexity of zipWith fibonacci in Haskell

In Haskell, the canonical zipWith implementation of the fibonacci function is :
fibs :: [Integer]
fibs = 1 : 1 : zipWith (+) fibs (tail fibs)
I have difficulty analysing the time complexity of this (fibs !! n).
Trying to write it on paper, at first i thought it was exponential.
Then O(n^2) , but I have no clue how it happens to be linear.
Why i think it is linear : https://wiki.haskell.org/The_Fibonacci_sequence#Canonical_zipWith_implementation
Also, i wrote another code :
fibs :: [Integer]
fibs = inc (0,1) where inc (a,b) = a : inc (b,a+b)
This, I believe is clearly O(n). But using the :set +s option in ghci, I see that the zipWith implementation clearly beats mine.
Note: I know that it takes O(n) time for addition of nth and (n-1)th fibonacci number. Thus while testing, i made the base case, i.e. the first two elements 0 : 0 .
Time complexities are mentioned using the same assumption.
It would be great if i could get some help with tracing these function calls. I'm interested to know which function was called when and maybe print something to let me know what is going on.
My unsuccessful attempt at this :
zipWith' = trace ("Called zipWith") (zipWith)
add' a b = trace (show a ++ " + " ++ (show b)) (add a b)
fibs = trace ("Called fibs") (1 : 1 : zipWith (+) fibs (tail fibs))
This does not work. The statements are printed exactly one.
Except for add' which works fine, surprisingly.
I wish to know how many times and in what order these functions were called.
I believe your version is slow primarily because you're running it without optimization, and so you end up building a bunch of unnecessary tuples. The partially hand-optimized (and more idiomatic) version would be
fibs = inc 0 1
where
inc a b = a : inc b (a+b)
Let's look at the classic:
fibs = 1 : 1 : zipWith (+) fibs (tail fibs)
The initial representation in memory looks very much like that. It's a list cons pointing to the number 1 and a second list cons pointing to the number 1 and a thunk representing zipWith (+) fibs (tail fibs). What happens when that thunk is forced? Well zipWith needs to inspect both of it's list arguments. It does so, and, seeing that they're not null, produces a list cons pointing to a thunk representing 1+1 and a thunk representing zipWith (+) fibs' (tail fibs'), where fibs' is a pointer to the second cons in the sequence. There's no need to evaluate fibs again for each of the zipWith arguments or anything like that.

Digit Counter Fibonacci List Haskell trouble

So, for problem 25 in Project Euler, I have to find the position of the first number in the Fibonacci sequence with a thousand digits.
-- Lazily generates an infinite Fibonacci series
fib :: [Int]
fib = 1 : 1 : zipWith (+) fib (tail fib)
-- Checks if given number has a thousand digits.
checkDigits :: Int -> Bool
checkDigits number = length (show number) == 1000
-- Checks if position in Fibonacci series has a thousand digits.
checkFibDigits :: Int -> Bool
checkFibDigits pos = checkDigits (fib !! (pos - 1))
p25 = head (filter (\x -> checkFibDigits x == True) ([1..]))
For some reason, this approach seems to hang indefinitely. If I replace 1000 with 10, it spits out 45, which is the position of the first number with 10 digits.
Either my approach is crazy inefficient, or Haskell's doing something weird with big numbers. A similar approach in Python worked pretty flawlessly.
Thank you for your help!
Your immediate problem is using Int rather than Integer in the type of fib, which limits the values to never go above around 231, but beyond that, yes, the way you’re doing it is pretty inefficient. Namely, it’s O(n2) when it really ought to be O(n). The way you’re generating the Fibonacci sequence is fine, but when trying to find the first value that’s a thousand digits, you go:
Is the first element of the Fibonacci sequence greater than 1000 digits? No, move on…
Is the second element [which, oh wait, I need to get from this linked list, so I better follow the ‘next’ pointer some number of times] greater than 1000 digits? No, move on…
…
Is the 50th element [better start at the beginning of the linked list, follow the next pointer, follow the next pointer, follow the next pointer, …, and fetch the value at this element] greater than 1000 digits? No, move on…
…
Basically, you’re re-traversing the linked list each and every single time. A different approach might be to zip together the index and corresponding Fibonacci result:
ghci> take 10 $ zip [1..] fib
[(1,1),(2,1),(3,2),(4,3),(5,5),(6,8),(7,13),(8,21),(9,34),(10,55)]
Then you drop elements until the Fibonacci value is at least 1000 digits, and take the index of the first one left:
ghci> fst $ head $ dropWhile ((< 1000) . length . show . snd) $ zip [1..] fib
4782
Just change Int to Integer for fib and checkDigits, you will notice that the answer will appear instantaneously:
fib :: [Integer]
fib = 1 : 1 : zipWith (+) fib (tail fib)
checkDigits :: Integer -> Bool
checkDigits number = length (show number) == 1000
That's because Int has limited size whereas Integer has an arbitrary precision which is limited by your system memory.

How does (co)recursive definition work in Haskell?

I'm playing around with the language to start learning and I am puzzled beyond my wits about how a recursive definition works.
For example, let's take the sequence of the Triangular numbers (TN n = sum [1..n])
The solution provided was:
triangularNumbers = scanl1 (+) [1..]
So far, so good.
But the solution I did come up with was:
triangularNumbers = zipWith (+) [1..] $ 0 : triangularNumbers
Which is also correct.
Now my question is: how does this translate to a lower level implementation? What happens exactly behind the scene when such a recursive definition is met?
Here is a simple recursive function that gives you the nth Triangular number:
triag 0 = 0
triag n = n + triag (n-1)
Your solution triag' = zipWith (+) [1..] $ 0 : triag' is something more fancy: It's corecursive (click, click). Instead of calculating the nth number by reducing it to the value of smaller inputs, you define the whole infinite sequence of triangular numbers by recursively specifying the next value, given an initial segment.
How does Haskell handle such corecursion? When it encounters your definition, no calculation is actually performed, it is deferred until results are needed for further computations. When you access a particular element of your list triag', Haskell starts computing the elements of the list based on the definition, up to the element that gets accessed. For more detail, I found this article on lazy evaluation helpful. In summary, lazy evaluation is great to have, unless you need to predict memory usage.
Here is a similar SO question, with a step-by step explanation of the evaluation of fibs = 0 : 1 : zipWith (+) fibs (tail fibs), a corecursive definition of the Fibonacci sequence.

Fibonacci numbers with initial two values as parameters

I have been trying to make a infinite fibonacci list producing function that can take first 2 values as parameters.
Without specifying the first two values it is possible like this
fib = 1 : 1 : zipWith (+) fib (tail fib)
Suppose I wanted to start the fibonacci sequence with 5 and 6 instead of 1,1 or 0,1 then I will have to change the above code. But when trying to make a lazy list generator in which I can specify the first 2 values of fibonacci sequence I am stumped. I came up with this but that didn't work.
fib a b = a : b : zipWith (+) fib (tail fib)
The problem is obvious. I am trying to convert the use of list in the hard-coded one. How can I solve that?
How about
fib a b = fibs where fibs = a : b : zipWith (+) fibs (tail fibs)
? Use the same method, but with your parameters in scope.
I should add that, in case you are tempted by
fib a b = a : b : zipWith (+) (fib a b) (tail (fib a b)) -- worth trying?
the where fibs version ensures that only one infinite stream is generated. The latter risks generating a fresh stream for each recursive invocation of fib. The compiler might be clever enough to spot the common subexpression, but it is not wise to rely on such luck. Try both versions in ghci and see how long it takes to compute the 1000th element.
The simplest way to do that is:
fib a b = a: fib b (a+b)
This stems from the inductive definition of the Fibonacci series: suppose we have a function that can produce a stream of Fibonacci numbers from Fi onwards, given Fi and Fi+1. What could that function look like? Well, Fi is given, and the rest of the stream can be computed using this function to produce a stream of Fibonacci numbers from Fi+1 onwards, if we can provide Fi+1 and Fi+2. Fi+1 is given, so we only need to work out Fi+2. The definition of series gives us Fi+2=Fi+Fi+1, so, there.

Haskell inefficient fibonacci implementation

I am new to haskell and just learning the fun of functional programming. but have run into trouble right away with an implementation of the fibonacci function. Please find the code below.
--fibonacci :: Num -> [Num]
fibonacci 1 = [1]
fibonacci 2 = [1,1]
--fibonacci 3 = [2]
--fibonacci n = fibonacci n-1
fibonacci n = fibonacci (n-1) ++ [last(fibonacci (n-1)) + last(fibonacci (n-2))]
Rather awkward, I know. I can't find time to look up and write a better one. Though I wonder what makes this so inefficient. I know I should look it up, just hoping someone would feel the need to be pedagogic and spare me the effort.
orangegoat's answer and Sec Oe's answer contain a link to probably the best place to learn how to properly write the fibonacci sequence in Haskell, but here's some reasons why your code is inefficient (note, your code is not that different from the classic naive definition. Elegant? Sure. Efficient? Goodness, no):
Let's consider what happens when you call
fibonacci 5
That expands into
(fibonacci 4) ++ [(last (fibonacci 4)) + (last (fibonacci 3))]
In addition to concatenating two lists together with ++, we can already see that one place we're being inefficient is that we calculate fibonacci 4 twice (the two places we called fibonacci (n-1). But it gets worst.
Everywhere it says fibonacci 4, that expands into
(fibonacci 3) ++ [(last (fibonacci 3)) + (last (fibonacci 2))]
And everywhere it says fibonacci 3, that expands into
(fibonacci 2) ++ [(last (fibonacci 2)) + (last (fibonacci 1))]
Clearly, this naive definition has a lot of repeated computations, and it only gets worse when n gets bigger and bigger (say, 1000). fibonacci is not a list, it just returns lists, so it isn't going to magically memoize the results of the previous computations.
Additionally, by using last, you have to navigate through the list to get its last element, which adds on top of the problems with this recursive definition (remember, lists in Haskell don't support constant time random access--they aren't dynamic arrays, they are linked lists).
One example of a recursive definition (from the links mentioned) that does keep down on the computations is this:
fibs = 0 : 1 : zipWith (+) fibs (tail fibs)
Here, fibs is actually a list, and we can take advantage of Haskell's lazy evaluation to generate fibs and tail fibs as needed, while the previous computations are still stored inside of fibs. And to get the first five numbers, it's as simple as:
take 5 fibs -- [0,1,1,2,3]
(Optionally, you can replace the first 0 with a 1 if you want the sequence to start at 1).
All the ways to implement the fibonacci sequence in Haskell just follow the link
http://www.haskell.org/haskellwiki/The_Fibonacci_sequence
This implementation is inefficient because it makes three recursive calls. If we were to write a recurrence relation for computing fibonacci n to a normal form (note, pedantic readers: not whnf), it would look like:
T(1) = c
T(2) = c'
T(n) = T(n-1) + T(n-1) + T(n-2) + c''
(Here c, c', and c'' are some constants that we don't know.) Here's a recurrence which is smaller:
S(1) = min(c, c')
S(n) = 2 * S(n-1)
...but this recurrence has a nice easy closed form, namely S(n) = min(c, c') * 2^(n-1): it's exponential! Bad news.
I like the general idea of your implementation (that is, track the second-to-last and last terms of the sequence together), but you fell down by recursively calling fibonacci multiple times, when that's totally unnecessary. Here's a version that fixes that mistake:
fibonacci 1 = [1]
fibonacci 2 = [1,1]
fibonacci n = case fibonacci (n-1) of
all#(last:secondLast:_) -> (last + secondLast) : all
This version should be significantly faster. As an optimization, it produces the list in reverse order, but the most important optimization here was making only one recursive call, not building the list efficiently.
So even if you wouldn't know about the more efficient ways, how could you improve your solution?
First, looking at the signature it seems you don't want an infinite list, but a list of a given length. That's fine, the infinite stuff might be too crazy for you right now.
The second observation is that you need to access the end of the list quite often in your version, which is bad. So here is a trick which is often useful when working with lists: Write a version that work backwards:
fibRev 0 = []
fibRev 1 = [1]
fibRev 2 = [1,1]
fibRev n = let zs#(x:y:_) = fibRev (n-1) in (x+y) : zs
Here is how the last case works: We get the list which is one element shorter and call it zs. At the same time we match against the pattern (x:y:_) (this use of # is called an as-pattern). This gives us the first two elements of that list. To calculate the next value of the sequence, we have just to add these elements. We just put the sum (x+y) in front of the list zs we already got.
Now we have the fibonacci list, but it is backwards. No problem, just use reverse:
fibonacci :: Int -> [Int]
fibonacci n = reverse (fibRev n)
The reverse function isn't that expensive, and we call it here only one time.

Resources