Digit Counter Fibonacci List Haskell trouble - haskell

So, for problem 25 in Project Euler, I have to find the position of the first number in the Fibonacci sequence with a thousand digits.
-- Lazily generates an infinite Fibonacci series
fib :: [Int]
fib = 1 : 1 : zipWith (+) fib (tail fib)
-- Checks if given number has a thousand digits.
checkDigits :: Int -> Bool
checkDigits number = length (show number) == 1000
-- Checks if position in Fibonacci series has a thousand digits.
checkFibDigits :: Int -> Bool
checkFibDigits pos = checkDigits (fib !! (pos - 1))
p25 = head (filter (\x -> checkFibDigits x == True) ([1..]))
For some reason, this approach seems to hang indefinitely. If I replace 1000 with 10, it spits out 45, which is the position of the first number with 10 digits.
Either my approach is crazy inefficient, or Haskell's doing something weird with big numbers. A similar approach in Python worked pretty flawlessly.
Thank you for your help!

Your immediate problem is using Int rather than Integer in the type of fib, which limits the values to never go above around 231, but beyond that, yes, the way you’re doing it is pretty inefficient. Namely, it’s O(n2) when it really ought to be O(n). The way you’re generating the Fibonacci sequence is fine, but when trying to find the first value that’s a thousand digits, you go:
Is the first element of the Fibonacci sequence greater than 1000 digits? No, move on…
Is the second element [which, oh wait, I need to get from this linked list, so I better follow the ‘next’ pointer some number of times] greater than 1000 digits? No, move on…
…
Is the 50th element [better start at the beginning of the linked list, follow the next pointer, follow the next pointer, follow the next pointer, …, and fetch the value at this element] greater than 1000 digits? No, move on…
…
Basically, you’re re-traversing the linked list each and every single time. A different approach might be to zip together the index and corresponding Fibonacci result:
ghci> take 10 $ zip [1..] fib
[(1,1),(2,1),(3,2),(4,3),(5,5),(6,8),(7,13),(8,21),(9,34),(10,55)]
Then you drop elements until the Fibonacci value is at least 1000 digits, and take the index of the first one left:
ghci> fst $ head $ dropWhile ((< 1000) . length . show . snd) $ zip [1..] fib
4782

Just change Int to Integer for fib and checkDigits, you will notice that the answer will appear instantaneously:
fib :: [Integer]
fib = 1 : 1 : zipWith (+) fib (tail fib)
checkDigits :: Integer -> Bool
checkDigits number = length (show number) == 1000
That's because Int has limited size whereas Integer has an arbitrary precision which is limited by your system memory.

Related

Generating Cartesian products in Haskell

I am trying to generate all possible combinations of n numbers. For example if n = 3 I would want the following combinations:
(0,0,0), (0,0,1), (0,0,2)... (0,0,9), (0,1,0)... (9,9,9).
This post describes how to do so for n = 3:
[(a,b,c) | m <- [0..9], a <- [0..m], b <- [0..m], c <- [0..m] ]
Or to avoid duplicates (i.e. multiple copies of the same n-uple):
let l = 9; in [(a,b,c) | m <- [0..3*l],
a <- [0..l], b <- [0..l], c <- [0..l],
a + b + c == m ]
However following the same pattern would become very silly very quickly for n > 3. Say I wanted to find all of the combinations: (a, b, c, d, e, f, g, h, i, j), etc.
Can anyone point me in the right direction here? Ideally I'd rather not use a built in funtion as I am trying to learn Haskell and I would rather take the time to understand a peice of code than just use a package written by someone else. A tuple is not required, a list would also work.
My other answer gave an arithmetic algorithm to enumerate all the combinations of digits. Here's an alternative solution which arises by generalising your example. It works for non-numbers, too, because it only uses the structure of lists.
First off, let's remind ourselves of how you might use a list comprehension for three-digit combinations.
threeDigitCombinations = [[x, y, z] | x <- [0..9], y <- [0..9], z <- [0..9]]
What's going on here? The list comprehension corresponds to nested loops. z counts from 0 to 9, then y goes up to 1 and z starts counting from 0 again. x ticks the slowest. As you note, the shape of the list comprehension changes (albeit in a uniform way) when you want a different number of digits. We're going to exploit that uniformity.
twoDigitCombinations = [[x, y] | x <- [0..9], y <- [0..9]]
We want to abstract over the number of variables in the list comprehension (equivalently, the nested-ness of the loop). Let's start playing around with it. First, I'm going to rewrite these list comprehensions as their equivalent monad comprehensions.
threeDigitCombinations = do
x <- [0..9]
y <- [0..9]
z <- [0..9]
return [x, y, z]
twoDigitCombinations = do
x <- [0..9]
y <- [0..9]
return [x, y]
Interesting. It looks like threeDigitCombinations is roughly the same monadic action as twoDigitCombinations, but with an extra statement. Rewriting again...
zeroDigitCombinations = [[]] -- equivalently, `return []`
oneDigitCombinations = do
z <- [0..9]
empty <- zeroDigitCombinations
return (z : empty)
twoDigitCombinations = do
y <- [0..9]
z <- oneDigitCombinations
return (y : z)
threeDigitCombinations = do
x <- [0..9]
yz <- twoDigitCombinations
return (x : yz)
It should be clear now what we need to parameterise:
combinationsOfDigits 0 = return []
combinationsOfDigits n = do
x <- [0..9]
xs <- combinationsOfDigits (n - 1)
return (x : xs)
ghci> combinationsOfDigits' 2
[[0,0],[0,1],[0,2],[0,3],[0,4],[0,5],[0,6],[0,7],[0,8],[0,9],[1,0],[1,1] ... [9,8],[9,9]]
It works, but we're not done yet. I want to show you that this is an instance of a more general monadic pattern. First I'm going to change the implementation of combinationsOfDigits so that it folds up a list of constants.
combinationsOfDigits n = foldUpList $ replicate n [0..9]
where foldUpList [] = return []
foldUpList (xs : xss) = do
x <- xs
ys <- foldUpList xss
return (x : ys)
Looking at the definiton of foldUpList :: [[a]] -> [[a]], we can see that it doesn't actually require the use of lists per se: it only uses the monad-y parts of lists. It could work on any monad, and indeed it does! It's in the standard library, and it's called sequence :: Monad m => [m a] -> m [a]. If you're confused by that, replace m with [] and you should see that those types mean the same thing.
combinationsOfDigits n = sequence $ replicate n [0..9]
Finally, noting that sequence . replicate n is the definition of replicateM, we get it down to a very snappy one-liner.
combinationsOfDigits n = replicateM n [0..9]
To summarise, replicateM n gives the n-ary combinations of an input list. This works for any list, not just a list of numbers. Indeed, it works for any monad - though the "combinations" interpretation only makes sense when your monad represents choice.
This code is very terse indeed! So much so that I think it's not entirely obvious how it works, unlike the arithmetic version I showed you in my other answer. The list monad has always been one of the monads I find less intuitive, at least when you're using higher-order monad combinators and not do-notation.
On the other hand, it runs quite a lot faster than the number-crunching version. On my (high-spec) MacBook Pro, compiled with -O2, this version calculates the 5-digit combinations about 4 times faster than the version which crunches numbers. (If anyone can explain the reason for this I'm listening!)
What are all the combinations of three digits? Let's write a few out manually.
000, 001, 002 ... 009, 010, 011 ... 099, 100, 101 ... 998, 999
We ended up simply counting! We enumerated all the numbers between 0 and 999. For an arbitrary number of digits this generalises straightforwardly: the upper limit is 10^n (exclusive), where n is the number of digits.
Numbers are designed this way on purpose. It would be jolly strange if there was a possible combination of three digits which wasn't a valid number, or if there was a number below 1000 which couldn't be expressed by combining three digits!
This suggests a simple plan to me, which just involves arithmetic and doesn't require a deep understanding of Haskell*:
Generate a list of numbers between 0 and 10^n
Turn each number into a list of digits.
Step 2 is the fun part. To extract the digits (in base 10) of a three-digit number, you do this:
Take the quotient and remainder of your number with respect to 100. The quotient is the first digit of the number.
Take the remainder from step 1 and take its quotient and remainder with respect to 10. The quotient is the second digit.
The remainder from step 2 was the third digit. This is the same as taking the quotient with respect to 1.
For an n-digit number, we take the quotient n times, starting with 10^(n-1) and ending with 1. Each time, we use the remainder from the last step as the input to the next step. This suggests that our function to turn a number into a list of digits should be implemented as a fold: we'll thread the remainder through the operation and build a list as we go. (I'll leave it to you to figure out how this algorithm changes if you're not in base 10!)
Now let's implement that idea. We want calculate a specified number of digits, zero-padding when necessary, of a given number. What should the type of digits be?
digits :: Int -> Int -> [Int]
Hmm, it takes in a number of digits and an integer, and produces a list of integers representing the digits of the input integer. The list will contain single-digit integers, each one of which will be one digit of the input number.
digits numberOfDigits theNumber = reverse $ fst $ foldr step ([], theNumber) powersOfTen
where step exponent (digits, remainder) =
let (digit, newRemainder) = remainder `divMod` exponent
in (digit : digits, newRemainder)
powersOfTen = [10^n | n <- [0..(numberOfDigits-1)]]
What's striking to me is that this code looks quite similar to my English description of the arithmetic we wanted to perform. We generate a powers-of-ten table by exponentiating numbers from 0 upwards. Then we fold that table back up; at each step we put the quotient on the list of digits and send the remainder to the next step. We have to reverse the output list at the end because of the right-to-left way it got built.
By the way, the pattern of generating a list, transforming it, and then folding it back up is an idiomatic thing to do in Haskell. It's even got its own high-falutin' mathsy name, hylomorphism. GHC knows about this pattern too and can compile it into a tight loop, optimising away the very existence of the list you're working with.
Let's test it!
ghci> digits 3 123
[1, 2, 3]
ghci> digits 5 10101
[1, 0, 1, 0, 1]
ghci> digits 6 99
[0, 0, 0, 0, 9, 9]
It works like a charm! (Well, it misbehaves when numberOfDigits is too small for theNumber, but never mind about that.) Now we just have to generate a counting list of numbers on which to use digits.
combinationsOfDigits :: Int -> [[Int]]
combinationsOfDigits numberOfDigits = map (digits numberOfDigits) [0..(10^numberOfDigits)-1]
... and we've finished!
ghci> combinationsOfDigits 2
[[0,0],[0,1],[0,2],[0,3],[0,4],[0,5],[0,6],[0,7],[0,8],[0,9],[1,0],[1,1] ... [9,7],[9,8],[9,9]]
* For a version which does require a deep understanding of Haskell, see my other answer.
combos 1 list = map (\x -> [x]) list
combos n list = foldl (++) [] $ map (\x -> map (\y -> x:y) nxt) list
where nxt = combos (n-1) list
In your case
combos 3 [0..9]

Non-pointfree style is substantially slower

I have the following, oft-quoted code for calculating the nth Fibonacci number in Haskell:
fibonacci :: Int -> Integer
fibonacci = (map fib [0..] !!)
where fib 0 = 0
fib 1 = 1
fib n = fibonacci (n-2) + fibonacci (n-1)
Using this, I can do calls such as:
ghci> fibonacci 1000
and receive an almost instantaneous answer.
However, if I modify the above code so that it's not in pointfree style, i.e.
fibonacci :: Int -> Integer
fibonacci x = (map fib [0..] !!) x
where fib 0 = 0
fib 1 = 1
fib n = fibonacci (n-2) + fibonacci (n-1)
it is substantially slower. To the extent that a call such as
ghci> fibonacci 1000
hangs.
My understanding was that the above two pieces of code were equivalent, but GHCi begs to differ. Does anyone have an explanation for this behaviour?
To observe the difference, you should probably look at Core. My guess that this boils down to comparing (roughly)
let f = map fib [0..] in \x -> f !! x
to
\x -> let f = map fib [0..] in f !! x
The latter will recompute f from scratch on every invocation. The former does not, effectively caching the same f for each invocation.
It happens that in this specific case, GHC was able to optimize the second into the first, once optimization is enabled.
Note however that GHC does not always perform this transformation, since this is not always an optimization. The cache used by the first is kept in memory forever. This might lead to a waste of memory, depending on the function at hand.
I tried to find it but struck out. I think I have it on my PC at home.
What I read was that functions using fixed point were inherently faster.
There are other reasons for using fixed point. I encountered one in writing this iterative Fibonacci function. I wanted to see how an iterative version would perform then I realized I had no ready way to measure. I am a Haskell neophyte. But here is an iterative version for someone to test.
I could not get this to define unless I used the dot after the first last function.
I could not reduce it further. the [0,1] parameter is fixed and not to be supplied as a parameter value.
Prelude> fib = last . flip take (iterate (\ls -> ls ++ [last ls + last (init ls)]) [0,1])
Prelude> fib 25
[0,1,1,2,3,5,8,13,21,34,55,89,144,233,377,610,987,1597,2584,4181,6765,10946,17711,28657,46368,75025]

Most efficient way to get digit count of arbitrarily big number

What is the most efficient way to get the digits of a number?
Lets begin with an example:
Imagine the Fibonacci sequence. Now lets say we want to know which Fibonacci number is the first to have 1000 digits (in base 10 representation). Up to 308 digits (1476th Fibonacci number) we can easily do this by using logBase 10 <number>. If the number is greater than the 1476th Fibonacci number, logBase will return Infinity and the calculation will fail. The problem is that 308 is somewhat far away from 1000, which was our initial goal.
A possible solution is to convert the number we want to know the number of digits of to a string and use it's length to determine the digit count. This is a little bit inefficient for my purposes because trying this with 10000 takes its sweet time.
The most efficient method shown in other questions is hardcoding all possible cases which I really do not want to do, especially because the number of digits exceeds 10 as needed in the proposed solutions.
So to come back to my question: What is the best (most efficient) way to determine a base 10 numbers digit count? Is it really converting it to a string and using its length or are there any "hacker" tricks like 0x5f3759df?
Note: I appreciate solutions in any language, even if this is tagged "haskell".
Why not use div until it's no longer greater than 10?
digitCount :: Integer -> Int
digitCount = go 1 . abs
where
go ds n = if n >= 10 then go (ds + 1) (n `div` 10) else ds
This is O(n) complexity, where n is the number of digits, and you could speed it up easily by checking against 1000, then 100, then 10, but this will probably be sufficient for most uses.
For reference, on my not-so-great laptop running it only in GHCi and using the horribly inaccurate :set +s statistics flag:
> let x = 10 ^ 10000 :: Integer
> :force x
<prints out 10 ^ 10000>
> digitCount x
10001
it :: Int
(0.06 secs, 23759220 bytes)
So it seems pretty quick, it can churn through a 10001 digit number in less than a 10th of a second without optimizations.
If you really wanted the O(log(n)) complexity, I would recommend writing your own version where you divide by 2 each time, but that one is a little more involved and trickier than dividing by 10. For your purposes this version will easily compute the number of digits up to about 20000 digits without problems.
If you just want to find the first number with at least digitCount digits in a list, you could test each number in O(1) by checking if fibBeingTested >= 10digitCount - 1. This works since 10digitCount - 1 is the lowest number with at least digitCount digits:
import Data.List (find)
fibs :: [Integer]
-- ...
findFib :: Int -> Integer
findFib digitCount =
let Just solution = find (>= tenPower) fibs
in
solution
where
tenPower = 10 ^ (digitCount - 1)
We use digitCount - 1 because 10^1, for instance, is 10 which has two digits.
As a result of the O(1) complexity that this comparison has, you can find Fibonacci numbers very quickly. On my machine:
λ> :set +s
λ> findFib 10000
[... the first Fibonacci number with at least 10,000 digits ...]
(0.23 secs, 121255512 bytes)
If the list of fibs has already been computed up to the 10,000th digit Fibonacci (for example, if you run findFib 10000 twice) it's even faster, which shows that more computation is taking place in calculating each Fibonacci number than in finding the one you're looking for:
λ> findFib 10000 -- Second run of findFib 10000
[... the first Fibonacci number with at least 10,000 digits ...]
(0.04 secs, 9922000 bytes)
For just getting up to a Fibonacci number that has more than 1000 digits, length . show (on Integer) suffices.
GHCi> let fibs = Data.Function.fix $ (0:) . scanl (+) 1
GHCi> let digits = length . (show :: Integer -> String)
GHCi> :set +t +s
GHCi> fst . head . dropWhile ((1000>) . digits . snd) $ zip [0..] fibs
4782
it :: Integer
(0.10 secs, 149103264 bytes)
For floating point numbers (so you can use logBase) outside the range of Double look to the numbers package. They are down-right slow, but you do have to pay something for that type of accuracy.
You could always try binary search to find the number of digits of n: first find a k such that 10^2^k ≥ n, and then divide n succesively by 10^2^(k-1), 10^2^(k-2), ..., 10^2^0:
numDigits n = fst $ foldr step (1,n) tenToPow2s
where
pow2s = iterate (*2) 1
tenToPow2s = zip pow2s . takeWhile (<=n) . iterate (^2) $ 10
step (k,t) (d,n) = if n>=t then (d+k, n `div` t) else (d,n)
For the specific case of Fibonacci numbers you could also just try math: the n-th Fibonacci number F(n) is between (φ^n-1)/√5 and (φⁿ+1)/√5 so for the base 10 logarithm we have:
log(F(n)) - n log(φ) + log(√5) ∈ [log(1 - 1/φⁿ), log(1 + 1/φⁿ)]
That interval gets tiny right away.

Converting Integer into list of digits without 'mod' and 'div'

I currently have the Haskell function below which converts an integer into a list of digits taken from the original integer. My question is thus: Is there a way to do this without using mod and div? For example, if I wanted to do the same thing with a string I could create a function utilising other functions such as head and tail etc.
I struggled with this problem for a while before finally come to SO and finding the answer in another post. What got me asking this question is the fact that I would have never thought of using mod and div myself!
toDigits :: Integer -> [Integer]
toDigits n
| n < 1 = []
| otherwise = toDigits (n `div` 10) ++ [n `mod` 10]
You mentioned that you could do the same thing on strings with list operations. Indeed, that would be another way. You could convert the integer to a string and then convert each character to an integer:
import Data.Char (digitToInt)
toDigits :: Int -> [Int]
toDigits = map digitToInt . show
Here I used Int rather than Integer, but you can use Integer if you really want with a little more trouble:
toDigits :: Integer -> [Integer]
toDigits = map (fromIntegral . digitToInt) . show
#icktoofay's answer uses show, a generic way to convert some value to a String (in other words, get its string representation). A value should be of a type that is an instance of a typeclass Show. For example, Int is an instance of Show (enter :i Int in ghci and seek for a string instance Show Int -- Defined in `GHC.Show'). But a function isn't an instance of Show, so let f n = n in f will throw an error, because how would you convert a function to a string? (See also: If functions as instances of the Show typeclass). Anyway, using show function is idiomatic, so you can stick to it.
There is however a way to extract a digit from a number using logarithms, powers and integer divisions. Remember that you can remove digits from the left by finding a remainder, and remove digits from the right by integer division. In both cases, the right operand is some power of 10. For example:
*Main> 123 `mod` 10
3
*Main> 123 `div` 100
1
But how do you know, which power of 10 you should use to divide by? By finding a logarithm base 10: #digits of N = log10N + 1, e.g. log1012345 = 4. Unfortunately you can't use logBase, because it uses floating point arithmetic, which is inaccurate. For example:
*Main> logBase 10 1000
2.9999999999999996
You can use custom function iLogBase for integers—copy the code from the link into your source code. This way to find a first digit of a number I use the following code:
firstDigit :: (Integral a) => a -> a
firstDigit n = n `div` (10^log)
where log = fst $ iLogBase 10 n
Creating a more general function of finding an arbitrary digit of a number and converting a number into a list of digits is left to you as an exercise :).
Also, the code in your question is inefficient. List concatenation (++) operation has the complexity of O(n), that is, every time you want to append an element to and end of list, it has to add the left list to the right list one by one, until you have a resulting list. Check out the source for (++), basically [1,2,3] ++ [4] becomes 1 : 2 : 3 : [4], which is terribly inefficient, as it takes 3 cons (:) operations just to add a list. And as you append numbers to the end multiple times, it has to repeat the same process each time, therefore overall complexity of your function is O(n^2).
On the other hand (:) is instant, that is, has complexity of O(1). No matter how long is your list, prepending an element to the beginning is cheap. So instead of adding an element to the end, I would recommend, adding it to the beginning and an the end simply reversing the list once (for information, Lisp people call this push/nreverse idiom):
reverse $ (n `mod` 10) : toDigits (n `div` 10)

Haskell script running out of space

I'm using project Euler to teach myself Haskell, and I'm having some trouble reasoning about how my code is being executed by haskell. The second problem has me computing the sum of all even Fibonacci numbers up to 4 million. My script looks like this:
fibs :: [Integer]
fibs = 1 : 2 : [ a+b | (a,b) <- zip fibs (tail fibs)]
evens :: Integer -> Integer -> Integer
evens x sum | (even x) = x + sum
| otherwise = sum
main = do
print (foldr evens 0 (take 4000000 fibs))
Hugs gives the error "Garbage collection fails to reclaim sufficient space", which I assume means that the list entries are not released as they are consumed by foldr.
What do I need to do to fix this? I tried writing a tail-recursive (I think) version that used accumulators, but couldn't get that to work either.
Firstly, you shouldn't use hugs. It is a toy for teaching purposes only.
GHC, however, is a fast, multicore-ready optimizing compiler for Haskell. Get it here. In particular, it does strictness analysis, and compiles to native code.
The main thing that stands out about your code is the use of foldr on a very large list. Probably you want a tail recursive loop. Like so:
import Data.List
fibs = 0 : 1 : zipWith (+) fibs (tail fibs)
evens x sum | even x = x + sum
| otherwise = sum
-- sum of even fibs in first 4M fibs
main = print (foldl' evens 0 (take 4000000 fibs))
Besides all this, the first 4M even fibs will use a fair amount of space, so it'll take a while.
Here's the sum of the first 400k even fibs, to save you some time (21s). :-)
A number of observations / hints:
the x + sums from even aren't getting evaluated until the very end
You're taking the first 4,000,000 fibs, not the fibs up to 4,000,000
There is an easier way to do this
Edit in response to comment
I'm not going to tell you what the easier way is, since that's the fun of Project Euler problems. But I will ask you a bunch of questions:
How many even fibs can you have in a row?
How long can you go without an even fib?
If you sum up all the even fibs and all the odd fibs (do this by hand), what do you notice about the sums?
You understood the problem wrong. The actual problem wants you to sum all the even Fibonacci numbers such that the Fibonacci number itself doesn't exceed 4 million (which happens to be only the first 33 Fibonacci numbers).
You are evaluating four million elements of fibs. Those numbers grow exponentially. I don't know how many bytes are required to represent the millionth Fibonacci number; just the one-thousandth Fibonacci number has 211 decimal digits, so that's going to take 22 32-bit words just to hold the digits, never mind whatever overhead gmp imposes. And these grow exponentially.
Exercise: calculuate the amount of memory needed to hold four million Fibonacci numbers.
have a look at the Prelude functions takeWhile, filter, even, and sum
takeWhile (<40) [0..]
filter even $ takeWhile (<40) [0..]
put 'em together:
ans = sum $ filter even $ takeWhile (< 4* 10^6) fibs

Resources