Why is this Haskell expression so slow? - haskell

I'm working on Project Euler Problem 14. Here's my solution.
import Data.List
collatzLength :: Int->Int
collatzLength 1 = 1
collatzLength n | odd n = 1 + collatzLength (3 * n + 1)
| even n = 1 + collatzLength (n `quot` 2)
maxTuple :: (Int, Int)->(Int, Int)->Ordering
maxTuple (x1, x2) (y1, y2) | x1 > y1 = GT
| x1 < y1 = LT
| otherwise = EQ
I'm running the following out of GHCi
maximumBy maxTuple [(collatzLength x, x) | x <- [1..1000000]]
I know that if Haskell evaluated strictly, the time on this would be something like O(n3). Since Haskell evaluates lazily though, it seems like this should be some constant multiple of n. This has been running for nearly an hour now. Seems very unreasonable. Does anyone have any idea why?

You're assuming that the collatzLength function will be memoized. Haskell does not do automatic memoization. You'll need to do that yourself. Here's an example using the data-memocombinators package.
import Data.List
import Data.Ord
import qualified Data.MemoCombinators as Memo
collatzLength :: Integer -> Integer
collatzLength = Memo.arrayRange (1,1000000) collatzLength'
where
collatzLength' 1 = 1
collatzLength' n | odd n = 1 + collatzLength (3 * n + 1)
| even n = 1 + collatzLength (n `quot` 2)
main = print $ foldl1' max $ [(collatzLength n, n) | n <- [1..1000000]]
This runs in about 1 second when compiled with -O2.

For being able to find a maximum of a list, the whole list needs to be evaluated.
So it will calculate collatzLength from 1 to 1000000 and collatzLength is recursive. The worst thing is, that your definition of collatzLength is even not tail-recursive.

cL is short for collatzLength
cL!!n stands for collatzLength n
cL :: [Int]
cL = 1 : 1 : [ 1 + (if odd n then cL!!(3*n+1) else cL!!(n `div` 2)) | n <- [2..]]
Simple test:
ghci> cL !! 13
10

Related

Summing a finite prefix of an infinite series

The number π can be calculated with the following infinite series sum:
I want to define a Haskell function roughlyPI that, given a natural number k, calculates the series sum from 0 to the k value.
Example: roughlyPi 1000 (or whatever) => 3.1415926535897922
What I did was this (in VS Code):
roughlyPI :: Double -> Double
roughlyPI 0 = 2
roughlyPI n = e1/e2 + (roughlyPI (n-1))
where
e1 = 2**(n+1)*(factorial n)**2
e2 = factorial (2*n +1)
factorial 0 = 1
factorial n = n * factorial (n-1)
but it doesn't really work....
*Main> roughlyPI 100
NaN
I don't know what's wrong. I'm new to Haskell, by the way.
All I really want is to be able to type in a number that will give me PI at the end. It can't be that hard...
As mentioned in the comments, we need to avoid large divisions and instead intersperse smaller divisions within the factorials. We use Double for representing PI but even Double has its limits. For instance 1 / 0 == Infinity and (1 / 0) / (1 / 0) == Infinity / Infinity == NaN.
Luckily, we can use algebra to simplify the formula and hopefully delay the blowup of our Doubles. By dividing within our factorial the numbers don't grow too unwieldy too quickly.
This solution will calculate roughlyPI 1000, but it fails on 1023 with NaN because 2 ^ 1024 :: Double == Infinity. Note how each iteration of fac has a division as well as a multiplication to help keep the numbers from blowing up. If you are trying to approximate PI with a computer, I believe there are better algorithms, but I tried to keep it as conceptually close to your attempt as possible.
roughlyPI :: Integer -> Double
roughlyPI 0 = 2
roughlyPI k = e + roughlyPI (k - 1)
where
k' = fromIntegral k
e = 2 ** (k' + 1) * fac k / (2 * k' + 1)
where
fac 1 = 1 / (k' + 1)
fac p = (fromIntegral p / (k' + fromIntegral p)) * fac (p - 1)
We can do better than having a blowup of Double after 1000 by doing computations with Rationals then converting to Double with realToFrac (credit to #leftaroundabout):
roughlyPI' :: Integer -> Double
roughlyPI' = realToFrac . go
where
go 0 = 2
go k = e + go (k - 1)
where
e = 2 ^ (k + 1) * fac k / (2 * fromIntegral k + 1)
where
fac 1 = 1 % (k + 1)
fac p = (p % (k + p)) * fac (p - 1)
For further reference see Wikipedia page on approximations of PI
P.S. Sorry for the bulky equations, stackoverflow does not support LaTex
First note that your code actually works:
*Main> roughlyPI 91
3.1415926535897922
The problem, as was already said, is that when you try to make the approximation better, the factorial terms become too big to be representable in double-precision floats. The simplest – albeit somewhat brute-force – way to fix that is to do all the computation in rational arithmetic instead. Because numerical operations in Haskell are polymorphic, this works with almost the same code as you have, only the ** operator can't be used since that allows fractional exponents (which are in general irrational). Instead, you should use integer exponents, which is anyway the conceptually right thing. That requires a few fromIntegral:
roughlyPI :: Integer -> Rational
roughlyPI 0 = 2
roughlyPI n = e1/e2 + (roughlyPI (n-1))
where
e1 = 2^(n+1)*fromIntegral (factorial n^2)
e2 = fromIntegral . factorial $ 2*n + 1
factorial 0 = 1
factorial n = n * factorial (n-1)
This now works also for much higher degrees of approximation, although it takes a long time to carry around the giant fractions involved:
*Main> realToFrac $ roughlyPI 1000
3.141592653589793
The way to go in such cases is to calculate the ratio of consecutive terms and calculate the terms by rolling multiplications of the ratios:
-- 1. -------------
pi1 n = Sum { k = 0 .. n } T(k)
where
T(k) = 2^(k+1)(k!)^2 / (2k+1)!
-- 2. -------------
ts2 = [ 2^(k+1)*(k!)^2 / (2k+1)! | k <- [0..] ]
pis2 = scanl1 (+) ts2
pi2 n = pis2 !! n
-- 3. -------------
T(k) = 2^(k+1)(k!)^2 / (2k+1)!
T(k+1) = 2^(k+2)((k+1)!)^2 / (2(k+1)+1)!
= T(k) 2 (k+1)^2 / (2k+2) (2k+3)
= T(k) (k+1)^2 / ( k+1) (2k+3)
= T(k) (k+1) / (k+1 + k+2)
= T(k) / (1 + (k+2)/(k+1))
= T(k) / (2 + 1 /(k+1))
-- 4. -------------
ts4 = scanl (/) 2 [ 2 + 1/(k+1) | k <- [0..]] :: [Double]
pis4 = scanl1 (+) ts4
pi4 n = pis4 !! n
This way we share and reuse the calculations as much as possible. This leads to the most efficient code, hopefully leading to the smallest cumulative numerical error. The formula also turned out to be exceptionally simple, and could even be simplified further as ts5 = scanl (/) 2 [ 2 + recip k | k <- [1..]].
Trying it out:
> pis2 = scanl1 (+) $ [ fromIntegral (2^(k+1))*fromIntegral (product[1..k])^2 /
fromIntegral (product[1..(2*k+1)]) | k <- [0..] ] :: [Double]
> take 8 $ drop 30 pis2
[3.1415926533011587,3.141592653447635,3.141592653519746,3.1415926535552634,
3.141592653572765,3.1415926535813923,3.141592653585647,3.141592653587746]
> take 8 $ drop 90 pis2
[3.1415926535897922,3.1415926535897922,NaN,NaN,NaN,NaN,NaN,NaN]
> take 8 $ drop 30 pis4
[3.1415926533011587,3.141592653447635,3.141592653519746,3.1415926535552634,
3.141592653572765,3.1415926535813923,3.141592653585647,3.141592653587746]
> take 8 $ drop 90 pis4
[3.1415926535897922,3.1415926535897922,3.1415926535897922,3.1415926535897922,
3.1415926535897922,3.1415926535897922,3.1415926535897922,3.1415926535897922]
> pis4 !! 1000
3.1415926535897922

What's wrong with my Fibonacci implementation?

I'm trying to transform my recursive Fibonacci function into an iterative solution. I tried the following:
fib_itt :: Int -> Int
fib_itt x = fib_itt' x 0
where
fib_itt' 0 y = 0
fib_itt' 1 y = y + 1
fib_itt' x y = fib_itt' (x-1) (y + ((x - 1) + (x - 2)))
I want to save the result into variable y and return it when the x y matches with 1 y, but it doesn't work as expected. For fib_itt 0 and fib_itt 1, it works correctly, but for n > 1, it doesn't work. For example, fib_rek 2 returns 1 and fib_rek 3 returns 2.
Your algorithm is wrong: in y + (x-1) + (x-2) you only add up consecutive numbers - not the numbers in the fib.series.
It seems like you tried some kind of pair-approach (I think) - and yes it's a good idea and can be done like this:
fib :: Int -> Int
fib k = snd $ fibIt k (0, 1)
fibIt :: Int -> (Int, Int) -> (Int, Int)
fibIt 0 x = x
fibIt k (n,n') = fibIt (k-1) (n',n+n')
as you can see: this passes the two needed parts (the last and second-to-last number) around as a pair of numbers and keeps track of the iteration with k.
Then it just gives back the second part of this tuple in fib (if you use the first you will get 0,1,1,2,3,... but of course you can adjust the initial tuple as well if you like (fib k = fst $ fibIt k (1, 1)).
by the way this idea directly leeds to this nice definition of the fib.sequence if you factor the iteration out to iterate ;)
fibs :: [Int]
fibs = map fst $ iterate next (1,1)
where
next (n,n') = (n',n+n')
fib :: Int -> Int
fib k = fibs !! k

Memoization not function correctly

I have the following code:
pB :: [(Integer, Integer, Integer)] -> Integer -> Integer -> [(Integer, Integer, Integer)]
pB lst x y
| screenList lst x y /= -1 = lst
| abs x > y = lst++[(x, y, 0)]
| y == 1 = lst++[(x, y, 1)]
| otherwise = lst++newEls
where
newEls = (pB lst x (y-1))++(pB lst (x-1) (y-1))++(pB lst (x+1) (y-1))
getFirst :: (Integer, Integer, Integer) -> Integer
getFirst (x, _, _) = x
getSecond :: (Integer, Integer, Integer) -> Integer
getSecond (_, y, _) = y
getThird :: (Integer, Integer, Integer) -> Integer
getThird (_, _, z) = z
screenList :: [(Integer, Integer, Integer)] -> Integer -> Integer -> Integer
screenList [] _ _ = -1
screenList lst x y
| getFirst leader == x && getSecond leader == y = getThird leader
| otherwise = screenList (tail lst) x y
where
leader = head lst
Which, by running an inefficient solution of (Ie: One which didn't keep track of values which had already been computed) returned the value 51 for input x = 0, y = 5. Now, running this with input [] 0 5 I should be able to find (0,5,51) in the output, which unfortunately I don't.
I have been looking at it for a few hours, but can't seem to understand where I'm going wrong.
Does anybody have any suggestions?
EDIT: Inefficient version:
nPB :: Integer -> Integer -> Integer
nPB x y
| abs x > y = 0
| y == 1 = 1
| otherwise = (nPB x (y-1)) + (nPB (x-1) (y-1)) + (nPB (x+1) (y-1))
Administrivia
It is rather hard to tell what you are asking, but I gather that you have a function that is terribly slow and you have tried to manually memoize this function. I don't think anyone is trying to understand your attempt, so if this question is primarily about manually memoizing a function and/or fixing your code then please submit another question that more clearly outlines its design.
In the remainder of this question I will show you how to use monad-memo and memo-trie to memoize the function you've named nPB.
Memoizing nPB with monad-memo
The nPB function is a prime target for memoization. This is readily apparent by glancing at it's three recursive calls. The below small benchmark takes 1 second to run, lets see if we can do better.
nPB :: Integer -> Integer -> Integer
nPB x y
| abs x > y = 0
| y == 1 = 1
| otherwise = (nPB x (y-1)) + (nPB (x-1) (y-1)) + (nPB (x+1) (y-1))
main = print (nPB 10 20)
In a previous answer I used the monad-memo package. Using monad-memo involves making your function monadic, which is syntactically more invasive than the other packages I know of, but I've always have good performance.
To use the package you simply:
make sure to call one of the memo functions with the target function as the first parameter.
Be sure to return your final result
Adjust your type signatures to include a constraint of MonadMemo and adjust the result to be some monad m.
Run the function with startEvalMemo
The code is:
{-# LANGUAGE FlexibleContexts #-}
import Control.Monad.Memo
nPB :: (MonadMemo (Integer,Integer) Integer m) => Integer -> Integer -> m Integer
nPB x y
| abs x > y = return 0
| y == 1 = return 1
| otherwise = do
t1 <- for2 memo nPB x (y-1)
t2 <- for2 memo nPB (x-1) (y-1)
t3 <- for2 memo nPB (x+1) (y-1)
return (t1+t2+t3)
main = print (startEvalMemo $ nPB 10 20)
Memoizing nPB with MemoTrie
The most common Haskell memoization package in use is MemoTrie. This is also a syntactically cleaner memoization package as it does not requires any sort of monad, but it currently suffers from a slight performance issue when using Integer as we shall soon see (bug has been reported, use of Int and other types seems fine).
There is much less to do to use MemoTrie, just replace your recursive calls with memoN where N is the number of arguments:
import Data.MemoTrie
nPB :: Integer -> Integer -> Integer
nPB x y
| abs x > y = 0
| y == 1 = 1
| otherwise = (memo2 nPB x (y-1)) + (memo2 nPB (x-1) (y-1)) + (memo2 nPB (x+1) (y-1))
main = print (nPB 10 20)
Performance
Using a type of Integer the performance is:
$ ghc original.hs -O2 && time ./original
8533660
real 0m1.047s
$ ghc monad-memo.hs -O2 && time ./monad-memo
8533660
real 0m0.002s
$ ghc memotrie.hs -O2 && time ./memotrie
8533660
real 0m0.331s
And using Int:
$ ghc original.hs -O2 && time ./original
8533660
real 0m0.190s
$ ghc monad-memo.hs -O2 && time ./monad-memo
8533660
real 0m0.002s
$ ghc memotrie.hs -O2 && time ./memotrie
8533660
real 0m0.002s
I guess this question is about memoization. I'm not sure how you are trying to implement this, but there are two "standard" ways of memoizing functions: use one of the libraries, or explicitly memoize the data yourself.
import Data.Function.Memoize (memoize)
import Data.MemoTrie (memo2)
import Data.Map (fromList, (!))
import System.Environment
test0 :: Integer -> Integer -> Integer
test0 x y
| abs x > y = 0
| y == 1 = 1
| otherwise = (test0 x (y-1)) + (test0 (x-1) (y-1)) + (test0 (x+1) (y-1))
test1 :: Integer -> Integer -> Integer
test1 = memoize test0
test2 :: Integer -> Integer -> Integer
test2 = memo2 test0
But it doesn't look like the memo libraries I tried are able to handle this, or I did something wrong, I've never really used these libraries: (The test code is at the bottom - these results from x,y = 0,18)
test0 : Total time 9.06s
test1 : Total time 9.08s
test2 : Total time 32.78s
So lets try manual memoization. The principle is simple: construct your domain in such a way that later elements only require the value of earlier elements. This is very simple here since your function always recurses on y-1, so you just need to build the domain moving up the rows. Then write a function which looks up earlier values in a table (here I use Data.Map.Map), and map over the domain:
test3 :: Integer -> Integer -> Integer
test3 x' y' = m ! (x', y')
where
xs = concat [ map (flip (,) y) [-x + x' .. x + x'] | (x, y) <- zip [y', y' - 1 .. 1] [1..]]
m = fromList [ ((x,y), go x y) | (x,y) <- xs]
go x y
| abs x > y = 0
| y == 1 = 1
| otherwise = m ! (x, y-1) + m ! (x-1, y-1) + m ! (x+1, y-1)
I actually construct a domain that is much than needed for simplicity, but the performance penalty is small since the extra domain is all 0 anyways. Taking a look at the performance, it is almost instant (Total time 0.02s). Even with x,y=0,1000 it still only takes 7 seconds. Although with large inputs you end up wasting a lot of time on GC.
-- usage: ghc --make -O2 -rtsopts Main.hs && Main n x y +RTS -sstderr
main = do
[n, x, y] <- getArgs
print $ (d !! (read n)) x y
where d = [test0, test1, test2, test3]
Here is the version written with memoFix2. Better performance than any other versions.
test4 :: Integer -> Integer -> Integer
test4 = memoFix2 go where
go r x y
| abs x > y = 0
| y == 1 = 1
| otherwise = (r x (y-1)) + (r (x-1) (y-1)) + (r (x+1) (y-1))

Application ($) operator acting unexpectedly

I'm writing a function that generates a Collatz chain based on a starting number, but I've run into an unexpected problem
here's the code:
-- original, works
collatzA :: Integer -> [Integer]
collatzA 1 = [1]
collatzA n
| even n = n:collatzA (n `div` 2)
| odd n = n:collatzA (n * 3 + 1)
-- what I'm trying to do, won't compile, gives nasty errors
collatzB :: Integer -> [Integer]
collatzB 1 = [1]
collatzB n
| even n = n:collatzB $ n `div` 2
| odd n = n:collatzB $ n * 3 + 1
-- attempted solution, works but re-adds the parentheses I tried to get rid of
collatzC :: Integer -> [Integer]
collatzC 1 = [1]
collatzC n
| even n = n: (collatzC $ n `div` 2)
| odd n = n: (collatzC $ n * 3 + 1)
so why is it that collatzA and collatzC work, but collatzB doesn't?
This problem is due to operator precedence or fixity.
For example (taken from RWH, which I highly recommend) (+) is declared as left-associative with fixity 6 and (*) is declared as left-associative with fixity 7. This means the expression
8 + 7 + 6 * 5 * 4
is parsed as
(8 + 7) + ((6 * 5) * 4)
Similarly in your example, the cons operator (:) is right-associative and has fixity 5, while the application operator ($) is right-associative and has fixity 0.
Since ($) has a lower fixity than (:), the recursive call to collatzB is "grabbed" by (:)
n = (n:collatzB) $ (n `div` 2)
This link contains the fixity information for the Prelude functions, and you can also see this post for more information.
The problem is that f $ g gets viewed as (f) $ (g) by the compiler. If you have f $ g $ h, the compiler sees it as (f) $ ((g) $ (h)), and you can extend this pattern in general. So when you have
n : collatzB $ n `div` 2`
the compiler sees this as
(n : collatzB) $ (n `div` 2)
And (n : collatzB) doesn't type check.
This is due to the fixity of $ and that its right associative (infixr).
If the parens bother you that much (which they shouldn't), you could define a new operator as
infixr 1 $:
($:) :: a -> (b -> [a]) -> b -> [a]
a $: f = \x -> a : f x
collatzB :: Integer -> [Integer]
collatzB 1 = [1]
collatzB n
| even n = n $: collatzB $ n `div` 2
| odd n = n $: collatzB $ n * 3 + 1
But this honestly would cause more confusion than it's worth. I would just stick with parens personally.

Fibonacci's closed-form expression, the ST monad, and Haskell

Two recent questions about Fibonacci's closed-form expression (here and here) as well as the HaskellWiki's page about the ST monad motivated me to try and compare two ways of calculating Fibonacci numbers.
The first implementation uses the closed-form expression together with rationals as seen in hammar's answer here (where Fib is a datatype abstracting numbers of the form a+b*√5):
fibRational :: Integer -> Integer
fibRational n = divSq5 $ phi^n - (1-phi)^n
where
phi = Fib (1/2) (1/2)
divSq5 (Fib 0 b) = numerator b
The second implementation is from the HaskellWiki's page about the ST monad, with some added strictness that was necessary in order to avoid a stack overflow:
fibST :: Integer -> Integer
fibST n | n < 2 = n
fibST n = runST $ do
x <- newSTRef 0
y <- newSTRef 1
fibST' n x y
where
fibST' 0 x _ = readSTRef x
fibST' !n x y = do
x' <- readSTRef x
y' <- readSTRef y
y' `seq` writeSTRef x y'
x' `seq` writeSTRef y (x'+y')
fibST' (n-1) x y
For reference, here's also the full code that I used for testing:
{-# LANGUAGE BangPatterns #-}
import Data.Ratio
import Data.STRef.Strict
import Control.Monad.ST.Strict
import System.Environment
data Fib =
Fib !Rational !Rational
deriving (Eq, Show)
instance Num Fib where
negate (Fib a b) = Fib (-a) (-b)
(Fib a b) + (Fib c d) = Fib (a+c) (b+d)
(Fib a b) * (Fib c d) = Fib (a*c+5*b*d) (a*d+b*c)
fromInteger i = Fib (fromInteger i) 0
abs = undefined
signum = undefined
fibRational :: Integer -> Integer
fibRational n = divSq5 $ phi^n - (1-phi)^n
where
phi = Fib (1/2) (1/2)
divSq5 (Fib 0 b) = numerator b
fibST :: Integer -> Integer
fibST n | n < 2 = n
fibST n = runST $ do
x <- newSTRef 0
y <- newSTRef 1
fibST' n x y
where
fibST' 0 x _ = readSTRef x
fibST' !n x y = do
x' <- readSTRef x
y' <- readSTRef y
y' `seq` writeSTRef x y'
x' `seq` writeSTRef y (x'+y')
fibST' (n-1) x y
main = do
(m:n:_) <- getArgs
let n' = read n
st = fibST n'
rt = fibRational n'
case m of
"st" -> print st
"rt" -> print rt
"cm" -> print (st == rt)
Now it turns out that the ST version is significantly slower than the closed-form version, although I'm not a hundred percent sure why:
# time ./fib rt 1000000 >/dev/null
./fib rt 1000000 > /dev/null 0.23s user 0.00s system 99% cpu 0.235 total
# time ./fib st 1000000 >/dev/null
./fib st 1000000 > /dev/null 11.35s user 0.06s system 99% cpu 11.422 total
So my question is: Can someone help me understand why the first implementation is so much faster? Is it algorithmic complexity, overhead or something else entirely? (I checked that both functions yield the same result). Thanks!
You are comparing very different versions here. To make it fair, here is an implementation that is equivalent to the ST solution you give, but in pure Haskell:
fibIt :: Integer -> Integer
fibIt n | n < 2 = n
fibIt n = go 1 1 (n-2)
where go !_x !y 0 = y
go !x !y i = go y (x+y) (i-1)
This one seems to perform exactly as good or bad as the ST version (both 10s here). The runtime is most likely dominated by all the Integer additions, overhead is therefore too low to be measurable.
First, the two implementations use two very different algorithms with different asymptotic complexity (well, depending on what the complexity of the Integer operations are).
Second, the st implementation is using references. References are (comparatively) slow in ghc. (Because updating a reference needs a GC write barrier due to the generational garbage collector.)
So, you're comparing two functions that differ both in algorithm an implementation technique.
You should rewrite the second one not to use references, that way you can compare just algorithms. Or rewrite the first one to use references. But why use references when it's the wrong thing? :)
You can compare the algorithmic complexities.
The first is O(1);
the second is O(n)

Resources