Haskell dynamic program efficiency

Haskell dynamic program efficiency - haskell

I am learning more about dynamic programming and was trying to implement it in haskell. I was running tests with different ways to write the algorithms and found that one is faster than the other. Here it is in the fibonacci problem
fib1 :: [Integer]
fib1 = 0:1:zipWith (+) fib1 (tail fib1)
fib2 :: [Integer]
fib2 = 0:1:[(fib2 !! (n-1)) + (fib2 !! (n-2)) | n <- [2..]]
fib1 is much faster than fib2 but I can't tell why. fib2 seems intuitive, the nth number is (n-1)st plus the (n-2)nd.
And I get fib1, but it looks like it is zipping over the whole list everytime so wouldn't that take longer than. Just calculating the next index?

Lists in Haskell are lazy. They're being calculated as they're being used, but not further.
The function fib1 indeed calculates the whole list, but only does it once, and only up to the index you're asking for.
The function fib2 does a lot of extra work: it potentially calculates elements many, many times over.
Just try to do it with pen and paper. For example, in the case of fib2 !! 5, the list needs to be expanded up to index 5. Calculating fib2 !! 0 and fib2 !! 1 takes little time, as they are constants. The next element, fib2 !! 2 is calculated by adding fib2 !! 0 and fib2 !! 1, and then fib2 !! 3 = fib2 !! 1 + fib2 !! 2, and so on.
BUT.
The most important thing to note here, is that the compiler and/or runtime does not memoize the fib2 function, meaning: it does not remembers previous calculations. So every time the code hits a fib2 !! n it starts calculating it all over again, it doesn't matter how many time this has been done before, even if this happened in the very same (recursive) function call.
Regarding computational efficiency, your fib2 implementation is equivalent to this:
fib3' :: Integer -> Integer
fib3' 0 = 0
fib3' 1 = 1
fib3' n = fib3' (n - 2) + fib3' (n - 1)
fib3 :: [Integer]
fib3 = [fib3' n | n <- [0..]]
which suffers from the same inefficiency, I merely factored out the list part.
On the other hand, fib1 takes advantage of the previous calculations, by using them to avoid re-calculating them. And that's the core idea behind dynamic programming: use a data structure that can be used to store and retrieve results of previous calculations to exchange a potentially expensive recursive function call to a - hpefully - much cheap lookup.

#netom sorry but I don't think that's what's happening. I ran some tests on time and to calculate the 10000th number took 0.7 seconds. In the same run it was instant to calculate the 10000th + 9999th (the 10001th number) showing it remembered.
I then tested the time it took to freshly calculate the 10001st and it took the same time to calculate the 10001st as if it calculated the 10000 and remembered all the rest. To calculate the 10001st, it does not calculate for 10000 and 9999 (in separate recursions) it behaves like you'd expect if it just indexed the remembered list.
The recursive function however takes almost twice as long! So they're both using dynamic programming correctly. But as I've found, the fib2 takes O(n) each step to access the array but fib1 zips it in O(1) each step.

Related

How to make this Haskell program run faster

So I've been trying to learn Haskell by solving some problems on Codeforce.
And I am getting a lot of TLE (Time Limit Exceed) even though I think my time complexity is optimal.
My question is: is the way I wrote this program that makes it slow?
For example, here is the problem.
Basically the answer is to find an for a given n , where
an = 2*an-1 + D(n) and D(n) = the difference of the number of divisors between n and n-1.
(update: the top limit for n is 106).
Below is my program.
import qualified Data.Map.Strict as Map
main = do t <- read <$> getLine
putStrLn . show $ solve t
solve :: Integer -> Integer
solve 0 = 1
solve 1 = 1
solve n = (2*(solve (n-1)) + (fact n) - (fact (n-1))) `mod` 998244353
where fact n = foldl (\s -> \t -> s*(snd t + 1)) 1 (Map.toList . factorization $ n)
--the number of divisors of a number
--copied from Internet,infinite prime list
primes :: [Integer]
primes = 2: 3: sieve (tail primes) [5,7..]
where
sieve (p:ps) xs = h ++ sieve ps [x | x <- t, x `rem` p /= 0]
where (h,~(_:t)) = span (< p*p) xs
--make factorization of a number
factorization :: Integer -> Map.Map Integer Integer
factorization 1 = Map.fromList []
factorization x = Map.insertWith (+) factor 1 (factorization (x `div` factor))
where factor = head $ filter (\s -> (x `mod` s) == 0) ls
ls = primes
This program failed to solve in the time limit.
So could anyone point me out where did I do wrong and how to fix it?
Or it just impossible to solve this problem using Haskell in time limit?

There are many ways in which your time complexity is not optimal. The most obvious one is a prime finder using trial division instead of, e.g., a sieve. Maybe it's fine because you only compute the primes once, but it does not inspire confidence.
factorization also has at least one glaring problem. Consider factoring a number like 78893012641, whose prime factorization is 280879^2. You will search each prime number up to 280879: expensive, but pretty much unavoidable. However, at this point you divide by 280879 and then try to factorize 280879, starting from 2 and scanning all the small primes again even though you just found out none of them are a factor!
As Li-yao Xia says in a comment, I would also be suspicious of the multiplication of very large Integers before taking their modulus, instead of taking a modulus after each multiplication.

You haven't copied the right piece of code from the "Internet". You should've instead copied primesTMWE for the primes list, but more importantly, primeFactors for the factorization algorithm.
Your foldl based calculation of the number of divisors from a number's factorization is perfectly fine, except perhaps foldl' should be used instead.
Notice that both solve n and solve (n-1) calculate fact (n-1), so better precalculate all of them..... perhaps a better algorithm exists to find the numbers of divisors for all numbers from 1 to n than calculating it for each number separately.
I suspect even with the right algorithms (which I link above) it's going to be tough, time-wise, if you're going to factorize each number independently (O(n) numbers, O(n1/2)) time to factorize each... each prime, at least).
Perhaps the thing to try here is the smallest-factor sieve which can be built in O(n log log n) time as usual with the sieve of Eratosthenes, and once it's built it lets you find the factorization of each number in O(log log n) time (it's the average number of prime factors for a number). It will have to be built up to n though (you can special-case the evens to halve the space requirements of course; or 6-coprimes to save another 1/6th). Probably as an STUArray (that link is an example; better codes can be found here on SO).
The smallest-factor sieve is just like the sieve of Eratosthenes, except it uses the smallest factor, not just a Boolean, as a mark.
To find a number's factorization then we just repeatedly delete by a number's smallest factor, n / sf(n) =: n1, repeating for n1 / sf(n1) =: n2, then n2, etc. until we hit a prime (which is any number which has itself as the smallest factor).
Since you only use those factors to calculate the number's total number of divisors, you can fuse the two calculations together into one joined loop, for extra efficiency.

Non-pointfree style is substantially slower

I have the following, oft-quoted code for calculating the nth Fibonacci number in Haskell:
fibonacci :: Int -> Integer
fibonacci = (map fib [0..] !!)
where fib 0 = 0
fib 1 = 1
fib n = fibonacci (n-2) + fibonacci (n-1)
Using this, I can do calls such as:
ghci> fibonacci 1000
and receive an almost instantaneous answer.
However, if I modify the above code so that it's not in pointfree style, i.e.
fibonacci :: Int -> Integer
fibonacci x = (map fib [0..] !!) x
where fib 0 = 0
fib 1 = 1
fib n = fibonacci (n-2) + fibonacci (n-1)
it is substantially slower. To the extent that a call such as
ghci> fibonacci 1000
hangs.
My understanding was that the above two pieces of code were equivalent, but GHCi begs to differ. Does anyone have an explanation for this behaviour?

To observe the difference, you should probably look at Core. My guess that this boils down to comparing (roughly)
let f = map fib [0..] in \x -> f !! x
to
\x -> let f = map fib [0..] in f !! x
The latter will recompute f from scratch on every invocation. The former does not, effectively caching the same f for each invocation.
It happens that in this specific case, GHC was able to optimize the second into the first, once optimization is enabled.
Note however that GHC does not always perform this transformation, since this is not always an optimization. The cache used by the first is kept in memory forever. This might lead to a waste of memory, depending on the function at hand.

I tried to find it but struck out. I think I have it on my PC at home.
What I read was that functions using fixed point were inherently faster.
There are other reasons for using fixed point. I encountered one in writing this iterative Fibonacci function. I wanted to see how an iterative version would perform then I realized I had no ready way to measure. I am a Haskell neophyte. But here is an iterative version for someone to test.
I could not get this to define unless I used the dot after the first last function.
I could not reduce it further. the [0,1] parameter is fixed and not to be supplied as a parameter value.
Prelude> fib = last . flip take (iterate (\ls -> ls ++ [last ls + last (init ls)]) [0,1])
Prelude> fib 25
[0,1,1,2,3,5,8,13,21,34,55,89,144,233,377,610,987,1597,2584,4181,6765,10946,17711,28657,46368,75025]

Using non-deterministic list monad to find long Collatz sequences

I wrote the following code to solve Project Euler's No. 14:
The following iterative (Collatz) sequence is defined for the set of positive integers:
n → n/2 (n is even)
n → 3n + 1 (n is odd)
Q: Which starting number, under one million, produces the longest chain?
And my code:
collatz :: Integer -> [Integer]
collatz 1 = [1]
collatz n =
filter (< 1000000) prev >>= poss
where prev = collatz (n - 1)
poss :: Integer -> [Integer]
poss prev
| even prev && prev `mod` 3 == 1 && (prev - 1) `div` 3 > 1 = [2 * prev, (prev - 1) `div` 3]
| otherwise = [2 * prev]
Where collatz n returns a list of numbers that will generate a Collatz chain of length n. The problem is, I can only either not restrict the result or restrict the whole chain, instead of only the seed number, to be under 1000,000. Is it possible to use this model to solve the problem at all?

I think that this approach - while interesting - is fundamentally doomed. Suppose I discover that all the seeds which result in a chain of length 500 are above 2,000,000. How can I know that I won't find that in three more steps there's a seed under 1,000,000 that gets me there? I see no way to know when you're done.
The only viable approach I see to this problem is to compute the collatz length for every number from 1 to 999,999 and then do something like:
main :: IO ()
main = do
let collatzMax = maximumBy (compare `on` collatzLength) [1..999999]
print collatzMax
On the other hand, this provides a great opportunity to learn about CAFs since the function collatzLength could be naively defined as:
collatzLength 1 = 1
collatzLength n | n `mod` 2 == 0 = 1 + collatzLength (n `div` 2)
collatzLength n = 1 + collatzLength (3 * n + 1)
And that kind of recursion screams out for a CAF.
Sure, there are memoization modules that will go and build the CAF for you, but building one yourself is a useful exercise. It's a whole little mini-course in lazy infinitely-recursive data structures.
If that defeats you, you can glance at this spoiler of how to use a CAF and then rewrite it using a different data structure. (what about a 10-way tree instead of a binary tree? What about traversing the tree in a different order? Can you remove the call to showIntAtBase?)

Your idea is interesting, although not the most efficient one. It could be worth trying, although it'll be probably memory intensive. Some thoughts:
As some chains can go over 1000000, so you can't just filter out everything less in collatz. You need to keep all the numbers in each pass.
Calling collatz this way is inefficient, as it computes the sets all over again. Making it an infinite list that shares values would be more efficient:
collatz :: [[Integer]]
collatz = [1] : map (>>= poss) collatz
You need to figure out when you're done. For this you'd need to go through the number lists generated by collatz and count how many of them are below 1000000. When you have seen all the numbers below the limit, the last list will contain the numbers with the longest chain.
That said, I'm afraid this approach isn't computationally feasible. In particular, you'll generate exponentially many numbers and exponentially large ones. For example, if the longest chain would be 500, the result of collatz in that step would contain numbers up to 2^500. And as mentioned, there is no way to tell which of these huge numbers might be the one leading to the solution, so you can't just discard them.

Haskell inefficient fibonacci implementation

I am new to haskell and just learning the fun of functional programming. but have run into trouble right away with an implementation of the fibonacci function. Please find the code below.
--fibonacci :: Num -> [Num]
fibonacci 1 = [1]
fibonacci 2 = [1,1]
--fibonacci 3 = [2]
--fibonacci n = fibonacci n-1
fibonacci n = fibonacci (n-1) ++ [last(fibonacci (n-1)) + last(fibonacci (n-2))]
Rather awkward, I know. I can't find time to look up and write a better one. Though I wonder what makes this so inefficient. I know I should look it up, just hoping someone would feel the need to be pedagogic and spare me the effort.

orangegoat's answer and Sec Oe's answer contain a link to probably the best place to learn how to properly write the fibonacci sequence in Haskell, but here's some reasons why your code is inefficient (note, your code is not that different from the classic naive definition. Elegant? Sure. Efficient? Goodness, no):
Let's consider what happens when you call
fibonacci 5
That expands into
(fibonacci 4) ++ [(last (fibonacci 4)) + (last (fibonacci 3))]
In addition to concatenating two lists together with ++, we can already see that one place we're being inefficient is that we calculate fibonacci 4 twice (the two places we called fibonacci (n-1). But it gets worst.
Everywhere it says fibonacci 4, that expands into
(fibonacci 3) ++ [(last (fibonacci 3)) + (last (fibonacci 2))]
And everywhere it says fibonacci 3, that expands into
(fibonacci 2) ++ [(last (fibonacci 2)) + (last (fibonacci 1))]
Clearly, this naive definition has a lot of repeated computations, and it only gets worse when n gets bigger and bigger (say, 1000). fibonacci is not a list, it just returns lists, so it isn't going to magically memoize the results of the previous computations.
Additionally, by using last, you have to navigate through the list to get its last element, which adds on top of the problems with this recursive definition (remember, lists in Haskell don't support constant time random access--they aren't dynamic arrays, they are linked lists).
One example of a recursive definition (from the links mentioned) that does keep down on the computations is this:
fibs = 0 : 1 : zipWith (+) fibs (tail fibs)
Here, fibs is actually a list, and we can take advantage of Haskell's lazy evaluation to generate fibs and tail fibs as needed, while the previous computations are still stored inside of fibs. And to get the first five numbers, it's as simple as:
take 5 fibs -- [0,1,1,2,3]
(Optionally, you can replace the first 0 with a 1 if you want the sequence to start at 1).

All the ways to implement the fibonacci sequence in Haskell just follow the link
http://www.haskell.org/haskellwiki/The_Fibonacci_sequence

This implementation is inefficient because it makes three recursive calls. If we were to write a recurrence relation for computing fibonacci n to a normal form (note, pedantic readers: not whnf), it would look like:
T(1) = c
T(2) = c'
T(n) = T(n-1) + T(n-1) + T(n-2) + c''
(Here c, c', and c'' are some constants that we don't know.) Here's a recurrence which is smaller:
S(1) = min(c, c')
S(n) = 2 * S(n-1)
...but this recurrence has a nice easy closed form, namely S(n) = min(c, c') * 2^(n-1): it's exponential! Bad news.
I like the general idea of your implementation (that is, track the second-to-last and last terms of the sequence together), but you fell down by recursively calling fibonacci multiple times, when that's totally unnecessary. Here's a version that fixes that mistake:
fibonacci 1 = [1]
fibonacci 2 = [1,1]
fibonacci n = case fibonacci (n-1) of
all#(last:secondLast:_) -> (last + secondLast) : all
This version should be significantly faster. As an optimization, it produces the list in reverse order, but the most important optimization here was making only one recursive call, not building the list efficiently.

So even if you wouldn't know about the more efficient ways, how could you improve your solution?
First, looking at the signature it seems you don't want an infinite list, but a list of a given length. That's fine, the infinite stuff might be too crazy for you right now.
The second observation is that you need to access the end of the list quite often in your version, which is bad. So here is a trick which is often useful when working with lists: Write a version that work backwards:
fibRev 0 = []
fibRev 1 = [1]
fibRev 2 = [1,1]
fibRev n = let zs#(x:y:_) = fibRev (n-1) in (x+y) : zs
Here is how the last case works: We get the list which is one element shorter and call it zs. At the same time we match against the pattern (x:y:_) (this use of # is called an as-pattern). This gives us the first two elements of that list. To calculate the next value of the sequence, we have just to add these elements. We just put the sum (x+y) in front of the list zs we already got.
Now we have the fibonacci list, but it is backwards. No problem, just use reverse:
fibonacci :: Int -> [Int]
fibonacci n = reverse (fibRev n)
The reverse function isn't that expensive, and we call it here only one time.

Haskell script running out of space

I'm using project Euler to teach myself Haskell, and I'm having some trouble reasoning about how my code is being executed by haskell. The second problem has me computing the sum of all even Fibonacci numbers up to 4 million. My script looks like this:
fibs :: [Integer]
fibs = 1 : 2 : [ a+b | (a,b) <- zip fibs (tail fibs)]
evens :: Integer -> Integer -> Integer
evens x sum | (even x) = x + sum
| otherwise = sum
main = do
print (foldr evens 0 (take 4000000 fibs))
Hugs gives the error "Garbage collection fails to reclaim sufficient space", which I assume means that the list entries are not released as they are consumed by foldr.
What do I need to do to fix this? I tried writing a tail-recursive (I think) version that used accumulators, but couldn't get that to work either.

Firstly, you shouldn't use hugs. It is a toy for teaching purposes only.
GHC, however, is a fast, multicore-ready optimizing compiler for Haskell. Get it here. In particular, it does strictness analysis, and compiles to native code.
The main thing that stands out about your code is the use of foldr on a very large list. Probably you want a tail recursive loop. Like so:
import Data.List
fibs = 0 : 1 : zipWith (+) fibs (tail fibs)
evens x sum | even x = x + sum
| otherwise = sum
-- sum of even fibs in first 4M fibs
main = print (foldl' evens 0 (take 4000000 fibs))
Besides all this, the first 4M even fibs will use a fair amount of space, so it'll take a while.
Here's the sum of the first 400k even fibs, to save you some time (21s). :-)

A number of observations / hints:
the x + sums from even aren't getting evaluated until the very end
You're taking the first 4,000,000 fibs, not the fibs up to 4,000,000
There is an easier way to do this
Edit in response to comment
I'm not going to tell you what the easier way is, since that's the fun of Project Euler problems. But I will ask you a bunch of questions:
How many even fibs can you have in a row?
How long can you go without an even fib?
If you sum up all the even fibs and all the odd fibs (do this by hand), what do you notice about the sums?

You understood the problem wrong. The actual problem wants you to sum all the even Fibonacci numbers such that the Fibonacci number itself doesn't exceed 4 million (which happens to be only the first 33 Fibonacci numbers).

You are evaluating four million elements of fibs. Those numbers grow exponentially. I don't know how many bytes are required to represent the millionth Fibonacci number; just the one-thousandth Fibonacci number has 211 decimal digits, so that's going to take 22 32-bit words just to hold the digits, never mind whatever overhead gmp imposes. And these grow exponentially.
Exercise: calculuate the amount of memory needed to hold four million Fibonacci numbers.

have a look at the Prelude functions takeWhile, filter, even, and sum
takeWhile (<40) [0..]
filter even $ takeWhile (<40) [0..]
put 'em together:
ans = sum $ filter even $ takeWhile (< 4* 10^6) fibs

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Haskell dynamic program efficiency - haskell

Related

How to make this Haskell program run faster

Non-pointfree style is substantially slower

Using non-deterministic list monad to find long Collatz sequences

Haskell inefficient fibonacci implementation

Haskell script running out of space

Categories

Resources