After failing to construct my own memoization table I turned to said class and tried using it to speed up a double recursive definition of the Fibonacci sequence:
fib :: Int -> Int
fib 0 = 0
fib 1 = 1
fib n = fib (n-1) + fib (n-2)
I have tried using the memoize function of the class in several ways, but even the construction below seems about as slow (eating 10 seconds at fib 33):
fib :: Int -> Int
fib 0 = 0
fib 1 = 1
fib n = memoize fib (n-1) + memoize fib (n-2)
fib' :: Int -> Int
fib' n = memoize fib n
I have tried distributing memoize in other ways, but the performance doesn't seem to improve.
I know there are other ways to have this problem in particular computed more efficiently, but for my original problem I would like to make use of the Memoize package. So my question is, how does one improve performance with the memoize function in this package?
Obviously memoisation is only useful if you do it precisely once, and then call to that multiple times. Whereas in your approach you keep memoising the function over and over again. That's not the idea!
fib' n = memoize fib n
is the right start, yet won't work as desired for a subtle reason: it's not a constant applicative form because it explicitly mentions its argument.
fib' = memoize fib
is correct. Now, for this to actually speed up the fib function you must refer back to the memoised version.
fib n = fib' (n-1) + fib' (n-2)
You see: just adhering to DRY strictly and avoiding as much boilerplate as possible gets you to the correct version!
As other said, the problem is that you are memoizing the top-level calls to the function but you are not using that information to avoid recomputing recursive calls. Let's see how we could make sure that the recursive calls are also cached.
First, we have the obvious import.
import Data.Function.Memoize
And then we are going to describe the call graph of the function fib. To do that, we write a higher order function which uses its argument instead of a recursive call:
fib_rec :: (Int -> Int) -> Int -> Int
fib_rec f 0 = 0
fib_rec f 1 = 1
fib_rec f n = f (n - 1) + f (n - 2)
Now, what we want is an operator which takes such an higher order function and somehow "ties the knot" making sure that the recursive calls are indeed the function we are interested in. We could write fix:
fix :: ((a -> b) -> (a -> b)) -> (a -> b)
fix f = f (fix f)
but then we are back to an inefficient solution: we never memoize anything. The alternative solution is to write something that looks like fix but makes sure that memoization happens all over the place. Let's call it memoized_fix:
memoized_fix :: Memoizable a => ((a -> b) -> (a -> b)) -> (a -> b)
memoized_fix = memoize . go
where go f = f (memoized_fix f)
And now you have your efficient function fib_mem:
fib_mem :: Int -> Int
fib_mem = memoized_fix fib_rec
You don't even have to write memoized_fix yourself, it's part of the memoize package.
The memoization package won't turn your function magically in a fast version. What will it do is avoid recomputing any old computation:
import Data.Function.Memoize
fib :: Int -> Int
fib 0 = 0
fib 1 = 1
fib n = fib (n-1) + fib (n-2)
fib_mem = memoize fib
val = 40
main = do
print $ fib val -- slow
print $ fib val -- slow
print $ fib_mem val -- slow
print $ fib_mem val -- fast!
what you need is a way to avoid recomputing any value in the recursive calls. A simple way to do that would be to compute the fibonacci sequence as an infinite list, an take the nth element:
fibs :: [Int]
fibs = 0:1:(zipWith (+) fibs (tail fibs))
fib_mem n = fibs !! n
A more general technique is to carry a Map Int Int and insert the results inside.
fib :: (Map Int Int) -> Int -> (Int, Map Int Int)
-- implementation left as exercise :-)
Related
I am reading the book https://www.packtpub.com/application-development/haskell-high-performance-programming and trying to figure out, what is the difference between those two functions:
This functions does memoize the intermediate numbers:
fib_mem :: Int -> Integer
fib_mem = (map fib [0..] !!)
where fib 0 = 1
fib 1 = 1
fib n = fib_mem (n-2) + fib_mem (n-1)
and this not:
fib_mem_arg :: Int -> Integer
fib_mem_arg x = map fib [0..] !! x
where fib 0 = 1
fib 1 = 1
fib n = fib_mem_arg (n-2) + fib_mem_arg (n-1)
The author tries to explain as following:
Running fib_mem_arg with anything but very small arguments, one can
confirm it does no memoization. Even though we can see that map fib
[0..] does not depend on the argument number and could be memorized,
it will not be, because applying an argument to a function will create
a new expression that cannot implicitly have pointers to expressions
from previous function applications.
What does he mean with the sentence, that is bold marked? Could someone provide me a simple example?
Why fib_mem is a constant applicative form?
Why fib_mem is a constant applicative form?
Not fib_mem, but (map fib [0..] !!). It is a CAF because it is a partially applied function (!!). As such it is subject to memory retention.
(see also: What are super combinators and constant applicative forms?)
Since the type is monomorphic, it is retained in memory even between calls to fib_mem, in effect as if having map fib [0..] "floated" to the top level, as if defined as
fib_mem_m :: Int -> Integer
fib_mem_m = (the_list !!)
where fib 0 = 1
fib 1 = 1
fib n = (the_list !! (n-2)) + (the_list !! (n-1))
the_list = map fib [0..]
If the type were polymorphic, the floating to top level wouldn't be possible, but it would still be retained for the duration of each call to fib_mem, as if defined as
fib_mem_p :: Num a => Int -> a
fib_mem_p = (the_list !!)
where fib 0 = 1
fib 1 = 1
fib n = (the_list !! (n-2)) + (the_list !! (n-1))
the_list = map fib [0..]
To see the difference, evaluate fib_mem_m 10000 twice, at the GHCi propt. The second attempt will take 0 seconds. But fib_mem_p 10000 will take same amount of time each time it is called. It will still be as fast as the first one, so there is still memoization going on there, it's just not retained between calls.
With this style of definition, the full application as in fib_mem_arg will too be memoized -- and just as the one above, not between the calls to fib_mem_arg, but only during each call.
fib_mem_arg :: Num a => Int -> Integer -- or polymorphic, makes no difference
fib_mem_arg x = the_list !! x
where fib 0 = 1
fib 1 = 1
fib n = (the_list !! (n-2)) + (the_list !! (n-1))
the_list = map fib [0..]
I have a map operation (that is actually run in parallel using parMap from Control.Parallel.Strategies) that takes quite a while. Given that I know how many times the function is applied (n in this context), how can I easily display, every once in a while, how many of the n applications have been evaluated?
The obvious solution would be to make the map a mapM with some putStr inside the mapping function, but that would:
take an unnecessary amount of efficiency
not sample the status every once in a while but in every applications
basically remove all the good things about a deterministic algorithm in the context of parallelism
So, is there a way to keep track of this information, that I'm missing, that avoids these problems?
In production you probably shouldn't use trace and are forced to deal with the complications of needing IO, but for tests you could modify the definition of parMap to take another parameter telling when to emit a count:
import Control.Monad (sequence)
import Control.Parallel.Strategies (Strategy, using, rseq, rparWith, parMap)
import Debug.Trace (traceShow)
import System.IO (hFlush, hSetBuffering, BufferMode(NoBuffering), stdout)
evalList' :: Integer -> Strategy a -> Strategy [a]
evalList' t s as = sequence $ foldr f [] $ zip as [1..]
where f (a, n) ss | n `mod` t == 0 = s (traceShow n a):ss
| otherwise = s a:ss
parList' :: Integer -> Strategy a -> Strategy [a]
parList' t s = evalList' t (rparWith s)
parMap' :: Integer -> Strategy b -> (a -> b) -> [a] -> [b]
parMap' t s f xs = map f xs `using` parList' t s
-- some work to do
fib :: Integer -> Integer
fib 0 = 1
fib 1 = 1
fib n = fib (n-1) + fib(n-2)
main = do hSetBuffering stdout NoBuffering
print $ sum (parMap' 1000 rseq (fib.(+20).(`mod` 5)) ([1..10000]::[Integer]))
If the work packages given by each list element become to small, you could adapt parListChunk instead accordingly.
One could try to craft this behaviour using timeout.
seconds :: Int
seconds = 1000000
progress :: [a] -> IO ()
progress [] = return ()
progress l#(x:xs) =
do r <- timeout (5 * seconds) x -- 5s
threadDelay (2 * seconds) -- 2s more delay
case r of
Nothing -> progress l -- retry
Just y -> do putStrLn "one done!"
progress xs
Be careful since I fear that timeout is aborting the computation. If there's another thread that evaluates x that should be fine, but if this is the only thread doing that it could cause a livelock if 5 seconds are not enough.
There are two functions written in the Haskell wiki website:
Function 1
fib = (map fib' [0 ..] !!)
where
fib' 0 = 0
fib' 1 = 1
fib' n = fib (n - 1) + fib (n - 2)
Function 2
fib x = map fib' [0 ..] !! x
where
fib' 0 = 0
fib' 1 = 1
fib' n = fib (n - 1) + fib (n - 2)
What does the "!!" mean?
This is actually more difficult to read then it would seem at first as operators in haskell are more generic then in other languages.
The first thing that we are all thinking of telling you is to go look this up yourself. If you do not already know about hoogle this is the time to become familiar with it. You can ask it either to tell you what a function does by name or (and this is even more cool) you can give it the type of a function and it can offer suggestions on which function implement that type.
Here is what hoogle tells you about this function (operator):
(!!) :: [a] -> Int -> a
List index (subscript) operator, starting from 0. It is an
instance of the more general genericIndex, which takes an index
of any integral type.
Let us assume that you need help reading this. The first line tell us that (!!) is a function that takes a list of things ([a]) and an Int then gives you back one of the thing in the list (a). The descriptions tells you what it does. It will give you the element of the list indexed by the Int. So, xs !! i works like xs[i] would in Java, C or Ruby.
Now we need to talk about how operators work in haskell. I'm not going to give you the whole thing here, but I will at least let you know that there is something more here then what you would encounter in other programming languages. Operators "always" take two arguments and return something (a -> b -> c). You can use them just like a normal function:
add x y
(+) x y -- same as above
But, by default, you can also use them between expression (the word for this is 'infix'). You can also make a normal function work like an operator with backtics:
x + y
x `add` y -- same as above
What makes the first code example you gave (especially for new haskell coders) is that the !! operator is used as a function rather then in a typical operator (infix) position. Let me add some binding so it is clearer:
-- return the ith Fibonacci number
fib :: Int -> Int -- (actually more general than this but do't worry about it)
fib i = fibs !! i
where
fibs :: [Int]
fibs = map fib' [0 ..]
fib' :: Int -> Int
fib' 0 = 0
fib' 1 = 1
fib' n = fib (n - 1) + fib (n - 2)
You can now work your way back to example 1. Make sure you understand what map fib' [0 ..] means.
I'm sorry your question got down-voted because if you understood what was going on the answer would have been easy to look up, but if you don't know about operators as the exist in haskell it is very hard to mentally parse the code above.
(!!) :: [a] -> Int -> a
List index (subscript) operator, starting from 0. It is an instance of the more general genericIndex, which takes an index of any integral type.
See here: http://www.haskell.org/ghc/docs/latest/html/libraries/base/Data-List.html#g:16
After reading a memoization introduction I reimplemented the Fibonacci example by using a more general memoize function (only for learning purposes):
memoizer :: (Int -> Integer) -> Int -> Integer
memoizer f = (map f [0 ..] !!)
memoized_fib :: Int -> Integer
memoized_fib = memoizer fib
where fib 0 = 0
fib 1 = 1
fib n = memoized_fib (n-2) + memoized_fib (n-1)
This works, but when I just change the last line to the following code, memoization suddenly does not work as I expected (the program becomes slow again):
fib n = memoizer fib (n-2) + memoizer fib (n-1)
Where is the crucial difference w.r.t. to memoization?
It is about explicit vs. implicit sharing. When you explicitly name a thing, it naturally can be shared, i.e. exist as separate entity in memory, and reused. (Of course sharing is not part of the language per se, we can only nudge the compiler ever so slightly towards sharing certain things).
But when you write same expression twice or thrice, you rely on compiler to replace the common sub-expressions with one explicitly shared entity. That might or might not happen.
Your first variant is equivalent to
memoized_fib :: Int -> Integer
memoized_fib = (map fib [0 ..] !!) where
fib 0 = 0
fib 1 = 1
fib n = memoized_fib (n-2) + memoized_fib (n-1)
Here you specifically name an entity, and refer to it by that name. But that is a function. To make the reuse even more certain, we can name the actual list of values that gets shared here, explicitly:
memoized_fib :: Int -> Integer
memoized_fib = (fibs !!) where
fibs = map fib [0 ..]
fib 0 = 0
fib 1 = 1
fib n = memoized_fib (n-2) + memoized_fib (n-1)
The last line can be made yet more visually apparent, with explicit reference to the actual entity which is shared here - the list fibs which we just named in the step above:
fib n = fibs !! (n-2) + fibs !! (n-1)
Your second variant is equivalent to this:
memoized_fib :: Int -> Integer
memoized_fib = (map fib [0 ..] !!) where
fib 0 = 0
fib 1 = 1
fib n = (map fib [0 ..] !!) (n-2) + (map fib [0 ..] !!) (n-1)
Here we have three seemingly independent map expressions, which might or might not get shared by a compiler. Compiling it with ghc -O2 seems to reintroduce sharing, and with it the speed.
momoized_fib = ... - that's top-level simple definition. it might be read as a constant lazy value (without any additional arguments required to be bound before expanding it. That's kinda "source" of your memoized values.
When you use (memoizer fib) (n-2) creates new source of values which have no relation with memoized_fib and thus it isn't reused. Actually you move a lot of load on GC here since you produce a lot (map fib [0 ..]) sequences in second variant.
Consider also more simple example:
f = \n -> sq !! n where sq = [x*x | x <- [0 ..]]
g n = sq !! n where sq = [x*x | x <- [0 ..]]
First will generate single f and associated with it sq because there is no n in head of declaration. Second will produce family of lists for each different value of f n and move over it (without bounding down to actual values) to get value.
I am trying to understand Haskell realization of memoization , but I don't get how it works:
memoized_fib :: Int -> Integer
memoized_fib = (map fib [0..] !!)
where fib 0 = 0
fib 1 = 1
fib n = memoized_fib(n - 2) + memoized_fib(n - 1)
First of all I even don't understand why 'map'-function get three parameters (function - fib, list [0..], and ||), but not two how it must do.
Updated:
I have tried to rewrite the code, but get the different result:
f' :: (Int -> Int) -> Int -> Int
f' mf 0 = 0
f' mf 1 = 1
f' mf n = mf(n - 2) + mf(n - 1)
f'_list :: [Int]
f'_list = map (f' faster_f') [0..]
faster_f' :: Int -> Int
faster_f' n = f'_list !! n
Why? Is the any error in my reasoning?
First: Haskell supports operator sections. So (+ 2) is equal to \ x -> x + 2. This means the expression with map is equal to \ x -> map fib [0..] !! x.
Secondly, how this works: this is taking advantage of Haskell's call-by-need evaluation strategy (its laziness).
Initially, the list which results from the map is not evaluated. Then, when you need to get to some particular index, all the elements up to that point get evaluated. However, once an element is evaluated, it does not get evaluated again (as long as you're referring to the same element). This is what gives you memoization.
Basically, Haskell's lazy evaluation strategy involves memoizing forced values. This memoized fib function just relies on that behavior.
Here "forcing" a value means evaluating a deferred expression called a thunk. So the list is basically initially stored as a "promise" of a list, and forcing it turns that "promise" into an actual value, and a "promise" for more values. The "promises" are just thunks, but I hope calling it a promise makes more sense.
I'm simplifying a bit, but this should clarify how the actual memoization works.
map does not take three parameters here.
(map fib [0..] !!)
partially applies (slices) the function (!!) with map fib [0..], a list, as its first (left-hand) argument.
Maybe it's clearer written it as:
memoized_fib n = (map fib [0..]) !! n
so it's just taking the nth element from the list, and the list is evaluated lazily.
This operator section stuff is exactly the same as normal partial application, but for infix operators. In fact, if we write the same form with a regular function instead of the !! infix operator, see how it looks:
import Data.List (genericIndex)
memoized_fib :: Int -> Integer
memoized_fib = genericIndex (map fib [0..])
where fib 0 = 0
fib 1 = 1
fib n = memoized_fib(n - 2) + memoized_fib(n - 1)