I am reading the book https://www.packtpub.com/application-development/haskell-high-performance-programming and trying to figure out, what is the difference between those two functions:
This functions does memoize the intermediate numbers:
fib_mem :: Int -> Integer
fib_mem = (map fib [0..] !!)
where fib 0 = 1
fib 1 = 1
fib n = fib_mem (n-2) + fib_mem (n-1)
and this not:
fib_mem_arg :: Int -> Integer
fib_mem_arg x = map fib [0..] !! x
where fib 0 = 1
fib 1 = 1
fib n = fib_mem_arg (n-2) + fib_mem_arg (n-1)
The author tries to explain as following:
Running fib_mem_arg with anything but very small arguments, one can
confirm it does no memoization. Even though we can see that map fib
[0..] does not depend on the argument number and could be memorized,
it will not be, because applying an argument to a function will create
a new expression that cannot implicitly have pointers to expressions
from previous function applications.
What does he mean with the sentence, that is bold marked? Could someone provide me a simple example?
Why fib_mem is a constant applicative form?
Why fib_mem is a constant applicative form?
Not fib_mem, but (map fib [0..] !!). It is a CAF because it is a partially applied function (!!). As such it is subject to memory retention.
(see also: What are super combinators and constant applicative forms?)
Since the type is monomorphic, it is retained in memory even between calls to fib_mem, in effect as if having map fib [0..] "floated" to the top level, as if defined as
fib_mem_m :: Int -> Integer
fib_mem_m = (the_list !!)
where fib 0 = 1
fib 1 = 1
fib n = (the_list !! (n-2)) + (the_list !! (n-1))
the_list = map fib [0..]
If the type were polymorphic, the floating to top level wouldn't be possible, but it would still be retained for the duration of each call to fib_mem, as if defined as
fib_mem_p :: Num a => Int -> a
fib_mem_p = (the_list !!)
where fib 0 = 1
fib 1 = 1
fib n = (the_list !! (n-2)) + (the_list !! (n-1))
the_list = map fib [0..]
To see the difference, evaluate fib_mem_m 10000 twice, at the GHCi propt. The second attempt will take 0 seconds. But fib_mem_p 10000 will take same amount of time each time it is called. It will still be as fast as the first one, so there is still memoization going on there, it's just not retained between calls.
With this style of definition, the full application as in fib_mem_arg will too be memoized -- and just as the one above, not between the calls to fib_mem_arg, but only during each call.
fib_mem_arg :: Num a => Int -> Integer -- or polymorphic, makes no difference
fib_mem_arg x = the_list !! x
where fib 0 = 1
fib 1 = 1
fib n = (the_list !! (n-2)) + (the_list !! (n-1))
the_list = map fib [0..]
After failing to construct my own memoization table I turned to said class and tried using it to speed up a double recursive definition of the Fibonacci sequence:
fib :: Int -> Int
fib 0 = 0
fib 1 = 1
fib n = fib (n-1) + fib (n-2)
I have tried using the memoize function of the class in several ways, but even the construction below seems about as slow (eating 10 seconds at fib 33):
fib :: Int -> Int
fib 0 = 0
fib 1 = 1
fib n = memoize fib (n-1) + memoize fib (n-2)
fib' :: Int -> Int
fib' n = memoize fib n
I have tried distributing memoize in other ways, but the performance doesn't seem to improve.
I know there are other ways to have this problem in particular computed more efficiently, but for my original problem I would like to make use of the Memoize package. So my question is, how does one improve performance with the memoize function in this package?
Obviously memoisation is only useful if you do it precisely once, and then call to that multiple times. Whereas in your approach you keep memoising the function over and over again. That's not the idea!
fib' n = memoize fib n
is the right start, yet won't work as desired for a subtle reason: it's not a constant applicative form because it explicitly mentions its argument.
fib' = memoize fib
is correct. Now, for this to actually speed up the fib function you must refer back to the memoised version.
fib n = fib' (n-1) + fib' (n-2)
You see: just adhering to DRY strictly and avoiding as much boilerplate as possible gets you to the correct version!
As other said, the problem is that you are memoizing the top-level calls to the function but you are not using that information to avoid recomputing recursive calls. Let's see how we could make sure that the recursive calls are also cached.
First, we have the obvious import.
import Data.Function.Memoize
And then we are going to describe the call graph of the function fib. To do that, we write a higher order function which uses its argument instead of a recursive call:
fib_rec :: (Int -> Int) -> Int -> Int
fib_rec f 0 = 0
fib_rec f 1 = 1
fib_rec f n = f (n - 1) + f (n - 2)
Now, what we want is an operator which takes such an higher order function and somehow "ties the knot" making sure that the recursive calls are indeed the function we are interested in. We could write fix:
fix :: ((a -> b) -> (a -> b)) -> (a -> b)
fix f = f (fix f)
but then we are back to an inefficient solution: we never memoize anything. The alternative solution is to write something that looks like fix but makes sure that memoization happens all over the place. Let's call it memoized_fix:
memoized_fix :: Memoizable a => ((a -> b) -> (a -> b)) -> (a -> b)
memoized_fix = memoize . go
where go f = f (memoized_fix f)
And now you have your efficient function fib_mem:
fib_mem :: Int -> Int
fib_mem = memoized_fix fib_rec
You don't even have to write memoized_fix yourself, it's part of the memoize package.
The memoization package won't turn your function magically in a fast version. What will it do is avoid recomputing any old computation:
import Data.Function.Memoize
fib :: Int -> Int
fib 0 = 0
fib 1 = 1
fib n = fib (n-1) + fib (n-2)
fib_mem = memoize fib
val = 40
main = do
print $ fib val -- slow
print $ fib val -- slow
print $ fib_mem val -- slow
print $ fib_mem val -- fast!
what you need is a way to avoid recomputing any value in the recursive calls. A simple way to do that would be to compute the fibonacci sequence as an infinite list, an take the nth element:
fibs :: [Int]
fibs = 0:1:(zipWith (+) fibs (tail fibs))
fib_mem n = fibs !! n
A more general technique is to carry a Map Int Int and insert the results inside.
fib :: (Map Int Int) -> Int -> (Int, Map Int Int)
-- implementation left as exercise :-)
There are two functions written in the Haskell wiki website:
Function 1
fib = (map fib' [0 ..] !!)
where
fib' 0 = 0
fib' 1 = 1
fib' n = fib (n - 1) + fib (n - 2)
Function 2
fib x = map fib' [0 ..] !! x
where
fib' 0 = 0
fib' 1 = 1
fib' n = fib (n - 1) + fib (n - 2)
What does the "!!" mean?
This is actually more difficult to read then it would seem at first as operators in haskell are more generic then in other languages.
The first thing that we are all thinking of telling you is to go look this up yourself. If you do not already know about hoogle this is the time to become familiar with it. You can ask it either to tell you what a function does by name or (and this is even more cool) you can give it the type of a function and it can offer suggestions on which function implement that type.
Here is what hoogle tells you about this function (operator):
(!!) :: [a] -> Int -> a
List index (subscript) operator, starting from 0. It is an
instance of the more general genericIndex, which takes an index
of any integral type.
Let us assume that you need help reading this. The first line tell us that (!!) is a function that takes a list of things ([a]) and an Int then gives you back one of the thing in the list (a). The descriptions tells you what it does. It will give you the element of the list indexed by the Int. So, xs !! i works like xs[i] would in Java, C or Ruby.
Now we need to talk about how operators work in haskell. I'm not going to give you the whole thing here, but I will at least let you know that there is something more here then what you would encounter in other programming languages. Operators "always" take two arguments and return something (a -> b -> c). You can use them just like a normal function:
add x y
(+) x y -- same as above
But, by default, you can also use them between expression (the word for this is 'infix'). You can also make a normal function work like an operator with backtics:
x + y
x `add` y -- same as above
What makes the first code example you gave (especially for new haskell coders) is that the !! operator is used as a function rather then in a typical operator (infix) position. Let me add some binding so it is clearer:
-- return the ith Fibonacci number
fib :: Int -> Int -- (actually more general than this but do't worry about it)
fib i = fibs !! i
where
fibs :: [Int]
fibs = map fib' [0 ..]
fib' :: Int -> Int
fib' 0 = 0
fib' 1 = 1
fib' n = fib (n - 1) + fib (n - 2)
You can now work your way back to example 1. Make sure you understand what map fib' [0 ..] means.
I'm sorry your question got down-voted because if you understood what was going on the answer would have been easy to look up, but if you don't know about operators as the exist in haskell it is very hard to mentally parse the code above.
(!!) :: [a] -> Int -> a
List index (subscript) operator, starting from 0. It is an instance of the more general genericIndex, which takes an index of any integral type.
See here: http://www.haskell.org/ghc/docs/latest/html/libraries/base/Data-List.html#g:16
After reading a memoization introduction I reimplemented the Fibonacci example by using a more general memoize function (only for learning purposes):
memoizer :: (Int -> Integer) -> Int -> Integer
memoizer f = (map f [0 ..] !!)
memoized_fib :: Int -> Integer
memoized_fib = memoizer fib
where fib 0 = 0
fib 1 = 1
fib n = memoized_fib (n-2) + memoized_fib (n-1)
This works, but when I just change the last line to the following code, memoization suddenly does not work as I expected (the program becomes slow again):
fib n = memoizer fib (n-2) + memoizer fib (n-1)
Where is the crucial difference w.r.t. to memoization?
It is about explicit vs. implicit sharing. When you explicitly name a thing, it naturally can be shared, i.e. exist as separate entity in memory, and reused. (Of course sharing is not part of the language per se, we can only nudge the compiler ever so slightly towards sharing certain things).
But when you write same expression twice or thrice, you rely on compiler to replace the common sub-expressions with one explicitly shared entity. That might or might not happen.
Your first variant is equivalent to
memoized_fib :: Int -> Integer
memoized_fib = (map fib [0 ..] !!) where
fib 0 = 0
fib 1 = 1
fib n = memoized_fib (n-2) + memoized_fib (n-1)
Here you specifically name an entity, and refer to it by that name. But that is a function. To make the reuse even more certain, we can name the actual list of values that gets shared here, explicitly:
memoized_fib :: Int -> Integer
memoized_fib = (fibs !!) where
fibs = map fib [0 ..]
fib 0 = 0
fib 1 = 1
fib n = memoized_fib (n-2) + memoized_fib (n-1)
The last line can be made yet more visually apparent, with explicit reference to the actual entity which is shared here - the list fibs which we just named in the step above:
fib n = fibs !! (n-2) + fibs !! (n-1)
Your second variant is equivalent to this:
memoized_fib :: Int -> Integer
memoized_fib = (map fib [0 ..] !!) where
fib 0 = 0
fib 1 = 1
fib n = (map fib [0 ..] !!) (n-2) + (map fib [0 ..] !!) (n-1)
Here we have three seemingly independent map expressions, which might or might not get shared by a compiler. Compiling it with ghc -O2 seems to reintroduce sharing, and with it the speed.
momoized_fib = ... - that's top-level simple definition. it might be read as a constant lazy value (without any additional arguments required to be bound before expanding it. That's kinda "source" of your memoized values.
When you use (memoizer fib) (n-2) creates new source of values which have no relation with memoized_fib and thus it isn't reused. Actually you move a lot of load on GC here since you produce a lot (map fib [0 ..]) sequences in second variant.
Consider also more simple example:
f = \n -> sq !! n where sq = [x*x | x <- [0 ..]]
g n = sq !! n where sq = [x*x | x <- [0 ..]]
First will generate single f and associated with it sq because there is no n in head of declaration. Second will produce family of lists for each different value of f n and move over it (without bounding down to actual values) to get value.
I'm currently stuck on setting upper limits in list comprehensions.
What I'm trying to do is to find all Fibonacci numbers below one million.
For this I had designed a rather simple recursive Fibonacci function
fib :: Int -> Integer
fib n
n == 0 = 0
n == 1 = 1
otherwise = fib (n-1) + fib (n-2)
The thing where I'm stuck on is defining the one million part. What I've got now is:
[ fib x | x <- [0..35], fib x < 1000000 ]
This because I know that the 35th number in the Fibonacci sequence is a high enough number.
However, what I'd like to have is to find that limit via a function and set it that way.
[ fib x | x <- [0..], fib x < 1000000 ]
This does give me the numbers, but it simply doesn't stop. It results in Haskell trying to find Fibonacci numbers below one million further in the sequence, which is rather fruitless.
Could anyone help me out with this? It'd be much appreciated!
The check fib x < 1000000 in the list comprehension filters away the fib x values that are less than 1000000; but the list comprehension has no way of knowing that greater values of x imply greater value of fib x and hence must continue until all x have been checked.
Use takeWhile instead:
takeWhile (< 1000000) [ fib x | x <- [0..35]]
A list comprehension is guaranteed to look at every element of the list. You want takeWhile :: (a -> Bool) -> [a] -> [a]. With it, your list is simply takeWhile (< 1000000) $ map fib [1..]. The takeWhile function simply returns the leading portion of the list which satisfies the given predicate; there's also a similar dropWhile function which drops the leading portion of the list which satisfies the given predicate, as well as span :: (a -> Bool) -> [a] -> ([a], [a]), which is just (takeWhile p xs, dropWhile p xs), and the similar break, which breaks the list in two when the predicate is true (and is equivalent to span (not . p). Thus, for instance:
takeWhile (< 3) [1,2,3,4,5,4,3,2,1] == [1,2]
dropWhile (< 3) [1,2,3,4,5,4,3,2,1] == [3,4,5,4,3,2,1]
span (< 3) [1,2,3,4,5,4,3,2,1] == ([1,2],[3,4,5,4,3,2,1])
break (> 3) [1,2,3,4,5,4,3,2,1] == ([1,2,3],[4,5,4,3,2,1])
It should be mentioned that for such a task the "canonical" (and faster) way is to define the numbers as an infinite stream, e.g.
fibs = 0 : 1 : zipWith (+) fibs (tail fibs)
takeWhile (<100) fibs
--[0,1,1,2,3,5,8,13,21,34,55,89]
The recursive definition may look scary (or even "magic") at first, but if you "think lazy", it will make sense.
A "loopy" (and in a sense more "imperative") way to define such an infinite list is:
fibs = map fst $ iterate (\(a,b) -> (b,a+b)) (0,1)
[Edit]
For an efficient direct calculation (without infinite list) you can use matrix multiplication:
fib n = second $ (0,1,1,1) ** n where
p ** 0 = (1,0,0,1)
p ** 1 = p
p ** n | even n = (p `x` p) ** (n `div` 2)
| otherwise = p `x` (p ** (n-1))
(a,b,c,d) `x` (q,r,s,t) = (a*q+b*s, a*r+b*t,c*q+d*s,c*r+d*t)
second (_,f,_,_) = f
(That was really fun to write, but I'm always grateful for suggestions)
The simplest thing I can think of is:
[ fib x | x <- [1..1000000] ]
Since fib n > n for all n > 3.