Partial application versus pattern matching: why do these Haskell functions behave differently? - haskell

I'm trying to understand something about Haskell functions.
First, here is a Fibonacci function defined in the typical "slow" way (i.e. recursive with no memoization, and no infinite-list tricks)
slowfib :: Int -> Integer
slowfib 0 = 0
slowfib 1 = 1
slowfib n = slowfib (n-2) + slowfib (n-1)
Next, a canonical memoizing version of the same. (Only slightly different from typical examples in tutorals/books/etc, because I prefer the prefix version of the !! operator.)
memfib = (!!) (map fib [0..])
where
fib 0 = 0
fib 1 = 1
fib k = memfib(k-2) + memfib(k-1)
The above solution uses partial application of the !! operator, which makes sense: we want memfib to end up as a function that takes a parameter, and we are defining it without including a parameter in the definition.
So far so good. Now, I thought I could write an equivalent memoizing function that does include a parameter in the definition, so I did this:
memfib_wparam n = ((!!) (map fib [0..])) n
where
fib 0 = 0
fib 1 = 1
fib k = memfib_wparam(k-2) + memfib_wparam(k-1)
(In Lambda calculus terms, memfib and memfib_wparams are just eta-conversions of each other. I think???)
This works, but the memoization is gone. In fact, memfib_wparam behaves even worse than showfib: Not only is it slower, but its memory usage is more than double.)
*Main> slowfib 30
832040
(1.81 secs, 921,581,768 bytes)
*Main> memfib 30
832040
(0.00 secs, 76,624 bytes)
*Main> memfib_wparam 30
832040
(2.01 secs, 2,498,274,008 bytes)
What's going on here? More importantly, what is my broader understanding of Haskell function definitions getting wrong? I was assuming the syntax I used in memfib_wparam was just syntactic sugar for what I did in memfib, but clearly it isn't.

The difference is in when your fib function is bound.
where-bound definitions have access to the outer function's parameters (i.e. the parameters are "in scope" within where). This means that fib should have access to n, which in turn means that fib is defined after n is passed, which means it's a different fib for every n, which means it's a different call to map fib [0..] for every n.
If you wanted to eta-expand your memfib, this would be the "right" way to do it (i.e. without unduly expanding the scope of n):
memfib = \n -> theCache !! n
where
theCache = map fib [0..]
fib 0 = 0
fib 1 = 1
fib k = memfib(k-2) + memfib(k-1)
If you're comparing with lambda calculus, the key difference is that eta-reduction/expansion doesn't say anything about performance, it just guarantees that the result of the program stays the same logically. Which it does.

Related

How does haskell manage memory of recursive function calls

I have been working on a problem that benefits a lot from catching results of my functions and in my research I came across this article. I am stunned at how simple is the core in "Memoization with recursion" section namely:
memoized_fib :: Int -> Integer
memoized_fib = (map fib [0 ..] !!)
where fib 0 = 0
fib 1 = 1
fib n = memoized_fib (n-2) + memoized_fib (n-1)
I feel like I understand how it work but do correct me if I'm wrong - this function saves a list which is populated using same function.
What bothers me is that I don't understand why this works, originally I was under impression that once haskell evaluates a function it releases memory that was used to store variables inside this function, but here it seems that if part of the list was evaluated by one call of this function those values are still available to another call of the same function.
Just typing this up makes my head hurt, because I don't understand why value used in calculation of fib 2 should be available in calculation of fib 3 or better yest fib 100?
My gut feeling tells me that this behavior has two problems(I'm probably wrong but again not sure why):
purity of this function we are evaluating one call using variable that did not arrive from parameters of this function
memory leaks no longer sure when will haskell release memory from this list
I think it's easier to understand if you compare your definition to this:
not_memoized_fib :: Int -> Integer
not_memoized_fib m = map fib [0 ..] !! m
where fib 0 = 0
fib 1 = 1
fib n = not_memoized_fib (n-2) + not_memoized_fib (n-1)
The definition above is essentially the same as yours, except that it takes an explicit argument m. It is a so-called eta-expansion of the previous function, and is semantically equivalent to it. Yet, operationally, this has drastically worse performance, since memoization here does not take place.
Why? Well, your function defines the list map fib [0..] before taking the (implicit) input parameter m, so there is only one list around, for all m we may pass later on as arguments. Instead, in not_memoized_fib we first take m as input, and then define the list, making the function create a list for every call to not_memoized_fib, destroying performance.
It is even easier to see if we use let and lambdas instead of where. Compare
memoized :: Int -> Integer
memoized = let
list = map fib [0..]
fib 0 = 0
fib 1 = 1
fib n = memoized (n-1) + memoized (n-2)
in \m -> list !! m
-- ^^ here we take m, after everything is defined,
with its let over lambda (*) code structure, to
not_memoized :: Int -> Integer
not_memoized = \m -> let
-- ^^ here we take m, before everything is defined, so
-- we define local bindings {list,fib} at every call
list = map fib [0..]
fib 0 = 0
fib 1 = 1
fib n = not_memoized (n-1) + not_memoized (n-2)
in list !! m
with the let inside the lambda.
In the former case, there is only one list around, while in the latter there is one list for each call.
(*) a searchable term.
The list defined by map fib [0..] is defined as part of the definition of the function, rather than being created each time the function is called. Due to laziness, though, the list is only "realized" as necessary for any given call.
Say your first call is memoized_fib 10. This will cause the first 10 Fibonacci numbers to actually be computed and stored in memory, and they will stay in memory for the duration of the program. Subsequent calls with a smaller argument don't need to compute anything; subsequent calls with larger arguments need only compute those elements that occur later in the list than the largest existing element.

Create a new expression without pointer from previous one

I am reading the book https://www.packtpub.com/application-development/haskell-high-performance-programming and trying to figure out, what is the difference between those two functions:
This functions does memoize the intermediate numbers:
fib_mem :: Int -> Integer
fib_mem = (map fib [0..] !!)
where fib 0 = 1
fib 1 = 1
fib n = fib_mem (n-2) + fib_mem (n-1)
and this not:
fib_mem_arg :: Int -> Integer
fib_mem_arg x = map fib [0..] !! x
where fib 0 = 1
fib 1 = 1
fib n = fib_mem_arg (n-2) + fib_mem_arg (n-1)
The author tries to explain as following:
Running fib_mem_arg with anything but very small arguments, one can
confirm it does no memoization. Even though we can see that map fib
[0..] does not depend on the argument number and could be memorized,
it will not be, because applying an argument to a function will create
a new expression that cannot implicitly have pointers to expressions
from previous function applications.
What does he mean with the sentence, that is bold marked? Could someone provide me a simple example?
Why fib_mem is a constant applicative form?
Why fib_mem is a constant applicative form?
Not fib_mem, but (map fib [0..] !!). It is a CAF because it is a partially applied function (!!). As such it is subject to memory retention.
(see also: What are super combinators and constant applicative forms?)
Since the type is monomorphic, it is retained in memory even between calls to fib_mem, in effect as if having map fib [0..] "floated" to the top level, as if defined as
fib_mem_m :: Int -> Integer
fib_mem_m = (the_list !!)
where fib 0 = 1
fib 1 = 1
fib n = (the_list !! (n-2)) + (the_list !! (n-1))
the_list = map fib [0..]
If the type were polymorphic, the floating to top level wouldn't be possible, but it would still be retained for the duration of each call to fib_mem, as if defined as
fib_mem_p :: Num a => Int -> a
fib_mem_p = (the_list !!)
where fib 0 = 1
fib 1 = 1
fib n = (the_list !! (n-2)) + (the_list !! (n-1))
the_list = map fib [0..]
To see the difference, evaluate fib_mem_m 10000 twice, at the GHCi propt. The second attempt will take 0 seconds. But fib_mem_p 10000 will take same amount of time each time it is called. It will still be as fast as the first one, so there is still memoization going on there, it's just not retained between calls.
With this style of definition, the full application as in fib_mem_arg will too be memoized -- and just as the one above, not between the calls to fib_mem_arg, but only during each call.
fib_mem_arg :: Num a => Int -> Integer -- or polymorphic, makes no difference
fib_mem_arg x = the_list !! x
where fib 0 = 1
fib 1 = 1
fib n = (the_list !! (n-2)) + (the_list !! (n-1))
the_list = map fib [0..]

Non-pointfree style is substantially slower

I have the following, oft-quoted code for calculating the nth Fibonacci number in Haskell:
fibonacci :: Int -> Integer
fibonacci = (map fib [0..] !!)
where fib 0 = 0
fib 1 = 1
fib n = fibonacci (n-2) + fibonacci (n-1)
Using this, I can do calls such as:
ghci> fibonacci 1000
and receive an almost instantaneous answer.
However, if I modify the above code so that it's not in pointfree style, i.e.
fibonacci :: Int -> Integer
fibonacci x = (map fib [0..] !!) x
where fib 0 = 0
fib 1 = 1
fib n = fibonacci (n-2) + fibonacci (n-1)
it is substantially slower. To the extent that a call such as
ghci> fibonacci 1000
hangs.
My understanding was that the above two pieces of code were equivalent, but GHCi begs to differ. Does anyone have an explanation for this behaviour?
To observe the difference, you should probably look at Core. My guess that this boils down to comparing (roughly)
let f = map fib [0..] in \x -> f !! x
to
\x -> let f = map fib [0..] in f !! x
The latter will recompute f from scratch on every invocation. The former does not, effectively caching the same f for each invocation.
It happens that in this specific case, GHC was able to optimize the second into the first, once optimization is enabled.
Note however that GHC does not always perform this transformation, since this is not always an optimization. The cache used by the first is kept in memory forever. This might lead to a waste of memory, depending on the function at hand.
I tried to find it but struck out. I think I have it on my PC at home.
What I read was that functions using fixed point were inherently faster.
There are other reasons for using fixed point. I encountered one in writing this iterative Fibonacci function. I wanted to see how an iterative version would perform then I realized I had no ready way to measure. I am a Haskell neophyte. But here is an iterative version for someone to test.
I could not get this to define unless I used the dot after the first last function.
I could not reduce it further. the [0,1] parameter is fixed and not to be supplied as a parameter value.
Prelude> fib = last . flip take (iterate (\ls -> ls ++ [last ls + last (init ls)]) [0,1])
Prelude> fib 25
[0,1,1,2,3,5,8,13,21,34,55,89,144,233,377,610,987,1597,2584,4181,6765,10946,17711,28657,46368,75025]

Haskell: Why is my implementation of the Fibonacci sequence inefficient?

I have written the following Fibonacci play program as part of learning Haskell:
fibonacci 0 = [0]
fibonacci 1 = [0,1]
fibonacci n = let
foo'1 = last (fibonacci (n-1))
foo'2 = last (fibonacci (n-2))
in reverse((foo'1 + foo'2):reverse (fibonacci (n-1)))
The program works:
ghci>fibonacci 6
[0,1,1,2,3,5,8]
But, the performance goes down exponentially with n. If I give it an argument of 30 it takes about a minute to run as opposed to running instantaneously at 6. It seems the lazy execution is burning me and fibonacci is getting run once for every element in the final list.
Am I doing something silly or missing something?
(I already got rid of the ++ thinking that might be doing it)
As pointed out in the comments, your approach is a tad overcomplicated. In particular, you don't need to use recursive calls, or even the reverse function, in order to generate the Fibonacci sequence.
A linear-time implementation
In addition to your own answer, here is a textbook one-liner, which uses memoization:
fibs :: [Integer]
fibs = 0 : 1 : zipWith (+) fibs (tail fibs)
Once you have fibs, writing your fib function is trivial:
fib :: Int -> [Integer]
fib n
| n < 0 = error "fib: negative argument"
| otherwise = take (n+1) fibs
This implementation of fib has complexity Θ(n), which is obviously much better than Θ(exp(n)).
Test in GHCi
λ> :set +s
λ> fib 6
[0,1,1,2,3,5,8]
(0.02 secs, 7282592 bytes)
λ> fib 30
[0,1,1,2,3,5,8,13,21,34,55,89,144,233,377,610,987,1597,2584,4181,6765,10946,17711,28657,46368,75025,121393,196418,317811,514229,832040]
(0.01 secs, 1035344 bytes)
As you can see, fib 30 is evaluated in well under one minute on my machine.
Further reading
For a much more comprehensive treatment of how to generate the Fibonacci sequence in Haskell, I refer you to this haskell.org wiki
Here is the answer to the question using #icktoofay's pointer to memoization. The answer included a function that quickly returned a given fibonacci number, so I used their example to create a solution to my original problem--creating a list of the Fibonacci numbers up to the requested number.
This solution runs pretty much instantaneously (the page has the additional benefit of referring to my approach as "naive")
memoized_fib :: Int -> Integer
memoized_fib = (map fib [0 ..] !!)
where fib 0 = 0
fib 1 = 1
fib n = memoized_fib (n-2) + memoized_fib (n-1)
fib 0 = [0]
fib 1 = [0,1]
fib n = reverse ((memoized_fib (n-2) + memoized_fib(n-1)) : reverse (fib (n-1)))
You don't need to add memoization to your function - it already has all the previous results, producing a list as it does. You just need to stop ignoring those results, as you do right now using last.
First of all, if it's more natural to build the list in reverse order, there's no reason not to:
revFib 0 = [0]
revFib 1 = [1,0]
revFib n | n > 0 = let f1 = head (revFib (n-1))
f2 = head (revFib (n-2))
in f1 + f2 : revFib (n-1)
This is still slow, as we're still ignoring all the previous results except the very last one, situated at the head of the list. We can stop doing that,
revFib 0 = [0]
revFib 1 = [1,0]
revFib n | n > 0 = let f1 = head (revFib (n-1))
f2 = head (tail (revFib (n-1)))
in f1 + f2 : revFib (n-1)
and then we'll name the common subexpression, so that it is shared among its uses, and is only calculated once:
revFib 0 = [0]
revFib 1 = [1,0]
revFib n | n > 0 = let prevs = revFib (n-1)
[f1,f2] = take 2 prevs
in f1 + f2 : prevs
and suddenly it's linear instead of exponential.

What does "!!" mean in haskell?

There are two functions written in the Haskell wiki website:
Function 1
fib = (map fib' [0 ..] !!)
where
fib' 0 = 0
fib' 1 = 1
fib' n = fib (n - 1) + fib (n - 2)
Function 2
fib x = map fib' [0 ..] !! x
where
fib' 0 = 0
fib' 1 = 1
fib' n = fib (n - 1) + fib (n - 2)
What does the "!!" mean?
This is actually more difficult to read then it would seem at first as operators in haskell are more generic then in other languages.
The first thing that we are all thinking of telling you is to go look this up yourself. If you do not already know about hoogle this is the time to become familiar with it. You can ask it either to tell you what a function does by name or (and this is even more cool) you can give it the type of a function and it can offer suggestions on which function implement that type.
Here is what hoogle tells you about this function (operator):
(!!) :: [a] -> Int -> a
List index (subscript) operator, starting from 0. It is an
instance of the more general genericIndex, which takes an index
of any integral type.
Let us assume that you need help reading this. The first line tell us that (!!) is a function that takes a list of things ([a]) and an Int then gives you back one of the thing in the list (a). The descriptions tells you what it does. It will give you the element of the list indexed by the Int. So, xs !! i works like xs[i] would in Java, C or Ruby.
Now we need to talk about how operators work in haskell. I'm not going to give you the whole thing here, but I will at least let you know that there is something more here then what you would encounter in other programming languages. Operators "always" take two arguments and return something (a -> b -> c). You can use them just like a normal function:
add x y
(+) x y -- same as above
But, by default, you can also use them between expression (the word for this is 'infix'). You can also make a normal function work like an operator with backtics:
x + y
x `add` y -- same as above
What makes the first code example you gave (especially for new haskell coders) is that the !! operator is used as a function rather then in a typical operator (infix) position. Let me add some binding so it is clearer:
-- return the ith Fibonacci number
fib :: Int -> Int -- (actually more general than this but do't worry about it)
fib i = fibs !! i
where
fibs :: [Int]
fibs = map fib' [0 ..]
fib' :: Int -> Int
fib' 0 = 0
fib' 1 = 1
fib' n = fib (n - 1) + fib (n - 2)
You can now work your way back to example 1. Make sure you understand what map fib' [0 ..] means.
I'm sorry your question got down-voted because if you understood what was going on the answer would have been easy to look up, but if you don't know about operators as the exist in haskell it is very hard to mentally parse the code above.
(!!) :: [a] -> Int -> a
List index (subscript) operator, starting from 0. It is an instance of the more general genericIndex, which takes an index of any integral type.
See here: http://www.haskell.org/ghc/docs/latest/html/libraries/base/Data-List.html#g:16

Resources