How and why does the Haskell Cont monad work? - haskell

This is how the Cont monad is defined:
newtype Cont r a = Cont { runCont :: (a -> r) -> r }
instance Monad (Cont r) where
return a = Cont ($ a)
m >>= k = Cont $ \c -> runCont m $ \a -> runCont (k a) c
Could you explain how and why this works? What is it doing?

The first thing to realize about the continuation monad is that, fundamentally, it's not really doing anything at all. It's true!
The basic idea of a continuation in general is that it represents the rest of a computation. Say we have an expression like this: foo (bar x y) z. Now, extract just the parenthesized portion, bar x y--this is part of the total expression, but it's not just a function we can apply. Instead, it's something we need to apply a function to. So, we can talk about the "rest of the computation" in this case as being \a -> foo a z, which we can apply to bar x y to reconstruct the complete form.
Now, it happens that this concept of "the rest of the computation" is useful, but it's awkward to work with, since it's something outside of the subexpression we're considering. To make things work better, we can turn things inside-out: extract the subexpression we're interested in, then wrap it in a function that takes an argument representing the rest of the computation: \k -> k (bar x y).
This modified version gives us a lot of flexibility--not only does it extract a subexpression from its context, but it lets us manipulate that outer context within the subexpression itself. We can think of it as a sort of suspended computation, giving us explicit control over what happens next. Now, how could we generalize this? Well, the subexpression is pretty much unchanged, so let's just replace it with a parameter to the inside-out function, giving us \x k -> k x--in other words, nothing more than function application, reversed. We could just as easily write flip ($), or add a bit of an exotic foreign language flavor and define it as an operator |>.
Now, it would be simple, albeit tedious and horribly obfuscating, to translate every piece of an expression to this form. Fortunately, there's a better way. As Haskell programmers, when we think building a computation within a background context the next thing we think is say, is this a monad? And in this case the answer is yes, yes it is.
To turn this into a monad, we start with two basic building blocks:
For a monad m, a value of type m a represents having access to a value of type a within the context of the monad.
The core of our "suspended computations" is flipped function application.
What does it mean to have access to something of type a within this context? It just means that, for some value x :: a, we've applied flip ($) to x, giving us a function that takes a function which takes an argument of type a, and applies that function to x. Let's say we have a suspended computation holding a value of type Bool. What type does this give us?
> :t flip ($) True
flip ($) True :: (Bool -> b) -> b
So for suspended computations, the type m a works out to (a -> b) -> b... which is perhaps an anticlimax, since we already knew the signature for Cont, but humor me for now.
An interesting thing to note here is that a sort of "reversal" also applies to the monad's type: Cont b a represents a function that takes a function a -> b and evaluates to b. As a continuation represents "the future" of a computation, so the type a in the signature represents in some sense "the past".
So, replacing (a -> b) -> b with Cont b a, what's the monadic type for our basic building block of reverse function application? a -> (a -> b) -> b translates to a -> Cont b a... the same type signature as return and, in fact, that's exactly what it is.
From here on out, everything pretty much falls directly out from the types: There's essentially no sensible way to implement >>= besides the actual implementation. But what is it actually doing?
At this point we come back to what I said initially: the continuation monad isn't really doing much of anything. Something of type Cont r a is trivially equivalent to something of just type a, simply by supplying id as the argument to the suspended computation. This might lead one to ask whether, if Cont r a is a monad but the conversion is so trivial, shouldn't a alone also be a monad? Of course that doesn't work as is, since there's no type constructor to define as a Monad instance, but say we add a trivial wrapper, like data Id a = Id a. This is indeed a monad, namely the identity monad.
What does >>= do for the identity monad? The type signature is Id a -> (a -> Id b) -> Id b, which is equivalent to a -> (a -> b) -> b, which is just simple function application again. Having established that Cont r a is trivially equivalent to Id a, we can deduce that in this case as well, (>>=) is just function application.
Of course, Cont r a is a crazy inverted world where everyone has goatees, so what actually happens involves shuffling things around in confusing ways in order to chain two suspended computations together into a new suspended computation, but in essence, there isn't actually anything unusual going on! Applying functions to arguments, ho hum, another day in the life of a functional programmer.

Here's fibonacci:
fib 0 = 0
fib 1 = 1
fib n = fib (n-1) + fib (n-2)
Imagine you have a machine without a call stack - it only allows tail recursion. How to execute fib on that machine? You could easily rewrite the function to work in linear, instead of exponential time, but that requires tiny bit of insight and is not mechanical.
The obstacle to making it tail recursive is the third line, where there are two recursive calls. We can only make a single call, which also has to give the result. Here's where continuations enter.
We'll make fib (n-1) take additional parameter, which will be a function specifying what should be done after computing its result, call it x. It will be adding fib (n-2) to it, of course. So: to compute fib n you compute fib (n-1) after that, if you call the result x, you compute fib (n-2), after that, if you call the result y, you return x+y.
In other words you have to tell:
How to do the following computation: "fib' n c = compute fib n and apply c to the result"?
The answer is that you do the following: "compute fib (n-1) and apply d to the result", where d x means "compute fib (n-2) and apply e to the result", where e y means c (x+y). In code:
fib' 0 c = c 0
fib' 1 c = c 1
fib' n c = fib' (n-1) d
where d x = fib' (n-2) e
where e y = c (x+y)
Equivalently, we can use lambdas:
fib' 0 = \c -> c 0
fib' 1 = \c -> c 1
fib' n = \c -> fib' (n-1) $ \x ->
fib' (n-2) $ \y ->
c (x+y)
To get actual Fibonacci use identity: fib' n id. You can think that the line fib (n-1) $ ... passes its result x to the next one.
The last three lines smell like a do block, and in fact
fib' 0 = return 0
fib' 1 = return 1
fib' n = do x <- fib' (n-1)
y <- fib' (n-2)
return (x+y)
is the same, up to newtypes, by definition of monad Cont. Note differences. There's \c -> at the beginning, instead of x <- ... there's ... $ \x -> and c instead of return.
Try writing factorial n = n * factorial (n-1) in a tail recursive style using CPS.
How does >>= work? m >>= k is equivalent to
do a <- m
t <- k a
return t
Making the translation back, in the same style as in fib', you get
\c -> m $ \a ->
k a $ \t ->
c t
simplifying \t -> c t to c
m >>= k = \c -> m $ \a -> k a c
Adding newtypes you get
m >>= k = Cont $ \c -> runCont m $ \a -> runCont (k a) c
which is on top of this page. It's complex, but if you know how to translate between do notation and direct use, you don't need to know exact definition of >>=! Continuation monad is much clearer if you look at do-blocks.
Monads and continuations
If you look at this usage of list monad...
do x <- [10, 20]
y <- [3,5]
return (x+y)
[10,20] >>= \x ->
[3,5] >>= \y ->
return (x+y)
([10,20] >>=) $ \x ->
([3,5] >>=) $ \y ->
return (x+y)
that looks like continuation! In fact, (>>=) when you apply one argument has type (a -> m b) -> m b which is Cont (m b) a. See sigfpe's Mother of all monads for explanation. I'd regard that as a good continuation monad tutorial, though it wasn't probably meant as it.
Since continuations and monads are so strongly related in both directions, I think what applies to monads applies to continuations: only hard work will teach you them, and not reading some burrito metaphor or analogy.

EDIT: Article migrated to the link below.
I've written up a tutorial directly addressing this topic that I hope you will find useful. (It certainly helped cement my understanding!) It's a bit too long to fit comfortably in a Stack Overflow topic, so I've migrated it to the Haskell Wiki.
Please see: MonadCont under the hood

I think the easiest way to get a grip on the Cont monad is to understand how to use its constructor. I'm going to assume the following definition for now, although the realities of the transformers package are slightly different:
newtype Cont r a = Cont { runCont :: (a -> r) -> r }
This gives:
Cont :: ((a -> r) -> r) -> Cont r a
so to build a value of type Cont r a, we need to give a function to Cont:
value = Cont $ \k -> ...
Now, k itself has type a -> r, and the body of the lambda needs to have type r. An obvious thing to do would be to apply k to a value of type a, and get a value of type r. We can do that, yes, but that's really only one of many things we can do. Remember that value need not be polymorphic in r, it might be of type Cont String Integer or something else concrete. So:
We could apply k to several values of type a, and combine the results somehow.
We could apply k to a value of type a, observe the result, and then decide to apply k to something else based on that.
We could ignore k altogether and just produce a value of type r ourselves.
But what does all this mean? What does k end up being? Well, in a do-block, we might have something looking like this:
flip runCont id $ do
v <- thing1
thing2 v
x <- Cont $ \k -> ...
thing3 x
thing4
Here's the fun part: we can, in our minds and somewhat informally, split the do-block in two at the occurrence of the Cont constructor, and think of the rest of the entire computation after it as a value in itself. But hold on, what it is depends on what x is, so it's really a function from a value x of type a to some result value:
restOfTheComputation x = do
thing3 x
thing4
In fact, this restOfTheComputation is roughly speaking what k ends up being. In other words, you call k with a value which becomes the result x of your Cont computation, the rest of the computation runs, and then the r produced winds its way back into your lambda as the result of the call to k. So:
if you called k multiple times, the rest of the computation will get run multiple times, and the results may be combined however you wish.
if you didn't call k at all, the rest of the entire computation will be skipped, and the enclosing runCont call will just give you back whatever value of type r you managed to synthesise. That is, unless some other part of the computation is calling you from their k, and messing about with the result...
If you're still with me at this point it should be easy to see this could be quite powerful. To make the point a little, let's implement some standard type classes.
instance Functor (Cont r) where
fmap f (Cont c) = Cont $ \k -> ...
We're given a Cont value with bind result x of type a, and a function f :: a -> b, and we want to make a Cont value with bind result f x of type b. Well, to set the bind result, just call k...
fmap f (Cont c) = Cont $ \k -> k (f ...
Wait, where do we get x from? Well, it's going to involve c, which we haven't used yet. Remember how c works: it gets given a function, and then calls that function with its bind result. We want to call our function with f applied to that bind result. So:
fmap f (Cont c) = Cont $ \k -> c (\x -> k (f x))
Tada! Next up, Applicative:
instance Applicative (Cont r) where
pure x = Cont $ \k -> ...
This one's simple. We want the bind result to be the x we get.
pure x = Cont $ \k -> k x
Now, <*>:
Cont cf <*> Cont cx = Cont $ \k -> ...
This a little trickier, but uses essentially the same ideas as in fmap: first get the function from the first Cont, by making a lambda for it to call:
Cont cf <*> Cont cx = Cont $ \k -> cf (\fn -> ...
Then get the value x from the second, and make fn x the bind result:
Cont cf <*> Cont cx = Cont $ \k -> cf (\fn -> cx (\x -> k (fn x)))
Monad is much the same, although requires runCont or a case or let to unpack the newtype.
This answer is already quite long, so I won't go into ContT (in short: it is exactly the same as Cont! The only difference is in the kind of the type constructor, the implementations of everything are identical) or callCC (a useful combinator that provides a convenient way to ignore k, implementing early exit from a sub-block).
For a simple and plausible application, try Edward Z. Yang's blog post implementing labelled break and continue in for loops.

Trying to complement the other answers:
Nested lambdas are horrible for readability. This is exactly the reason why let... in... and ... where ... exist, to get rid of nested lambdas by using intermediate variables. Using those, the bind implementation can be refactored into:
newtype Cont r a = Cont { runCont :: (a -> r) -> r }
instance Monad (Cont r) where
return a = Cont ($ a)
m >>= k = k a
where a = runCont m id
Which hopefully makes what is happening clearer. The return implementation boxes value with a lazy apply. Using runCont id applies id to the boxed value, which returns the original value.
For any monad where any boxed value can simply be unboxed, there is generally a trivial implementation of bind, which is to simply unbox the value and apply a monadic function to it.
To get the obfuscated implementation in the original question, first replace k a with Cont $ runCont (k a) , which in turn can be replaced with Cont $ \c-> runCont (k a) c
Now, we can move the where into a subexpression, so that we are left with
Cont $ \c-> ( runCont (k a) c where a = runCont m id )
The expression within parentheses can be desugared into \a -> runCont (k a) c $ runCont m id.
To finish, we use the property of runCont, f ( runCont m g) = runCont m (f.g), and we are back to the original obfuscated expression.

Related

Doing greatest common divisor function with Haskell using until

I need help proggraming a function in Haskell to calculate the greatest common divisor. The problem is I need to use the until function which I'm not being able to implement. I tried using the following code (that didn't work):
mygcd a b = until (==0) (`mod` (a b)) b
Thank for your help!
I would try something like this (pseudocode)
mygcd a b = finalize (until endState nextState (a,b))
where
finalize (x,y) = ...
endState (x,y) = some condition here
nextState (x,y) = (x',y') computed in some way using mod
mygcd a b = until (==0) (`mod` (a b)) b
> :t until
until :: (a -> Bool) -> (a -> a) -> a -> a
So we have our predicate (==0), our function (`mod`(a b)), and starting value b. My first issue with this, however: this won't terminate until the predicate holds true, so it can never produce anything but 0. What does that tell us? We also see that a must be of type Integral a => a -> a since it's applied to b to get something we can use in b `mod` x. And repeating that won't yield a different result, so the expression either produces 0 immediately or after one mod, or never finishes.
I think, to get something useful out of until, you need a more complicated predicate than an equality check. Perhaps a question like "what's the lowest multiple of 13 over 100", or you might make a a tuple and check one part of it. It looks a bit to me like:
until p f v = head (dropWhile (not . p) (iterate f v))

Understanding `ap` in a point-free function in Haskell

I am able to understand the basics of point-free functions in Haskell:
addOne x = 1 + x
As we see x on both sides of the equation, we simplify it:
addOne = (+ 1)
Incredibly it turns out that functions where the same argument is used twice in different parts can be written point-free!
Let me take as a basic example the average function written as:
average xs = realToFrac (sum xs) / genericLength xs
It may seem impossible to simplify xs, but http://pointfree.io/ comes out with:
average = ap ((/) . realToFrac . sum) genericLength
That works.
As far as I understand this states that average is the same as calling ap on two functions, the composition of (/) . realToFrac . sum and genericLength
Unfortunately the ap function makes no sense whatsoever to me, the docs http://hackage.haskell.org/package/base-4.8.1.0/docs/Control-Monad.html#v:ap state:
ap :: Monad m => m (a -> b) -> m a -> m b
In many situations, the liftM operations can be replaced by uses of ap,
which promotes function application.
return f `ap` x1 `ap` ... `ap` xn
is equivalent to
liftMn f x1 x2 ... xn
But writing:
let average = liftM2 ((/) . realToFrac . sum) genericLength
does not work, (gives a very long type error message, ask and I'll include it), so I do not understand what the docs are trying to say.
How does the expression ap ((/) . realToFrac . sum) genericLength work? Could you explain ap in simpler terms than the docs?
Any lambda term can be rewritten to an equivalent term that uses just a set of suitable combinators and no lambda abstractions. This process is called abstraciton elimination. During the process you want to remove lambda abstractions from inside out. So at one step you have λx.M where M is already free of lambda abstractions, and you want to get rid of x.
If M is x, you replace λx.x with id (id is usually denoted by I in combinatory logic).
If M doesn't contain x, you replace the term with const M (const is usually denoted by K in combinatory logic).
If M is PQ, that is the term is λx.PQ, you want to "push" x inside both parts of the function application so that you can recursively process both parts. This is accomplished by using the S combinator defined as λfgx.(fx)(gx), that is, it takes two functions and passes x to both of them, and applies the results together. You can easily verify that that λx.PQ is equivalent to S(λx.P)(λx.Q), and we can recursively process both subterms.
As described in the other answers, the S combinator is available in Haskell as ap (or <*>) specialized to the reader monad.
The appearance of the reader monad isn't accidental: When solving the task of replacing λx.M with an equivalent function is basically lifting M :: a to the reader monad r -> a (actually the reader Applicative part is enough), where r is the type of x. If we revise the process above:
The only case that is actually connected with the reader monad is when M is x. Then we "lift" x to id, to get rid of the variable. The other cases below are just mechanical applications of lifting an expression to an applicative functor:
The other case λx.M where M doesn't contain x, it's just lifting M to the reader applicative, which is pure M. Indeed, for (->) r, pure is equivalent to const.
In the last case, <*> :: f (a -> b) -> f a -> f b is function application lifted to a monad/applicative. And this is exactly what we do: We lift both parts P and Q to the reader applicative and then use <*> to bind them together.
The process can be further improved by adding more combinators, which allows the resulting term to be shorter. Most often, combinators B and C are used, which in Haskell correspond to functions (.) and flip. And again, (.) is just fmap/<$> for the reader applicative. (I'm not aware of such a built-in function for expressing flip, but it'd be viewed as a specialization of f (a -> b) -> a -> f b for the reader applicative.)
Some time ago I wrote a short article about this: The Monad Reader Issue 17, The Reader Monad and Abstraction Elimination.
When the monad m is (->) a, as in your case, you can define ap as follows:
ap f g = \x -> f x (g x)
We can see that this indeed "works" in your pointfree example.
average = ap ((/) . realToFrac . sum) genericLength
average = \x -> ((/) . realToFrac . sum) x (genericLength x)
average = \x -> (/) (realToFrac (sum x)) (genericLength x)
average = \x -> realToFrac (sum x) / genericLength x
We can also derive ap from the general law
ap f g = do ff <- f ; gg <- g ; return (ff gg)
that is, desugaring the do-notation
ap f g = f >>= \ff -> g >>= \gg -> return (ff gg)
If we substitute the definitions of the monad methods
m >>= f = \x -> f (m x) x
return x = \_ -> x
we get the previous definition of ap back (for our specific monad (->) a). Indeed:
app f g
= f >>= \ff -> g >>= \gg -> return (ff gg)
= f >>= \ff -> g >>= \gg -> \_ -> ff gg
= f >>= \ff -> g >>= \gg _ -> ff gg
= f >>= \ff -> \x -> (\gg _ -> ff gg) (g x) x
= f >>= \ff -> \x -> (\_ -> ff (g x)) x
= f >>= \ff -> \x -> ff (g x)
= f >>= \ff x -> ff (g x)
= \y -> (\ff x -> ff (g x)) (f y) y
= \y -> (\x -> f y (g x)) y
= \y -> f y (g y)
The Simple Bit: fixing liftM2
The problem in the original example is that ap works a bit differently from the liftM functions. ap takes a function wrapped up in a monad, and applies it to an argument wrapped up in a monad. But the liftMn functions take a "normal" function (one which is not wrapped up in a monad) and apply it to argument(s) that are wrapped up in monads.
I'll explain more about what that means below, but the upshot is that if you want to use liftM2, then you have to pull (/) out and make it a separate argument at the beginning. (So in this case (/) is the "normal" function.)
let average = liftM2 ((/) . realToFrac . sum) genericLength -- does not work
let average = liftM2 (/) (realToFrac . sum) genericLength -- works
As posted in the original question, calling liftM2 should involve three agruments: liftM2 f x1 x2. Here the f is (/), x1 is (realToFrac . sum) and x2 is genericLength.
The version posted in the question (the one which doesn't work) was trying to call liftM2 with only two arguments.
The explanation
I'll build this up in a few stages. I'll start with some specific values, and build up to a function that can take any set of values. Jump to the last section for the TL:DR
In this example, lets assume the list of numbers is [1,2,3,4]. The sum of these numbers is 10, and the length of the list is 4. The average is 10/4 or 2.5.
To shoe-horn this into the right form for ap, we're going to break this into a function, an input, and a result.
ourFunction = (10/) -- "divide 10 by"
ourInput = 4
ourResult = 2.5
Three kinds of Function Application
ap and listM both involve monads. At this point in the explanation, you can think of a monad as something that a value can be "wrapped up in". I'll give a better definition below.
Normal function application applies a normal function to a normal input. liftM applies a normal function to an input wrapped in a monad, and ap applies a function wrapped in a monad to an input wrapped in a monad.
(10/) 4 -- returns 2.5
liftM (10/) monad(4) -- returns monad(2.5)
ap monad(10/) monad(4) -- returns monad(2.5)
(Note that this is pseudocode. monad(4) is not actually valid Haskell).
(Note that liftM is a different function from liftM2, which was used earlier. liftM takes a function and only one argument, which is a better fit for the pattern i'm describing.)
In the average function defined above, the monads were functions, but "functions-as-monads" can be hard to talk about, so I'll start with simpler examples.
So what's a monad?
A better description of a monad is "something which contains a value, or produces a value, or which you can somehow extract a value from, but which also has something more complicated going on."
That's a really vague description, but it kind of has to be, because the "something more complicated" can be a lot of different things.
Monads can be confusing, but the point of them is that when you use monad operations (like ap and liftM) they will take care of the "something more complicated" for you, so you can just concentrate on the values.
That's probably still not very clear, so let's do some examples:
The Maybe monad
ap (Just (10/)) (Just 4) -- result is (Just 2.5)
One of the simplest monads is 'Maybe'. The value is whatever is contained inside a Just. So if we call ap and give it (Just ourFunction) and (Just ourInput) then we get back (Just ourResult).
The "something more complicated" is the fact that there might not be a value there at all, and you have to allow for the Nothing case.
As mentioned, the point of using a function like ap is that it takes care of these extra complications for us. With the Maybe monad, ap handles this by returning Nothing if either the Maybe-function or the Maybe-input were Nothing.
ap (Just (10/)) Nothing -- result is Nothing
ap Nothing (Just 4) -- result is Nothing
The List Monad
ap [(10/)] [4] -- result is [2.5]
With the list Monad, the value is whatever is inside the list. So ap [ourfunction] [ourInput] returns [ourResult].
The "something more complicated" is that there may be more than one thing inside the list (or exactly one thing, or nothing at all).
With lists, that means ap takes a list of zero or more functions, and a list of zero or more inputs. It handles that by returning a list of zero or more results: one result for every possible combination of function and input.
ap [(10/), (100/)] [5,4,2] -- result is [2.0, 2.5, 5.0, 20.0, 25.0, 50.0]
Functions as Monads
A function like genericLength is considered a Monad because it has a value (the function's output), and it has a "something more complicated" (the fact that you have to supply an input before you can get the value).
This is where it gets a little confusing, because we're dealing with multiple functions, multiple inputs, and multiple results. It is all well defined, it's just hard to describe, so we have to be careful with our terminology.
Lets start with the list [1,2,3,4], and call that our "original input". That's the list we're trying to find the average of. It's the xs argument in the original average function.
If we give our original input ([1,2,3,4]) to genericLength then we get a value of '4'.
Our other function is ((/) . realToFrac . sum). It takes our list [1,2,3,4] and finds the sum (10), turns that into a fractional value, and then feeds it as the first argument to (/). The result is an incomplete division function that is waiting for another argument. ie it takes [1,2,3,4] as an input, and produces (10/) as its output.
This all fits with the way ap is defined for functions. With functions, ap takes two things. The first is a function that reads the original input and produces a new function. The second is a function that reads the original input and produces a new input. The final result is a function that takes the original input, and returns the same thing you would get if you applied the new function to the new input.
You might have to read that a few times to make sense of it. Alternatively, here it is in pseudocode:
average =
ap
(functionThatTakes [1,2,3,4] and returns "(10/)" )
(functionThatTakes [1,2,3,4] and returns " 4 " )
-- which means:
average =
(functionThatTakes [1,2,3,4] and returns "2.5" )
If you compare this to the simpler examples above, you'll see that it still has our function (10/), our input 4 and our result 2.5. And each of them is once again wrapped up in the "something more complicated". In this case, the "something more complicated" is the "function that takes [1,2,3,4] and returns...".
Of course, since they're functions, they don't have to take [1,2,3,4] as their input. If they took a different list of integers (eg [1,2,3,4,5]) then we would get different results (e.g. new function: (15/), new input 5 and new value 3).
Other examples
minPlusMax = ap ((+) . minimum) maximum
-- a function that adds the minimum element of a list, to the maximum element
upperAndLower = ap ((,) . toUpper) toLower
-- a function that takes a Char and returns a tuple, with the upper case and lower case versions of a character
These could all also be defined using liftM2.
average = liftM2 (/) sum genericLength
minPlusMax = liftM2 (+) minimum maximum
upperAndLower = liftM2 (,) toUpper toLower

Haskell equivalent of -rectypes

What is the GHC equivalent of OCaml's -rectypes for allowing recursive types? I don't see one in the documentation. Is it a hidden feature?
There isn't one unfortunately, all recursion must go through a data type. However, if you're willing to put up with a bit of headache you can still write recursive types pretty easily.
newtype RecArr b a = RecArr {unArr :: RecArr b a -> b}
unfold = unArr
fold = RecArr
Now we can fold and unfold our RecArr to unfold our recursion to our hearts content. This is a little painful because it's manual, but completely workable. As a demonstration, here's the y combinator written using fold and unfold.
y f = (\x -> f (unfold x x)) $ fold (\x -> f (unfold x x))
factorial f n = if n == 0 then 1 else n * f (n-1)
main = print (y factorial 5) -- prints 120
There is none. All recursion has to go through nominal types. That is, you have to define a data type.

Inverting a fold

Suppose for a minute that we think the following is a good idea:
data Fold x y = Fold {start :: y, step :: x -> y -> y}
fold :: Fold x y -> [x] -> y
Under this scheme, functions such as length or sum can be implemented by calling fold with the appropriate Fold object as argument.
Now, suppose you want to do clever optimisation tricks. In particular, suppose you want to write
unFold :: ([x] -> y) -> Fold x y
It should be relatively easy to rule a RULES pragma such that fold . unFold = id. But the interesting question is... can we actually implement unFold?
Obviously you can use RULES to apply arbitrary code transformations, whether or not they preserve the original meaning of the code. But can you really write an unFold implementation which actually does what its type signature suggests?
No, it's not possible. Proof: let
f :: [()] -> Bool
f[] = False
f[()] = False
f _ = True
First we must, for f' = unFold f, have start f' = False, because when folding over the empty list we directly get the start value. Then we must require step f' () False = False to achieve fold f' [()] = False. But when now evaluating fold f' [(),()], we would again only get a call step f' () False, which we had to define as False, leading to fold f' [(),()] ≡ False, whereas f[(),()] ≡ True. So there exists no unFold f that fulfills fold $ unFold f ≡ f.                                                                                                                                              □
You can, but you need to make a slight modification to Fold in order to pull it off.
All functions on lists can be expressed as a fold, but sometimes to accomplish this, extra bookkeeping is needed. Suppose we add an additional type parameter to your Fold type, which passes along this additional contextual information.
data Fold a c r = Fold { _start :: (c, r), _step :: a -> (c,r) -> (c,r) }
Now we can implement fold like so
fold :: Fold a c r -> [a] -> r
fold (Fold step start) = snd . foldr step start
Now what happens when we try to go the other way?
unFold :: ([a] -> r) -> Fold a c r
Where does the c come from? Functions are opaque values, so it's hard to know how to inspect a function and know which contextual information it relies on. So, let's cheat a little. We're going to have the "contextual information" be the entire list, so then when we get to the leftmost element, we can just apply the function to the original list, ignoring the prior cumulative results.
unFold :: ([a] -> r) -> Fold a [a] r
unFold f = Fold { _start = ([], f [])
, _step = \a (c, _r) -> let c' = a:c in (c', f c') }
Now, sadly, this does not necessarily compose with fold, because it requires that c must be [a]. Let's fix that by hiding c with existential quantification.
{-# LANGUAGE ExistentialQuantification #-}
data Fold a r = forall c. Fold
{ _start :: (c,r)
, _step :: a -> (c,r) -> (c,r) }
fold :: Fold a r -> [a] -> r
fold (Fold start step) = snd . foldr step start
unFold :: ([a] -> r) -> Fold a r
unFold f = Fold start step where
start = ([], f [])
step a (c, _r) = let c' = a:c in (c', f c')
Now, it should always be true that fold . unFold = id. And, given a relaxed notion of equality for the Fold data type, you could also say that unFold . fold = id. You can even provide a smart constructor that acts like the old Fold constructor:
makeFold :: r -> (a -> r -> r) -> Fold a r
makeFold start step = Fold start' step' where
start' = ((), start)
step' a ((), r) = ((), step a r)
tl;dr:
Conclusion 1: you can't
What you asked for originally isn't possible, at least not by any version of what you wanted I can come up with. (See below.)
If change your data type to allow me to store intermediate calculations, I think I'll be fine, but even then,
the function unFold would be rather inefficient, which seems to run counter to your clever optimisation tricks agenda!
Conclusion 2: I don't think it achieves what you want, even if you work around it by changing the types
Any optimisation of the list algorithm would be subject to the problem that you've calculated the step function using the original unoptimised function, and quite probably in a complicated way.
Since there's no equality on functions, optimising step to something efficient isn't possible. I think you need a human to do unFold, not a compiler.
Anyway, back to the original question:
Could fold . unFold = id ?
No. Suppose we have
isSingleton :: [a] -> Bool
isSingleton [x] = True
isSingleton _ = False
then if we had unFold :: ([x] -> y) -> Fold x y then if foldSingleton was the same as unFold isSingleton would need to have
foldSingleton = Fold {start = False , step = ???}
Where step takes an element of the list and updates the result.
Now isSingleton "a" == True, we need
step False = True
and because isSingleton "ab" == False, we need
step True = False
so step = not would do so far, but also isSingleton "abc" == False so we also need
step False = False
Since there are functions ([x] -> y) that cannot be represented by a value of type Fold x y, there cannot exist a function unFold :: ([x] -> y) -> Fold x y such that fold . unFold = id, because id is a total function.
Edit:
It turns out you're not convinced by this, because you only expected unFold to work on functions that had a representation as a fold, so maybe you meant unFold.fold = id.
Could unFold . fold = id ?
No.
Even if you just want unFold to work on functions ([x] -> y) that can be obtained using fold :: Fold x y -> ([x] -> y), I don't think it's possible. Let's address the question by assuming now we have defined
combine :: X -> Y -> Y
initial :: Y
folded :: [X] -> Y
folded = fold $ Fold initial combine
Recovering the value initial is trivial: initial = folded [].
Recovery of the original combine is not, because there's no way to go from a function that gives you some values of Y to one which combines arbitrary values of Y.
For an example, if we had X = Y = Int and I defined
combine x y | y < 0 = -10
| otherwise = y + 1
initial = 0
then since combine just adds one to y every time you use it on positive y, and the initial value is 0, folded is indistinguishable from length in terms of its output. Notice that since folded xs is never negative, it's also impossible to define a function unFold :: ([x] -> y) -> Fold x y that ever recovers our combine function. This boils down to the fact that fold is not injective; it carries different values of type Fold x y to the same value of type [x] -> y.
Thus I've proved two things: if unFold :: ([x] -> y) -> Fold x y then both fold.unFold /= id and now also unFold.fold /= id
I bet you're not convinced by this either, because you don't really care whether you got Fold 0 (\_ y -> y+1) or Fold 0 combine back from unFold folded, seeing as they have the same value when refolded! Let's narrow the goalposts one more time. Perhaps you want unFold to work whenever the function is obtainable via fold, and you're happy for it not to give you inconsistent answers as long as when you fold the result again, you get the same function. I can summarise that with this next question:
Could fold . unFold . fold = fold ?
i.e. Could you define unFold so that fold.unFold is the identity on the set of functions obtainable via fold?
I'm really convinced this isn't possible, because it's not a tractible problem to calculate the step function without retaining extra information about intermediate values on sublists.
Suppose we had
unFold f = Fold {start = f [], step = recoverstep f}
we need
recoverstep f x1 initial == f [x1]
so if there's an Eq instance for x (ring the alarm bells!), then recoverstep must have the same effect as
recoverstep f x1 y | y == initial = f [x1]
also we need
recoverstep f x2 (f [x1]) == f [x1,x2]
so if there's an Eq instance for x, then recoverstep must have the same effect as
recoverstep f x2 y | y == (f [x1]) = f [x1,x2]
but there's a massive problem here: the variable x1 is free in the right hand side of this equation.
This means that logically, we can't tell what value the step function should have on an x unless we already
know what values it has been used on. We would need to store the values of f [x1], f [x1,x2] etc in the Fold
data type to make it work, and this is the clincher as to why we can't define unFold. If you change the data type Fold
to allow us to store information about intermediate lists, I can see it would work, but as it stands it's impossible
to recover the context.
Similar to Dan's answer, but using a slightly different approach. Instead of pairing the accumulator with partial results which will be thrown away at the end, we add a "post-processing" function which will convert from the accumulator type to the final result.
The same "cheat" for unFold just does all the work in the post-processing step:
{-# LANGUAGE ExistentialQuantification #-}
data Fold a r = forall c. Fold
{ _start :: c
, _step :: a -> c -> c
, _result :: c -> r }
fold :: Fold a r -> [a] -> r
fold (Fold start step result) = result . foldr step start
unFold :: ([a] -> r) -> Fold a r
unFold f = Fold [] (:) f
makeFold :: r -> (a -> r -> r) -> Fold a r
makeFold start step = Fold start step id

CPS in curried languages

How does CPS in curried languages like lambda calculus or Ocaml even make sense? Technically, all function have one argument. So say we have a CPS version of addition in one such language:
cps-add k n m = k ((+) n m)
And we call it like
(cps-add random-continuation 1 2)
This is then the same as:
(((cps-add random-continuation) 1) 2)
I already see two calls there that aren't tail calls and in reality a complexly nested expression, the (cps-add random-continuation) returns a value, namely a function that consumes a number, and then returns a function which consumes another number and then delivers the sum of both to that random-continuation. But we can't work around this value returning by simply translating this into CPS again, because we can only give each function one argument. We need to have at least two to make room for the continuation and the 'actual' argument.
Or am I missing something completely?
Since you've tagged this with Haskell, I'll answer in that regard: In Haskell, the equivalent of doing a CPS transform is working in the Cont monad, which transforms a value x into a higher-order function that takes one argument and applies it to x.
So, to start with, here's 1 + 2 in regular Haskell: (1 + 2) And here it is in the continuation monad:
contAdd x y = do x' <- x
y' <- y
return $ x' + y'
...not terribly informative. To see what's going on, let's disassemble the monad. First, removing the do notation:
contAdd x y = x >>= (\x' -> y >>= (\y' -> return $ x' + y'))
The return function lifts a value into the monad, and in this case is implemented as \x k -> k x, or using an infix operator section as \x -> ($ x).
contAdd x y = x >>= (\x' -> y >>= (\y' -> ($ x' + y')))
The (>>=) operator (read "bind") chains together computations in the monad, and in this case is implemented as \m f k -> m (\x -> f x k). Changing the bind function to prefix form and substituting in the lambda, plus some renaming for clarity:
contAdd x y = (\m1 f1 k1 -> m1 (\a1 -> f1 a1 k1)) x (\x' -> (\m2 f2 k2 -> m2 (\a2 -> f2 a2 k2)) y (\y' -> ($ x' + y')))
Reducing some function applications:
contAdd x y = (\k1 -> x (\a1 -> (\x' -> (\k2 -> y (\a2 -> (\y' -> ($ x' + y')) a2 k2))) a1 k1))
contAdd x y = (\k1 -> x (\a1 -> y (\a2 -> ($ a1 + a2) k1)))
And a bit of final rearranging and renaming:
contAdd x y = \k -> x (\x' -> y (\y' -> k $ x' + y'))
In other words: The arguments to the function have been changed from numbers, into functions that take a number and return the final result of the entire expression, just as you'd expect.
Edit: A commenter points out that contAdd itself still takes two arguments in curried style. This is sensible because it doesn't use the continuation directly, but not necessary. To do otherwise, you'd need to first break the function apart between the arguments:
contAdd x = x >>= (\x' -> return (\y -> y >>= (\y' -> return $ x' + y')))
And then use it like this:
foo = do f <- contAdd (return 1)
r <- f (return 2)
return r
Note that this is really no different from the earlier version; it's simply packaging the result of each partial application as taking a continuation, not just the final result. Since functions are first-class values, there's no significant difference between a CPS expression holding a number vs. one holding a function.
Keep in mind that I'm writing things out in very verbose fashion here to make explicit all the steps where something is in continuation-passing style.
Addendum: You may notice that the final expression looks very similar to the de-sugared version of the monadic expression. This is not a coincidence, as the inward-nesting nature of monadic expressions that lets them change the structure of the computation based on previous values is closely related to continuation-passing style; in both cases, you have in some sense reified a notion of causality.
Short answer : of course it makes sense, you can apply a CPS-transform directly, you will only have lots of cruft because each argument will have, as you noticed, its own attached continuation
In your example, I will consider that there is a +(x,y) uncurried primitive, and that you're asking what is the translation of
let add x y = +(x,y)
(This add faithfully represent OCaml's (+) operator)
add is syntaxically equivalent to
let add = fun x -> (fun y -> +(x, y))
So you apply a CPS transform¹ and get
let add_cps = fun x kx -> kx (fun y ky -> ky +(x,y))
If you want a translated code that looks more like something you could have willingly written, you can devise a finer transformation that actually considers known-arity function as non-curried functions, and tream all parameters as a whole (as you have in non-curried languages, and as functional compilers already do for obvious performance reasons).
¹: I wrote "a CPS transform" because there is no "one true CPS translation". Different translations have been devised, producing more or less continuation-related garbage. The formal CPS translations are usually defined directly on lambda-calculus, so I suppose you're having a less formal, more hand-made CPS transform in mind.
The good properties of CPS (as a style that program respect, and not a specific transformation into this style) are that the order of evaluation is completely explicit, and that all calls are tail-calls. As long as you respect those, you're relatively free in what you can do. Handling curryfied functions specifically is thus perfectly fine.
Remark : Your (cps-add k 1 2) version can also be considered tail-recursive if you assume the compiler detect and optimize that cps-add actually always take 3 arguments, and don't build intermediate closures. That may seem far-fetched, but it's the exact same assumption we use when reasoning about tail-calls in non-CPS programs, in those languages.
yes, technically all functions can be decomposed into functions with one method, however, when you want to use CPS the only thing you are doing is saying is that at a certain point of computation, run the continuation method.
Using your example, lets have a look. To make things a little easier, let's deconstruct cps-add into its normal form where it is a function only taking one argument.
(cps-add k) -> n -> m = k ((+) n m)
Note at this point that the continuation, k, is not being evaluated (Could this be the point of confusion for you?).
Here we have a method called cps-add k that receives a function as an argument and then returns a function that takes another argument, n.
((cps-add k) n) -> m = k ((+) n m)
Now we have a function that takes an argument, m.
So I suppose what I am trying to point out is that currying does not get in the way of CPS style programming. Hope that helps in some way.

Resources