Related
This question already has answers here:
Defining a function by equations with different number of arguments
(3 answers)
Closed 1 year ago.
While trying to define a Haskell function which corresponds to repeated evaluation, I ran into an "Equations give different arities" error. This is my code:
nEvals :: Int -> (a -> a) -> a -> a
nEvals 1 = ($) --equivalent to nEvals 1 f x = f x
nEvals k f = (nEvals (k - 1) f) . f --equivalent to nEvals k f x = nEvals (k - 1) f (f x)
I don't quite understand why Haskell is giving this error, since I'm using pattern matching, not explicitly assigning a value for a given set of inputs. Perhaps Haskell only allows pattern matching on a fixed number of parameters? If so, could someone explain why that decision was made?
You should specify the same number of parameters in all the clauses of the definition of your function. This thus means that you implement this as:
nEvals :: Int -> (a -> a) -> a -> a
nEvals 1 f = f -- added f as parameter
nEvals k f = (nEvals (k - 1) f) . f
we can also reduce the number of parameters with:
nEvals :: Int -> (a -> a) -> a -> a
nEvals 1 = id
nEvals k = (.) =<< nEvals (k - 1)
The Haskell Report section 4.4.3.1 requires it:
A function binding binds a variable to a function value. The general form of a function binding for variable x is:
x p11 … p1k match1
…
x pn1 … pnk matchn
where each pij is a pattern, [...]
Note that all clauses defining a function must be contiguous, and the number of patterns in each clause must be the same.
It doesn't explicitly say why, but it does state that:
Translation: The general binding form for functions is semantically equivalent to the equation (i.e. simple pattern binding):
x = \ x1 … xk -> case (x1, …, xk) of
(p11, …, p1k) match1
…
(pn1, …, pnk) matchn
This translation would not be possible with different numbers of patterns.
It might have been possible to instead define the behavior in terms of e.g.
case (x11, …, x1k) of
(p11, …, p1k) match1
_ -> case (x21, …, x2k) of
(p21, …, p2k) match2
_ -> ...
but it seems like a mess that it's harder for the compiler to work efficiently with, for very little benefit.
Is there a way to compare two functions for equality? For example, (λx.2*x) == (λx.x+x) should return true, because those are obviously equivalent.
It's pretty well-known that general function equality is undecidable in general, so you'll have to pick a subset of the problem that you're interested in. You might consider some of these partial solutions:
Presburger arithmetic is a decidable fragment of first-order logic + arithmetic.
The universe package offers function equality tests for total functions with finite domain.
You can check that your functions are equal on a whole bunch of inputs and treat that as evidence for equality on the untested inputs; check out QuickCheck.
SMT solvers make a best effort, sometimes responding "don't know" instead of "equal" or "not equal". There are several bindings to SMT solvers on Hackage; I don't have enough experience to suggest a best one, but Thomas M. DuBuisson suggests sbv.
There's a fun line of research on deciding function equality and other things on compact functions; the basics of this research is described in the blog post Seemingly impossible functional programs. (Note that compactness is a very strong and very subtle condition! It's not one that most Haskell functions satisfy.)
If you know your functions are linear, you can find a basis for the source space; then every function has a unique matrix representation.
You could attempt to define your own expression language, prove that equivalence is decidable for this language, and then embed that language in Haskell. This is the most flexible but also the most difficult way to make progress.
This is undecidable in general, but for a suitable subset, you can indeed do it today effectively using SMT solvers:
$ ghci
GHCi, version 8.0.1: http://www.haskell.org/ghc/ :? for help
Prelude> :m Data.SBV
Prelude Data.SBV> (\x -> 2 * x) === (\x -> x + x :: SInteger)
Q.E.D.
Prelude Data.SBV> (\x -> 2 * x) === (\x -> 1 + x + x :: SInteger)
Falsifiable. Counter-example:
s0 = 0 :: Integer
For details, see: https://hackage.haskell.org/package/sbv
In addition to practical examples given in the other answer, let us pick the subset of functions expressible in typed lambda calculus; we can also allow product and sum types. Although checking whether two functions are equal can be as simple as applying them to a variable and comparing results, we cannot build the equality function within the programming language itself.
ETA: λProlog is a logic programming language for manipulating (typed lambda calculus) functions.
2 years have passed, but I want to add a little remark to this question. Originally, I asked if there is any way to tell if (λx.2*x) is equal to (λx.x+x). Addition and multiplication on the λ-calculus can be defined as:
add = (a b c -> (a b (a b c)))
mul = (a b c -> (a (b c)))
Now, if you normalize the following terms:
add_x_x = (λx . (add x x))
mul_x_2 = (mul (λf x . (f (f x)))
You get:
result = (a b c -> (a b (a b c)))
For both programs. Since their normal forms are equal, both programs are obviously equal. While this doesn't work in general, it does work for many terms in practice. (λx.(mul 2 (mul 3 x)) and (λx.(mul 6 x)) both have the same normal forms, for example.
In a language with symbolic computation like Mathematica:
Or C# with a computer algebra library:
MathObject f(MathObject x) => x + x;
MathObject g(MathObject x) => 2 * x;
{
var x = new Symbol("x");
Console.WriteLine(f(x) == g(x));
}
The above displays 'True' at the console.
Proving two functions equal is undecidable in general but one can still prove functional equality in special cases as in your question.
Here's a sample proof in Lean
def foo : (λ x, 2 * x) = (λ x, x + x) :=
begin
apply funext, intro x,
cases x,
{ refl },
{ simp,
dsimp [has_mul.mul, nat.mul],
have zz : ∀ a : nat, 0 + a = a := by simp,
rw zz }
end
One can do the same in other dependently typed language such as Coq, Agda, Idris.
The above is a tactic style proof. The actual definition of foo (the proof) that gets generated is quite a mouthful to be written by hand:
def foo : (λ (x : ℕ), 2 * x) = λ (x : ℕ), x + x :=
funext
(λ (x : ℕ),
nat.cases_on x (eq.refl (2 * 0))
(λ (a : ℕ),
eq.mpr
(id_locked
((λ (a a_1 : ℕ) (e_1 : a = a_1) (a_2 a_3 : ℕ) (e_2 : a_2 = a_3), congr (congr_arg eq e_1) e_2)
(2 * nat.succ a)
(nat.succ a * 2)
(mul_comm 2 (nat.succ a))
(nat.succ a + nat.succ a)
(nat.succ a + nat.succ a)
(eq.refl (nat.succ a + nat.succ a))))
(id_locked
(eq.mpr
(id_locked
(eq.rec (eq.refl (0 + nat.succ a + nat.succ a = nat.succ a + nat.succ a))
(eq.mpr
(id_locked
(eq.trans
(forall_congr_eq
(λ (a : ℕ),
eq.trans
((λ (a a_1 : ℕ) (e_1 : a = a_1) (a_2 a_3 : ℕ) (e_2 : a_2 = a_3),
congr (congr_arg eq e_1) e_2)
(0 + a)
a
(zero_add a)
a
a
(eq.refl a))
(propext (eq_self_iff_true a))))
(propext (implies_true_iff ℕ))))
trivial
(nat.succ a))))
(eq.refl (nat.succ a + nat.succ a))))))
Agda manual on Inductive Data Types and Pattern Matching states:
To ensure normalisation, inductive occurrences must appear in strictly positive positions. For instance, the following datatype is not allowed:
data Bad : Set where
bad : (Bad → Bad) → Bad
since there is a negative occurrence of Bad in the argument to the constructor.
Why is this requirement necessary for inductive data types?
The data type you gave is special in that it is an embedding of the untyped lambda calculus.
data Bad : Set where
bad : (Bad → Bad) → Bad
unbad : Bad → (Bad → Bad)
unbad (bad f) = f
Let's see how. Recall, the untyped lambda calculus has these terms:
e := x | \x. e | e e'
We can define a translation [[e]] from untyped lambda calculus terms to Agda terms of type Bad (though not in Agda):
[[x]] = x
[[\x. e]] = bad (\x -> [[e]])
[[e e']] = unbad [[e]] [[e']]
Now you can use your favorite non-terminating untyped lambda term to produce a non-terminating term of type Bad. For example, we could translate (\x. x x) (\x. x x) to the non-terminating expression of type Bad below:
unbad (bad (\x -> unbad x x)) (bad (\x -> unbad x x))
Although the type happened to be a particularly convenient form for this argument, it can be generalized with a bit of work to any data type with negative occurrences of recursion.
An example how such a data type allows us to inhabit any type is given in Turner, D.A. (2004-07-28), Total Functional Programming, sect. 3.1, page 758 in Rule 2: Type recursion must be covariant."
Let's make a more elaborate example using Haskell. We'll start with a "bad" recursive data type
data Bad a = C (Bad a -> a)
and construct the Y combinator from it without any other form of recursion. This means that having such a data type allows us to construct any kind of recursion, or inhabit any type by an infinite recursion.
The Y combinator in the untyped lambda calculus is defined as
Y = λf.(λx.f (x x)) (λx.f (x x))
The key to it is that we apply x to itself in x x. In typed languages this is not directly possible, because there is no valid type x could possibly have. But our Bad data type allows this modulo adding/removing the constructor:
selfApp :: Bad a -> a
selfApp (x#(C x')) = x' x
Taking x :: Bad a, we can unwrap its constructor and apply the function inside to x itself. Once we know how to do this, it's easy to construct the Y combinator:
yc :: (a -> a) -> a
yc f = let fxx = C (\x -> f (selfApp x)) -- this is the λx.f (x x) part of Y
in selfApp fxx
Note that neither selfApp nor yc are recursive, there is no recursive call of a function to itself. Recursion appears only in our recursive data type.
We can check that the constructed combinator indeed does what it should. We can make an infinite loop:
loop :: a
loop = yc id
or compute let's say GCD:
gcd' :: Int -> Int -> Int
gcd' = yc gcd0
where
gcd0 :: (Int -> Int -> Int) -> (Int -> Int -> Int)
gcd0 rec a b | c == 0 = b
| otherwise = rec b c
where c = a `mod` b
How does CPS in curried languages like lambda calculus or Ocaml even make sense? Technically, all function have one argument. So say we have a CPS version of addition in one such language:
cps-add k n m = k ((+) n m)
And we call it like
(cps-add random-continuation 1 2)
This is then the same as:
(((cps-add random-continuation) 1) 2)
I already see two calls there that aren't tail calls and in reality a complexly nested expression, the (cps-add random-continuation) returns a value, namely a function that consumes a number, and then returns a function which consumes another number and then delivers the sum of both to that random-continuation. But we can't work around this value returning by simply translating this into CPS again, because we can only give each function one argument. We need to have at least two to make room for the continuation and the 'actual' argument.
Or am I missing something completely?
Since you've tagged this with Haskell, I'll answer in that regard: In Haskell, the equivalent of doing a CPS transform is working in the Cont monad, which transforms a value x into a higher-order function that takes one argument and applies it to x.
So, to start with, here's 1 + 2 in regular Haskell: (1 + 2) And here it is in the continuation monad:
contAdd x y = do x' <- x
y' <- y
return $ x' + y'
...not terribly informative. To see what's going on, let's disassemble the monad. First, removing the do notation:
contAdd x y = x >>= (\x' -> y >>= (\y' -> return $ x' + y'))
The return function lifts a value into the monad, and in this case is implemented as \x k -> k x, or using an infix operator section as \x -> ($ x).
contAdd x y = x >>= (\x' -> y >>= (\y' -> ($ x' + y')))
The (>>=) operator (read "bind") chains together computations in the monad, and in this case is implemented as \m f k -> m (\x -> f x k). Changing the bind function to prefix form and substituting in the lambda, plus some renaming for clarity:
contAdd x y = (\m1 f1 k1 -> m1 (\a1 -> f1 a1 k1)) x (\x' -> (\m2 f2 k2 -> m2 (\a2 -> f2 a2 k2)) y (\y' -> ($ x' + y')))
Reducing some function applications:
contAdd x y = (\k1 -> x (\a1 -> (\x' -> (\k2 -> y (\a2 -> (\y' -> ($ x' + y')) a2 k2))) a1 k1))
contAdd x y = (\k1 -> x (\a1 -> y (\a2 -> ($ a1 + a2) k1)))
And a bit of final rearranging and renaming:
contAdd x y = \k -> x (\x' -> y (\y' -> k $ x' + y'))
In other words: The arguments to the function have been changed from numbers, into functions that take a number and return the final result of the entire expression, just as you'd expect.
Edit: A commenter points out that contAdd itself still takes two arguments in curried style. This is sensible because it doesn't use the continuation directly, but not necessary. To do otherwise, you'd need to first break the function apart between the arguments:
contAdd x = x >>= (\x' -> return (\y -> y >>= (\y' -> return $ x' + y')))
And then use it like this:
foo = do f <- contAdd (return 1)
r <- f (return 2)
return r
Note that this is really no different from the earlier version; it's simply packaging the result of each partial application as taking a continuation, not just the final result. Since functions are first-class values, there's no significant difference between a CPS expression holding a number vs. one holding a function.
Keep in mind that I'm writing things out in very verbose fashion here to make explicit all the steps where something is in continuation-passing style.
Addendum: You may notice that the final expression looks very similar to the de-sugared version of the monadic expression. This is not a coincidence, as the inward-nesting nature of monadic expressions that lets them change the structure of the computation based on previous values is closely related to continuation-passing style; in both cases, you have in some sense reified a notion of causality.
Short answer : of course it makes sense, you can apply a CPS-transform directly, you will only have lots of cruft because each argument will have, as you noticed, its own attached continuation
In your example, I will consider that there is a +(x,y) uncurried primitive, and that you're asking what is the translation of
let add x y = +(x,y)
(This add faithfully represent OCaml's (+) operator)
add is syntaxically equivalent to
let add = fun x -> (fun y -> +(x, y))
So you apply a CPS transform¹ and get
let add_cps = fun x kx -> kx (fun y ky -> ky +(x,y))
If you want a translated code that looks more like something you could have willingly written, you can devise a finer transformation that actually considers known-arity function as non-curried functions, and tream all parameters as a whole (as you have in non-curried languages, and as functional compilers already do for obvious performance reasons).
¹: I wrote "a CPS transform" because there is no "one true CPS translation". Different translations have been devised, producing more or less continuation-related garbage. The formal CPS translations are usually defined directly on lambda-calculus, so I suppose you're having a less formal, more hand-made CPS transform in mind.
The good properties of CPS (as a style that program respect, and not a specific transformation into this style) are that the order of evaluation is completely explicit, and that all calls are tail-calls. As long as you respect those, you're relatively free in what you can do. Handling curryfied functions specifically is thus perfectly fine.
Remark : Your (cps-add k 1 2) version can also be considered tail-recursive if you assume the compiler detect and optimize that cps-add actually always take 3 arguments, and don't build intermediate closures. That may seem far-fetched, but it's the exact same assumption we use when reasoning about tail-calls in non-CPS programs, in those languages.
yes, technically all functions can be decomposed into functions with one method, however, when you want to use CPS the only thing you are doing is saying is that at a certain point of computation, run the continuation method.
Using your example, lets have a look. To make things a little easier, let's deconstruct cps-add into its normal form where it is a function only taking one argument.
(cps-add k) -> n -> m = k ((+) n m)
Note at this point that the continuation, k, is not being evaluated (Could this be the point of confusion for you?).
Here we have a method called cps-add k that receives a function as an argument and then returns a function that takes another argument, n.
((cps-add k) n) -> m = k ((+) n m)
Now we have a function that takes an argument, m.
So I suppose what I am trying to point out is that currying does not get in the way of CPS style programming. Hope that helps in some way.
This is how the Cont monad is defined:
newtype Cont r a = Cont { runCont :: (a -> r) -> r }
instance Monad (Cont r) where
return a = Cont ($ a)
m >>= k = Cont $ \c -> runCont m $ \a -> runCont (k a) c
Could you explain how and why this works? What is it doing?
The first thing to realize about the continuation monad is that, fundamentally, it's not really doing anything at all. It's true!
The basic idea of a continuation in general is that it represents the rest of a computation. Say we have an expression like this: foo (bar x y) z. Now, extract just the parenthesized portion, bar x y--this is part of the total expression, but it's not just a function we can apply. Instead, it's something we need to apply a function to. So, we can talk about the "rest of the computation" in this case as being \a -> foo a z, which we can apply to bar x y to reconstruct the complete form.
Now, it happens that this concept of "the rest of the computation" is useful, but it's awkward to work with, since it's something outside of the subexpression we're considering. To make things work better, we can turn things inside-out: extract the subexpression we're interested in, then wrap it in a function that takes an argument representing the rest of the computation: \k -> k (bar x y).
This modified version gives us a lot of flexibility--not only does it extract a subexpression from its context, but it lets us manipulate that outer context within the subexpression itself. We can think of it as a sort of suspended computation, giving us explicit control over what happens next. Now, how could we generalize this? Well, the subexpression is pretty much unchanged, so let's just replace it with a parameter to the inside-out function, giving us \x k -> k x--in other words, nothing more than function application, reversed. We could just as easily write flip ($), or add a bit of an exotic foreign language flavor and define it as an operator |>.
Now, it would be simple, albeit tedious and horribly obfuscating, to translate every piece of an expression to this form. Fortunately, there's a better way. As Haskell programmers, when we think building a computation within a background context the next thing we think is say, is this a monad? And in this case the answer is yes, yes it is.
To turn this into a monad, we start with two basic building blocks:
For a monad m, a value of type m a represents having access to a value of type a within the context of the monad.
The core of our "suspended computations" is flipped function application.
What does it mean to have access to something of type a within this context? It just means that, for some value x :: a, we've applied flip ($) to x, giving us a function that takes a function which takes an argument of type a, and applies that function to x. Let's say we have a suspended computation holding a value of type Bool. What type does this give us?
> :t flip ($) True
flip ($) True :: (Bool -> b) -> b
So for suspended computations, the type m a works out to (a -> b) -> b... which is perhaps an anticlimax, since we already knew the signature for Cont, but humor me for now.
An interesting thing to note here is that a sort of "reversal" also applies to the monad's type: Cont b a represents a function that takes a function a -> b and evaluates to b. As a continuation represents "the future" of a computation, so the type a in the signature represents in some sense "the past".
So, replacing (a -> b) -> b with Cont b a, what's the monadic type for our basic building block of reverse function application? a -> (a -> b) -> b translates to a -> Cont b a... the same type signature as return and, in fact, that's exactly what it is.
From here on out, everything pretty much falls directly out from the types: There's essentially no sensible way to implement >>= besides the actual implementation. But what is it actually doing?
At this point we come back to what I said initially: the continuation monad isn't really doing much of anything. Something of type Cont r a is trivially equivalent to something of just type a, simply by supplying id as the argument to the suspended computation. This might lead one to ask whether, if Cont r a is a monad but the conversion is so trivial, shouldn't a alone also be a monad? Of course that doesn't work as is, since there's no type constructor to define as a Monad instance, but say we add a trivial wrapper, like data Id a = Id a. This is indeed a monad, namely the identity monad.
What does >>= do for the identity monad? The type signature is Id a -> (a -> Id b) -> Id b, which is equivalent to a -> (a -> b) -> b, which is just simple function application again. Having established that Cont r a is trivially equivalent to Id a, we can deduce that in this case as well, (>>=) is just function application.
Of course, Cont r a is a crazy inverted world where everyone has goatees, so what actually happens involves shuffling things around in confusing ways in order to chain two suspended computations together into a new suspended computation, but in essence, there isn't actually anything unusual going on! Applying functions to arguments, ho hum, another day in the life of a functional programmer.
Here's fibonacci:
fib 0 = 0
fib 1 = 1
fib n = fib (n-1) + fib (n-2)
Imagine you have a machine without a call stack - it only allows tail recursion. How to execute fib on that machine? You could easily rewrite the function to work in linear, instead of exponential time, but that requires tiny bit of insight and is not mechanical.
The obstacle to making it tail recursive is the third line, where there are two recursive calls. We can only make a single call, which also has to give the result. Here's where continuations enter.
We'll make fib (n-1) take additional parameter, which will be a function specifying what should be done after computing its result, call it x. It will be adding fib (n-2) to it, of course. So: to compute fib n you compute fib (n-1) after that, if you call the result x, you compute fib (n-2), after that, if you call the result y, you return x+y.
In other words you have to tell:
How to do the following computation: "fib' n c = compute fib n and apply c to the result"?
The answer is that you do the following: "compute fib (n-1) and apply d to the result", where d x means "compute fib (n-2) and apply e to the result", where e y means c (x+y). In code:
fib' 0 c = c 0
fib' 1 c = c 1
fib' n c = fib' (n-1) d
where d x = fib' (n-2) e
where e y = c (x+y)
Equivalently, we can use lambdas:
fib' 0 = \c -> c 0
fib' 1 = \c -> c 1
fib' n = \c -> fib' (n-1) $ \x ->
fib' (n-2) $ \y ->
c (x+y)
To get actual Fibonacci use identity: fib' n id. You can think that the line fib (n-1) $ ... passes its result x to the next one.
The last three lines smell like a do block, and in fact
fib' 0 = return 0
fib' 1 = return 1
fib' n = do x <- fib' (n-1)
y <- fib' (n-2)
return (x+y)
is the same, up to newtypes, by definition of monad Cont. Note differences. There's \c -> at the beginning, instead of x <- ... there's ... $ \x -> and c instead of return.
Try writing factorial n = n * factorial (n-1) in a tail recursive style using CPS.
How does >>= work? m >>= k is equivalent to
do a <- m
t <- k a
return t
Making the translation back, in the same style as in fib', you get
\c -> m $ \a ->
k a $ \t ->
c t
simplifying \t -> c t to c
m >>= k = \c -> m $ \a -> k a c
Adding newtypes you get
m >>= k = Cont $ \c -> runCont m $ \a -> runCont (k a) c
which is on top of this page. It's complex, but if you know how to translate between do notation and direct use, you don't need to know exact definition of >>=! Continuation monad is much clearer if you look at do-blocks.
Monads and continuations
If you look at this usage of list monad...
do x <- [10, 20]
y <- [3,5]
return (x+y)
[10,20] >>= \x ->
[3,5] >>= \y ->
return (x+y)
([10,20] >>=) $ \x ->
([3,5] >>=) $ \y ->
return (x+y)
that looks like continuation! In fact, (>>=) when you apply one argument has type (a -> m b) -> m b which is Cont (m b) a. See sigfpe's Mother of all monads for explanation. I'd regard that as a good continuation monad tutorial, though it wasn't probably meant as it.
Since continuations and monads are so strongly related in both directions, I think what applies to monads applies to continuations: only hard work will teach you them, and not reading some burrito metaphor or analogy.
EDIT: Article migrated to the link below.
I've written up a tutorial directly addressing this topic that I hope you will find useful. (It certainly helped cement my understanding!) It's a bit too long to fit comfortably in a Stack Overflow topic, so I've migrated it to the Haskell Wiki.
Please see: MonadCont under the hood
I think the easiest way to get a grip on the Cont monad is to understand how to use its constructor. I'm going to assume the following definition for now, although the realities of the transformers package are slightly different:
newtype Cont r a = Cont { runCont :: (a -> r) -> r }
This gives:
Cont :: ((a -> r) -> r) -> Cont r a
so to build a value of type Cont r a, we need to give a function to Cont:
value = Cont $ \k -> ...
Now, k itself has type a -> r, and the body of the lambda needs to have type r. An obvious thing to do would be to apply k to a value of type a, and get a value of type r. We can do that, yes, but that's really only one of many things we can do. Remember that value need not be polymorphic in r, it might be of type Cont String Integer or something else concrete. So:
We could apply k to several values of type a, and combine the results somehow.
We could apply k to a value of type a, observe the result, and then decide to apply k to something else based on that.
We could ignore k altogether and just produce a value of type r ourselves.
But what does all this mean? What does k end up being? Well, in a do-block, we might have something looking like this:
flip runCont id $ do
v <- thing1
thing2 v
x <- Cont $ \k -> ...
thing3 x
thing4
Here's the fun part: we can, in our minds and somewhat informally, split the do-block in two at the occurrence of the Cont constructor, and think of the rest of the entire computation after it as a value in itself. But hold on, what it is depends on what x is, so it's really a function from a value x of type a to some result value:
restOfTheComputation x = do
thing3 x
thing4
In fact, this restOfTheComputation is roughly speaking what k ends up being. In other words, you call k with a value which becomes the result x of your Cont computation, the rest of the computation runs, and then the r produced winds its way back into your lambda as the result of the call to k. So:
if you called k multiple times, the rest of the computation will get run multiple times, and the results may be combined however you wish.
if you didn't call k at all, the rest of the entire computation will be skipped, and the enclosing runCont call will just give you back whatever value of type r you managed to synthesise. That is, unless some other part of the computation is calling you from their k, and messing about with the result...
If you're still with me at this point it should be easy to see this could be quite powerful. To make the point a little, let's implement some standard type classes.
instance Functor (Cont r) where
fmap f (Cont c) = Cont $ \k -> ...
We're given a Cont value with bind result x of type a, and a function f :: a -> b, and we want to make a Cont value with bind result f x of type b. Well, to set the bind result, just call k...
fmap f (Cont c) = Cont $ \k -> k (f ...
Wait, where do we get x from? Well, it's going to involve c, which we haven't used yet. Remember how c works: it gets given a function, and then calls that function with its bind result. We want to call our function with f applied to that bind result. So:
fmap f (Cont c) = Cont $ \k -> c (\x -> k (f x))
Tada! Next up, Applicative:
instance Applicative (Cont r) where
pure x = Cont $ \k -> ...
This one's simple. We want the bind result to be the x we get.
pure x = Cont $ \k -> k x
Now, <*>:
Cont cf <*> Cont cx = Cont $ \k -> ...
This a little trickier, but uses essentially the same ideas as in fmap: first get the function from the first Cont, by making a lambda for it to call:
Cont cf <*> Cont cx = Cont $ \k -> cf (\fn -> ...
Then get the value x from the second, and make fn x the bind result:
Cont cf <*> Cont cx = Cont $ \k -> cf (\fn -> cx (\x -> k (fn x)))
Monad is much the same, although requires runCont or a case or let to unpack the newtype.
This answer is already quite long, so I won't go into ContT (in short: it is exactly the same as Cont! The only difference is in the kind of the type constructor, the implementations of everything are identical) or callCC (a useful combinator that provides a convenient way to ignore k, implementing early exit from a sub-block).
For a simple and plausible application, try Edward Z. Yang's blog post implementing labelled break and continue in for loops.
Trying to complement the other answers:
Nested lambdas are horrible for readability. This is exactly the reason why let... in... and ... where ... exist, to get rid of nested lambdas by using intermediate variables. Using those, the bind implementation can be refactored into:
newtype Cont r a = Cont { runCont :: (a -> r) -> r }
instance Monad (Cont r) where
return a = Cont ($ a)
m >>= k = k a
where a = runCont m id
Which hopefully makes what is happening clearer. The return implementation boxes value with a lazy apply. Using runCont id applies id to the boxed value, which returns the original value.
For any monad where any boxed value can simply be unboxed, there is generally a trivial implementation of bind, which is to simply unbox the value and apply a monadic function to it.
To get the obfuscated implementation in the original question, first replace k a with Cont $ runCont (k a) , which in turn can be replaced with Cont $ \c-> runCont (k a) c
Now, we can move the where into a subexpression, so that we are left with
Cont $ \c-> ( runCont (k a) c where a = runCont m id )
The expression within parentheses can be desugared into \a -> runCont (k a) c $ runCont m id.
To finish, we use the property of runCont, f ( runCont m g) = runCont m (f.g), and we are back to the original obfuscated expression.