Combining the state monad with the costate comonad - haskell

How to combine the state monad S -> (A, S) with the costate comonad (E->A, E)?
I tried with both obvious combinations S -> ((E->A, E), S) and (E->S->(A, S), E) but then in either case I do not know how to define the operations (return, extract, ... and so on) for the combination.

Combining two monads O and I yields a monad if either O or I is copointed, i.e. have an extract method. Each comonad is copointed. If both O and I` are copointed, then you have two different "natural" ways to obtain a monad which are presumably not equivalent.
You have:
unit_O :: a -> O a
join_O :: O (O a) -> O a
unit_I :: a -> I a
join_I :: I (I a) -> I a
Here I've added _O and _I suffixed for clarity; in actual Haskell code, they would not be there, since the type checker figures this out on its own.
Your goal is to show that O (I O (I a))) is a monad. Let's assume that O is copointed, i.e. that there is a function extract_O :: O a -> a.
Then we have:
unit :: a -> O (I a)
unit = unit_O . unit_I
join :: O (I (O (I a))) -> O (I a)
The problem, of course, is in implementing join. We follow this strategy:
fmap over the outer O
use extract_O to get ride of the inner O
use join_I to combine the two I monads
This leads us to
join = fmap_O $ join_I . fmap_I extract
To make this work, you'll also need to define
newtype MCompose O I a = MCompose O (I a)
and add the respective type constructors and deconstructors into the definitions above.
The other alternative uses extract_I instead of extract_O. This version is even simpler:
join = join_O . fmap_O extract_I
This defines a new monad. I assume you can define a new comonad in the same way, but I haven't attempted this.

As the other answer demonstrates, both of the combinations S -> ((E->A, E), S) and (E->S->(A, S), E)
have Monad and Comonad instances simultaneously. In fact giving a Monad/Comonad instance is
equivalent to giving a
monoid structure to resp. its points ∀r.r->f(r) or its copoints ∀r.f(r)->r, at least in classical,
non-constructive sense (I don't know the constructive answer). This fact suggests that actually a
Functor f has a very good chance that it can
be both Monad and Comonad, provided its points and copoints are non-trivial.
The real question, however, is whether the Monad/Comonad instances constructed as such do have natural
computational/categorical meanings. In this particular case I would say "no", because you don't seem to have
a priori knowledge about how to compose them in a way that suit your computational needs.
The standard categorical way to compose two (co)monads is via adjunctions. Let me summarize your situation:
Fₑ Fₛ
--> -->
Hask ⊣ Hask ⊣ Hask
<-- <--
Gₑ Gₛ
Fₜ(a) = (a,t)
Gₜ(a) = (t->a)
Proof of Fₜ ⊣ Gₜ:
Fₜ(x) -> y ≃
(x,t) -> y ≃
x -> (t->y) ≃
x -> Gₜ(y)
Now you can see that the state monad (s->(a,s)) ≃ (s->a,s->s) is the composition GₛFₛ and the costate comonad
is FₑGₑ. This adjunction says that Hask can be interpreted as a model of the (co)state (co)algebras.
Now, 'adjunctions compose.' For example,
FₛFₑ(x) -> y ≃
Fₑ(x) -> Gₛ(y) ≃
x -> GₑGₛ(y)
So FₛFₑ ⊣ GₑGₛ. This gives a pair of a monad and a comonad, namely
T(a) = GₑGₛFₛFₑ(a)
= GₑGₛFₛ(a,e)
= GₑGₛ(a,e,s)
= Gₑ(s->(a,e,s))
= e->s->(a,e,s)
= ((e,s)->a, (e,s)->(e,s))
G(a) = FₛFₑGₑGₛ(a)
= FₛFₑGₑ(s->a)
= FₛFₑ(e->s->a)
= Fₛ(e->s->a,e)
= (e->s->a,e,s)
= ((e,s)->a, (e,s))
T is simply the state monad with the state (e,s), G is the costate comonad with the costate (e,s), so
these do have very natural meanings.
Composing adjunctions is a natural, frequent mathematical operation. For example, a geometric morphism
between topoi (kind of Cartesian Closed Categories which admit complex (non-free) constructions at the 'type level') is defined as a pair of adjunctions, only requiring its left adjoint to be left exact (i.e. preserves finite limits). If those topoi are sheaves on topological spaces,
composing the adjunctions simply corresponds to composing (unique) continuous base change maps (in the opposite direction), having a very natural meaning.
On the other hand, composing monads/comonads directly seems to be a very rare practice in Mathematics.
This is because often a (co)monad is thought of as a carrier of an (co)algebraic theory, rather than as a
model. In this interpretation the corresponding adjunctions are the models, not the monad. The problem
is that composing two theories requires another theory, a theory about how to compose them. For example,
imagine composing two theories of monoids. Then you may get at least two new theories,
namely the theory of lists of lists, or ring-like algebras where two kinds of binary operations distribute.
Neither is a priori better/more natural than the other. This is the meaning of "monads don't compose"; it doesn't say the composition cannot be a monad, but it does say you will need another theory how to compose them.
In contrast, composing adjunctions naturally results in another adjunction simply because by doing so you are
implicity specifying the rules of composing two given theories. So by taking the monad of the composed adjunction you get the theory that also specifies the rules of composition.

Related

How are free objects constructed?

So I understand that a free object is defined as being the left-hand side of an adjunction. But how does that lead you to the Haskell definition of such objects?
More concretely: given a "forgetful functor" from the category of monads to the category of endofunctors,
newtype Forget m a = Forget (m a)
instance Monad m => Functor (Forget m) where
fmap f (Forget x) = Forget (liftM f x)
then the free monad Free :: (* -> *) -> (* -> *) is a type admitting (a Monad instance and) the following isomorphism:
type f ~> g = forall x. f x -> g x
fwd :: (Functor f, Monad m) => (f ~> Forget m) -> (Free f ~> m)
bwd :: (Functor f, Monad m) => (Free f ~> m) -> (f ~> Forget m)
fwd . bwd = id = bwd . fwd
If we drop the Forgets, for the free monad in Control.Monad.Free we have fwd = foldFree and bwd = (. liftF) (I think?)
But how do these laws lead to the construction found in Control.Monad.Free? How do you come up with data Free f a = Return a | Free (f (Free f a))? Surely you don't just guess until you come up with something that satisfies the laws? Same question goes for the free category of a graph, the free monoid of a set, and any other free object you care to name.
I don't think the notion of "free" is as well-defined as you seem to believe. While I do think the general consensus is that it is indeed a left adjoint of a forgetful functor, the issue lies in what "forgetful" means. There are clear definitions in some broad-ranging cases, particularly for concrete categories.
Universal algebra provides a broad ranging approach which covers almost all "algebraic" structures (over sets). The upshot is given a "signature" which consists of sorts, operations, and equations, you build a term algebra (i.e. an AST) of the operations and then quotient it by the equivalence relation generated by the equations. This is the free algebra generated from that signature. For example, we usually talk about monoids as being a set equipped with an associative multiplication and unit. In code, the free algebra before quotienting would be:
data PreFreeMonoid a
= Unit
| Var a
| Mul (PreFreeMonid a) (PreFreeMonoid a)
We would then quotient by the equivalence relation generated from the equations:
Mul Unit x = x
Mul x Unit = x
Mul (Mul x y) z = Mul x (Mul y z)
But you can show that the resulting quotient type is isomorphic to lists. In the multi-sorted case, we'd have a family of term algebras, one for each sort.
One way to recast this categorically is to use the notion of a (slightly generalized) Lawvere theory. Given a signature with a set of sorts, S, we can build a small category, call it T, whose objects are lists of elements of S. These small categories will be called theories in general. Operations get mapped to arrows whose source and target correspond to the appropriate arities. We freely add "tupling" and "projection" arrows so that e.g. [A,B,A] becomes the product [A]×[B]×[A]. Finally, we add commutative diagrams (i.e. equations between arrows) correspond to each equation in the signature. At this point, T essentially represents the term algebra(s). In fact, an actual interpretation or model of this term algebra is just a finite product preserving functor T → Set, write Mod(T) for the category of finite product preserving functors from T → Set. In the single sorted case, we'd have an underlying set functor, but in general we get a S-indexed family of sets, i.e. we have a functor U : Mod(T) → SetS where we're viewing S as a discrete category here. U is simply U(m)(s) = m([s]). We can actually calculate the left adjoint. First, we have a family of sets indexed by elements of S, call it G. Then we need to build a finite product preserving functor T → Set, but any functor into Set (i.e. copresheaf) is a colimit of representables which, in this case, means it's a quotient of the following (dependent) sum type:
Free(G)(s) = Σt:T.T(t,s)×Free(G)(t)
If Free(G) is finite product preserving then in the t = [A,B] case, for example, we'd have:
T([A,B],s)×Free(G)([A,B]) = T([A,B],s)×Free(G)([A])×Free(G)([B])
and we simply define Free(G)([A]) = G(A) for each A in S producing:
T([A,B],s)×Free(G)([A])×Free(G)([B]) = T([A,B],s)×G(A)×G(B)
Altogether this says that, an element of Free(G)([A]) consists of an arrow of T into [A] and a list of elements of the appropriate sets corresponding to the source of that arrow, i.e. the arity of the term modulo equations that make it behave sensibly and obey the equations from the signature but which I'm not going to elaborate on. For the multiplication of a monoid, we'd have an arrow m : [A,A] → [A] and this would lead to a tuples (m, x, y) where x and y are elements of G(A) corresponding to an term like m(x, y). Recasting this definition of as a recursive one takes looking at the equations we're quotienting by.
There are other things to verify to show that Free &dashv; U but it isn't too hard. Once that's done, U&compfn;Free is a monad on SetS.
The nice thing about the Lawvere theory approach is that it is easy to generalize in multiple ways. One straightforward way is to replace Set by some other topos E. It's actually the case that the category of directed multigraphs form a topos, but I don't believe you can (easily) view categories as theories over Graph. A different direction to extend Lawvere theories is to consider doctrines other than finite product preserving functors, in particular finite limit preserving functors aka left exact or lex functors is an interesting point. Both small categories and directed multigraphs (which categorists sometimes call quivers) can be viewed as models of a category with finite limits. There's a straightforward inclusion of the theory of directed multigraphs into the theory of small categories. This, contravariantly, induces a functor cat → Graph simply by precomposition. The left adjoint of this is then (almost) the left Kan extensions along that inclusion. These left Kan extensions will occur in Set so ultimately they are just colimits which are just quotients of (dependent) sum types. (Technically, you need to verify that the resulting Kan extensions are finite limit preserving. We're also helped by the fact that the models of the theory of graphs are essentially arbitrary functors from the theory of graphs. This happens because the theory of graphs consists only of unary operations.)
None of this helps for free monads though. However, it turns out that one construction subsumes all of these including free monads. Returning to universal algebra, it's the case that every signature with no equations gives rise to a (polynomial) functor whose initial algebra is the free term algebra. Lambek's lemma suggests and it's easy to prove that the initial algebra is just colimit of repeated applications of the functor. The above general result is based on a similar approach and the relevant case for free monads is the unpointed endofunctor case, in which you start to see the definition of Free that you gave, but actually working it out fully requires unfolding many constructions.
Frankly, though, what I'm pretty sure actually happened in the FP world is the following. If you look at PreFreeMonoid, it's actually a free monad. PreFreeMonoid Void is the initial algebra for the functor the monoid signature (minus the equations) would give rise to. If you are familiar with using functors for initial algebras and you even start thinking about universal algebra, you are almost certainly going to end up defining a type like data Term f a = Var a | Op (f (Term f a)). It's easy to verify this is a monad once you think to ask the question. If you're even vaguely familiar with the relationship monads have to algebraic structures or to term substitution, then you may ask the question quite quickly. The same construction can be stumbled upon from a programming language implementation perspective. If you just directly set your goal to be deriving the free monad construction in Haskell, there are several intuitive ways to arrive at the right definition especially combined with some equational/parametricity-driven reasoning. In fact, the "monoid object in the category of endofunctors" one is quite suggestive.
('really wish this StackExchange had MathJax support.)

Are Monad instances uniquely determined by their Applicative instances? [duplicate]

As described this question/answers, Functor instances are uniquely determined, if they exists.
For lists, there are two well know Applicative instances: [] and ZipList. So Applicative isn't unique (see also Can GHC derive Functor and Applicative instances for a monad transformer? and Why is there no -XDeriveApplicative extension?). However, ZipList needs infinite lists, as its pure repeats a given element indefinitely.
Are there other, perhaps better examples of data structures that have at least two Applicative instances?
Are there any such examples that only involve finite data structures? That is, like if hypothetically Haskell's type system distinguished inductive and coinductive data types, would it be possible to uniquely determine Applicative?
Going further, if we could extend both [] and ZipList to a Monad, we'd have an example where a monad isn't uniquely determined by the data type and its Functor. Alas, ZipList has a Monad instance only if we restrict ourselves to infinite lists (streams).
And return for [] creates a single-element list, so it requires finite lists. Therefore:
Are Monad instances uniquely determined by the data type? Or is there an example of a data type that can have two distinct Monad instances?
In the case there is an example with two or more distinct instances, an obvious question arises, if they must/can have the same Applicative instance:
Are Monad instances uniquely determined by the Applicative instance, or is there an example of an Applicative that can have two distinct Monad instances?
Is there an example of a data type with two distinct Monad instances, each having a different Applicative super-instance?
And finally we can ask the same question for Alternative/MonadPlus. This is complicated by the fact that there are two distinct set of MonadPlus laws. Assuming we accept one of the set of laws (and for Applicative we accept right/left distributivity/absorption, see also this question),
is Alternative uniquely determined by Applicative, and MonadPlus by Monad, or are there any counter-examples?
If any of the above are unique, I'd be interested in knowing why, to have a hint of a proof. If not, an counter-example.
First, since Monoids are not unique, neither are Writer Monads or Applicatives. Consider
data M a = M Int a
then you can give it Applicative and Monad instances isomorphic to either of:
Writer (Sum Int)
Writer (Product Int)
Given a Monoid instance for a type s, another isomorphic pair with different Applicative/Monad instances is:
ReaderT s (Writer s)
State s
As for having one Applicative instance extend to two different Monads, I cannot remember any example. However, back when I tried to convince myself completely about whether ZipList really cannot be made a Monad, I found the following pretty strong restriction that holds for any Monad:
join (fmap (\x -> fmap (\y -> f x y) ys) xs) = f <$> xs <*> ys
That doesn't give join for all values though: in the case of lists the restricted values are the ones where all elements have the same length, i.e. lists of lists with "rectangular" shape.
(For Reader monads, where the "shape" of monadic values doesn't vary, these are in fact all the m (m x) values, so those do have unique extension. EDIT: Come to think of it, Either, Maybe and Writer also have only "rectangular" m (m x) values, so their extension from Applicative to Monad is also unique.)
I wouldn't be surprised if an Applicative with two Monads exists, though.
For Alternative/MonadPlus, I cannot recall any law for instances using the Left Distribution law instead of Left Catch, I see nothing preventing you from just swapping (<|>) with flip (<|>). I don't know if there's a less trivial variation.
ADDENDUM: I suddenly remembered I had found an example of an Applicative with two Monads. Namely, finite lists. There's the usual Monad [] instance, but you can then replace its join by the following function (essentially making empty lists "infectious"):
ljoin xs
| any null xs = []
| otherwise = concat xs
(Alas, the lists need to be finite because otherwise the null check will never finish, and that would ruin the join . fmap return == id monad law.)
This has the same value as join/concat on rectangular lists of lists, so will give the same Applicative. As I recall, it turns out that the first two monad laws are automatic from that, and you just need to check ljoin . ljoin == ljoin . fmap ljoin.
Given that every Applicative has a Backwards counterpart,
newtype Backwards f x = Backwards {backwards :: f x}
instance Applicative f => Applicative (Backwards f) where
pure x = Backwards (pure x)
Backwards ff <*> Backwards fs = Backwards (flip ($) <$> fs <*> ff)
it's unusual for Applicative to be uniquely determined, just as (and this is very far from unrelated) many sets extend to monoids in multiple ways.
In this answer, I set the exercise of finding at least four distinct valid Applicative instances for nonempty lists: I won't spoil it here, but I will give a big hint on how to hunt.
Meanwhile, in some wonderful recent work (which I saw at a summer school a few months ago), Tarmo Uustalu showed a rather neat way to get a handle on this problem, at least when the underlying functor is a container, in the sense of Abbott, Altenkirch and Ghani.
Warning: Dependent types ahead!
What is a container? If you have dependent types to hand, you can present container-like functors F uniformly, as being determined by two components
a set of shapes, S : Set
an S-indexed set of positions, P : S -> Set
Up to isomorphism, container data structures in F X are given by the dependent pair of some shape s : S, and some function e : P s -> X, which tells you the element located at each position. That is, we define the extension of a container
(S <| P) X = (s : S) * (P s -> X)
(which, by the way, looks a lot like a generalized power series if you read -> as reversed exponentiation). The triangle is supposed to remind you of a tree node sideways, with an element s : S labelling the apex, and the baseline representing the position set P s. We say that some functor is a container if it is isomorphic to some S <| P.
In Haskell, you can easily take S = F (), but constructing P can take quite a bit of type-hackery. But that is something you can try at home. You'll find that containers are closed under all the usual polynomial type-forming operations, as well as identity,
Id ~= () <| \ _ -> ()
composition, where a whole shape is made from just one outer shape and an inner shape for each outer position,
(S0 <| P0) . (S1 <| P1) ~= ((S0 <| P0) S1) <| \ (s0, e0) -> (p0 : P0, P1 (e0 p0))
and some other things, notably the tensor, where there is one outer and one inner shape (so "outer" and "inner" are interchangeable)
(S0 <| P0) (X) (S1 <| P1) = ((S0, S1) <| \ (s0, s1) -> (P0 s0, P1 s1))
so that F (X) G means "F-structures of G-structures-all-the-same-shape", e.g., [] (X) [] means rectangular lists-of-lists. But I digress
Polymorphic functions between containers Every polymorphic function
m : forall X. (S0 <| P0) X -> (S1 <| P1) X
can be implemented by a container morphism, constructed from two components in a very particular way.
a function f : S0 -> S1 mapping input shapes to output shapes;
a function g : (s0 : S0) -> P1 (f s0) -> P0 s0 mapping output positions to input positions.
Our polymorphic function is then
\ (s0, e0) -> (f s0, e0 . g s0)
where the output shape is computed from the input shape, then the output positions are filled up by picking elements from input positions.
(If you're Peter Hancock, you have a whole other metaphor for what's going on. Shapes are Commands; Positions are Responses; a container morphism is a device driver, translating commands one way, then responses the other.)
Every container morphism gives you a polymorphic function, but the reverse is also true. Given such an m, we may take
(f s, g s) = m (s, id)
That is, we have a representation theorem, saying that every polymorphic function between two containers is given by such an f, g-pair.
What about Applicative? We kind of got a bit lost along the way, building all this machinery. But it has been worth it. When the underlying functors for monads and applicatives are containers, the polymorphic functions pure and <*>, return and join must be representable by the relevant notion of container morphism.
Let's take applicatives first, using their monoidal presentation. We need
unit : () -> (S <| P) ()
mult : forall X, Y. ((S <| P) X, (S <| P) Y) -> (S <| P) (X, Y)
The left-to-right maps for shapes require us to deliver
unitS : () -> S
multS : (S, S) -> S
so it looks like we might need a monoid. And when you check that the applicative laws, you find we need exactly a monoid. Equipping a container with applicative structure is exactly refining the monoid structures on its shapes with suitable position-respecting operations. There's nothing to do for unit (because there is no chocie of source position), but for mult, we need that whenenver
multS (s0, s1) = s
we have
multP (s0, s1) : P s -> (P s0, P s1)
satisfying appropriate identity and associativity conditions. If we switch to Hancock's interpretation, we're defining a monoid (skip, semicolon) for commands, where there is no way to look at the response to the first command before choosing the second, like commands are a deck of punch cards. We have to be able to chop up responses to combined commands into the individual responses to the individual commands.
So, every monoid on the shapes gives us a potential applicative structure. For lists, shapes are numbers (lengths), and there are a great many monoids from which to choose. Even if shapes live in Bool, we have quite a bit of choice.
What about Monad? Meanwhile, for monads M with M ~= S <| P. We need
return : Id -> M
join : M . M -> M
Looking at shapes first, that means we need a sort-of lopsided monoid.
return_f : () -> S
join_f : (S <| P) S -> S -- (s : S, P s -> S) -> S
It's lopsided because we get a bunch of shapes on the right, not just one. If we switch to Hancock's interpretation, we're defining a kind of sequential composition for commands, where we do let the second command be chosen on the basis of the first response, like we're interacting at a teletype. More geometrically, we're explaining how to glom two layers of a tree into one. It would be very surprising if such compositions were unique.
Again, for the positions, we have to map single output positions to pairs in a coherent way. This is trickier for monads: we first choose an outer position (response), then we have to choose an inner position(response) appropriate to the shape (command) found at the first position (chosen after the first response).
I'd love to link to Tarmo's work for the details, but it doesn't seem to have hit the streets yet. He has actually used this analysis to enumerate all possible monad structures for several choices of underlying container. I'm looking forward to the paper!
Edit. By way of doing honour to the other answer, I should observe that when everywhere P s = (), then (S <| P) X ~= (S, X) and the monad/applicative structures coincide exactly with each other and with the monoid structures on S. That is, for writer monads, we need only choose the shape-level operations, because there is exactly one position for a value in every case.

To what extent are Applicative/Monad instances uniquely determined?

As described this question/answers, Functor instances are uniquely determined, if they exists.
For lists, there are two well know Applicative instances: [] and ZipList. So Applicative isn't unique (see also Can GHC derive Functor and Applicative instances for a monad transformer? and Why is there no -XDeriveApplicative extension?). However, ZipList needs infinite lists, as its pure repeats a given element indefinitely.
Are there other, perhaps better examples of data structures that have at least two Applicative instances?
Are there any such examples that only involve finite data structures? That is, like if hypothetically Haskell's type system distinguished inductive and coinductive data types, would it be possible to uniquely determine Applicative?
Going further, if we could extend both [] and ZipList to a Monad, we'd have an example where a monad isn't uniquely determined by the data type and its Functor. Alas, ZipList has a Monad instance only if we restrict ourselves to infinite lists (streams).
And return for [] creates a single-element list, so it requires finite lists. Therefore:
Are Monad instances uniquely determined by the data type? Or is there an example of a data type that can have two distinct Monad instances?
In the case there is an example with two or more distinct instances, an obvious question arises, if they must/can have the same Applicative instance:
Are Monad instances uniquely determined by the Applicative instance, or is there an example of an Applicative that can have two distinct Monad instances?
Is there an example of a data type with two distinct Monad instances, each having a different Applicative super-instance?
And finally we can ask the same question for Alternative/MonadPlus. This is complicated by the fact that there are two distinct set of MonadPlus laws. Assuming we accept one of the set of laws (and for Applicative we accept right/left distributivity/absorption, see also this question),
is Alternative uniquely determined by Applicative, and MonadPlus by Monad, or are there any counter-examples?
If any of the above are unique, I'd be interested in knowing why, to have a hint of a proof. If not, an counter-example.
First, since Monoids are not unique, neither are Writer Monads or Applicatives. Consider
data M a = M Int a
then you can give it Applicative and Monad instances isomorphic to either of:
Writer (Sum Int)
Writer (Product Int)
Given a Monoid instance for a type s, another isomorphic pair with different Applicative/Monad instances is:
ReaderT s (Writer s)
State s
As for having one Applicative instance extend to two different Monads, I cannot remember any example. However, back when I tried to convince myself completely about whether ZipList really cannot be made a Monad, I found the following pretty strong restriction that holds for any Monad:
join (fmap (\x -> fmap (\y -> f x y) ys) xs) = f <$> xs <*> ys
That doesn't give join for all values though: in the case of lists the restricted values are the ones where all elements have the same length, i.e. lists of lists with "rectangular" shape.
(For Reader monads, where the "shape" of monadic values doesn't vary, these are in fact all the m (m x) values, so those do have unique extension. EDIT: Come to think of it, Either, Maybe and Writer also have only "rectangular" m (m x) values, so their extension from Applicative to Monad is also unique.)
I wouldn't be surprised if an Applicative with two Monads exists, though.
For Alternative/MonadPlus, I cannot recall any law for instances using the Left Distribution law instead of Left Catch, I see nothing preventing you from just swapping (<|>) with flip (<|>). I don't know if there's a less trivial variation.
ADDENDUM: I suddenly remembered I had found an example of an Applicative with two Monads. Namely, finite lists. There's the usual Monad [] instance, but you can then replace its join by the following function (essentially making empty lists "infectious"):
ljoin xs
| any null xs = []
| otherwise = concat xs
(Alas, the lists need to be finite because otherwise the null check will never finish, and that would ruin the join . fmap return == id monad law.)
This has the same value as join/concat on rectangular lists of lists, so will give the same Applicative. As I recall, it turns out that the first two monad laws are automatic from that, and you just need to check ljoin . ljoin == ljoin . fmap ljoin.
Given that every Applicative has a Backwards counterpart,
newtype Backwards f x = Backwards {backwards :: f x}
instance Applicative f => Applicative (Backwards f) where
pure x = Backwards (pure x)
Backwards ff <*> Backwards fs = Backwards (flip ($) <$> fs <*> ff)
it's unusual for Applicative to be uniquely determined, just as (and this is very far from unrelated) many sets extend to monoids in multiple ways.
In this answer, I set the exercise of finding at least four distinct valid Applicative instances for nonempty lists: I won't spoil it here, but I will give a big hint on how to hunt.
Meanwhile, in some wonderful recent work (which I saw at a summer school a few months ago), Tarmo Uustalu showed a rather neat way to get a handle on this problem, at least when the underlying functor is a container, in the sense of Abbott, Altenkirch and Ghani.
Warning: Dependent types ahead!
What is a container? If you have dependent types to hand, you can present container-like functors F uniformly, as being determined by two components
a set of shapes, S : Set
an S-indexed set of positions, P : S -> Set
Up to isomorphism, container data structures in F X are given by the dependent pair of some shape s : S, and some function e : P s -> X, which tells you the element located at each position. That is, we define the extension of a container
(S <| P) X = (s : S) * (P s -> X)
(which, by the way, looks a lot like a generalized power series if you read -> as reversed exponentiation). The triangle is supposed to remind you of a tree node sideways, with an element s : S labelling the apex, and the baseline representing the position set P s. We say that some functor is a container if it is isomorphic to some S <| P.
In Haskell, you can easily take S = F (), but constructing P can take quite a bit of type-hackery. But that is something you can try at home. You'll find that containers are closed under all the usual polynomial type-forming operations, as well as identity,
Id ~= () <| \ _ -> ()
composition, where a whole shape is made from just one outer shape and an inner shape for each outer position,
(S0 <| P0) . (S1 <| P1) ~= ((S0 <| P0) S1) <| \ (s0, e0) -> (p0 : P0, P1 (e0 p0))
and some other things, notably the tensor, where there is one outer and one inner shape (so "outer" and "inner" are interchangeable)
(S0 <| P0) (X) (S1 <| P1) = ((S0, S1) <| \ (s0, s1) -> (P0 s0, P1 s1))
so that F (X) G means "F-structures of G-structures-all-the-same-shape", e.g., [] (X) [] means rectangular lists-of-lists. But I digress
Polymorphic functions between containers Every polymorphic function
m : forall X. (S0 <| P0) X -> (S1 <| P1) X
can be implemented by a container morphism, constructed from two components in a very particular way.
a function f : S0 -> S1 mapping input shapes to output shapes;
a function g : (s0 : S0) -> P1 (f s0) -> P0 s0 mapping output positions to input positions.
Our polymorphic function is then
\ (s0, e0) -> (f s0, e0 . g s0)
where the output shape is computed from the input shape, then the output positions are filled up by picking elements from input positions.
(If you're Peter Hancock, you have a whole other metaphor for what's going on. Shapes are Commands; Positions are Responses; a container morphism is a device driver, translating commands one way, then responses the other.)
Every container morphism gives you a polymorphic function, but the reverse is also true. Given such an m, we may take
(f s, g s) = m (s, id)
That is, we have a representation theorem, saying that every polymorphic function between two containers is given by such an f, g-pair.
What about Applicative? We kind of got a bit lost along the way, building all this machinery. But it has been worth it. When the underlying functors for monads and applicatives are containers, the polymorphic functions pure and <*>, return and join must be representable by the relevant notion of container morphism.
Let's take applicatives first, using their monoidal presentation. We need
unit : () -> (S <| P) ()
mult : forall X, Y. ((S <| P) X, (S <| P) Y) -> (S <| P) (X, Y)
The left-to-right maps for shapes require us to deliver
unitS : () -> S
multS : (S, S) -> S
so it looks like we might need a monoid. And when you check that the applicative laws, you find we need exactly a monoid. Equipping a container with applicative structure is exactly refining the monoid structures on its shapes with suitable position-respecting operations. There's nothing to do for unit (because there is no chocie of source position), but for mult, we need that whenenver
multS (s0, s1) = s
we have
multP (s0, s1) : P s -> (P s0, P s1)
satisfying appropriate identity and associativity conditions. If we switch to Hancock's interpretation, we're defining a monoid (skip, semicolon) for commands, where there is no way to look at the response to the first command before choosing the second, like commands are a deck of punch cards. We have to be able to chop up responses to combined commands into the individual responses to the individual commands.
So, every monoid on the shapes gives us a potential applicative structure. For lists, shapes are numbers (lengths), and there are a great many monoids from which to choose. Even if shapes live in Bool, we have quite a bit of choice.
What about Monad? Meanwhile, for monads M with M ~= S <| P. We need
return : Id -> M
join : M . M -> M
Looking at shapes first, that means we need a sort-of lopsided monoid.
return_f : () -> S
join_f : (S <| P) S -> S -- (s : S, P s -> S) -> S
It's lopsided because we get a bunch of shapes on the right, not just one. If we switch to Hancock's interpretation, we're defining a kind of sequential composition for commands, where we do let the second command be chosen on the basis of the first response, like we're interacting at a teletype. More geometrically, we're explaining how to glom two layers of a tree into one. It would be very surprising if such compositions were unique.
Again, for the positions, we have to map single output positions to pairs in a coherent way. This is trickier for monads: we first choose an outer position (response), then we have to choose an inner position(response) appropriate to the shape (command) found at the first position (chosen after the first response).
I'd love to link to Tarmo's work for the details, but it doesn't seem to have hit the streets yet. He has actually used this analysis to enumerate all possible monad structures for several choices of underlying container. I'm looking forward to the paper!
Edit. By way of doing honour to the other answer, I should observe that when everywhere P s = (), then (S <| P) X ~= (S, X) and the monad/applicative structures coincide exactly with each other and with the monoid structures on S. That is, for writer monads, we need only choose the shape-level operations, because there is exactly one position for a value in every case.

Monads as adjunctions

I've been reading about monads in category theory. One definition of monads uses a pair of adjoint functors. A monad is defined by a round-trip using those functors. Apparently adjunctions are very important in category theory, but I haven't seen any explanation of Haskell monads in terms of adjoint functors. Has anyone given it a thought?
Edit: Just for fun, I'm going to do this right. Original answer preserved below
The current adjunction code for category-extras now is in the adjunctions package: http://hackage.haskell.org/package/adjunctions
I'm just going to work through the state monad explicitly and simply. This code uses Data.Functor.Compose from the transformers package, but is otherwise self-contained.
An adjunction between f (D -> C) and g (C -> D), written f -| g, can be characterized in a number of ways. We'll use the counit/unit (epsilon/eta) description, which gives two natural transformations (morphisms between functors).
class (Functor f, Functor g) => Adjoint f g where
counit :: f (g a) -> a
unit :: a -> g (f a)
Note that the "a" in counit is really the identity functor in C, and the "a" in unit is really the identity functor in D.
We can also recover the hom-set adjunction definition from the counit/unit definition.
phiLeft :: Adjoint f g => (f a -> b) -> (a -> g b)
phiLeft f = fmap f . unit
phiRight :: Adjoint f g => (a -> g b) -> (f a -> b)
phiRight f = counit . fmap f
In any case, we can now define a Monad from our unit/counit adjunction like so:
instance Adjoint f g => Monad (Compose g f) where
return x = Compose $ unit x
x >>= f = Compose . fmap counit . getCompose $ fmap (getCompose . f) x
Now we can implement the classic adjunction between (a,) and (a ->):
instance Adjoint ((,) a) ((->) a) where
-- counit :: (a,a -> b) -> b
counit (x, f) = f x
-- unit :: b -> (a -> (a,b))
unit x = \y -> (y, x)
And now a type synonym
type State s = Compose ((->) s) ((,) s)
And if we load this up in ghci, we can confirm that State is precisely our classic state monad. Note that we can take the opposite composition and get the Costate Comonad (aka the store comonad).
There are a bunch of other adjunctions we can make into monads in this fashion (such as (Bool,) Pair), but they're sort of strange monads. Unfortunately we can't do the adjunctions that induce Reader and Writer directly in Haskell in a pleasant way. We can do Cont, but as copumpkin describes, that requires an adjunction from an opposite category, so it actually uses a different "form" of the "Adjoint" typeclass that reverses some arrows. That form is also implemented in a different module in the adjunctions package.
this material is covered in a different way by Derek Elkins' article in The Monad Reader 13 -- Calculating Monads with Category Theory: http://www.haskell.org/wikiupload/8/85/TMR-Issue13.pdf
Also, Hinze's recent Kan Extensions for Program Optimization paper walks through the construction of the list monad from the adjunction between Mon and Set: http://www.cs.ox.ac.uk/ralf.hinze/Kan.pdf
Old answer:
Two references.
1) Category-extras delivers, as as always, with a representation of adjunctions and how monads arise from them. As usual, it's good to think with, but pretty light on documentation: http://hackage.haskell.org/packages/archive/category-extras/0.53.5/doc/html/Control-Functor-Adjunction.html
2) -Cafe also delivers with a promising but brief discussion on the role of adjunction. Some of which may help in interpreting category-extras: http://www.haskell.org/pipermail/haskell-cafe/2007-December/036328.html
Derek Elkins was showing me recently over dinner how the Cont Monad arises from composing the (_ -> k) contravariant functor with itself, since it happens to be self-adjoint. That's how you get (a -> k) -> k out of it. Its counit, however, leads to double negation elimination, which can't be written in Haskell.
For some Agda code that illustrates and proves this, please see http://hpaste.org/68257.
This is an old thread, but I found the question interesting,
so I did some calculations myself. Hopefully Bartosz is still there
and might read this..
In fact, the Eilenberg-Moore construction does give a very clear picture in this case.
(I will use CWM notation with Haskell like syntax)
Let T be the list monad < T,eta,mu > (eta = return and mu = concat)
and consider a T-algebra h:T a -> a.
(Note that T a = [a] is a free monoid <[a],[],(++)>, that is, identity [] and multiplication (++).)
By definition, h must satisfy h.T h == h.mu a and h.eta a== id.
Now, some easy diagram chasing proves that h actually induces a monoid structure on a (defined by x*y = h[x,y] ),
and that h becomes a monoid homomorphism for this structure.
Conversely, any monoid structure < a,a0,* > defined in Haskell is naturally defined as a T-algebra.
In this way (h = foldr ( * ) a0, a function that 'replaces' (:) with (*),and maps [] to a0, the identity).
So, in this case, the category of T-algebras is just the category of monoid structures definable in Haskell, HaskMon.
(Please check that the morphisms in T-algebras are actually monoid homomorphisms.)
It also characterizes lists as universal objects in HaskMon, just like free products in Grp, polynomial rings in CRng, etc.
The adjuction corresponding to the above construction is < F,G,eta,epsilon >
where
F:Hask -> HaskMon, which takes a type a to the 'free monoid generated by a',that is, [a],
G:HaskMon -> Hask, the forgetful functor (forget the multiplication),
eta:1 -> GF , the natural transformation defined by \x::a -> [x],
epsilon: FG -> 1 , the natural transformation defined by the folding function above
(the 'canonical surjection' from a free monoid to its quotient monoid)
Next, there is another 'Kleisli category' and the corresponding adjunction.
You can check that it is just the category of Haskell types with morphisms a -> T b,
where its compositions are given by the so-called 'Kleisli composition' (>=>).
A typical Haskell programmer will find this category more familiar.
Finally,as is illustrated in CWM, the category of T-algebras
(resp. Kleisli category) becomes the terminal (resp. initial) object in the category
of adjuctions that define the list monad T in a suitable sense.
I suggest to do a similar calculations for the binary tree functor T a = L a | B (T a) (T a) to check your understanding.
I've found a standard constructions of adjunct functors for any monad by Eilenberg-Moore, but I'm not sure if it adds any insight to the problem. The second category in the construction is a category of T-algebras. A T algebra adds a "product" to the initial category.
So how would it work for a list monad? The functor in the list monad consists of a type constructor, e.g., Int->[Int] and a mapping of functions (e.g., standard application of map to lists). An algebra adds a mapping from lists to elements. One example would be adding (or multiplying) all the elements of a list of integers. The functor F takes any type, e.g., Int, and maps it into the algebra defined on the lists of Int, where the product is defined by monadic join (or vice versa, join is defined as the product). The forgetful functor G takes an algebra and forgets the product. The pair F, G, of adjoint functors is then used to construct the monad in the usual way.
I must say I'm none the wiser.
If you are interested,here's some thoughts of a non-expert
on the role of monads and adjunctions in programming languages:
First of all, there exists for a given monad T a unique adjunction to the Kleisli category of T.
In Haskell,the use of monads is primarily confined to operations in this category
(which is essentially a category of free algebras,no quotients).
In fact, all one can do with a Haskell Monad is to compose some Kleisli morphisms of
type a->T b through the use of do expressions, (>>=), etc., to create a new
morphism. In this context, the role of monads is restricted to just the economy
of notation.One exploits associativity of morphisms to be able to write (say) [0,1,2]
instead of (Cons 0 (Cons 1 (Cons 2 Nil))), that is, you can write sequence as sequence,
not as a tree.
Even the use of IO monads is non essential, for the current Haskell type system is powerful
enough to realize data encapsulation (existential types).
This is my answer to your original question,
but I'm curious what Haskell experts have to say about this.
On the other hand, as we have noted, there's also a 1-1 correspondence between monads and
adjunctions to (T-)algebras. Adjoints, in MacLane's terms, are 'a way
to express equivalences of categories.'
In a typical setting of adjunctions <F,G>:X->A where F is some sort
of 'free algebra generator' and G a 'forgetful functor',the corresponding monad
will (through the use of T-algebras) describe how (and when) the algebraic structure of A is constructed on the objects of X.
In the case of Hask and the list monad T, the structure which T introduces is that
of monoid,and this can help us to establish properties (including the correctness) of code through algebraic
methods that the theory of monoids provides. For example, the function foldr (*) e::[a]->a can
readily be seen as an associative operation as long as <a,(*),e> is a monoid,
a fact which could be exploited by the compiler to optimize the computation (e.g. by parallelism).
Another application is to identify and classify 'recursion patterns' in functional programming using categorical
methods in the hope to (partially) dispose of 'the goto of functional programming', Y (the arbitrary recursion combinator).
Apparently, this kind of applications is one of the primary motivations of the creators of Category Theory (MacLane, Eilenberg, etc.),
namely, to establish natural equivalence of categories, and transfer a well-known method in one category
to another (e.g. homological methods to topological spaces,algebraic methods to programming, etc.).
Here, adjoints and monads are indispensable tools to exploit this connection of categories.
(Incidentally, the notion of monads (and its dual, comonads) is so general that one can even go so far as to define 'cohomologies' of
Haskell types.But I have not given a thought yet.)
As for non-determistic functions you mentioned, I have much less to say...
But note that; if an adjunction <F,G>:Hask->A for some category A defines the list monad T,
there must be a unique 'comparison functor' K:A->MonHask (the category of monoids definable in Haskell), see CWM.
This means, in effect, that your category of interest must be a category of monoids in some restricted form (e.g. it may lack some quotients but not free algebras) in order to define the list monad.
Finally,some remarks:
The binary tree functor I mentioned in my last posting easily generalizes to arbitrary data type
T a1 .. an = T1 T11 .. T1m | ....
Namely,any data type in Haskell naturally defines a monad (together with the corresponding category of algebras and the Kleisli category),
which is just the result of any data constructor in Haskell being total.
This is another reason why I consider Haskell's Monad class is not much more than a syntax sugar
(which is pretty important in practice,of course).

Monad theory and Haskell

Most tutorials seem to give a lot of examples of monads (IO, state, list and so on) and then expect the reader to be able to abstract the overall principle and then they mention category theory. I don't tend to learn very well by trying generalise from examples and I would like to understand from a theoretical point of view why this pattern is so important.
Judging from this thread:
Can anyone explain Monads?
this is a common problem, and I've tried looking at most of the tutorials suggested (except the Brian Beck videos which won't play on my linux machine):
Does anyone know of a tutorial that starts from category theory and explains IO, state, list monads in those terms? the following is my unsuccessful attempt to do so:
As I understand it a monad consists of a triple: an endo-functor and two natural transformations.
The functor is usually shown with the type:
(a -> b) -> (m a -> m b)
I included the second bracket just to emphasise the symmetry.
But, this is an endofunctor, so shouldn't the domain and codomain be the same like this?:
(a -> b) -> (a -> b)
I think the answer is that the domain and codomain both have a type of:
(a -> b) | (m a -> m b) | (m m a -> m m b) and so on ...
But I'm not really sure if that works or fits in with the definition of the functor given?
When we move on to the natural transformation it gets even worse. If I understand correctly a natural transformation is a second order functor (with certain rules) that is a functor from one functor to another one. So since we have defined the functor above the general type of the natural transformations would be:
((a -> b) -> (m a -> m b)) -> ((a -> b) -> (m a -> m b))
But the actual natural transformations we are using have type:
a -> m a
m a -> (a ->m b) -> m b
Are these subsets of the general form above? and why are they natural transformations?
Martin
A quick disclaimer: I'm a little shaky on category theory in general, while I get the impression you have at least some familiarity with it. Hopefully I won't make too much of a hash of this...
Does anyone know of a tutorial that starts from category theory and explains IO, state, list monads in those terms?
First of all, ignore IO for now, it's full of dark magic. It works as a model of imperative computations for the same reasons that State works for modelling stateful computations, but unlike the latter IO is a black box with no way to deduce the monadic structure from the outside.
The functor is usually shown with the type: (a -> b) -> (m a -> m b) I included the second bracket just to emphasise the symmetry.
But, this is an endofunctor, so shouldn't the domain and codomain be the same like this?:
I suspect you are misinterpreting how type variables in Haskell relate to the category theory concepts.
First of all, yes, that specifies an endofunctor, on the category of Haskell types. A type variable such as a is not anything in this category, however; it's a variable that is (implicitly) universally quantified over all objects in the category. Thus, the type (a -> b) -> (a -> b) describes only endofunctors that map every object to itself.
Type constructors describe an injective function on objects, where the elements of the constructor's codomain cannot be described by any means except as an application of the type constructor. Even if two type constructors produce isomorphic results, the resulting types remain distinct. Note that type constructors are not, in the general case, functors.
The type variable m in the functor signature, then, represents a one-argument type constructor. Out of context this would normally be read as universal quantification, but that's incorrect in this case since no such function can exist. Rather, the type class definition binds m and allows the definition of such functions for specific type constructors.
The resulting function, then, says that for any type constructor m which has fmap defined, for any two objects a and b and a morphism between them, we can find a morphism between the types given by applying m to a and b.
Note that while the above does, of course, define an endofunctor on Hask, it is not even remotely general enough to describe all such endofunctors.
But the actual natural transformations we are using have type:
a -> m a
m a -> (a ->m b) -> m b
Are these subsets of the general form above? and why are they natural transformations?
Well, no, they aren't. A natural transformation is roughly a function (not a functor) between functors. The two natural transformations that specify a monad M look like I -> M where I is the identity functor, and M ∘ M -> M where ∘ is functor composition. In Haskell, we have no good way of working directly with either a true identity functor or with functor composition. Instead, we discard the identity functor to get just (Functor m) => a -> m a for the first, and write out nested type constructor application as (Functor m) => m (m a) -> m a for the second.
The first of these is obviously return; the second is a function called join, which is not part of the type class. However, join can be written in terms of (>>=), and the latter is more often useful in day-to-day programming.
As far as specific monads go, if you want a more mathematical description, here's a quick sketch of an example:
For some fixed type S, consider two functors F and G where F(x) = (S, x) and G(x) = S -> x (It should hopefully be obvious that these are indeed valid functors).
These functors are also adjoints; consider natural transformations unit :: x -> G (F x) and counit :: F (G x) -> x. Expanding the definitions gives us unit :: x -> (S -> (S, x)) and counit :: (S, S -> x) -> x. The types suggest uncurried function application and tuple construction; feel free to verify that those work as expected.
An adjunction gives rise to a monad by composition of the functors, so taking G ∘ F and expanding the definition, we get G (F x) = S -> (S, x), which is the definition of the State monad. The unit for the adjunction is of course return; and you should be able to use counit to define join.
This page does exactly that. I think your main confusion is that the class doesn't really make the Type a functor, but it defines a functor from the category of Haskell types into the category of that type.
Following the notation of the link, assuming F is a Haskell Functor, it means that there is a functor from the category of Hask to the category of F.
Roughly speaking, Haskell does its category theory all in just one category, whose objects are Haskell types and whose arrows are functions between these types. It's definitely not a general-purpose language for modelling category theory.
A (mathematical) functor is an operation turning things in one category into things in another, possibly entirely different, category. An endofunctor is then a functor which happens to have the same source and target categories. In Haskell, a functor is an operation turning things in the category of Haskell types into other things also in the category of Haskell types, so it is always an endofunctor.
[If you're following the mathematical literature, technically, the operation '(a->b)->(m a -> m b)' is just the arrow part of the endofunctor m, and 'm' is the object part]
When Haskellers talk about working 'in a monad' they really mean working in the Kleisli category of the monad. The Kleisli category of a monad is a thoroughly confusing beast at first, and normally needs at least two colours of ink to give a good explanation, so take the following attempt for what it is and check out some references (unfortunately Wikipedia is useless here for all but the straight definitions).
Suppose you have a monad 'm' on the category C of Haskell types. Its Kleisli category Kl(m) has the same objects as C, namely Haskell types, but an arrow a ~(f)~> b in Kl(m) is an arrow a -(f)-> mb in C. (I've used a squiggly line in my Kleisli arrow to distinguish the two). To reiterate: the objects and arrows of the Kl(C) are also objects and arrows of C but the arrows point to different objects in Kl(C) than in C. If this doesn't strike you as odd, read it again more carefully!
Concretely, consider the Maybe monad. Its Kleisli category is just the collection of Haskell types, and its arrows a ~(f)~> b are functions a -(f)-> Maybe b. Or consider the (State s) monad whose arrows a ~(f)~> b are functions a -(f)-> (State s b) == a -(f)-> (s->(s,b)). In any case, you're always writing a squiggly arrow as a shorthand for doing something to the type of the codomain of your functions.
[Note that State is not a monad, because the kind of State is * -> * -> *, so you need to supply one of the type parameters to turn it into a mathematical monad.]
So far so good, hopefully, but suppose you want to compose arrows a ~(f)~> b and b ~(g)~> c. These are really Haskell functions a -(f)-> mb and b -(g)-> mc which you cannot compose because the types don't match. The mathematical solution is to use the 'multiplication' natural transformation u:mm->m of the monad as follows: a ~(f)~> b ~(g)~> c == a -(f)-> mb -(mg)-> mmc -(u_c)-> mc to get an arrow a->mc which is a Kleisli arrow a ~(f;g)~> c as required.
Perhaps a concrete example helps here. In the Maybe monad, you cannot compose functions f : a -> Maybe b and g : b -> Maybe c directly, but by lifting g to
Maybe_g :: Maybe b -> Maybe (Maybe c)
Maybe_g Nothing = Nothing
Maybe_g (Just a) = Just (g a)
and using the 'obvious'
u :: Maybe (Maybe c) -> Maybe c
u Nothing = Nothing
u (Just Nothing) = Nothing
u (Just (Just c)) = Just c
you can form the composition u . Maybe_g . f which is the function a -> Maybe c that you wanted.
In the (State s) monad, it's similar but messier: Given two monadic functions a ~(f)~> b and b ~(g)~> c which are really a -(f)-> (s->(s,b)) and b -(g)-> (s->(s,c)) under the hood, you compose them by lifting g into
State_s_g :: (s->(s,b)) -> (s->(s,(s->(s,c))))
State_s_g p s1 = let (s2, b) = p s1 in (s2, g b)
then you apply the 'multiplication' natural transformation u, which is
u :: (s->(s,(s->(s,c)))) -> (s->(s,c))
u p1 s1 = let (s2, p2) = p1 s1 in p2 s2
which (sort of) plugs the final state of f into the initial state of g.
In Haskell, this turns out to be a bit of an unnatural way to work so instead there's the (>>=) function which basically does the same thing as u but in a way that makes it easier to implement and use. This is important: (>>=) is not the natural transformation 'u'. You can define each in terms of the other, so they're equivalent, but they're not the same thing. The Haskell version of 'u' is written join.
The other thing missing from this definition of Kleisli categories is the identity on each object: a ~(1_a)~> a which is really a -(n_a)-> ma where n is the 'unit' natural transformation. This is written return in Haskell, and doesn't seem to cause as much confusion.
I learned category theory before I came to Haskell, and I too have had difficulty with the mismatch between what mathematicians call a monad and what they look like in Haskell. It's no easier from the other direction!
Not sure I understand what was the question but yes, you are right, monad in Haskell is defined as a triple:
m :: * -> * -- this is endofunctor from haskell types to haskell types!
return :: a -> m a
(>>=) :: m a -> (a -> m b) -> m b
but common definition from category theory is another triple:
m :: * -> *
return :: a -> m a
join :: m (m a) -> m a
It is slightly confusing but it's not so hard to show that these two definitions are equal.
To do that we need to define join in terms of (>>=) (and vice versa).
First step:
join :: m (m a) -> m a
join x = ?
This gives us x :: m (m a).
All we can do with something that have type m _ is to aply (>>=) to it:
(x >>=) :: (m a -> m b) -> m b
Now we need something as a second argument for (>>=), and also,
from the type of join we have constraint (x >>= y) :: ma.
So y here will have type y :: ma -> ma and id :: a -> a fits it very well:
join x = x >>= id
The other way
(>>=) :: ma -> (a -> mb) -> m b
(>>=) x y = ?
Where x :: m a and y :: a -> m b.
To get m b from x and y we need something of type a.
Unfortunately, we can't extract a from m a. But we can substitute it for something else (remember, monad is a functor also):
fmap :: (a -> b) -> m a -> m b
fmap y x :: m (m b)
And it's perfectly fits as argument for join: (>>=) x y = join (fmap y x).
The best way to look at monads and computational effects is to start with where Haskell got the notion of monads for computational effects from, and then look at Haskell after you understand that. See this paper in particular: Notions of Computation and Monads, by E. Moggi.
See also Moggi's earlier paper which shows how monads work for the lambda calculus alone: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.26.2787
The fact that monads capture substitution, among other things (http://blog.sigfpe.com/2009/12/where-do-monads-come-from.html), and substitution is key to the lambda calculus, should give a good clue as to why they have so much expressive power.
While monads originally came from category theory, this doesn't mean that category theory is the only abstract context in which you can view them. A different viewpoint is given by operational semantics. For an introduction, have a look at my Operational Monad Tutorial.
One way to look at IO is to consider it as a strange kind of state monad. Remember that the state monad looks like:
data State s a = State (s -> (s, a))
where the "s" argument is the data type you want to thread through the computation. Also, this version of "State" doesn't have "get" and "put" actions and we don't export the constructor.
Now imagine a type
data RealWorld = RealWorld ......
This has no real definition, but notionally a value of type RealWorld holds the state of the entire universe outside the computer. Of course we can never have a value of type RealWorld, but you can imagine something like:
getChar :: RealWorld -> (RealWorld, Char)
In other words the "getChar" function takes a state of the universe before the keyboard button has been pressed, and returns the key pressed plus the state of the universe after the key has been pressed. Of course the problem is that the previous state of the world is still available to be referenced, which can't happen in reality.
But now we write this:
type IO = State RealWorld
getChar :: IO Char
Notionally, all we have done is wrap the previous version of "getChar" as a state action. But by doing this we can no longer access the "RealWorld" values because they are wrapped up inside the State monad.
So when a Haskell program wants to change a lightbulb it takes hold of the bulb and applies a "rotate" function to the RealWorld value inside IO.
For me, so far, the explanation that comes closest to tie together monads in category theory and monads in Haskell is that monads are a monid whose objects have the type a->m b. I can see that these objects are very close to an endofunctor and so the composition of such functions are related to an imperative sequence of program statements. Also functions which return IO functions are valid in pure functional code until the inner function is called from outside.
This id element is 'a -> m a' which fits in very well but the multiplication element is function composition which should be:
(>=>) :: Monad m => (a -> m b) -> (b -> m c) -> (a -> m c)
This is not quite function composition, but close enough (I think to get true function composition we need a complementary function which turns m b back into a, then we get function composition if we apply these in pairs?), I'm not quite sure how to get from that to this:
(>>=) :: Monad m => m a -> (a -> m b) -> m b
I've got a feeling I may have seen an explanation of this in all the stuff that I read, without understanding its significance the first time through, so I will do some re-reading to try to (re)find an explanation of this.
The other thing I would like to do is link together all the different category theory explanations: endofunctor+2 natural transformations, Kleisli category, a monoid whose objects are monoids and so on. For me the thing that seems to link all these explanations is that they are two level. That is, normally we treat category objects as black-boxes where we imply their properties from their outside interactions, but here there seems to be a need to go one level inside the objects to see what’s going on? We can explain monads without this but only if we accept apparently arbitrary constructions.
Martin
See this question: is chaining operations the only thing that the monad class solves?
In it, I explain my vision that we must differentiate between the Monad class and individual types that solve individual problems. The Monad class, by itself, only solve the important problem of "chaining operations with choice" and mades this solution available to types being instance of it (by means of "inheritance").
On the other hand, if a given type that solves a given problem faces the problem of "chaining operations with choice" then, it should be made an instance (inherit) of the Monad class.
The fact is that problems not get solved merely by being a Monad. It would be like saying that "wheels" solve many problems, but actually "wheels" only solve a problem, and things with wheels solve many different problems.

Resources