How are free objects constructed?

How are free objects constructed? - haskell

So I understand that a free object is defined as being the left-hand side of an adjunction. But how does that lead you to the Haskell definition of such objects?
More concretely: given a "forgetful functor" from the category of monads to the category of endofunctors,
newtype Forget m a = Forget (m a)
instance Monad m => Functor (Forget m) where
fmap f (Forget x) = Forget (liftM f x)
then the free monad Free :: (* -> *) -> (* -> *) is a type admitting (a Monad instance and) the following isomorphism:
type f ~> g = forall x. f x -> g x
fwd :: (Functor f, Monad m) => (f ~> Forget m) -> (Free f ~> m)
bwd :: (Functor f, Monad m) => (Free f ~> m) -> (f ~> Forget m)
fwd . bwd = id = bwd . fwd
If we drop the Forgets, for the free monad in Control.Monad.Free we have fwd = foldFree and bwd = (. liftF) (I think?)
But how do these laws lead to the construction found in Control.Monad.Free? How do you come up with data Free f a = Return a | Free (f (Free f a))? Surely you don't just guess until you come up with something that satisfies the laws? Same question goes for the free category of a graph, the free monoid of a set, and any other free object you care to name.

I don't think the notion of "free" is as well-defined as you seem to believe. While I do think the general consensus is that it is indeed a left adjoint of a forgetful functor, the issue lies in what "forgetful" means. There are clear definitions in some broad-ranging cases, particularly for concrete categories.
Universal algebra provides a broad ranging approach which covers almost all "algebraic" structures (over sets). The upshot is given a "signature" which consists of sorts, operations, and equations, you build a term algebra (i.e. an AST) of the operations and then quotient it by the equivalence relation generated by the equations. This is the free algebra generated from that signature. For example, we usually talk about monoids as being a set equipped with an associative multiplication and unit. In code, the free algebra before quotienting would be:
data PreFreeMonoid a
= Unit
| Var a
| Mul (PreFreeMonid a) (PreFreeMonoid a)
We would then quotient by the equivalence relation generated from the equations:
Mul Unit x = x
Mul x Unit = x
Mul (Mul x y) z = Mul x (Mul y z)
But you can show that the resulting quotient type is isomorphic to lists. In the multi-sorted case, we'd have a family of term algebras, one for each sort.
One way to recast this categorically is to use the notion of a (slightly generalized) Lawvere theory. Given a signature with a set of sorts, S, we can build a small category, call it T, whose objects are lists of elements of S. These small categories will be called theories in general. Operations get mapped to arrows whose source and target correspond to the appropriate arities. We freely add "tupling" and "projection" arrows so that e.g. [A,B,A] becomes the product [A]×[B]×[A]. Finally, we add commutative diagrams (i.e. equations between arrows) correspond to each equation in the signature. At this point, T essentially represents the term algebra(s). In fact, an actual interpretation or model of this term algebra is just a finite product preserving functor T → Set, write Mod(T) for the category of finite product preserving functors from T → Set. In the single sorted case, we'd have an underlying set functor, but in general we get a S-indexed family of sets, i.e. we have a functor U : Mod(T) → SetS where we're viewing S as a discrete category here. U is simply U(m)(s) = m([s]). We can actually calculate the left adjoint. First, we have a family of sets indexed by elements of S, call it G. Then we need to build a finite product preserving functor T → Set, but any functor into Set (i.e. copresheaf) is a colimit of representables which, in this case, means it's a quotient of the following (dependent) sum type:
Free(G)(s) = Σt:T.T(t,s)×Free(G)(t)
If Free(G) is finite product preserving then in the t = [A,B] case, for example, we'd have:
T([A,B],s)×Free(G)([A,B]) = T([A,B],s)×Free(G)([A])×Free(G)([B])
and we simply define Free(G)([A]) = G(A) for each A in S producing:
T([A,B],s)×Free(G)([A])×Free(G)([B]) = T([A,B],s)×G(A)×G(B)
Altogether this says that, an element of Free(G)([A]) consists of an arrow of T into [A] and a list of elements of the appropriate sets corresponding to the source of that arrow, i.e. the arity of the term modulo equations that make it behave sensibly and obey the equations from the signature but which I'm not going to elaborate on. For the multiplication of a monoid, we'd have an arrow m : [A,A] → [A] and this would lead to a tuples (m, x, y) where x and y are elements of G(A) corresponding to an term like m(x, y). Recasting this definition of as a recursive one takes looking at the equations we're quotienting by.
There are other things to verify to show that Free &dashv; U but it isn't too hard. Once that's done, U&compfn;Free is a monad on SetS.
The nice thing about the Lawvere theory approach is that it is easy to generalize in multiple ways. One straightforward way is to replace Set by some other topos E. It's actually the case that the category of directed multigraphs form a topos, but I don't believe you can (easily) view categories as theories over Graph. A different direction to extend Lawvere theories is to consider doctrines other than finite product preserving functors, in particular finite limit preserving functors aka left exact or lex functors is an interesting point. Both small categories and directed multigraphs (which categorists sometimes call quivers) can be viewed as models of a category with finite limits. There's a straightforward inclusion of the theory of directed multigraphs into the theory of small categories. This, contravariantly, induces a functor cat → Graph simply by precomposition. The left adjoint of this is then (almost) the left Kan extensions along that inclusion. These left Kan extensions will occur in Set so ultimately they are just colimits which are just quotients of (dependent) sum types. (Technically, you need to verify that the resulting Kan extensions are finite limit preserving. We're also helped by the fact that the models of the theory of graphs are essentially arbitrary functors from the theory of graphs. This happens because the theory of graphs consists only of unary operations.)
None of this helps for free monads though. However, it turns out that one construction subsumes all of these including free monads. Returning to universal algebra, it's the case that every signature with no equations gives rise to a (polynomial) functor whose initial algebra is the free term algebra. Lambek's lemma suggests and it's easy to prove that the initial algebra is just colimit of repeated applications of the functor. The above general result is based on a similar approach and the relevant case for free monads is the unpointed endofunctor case, in which you start to see the definition of Free that you gave, but actually working it out fully requires unfolding many constructions.
Frankly, though, what I'm pretty sure actually happened in the FP world is the following. If you look at PreFreeMonoid, it's actually a free monad. PreFreeMonoid Void is the initial algebra for the functor the monoid signature (minus the equations) would give rise to. If you are familiar with using functors for initial algebras and you even start thinking about universal algebra, you are almost certainly going to end up defining a type like data Term f a = Var a | Op (f (Term f a)). It's easy to verify this is a monad once you think to ask the question. If you're even vaguely familiar with the relationship monads have to algebraic structures or to term substitution, then you may ask the question quite quickly. The same construction can be stumbled upon from a programming language implementation perspective. If you just directly set your goal to be deriving the free monad construction in Haskell, there are several intuitive ways to arrive at the right definition especially combined with some equational/parametricity-driven reasoning. In fact, the "monoid object in the category of endofunctors" one is quite suggestive.
('really wish this StackExchange had MathJax support.)

Related

The fixed point functors of Free and Cofree

To make that clear, I'm not talking about how the free monad looks a lot like a fixpoint combinator applied to a functor, i.e. how Free f is basically a fixed point of f. (Not that this isn't interesting!)
What I'm talking about are fixpoints of Free, Cofree :: (*->*) -> (*->*), i.e. functors f such that Free f is isomorphic to f itself.
Background: today, to firm up my rather lacking grasp on free monads, I decided to just write a few of them out for different simple functors, both for Free and for Cofree and see what better-known [co]monads they'd be isomorphic to. What intrigued me particularly was the discovery that Cofree Empty is isomorphic to Empty (meaning, Const Void, the functor that maps any type to the uninhabited). Ok, perhaps this is just stupid – I've discovered that if you put empty garbage in you get empty garbage out, yeah! – but hey, this is category theory, where whole universes rise up from seeming trivialities... right?
The immediate question is, if Cofree has such a fixed point, what about Free? Well, it certainly can't be Empty as that's not a monad. The quick suspect would be something nearby like Const () or Identity, but no:
Free (Const ()) ~~ Either () ~~ Maybe
Free Identity ~~ (Nat,) ~~ Writer Nat
Indeed, the fact that Free always adds an extra constructor suggests that the structure of any functor that's a fixed point would have to be already infinite. But it seems odd that, if Cofree has such a simple fixed point, Free should only have a much more complex one (like the fix-by-construction FixFree a = C (Free FixFree a) that Reid Barton brings up in the comments).
Is the boring truth just that Free has no “accidental fixed point” and it's a mere coincidence that Cofree has one, or am I missing something?

Your observation that Empty is a fixed point of Cofree (which is not really true in Haskell, but I guess you want to work in some model that ignores ⊥, like Set) boils down to the fact that
there is a set E (the empty set) such that for every set X, the projection p₂ : X × E -> E is an isomorphism.
We could say in this situation that E is an absorbing object for the product. We can replace the word “set” by “object of C” for any category C with products, and we get a statement about C that may or may not be true. For Set, it happens to be true.
If we pick C = Setop, which also has products (because Set has coproducts), and then dualize the language to talk about sets again, we get the statement
there is a set F such that for every set Y, the inclusion i₂ : F -> Y + F is an isomorphism.
Obviously, this statement is not true for any set F (we can pick any non-empty set Y as a counterexample for any F). No surprise there, after all Setop is a different category from Set.
So, we won't get a “trivial fixed point” of Free in the same way we got one for Cofree, because Setop is qualitatively different from Set. The initial object of Set is an absorbing element for the product, but the terminal object of Set is not an absorbing object for the coproduct.
If I may get on my soapbox for a moment:
There is much discussion among Haskell programmers about which constructions are the “duals” of which other constructions. Most of this is in a formal sense meaningless, because in category theory dualizing a construction works like this:
Suppose I have a construction which I can perform on any category C (or any category with certain extra structure and/or properties). Then the dual construction on a category C is the original construction on the opposite category Cop (which had better have the extra structure and properties we needed, if any).
For example: The notion of products makes sense in any category C (though products might not always exist), via the universal property defining products. To get a dual notion of coproducts in C we should ask what are the products in Cop, and we have just defined what products are in any category, so this notion makes sense.
The trouble with applying duality to the setting of Haskell is that the Haskell language prefers overwhelmingly to talk about just one category, Hask, in which we do our constructions. This causes two problems for talking about duality:
To obtain the dual of a construction as described above, I am supposed to be able to be able to do the construction in any category, or at least any category of a particular form. So we must first generalize the construction that, typically, we have only done in the category Hask to a larger class of categories. (And having done so, there are plenty of other interesting categories we could potentially interpret the resulting notion in besides Haskop, such as Kleisli categories of monads.)
The category Hask enjoys many special properties which can be summarized by saying that (ignoring ⊥) Hask is a cartesian closed category. For example, this implies that the initial object is an absorbing object for the product. Haskop does not have these properties, which means that the generalized notion may not make sense in Haskop; and it can also mean that two notions which happened to be equivalent in Hask are distinct in general, and have different duals.
For an example of the latter, take lenses. In Hask they can be constructed in a number of ways; two ways are in terms of getter/setter pairs and as coalgebras for the costate comonad. The former generalizes to categories with products and the second to categories enriched in a particular way over Hask. If we apply the former construction to Haskop then we get out prisms, but if we apply the latter construction to Haskop then we get algebras for the state monad and these are not the same thing.
A more familiar example might be comonads: starting from the Haskell-centric presentation
return :: a -> m a
(>>=) :: m a -> (a -> m b) -> m b
some insight seems to be needed to determine which arrows to reverse to obtain
extract :: w a -> a
extend :: w a -> (w b -> a) -> w b
The point is that it would have been much easier to start from join :: m (m a) -> m a instead of (>>=); but finding this alternative presentation (equivalent due to special features of Hask) is a creative process, not a mechanical one.
In a question like yours, and many others like it, where it is pretty clear what sense of dual is intended, there's still absolutely no reason to expect a priori that the dual construction will actually exist or have the same properties as the original, because Haskop qualitatively behaves quite differently from Hask. A slogan might be
the theory of categories is self-dual, but the theory of any particular category is not!

Since you asked about the structure of the fixed points of Free, I'm going to sketch an informal argument that Free only has one fixed point which is a Functor, namely the type
newtype FixFree a = C (Free FixFree a)
that Reid Barton described. Indeed, I make a somewhat stronger claim. Let's start with a few pieces:
newtype Fix f a = Fix (f (Fix f) a)
instance Functor (f (Fix f)) => Functor (Fix f) where
fmap f (Fix x) = Fix (fmap f x)
-- This is basically `MFunctor` from `Control.Monad.Morph`
class FFunctor (g :: (* -> *) -> * -> *) where
hoistF :: Functor f => (forall a . f a -> f' a) -> g f b -> g f' b
Notably,
instance FFunctor Free where
hoistF _f (Pure a) = Pure a
hoistF f (Free fffa) = Free . f . fmap (hoistF f) $ fffa
Then
fToFixG :: (Functor f, FFunctor g) => (forall a . f a -> g f a) -> f a -> Fix g a
fToFixG fToG fa = Fix $ hoistF (fToFixG fToG) $ fToG fa
fixGToF :: forall f b (g :: (* -> *) -> * -> *) .
(FFunctor g, Functor (g (Fix g)))
=> (forall a . g f a -> f a) -> Fix g b -> f b
fixGToF gToF (Fix ga) = gToF $ hoistF (fixGToF gToF) ga
If I'm not mistaken (which I could be), passing each side of an isomorphism between f and g f to each of these functions will yield each side of an isomorphism between f and Fix g. Substituting Free for g will demonstrate the claim. This argument is very hand-wavey, of course, because Haskell is inconsistent.

Combining the state monad with the costate comonad

How to combine the state monad S -> (A, S) with the costate comonad (E->A, E)?
I tried with both obvious combinations S -> ((E->A, E), S) and (E->S->(A, S), E) but then in either case I do not know how to define the operations (return, extract, ... and so on) for the combination.

Combining two monads O and I yields a monad if either O or I is copointed, i.e. have an extract method. Each comonad is copointed. If both O and I` are copointed, then you have two different "natural" ways to obtain a monad which are presumably not equivalent.
You have:
unit_O :: a -> O a
join_O :: O (O a) -> O a
unit_I :: a -> I a
join_I :: I (I a) -> I a
Here I've added _O and _I suffixed for clarity; in actual Haskell code, they would not be there, since the type checker figures this out on its own.
Your goal is to show that O (I O (I a))) is a monad. Let's assume that O is copointed, i.e. that there is a function extract_O :: O a -> a.
Then we have:
unit :: a -> O (I a)
unit = unit_O . unit_I
join :: O (I (O (I a))) -> O (I a)
The problem, of course, is in implementing join. We follow this strategy:
fmap over the outer O
use extract_O to get ride of the inner O
use join_I to combine the two I monads
This leads us to
join = fmap_O $ join_I . fmap_I extract
To make this work, you'll also need to define
newtype MCompose O I a = MCompose O (I a)
and add the respective type constructors and deconstructors into the definitions above.
The other alternative uses extract_I instead of extract_O. This version is even simpler:
join = join_O . fmap_O extract_I
This defines a new monad. I assume you can define a new comonad in the same way, but I haven't attempted this.

As the other answer demonstrates, both of the combinations S -> ((E->A, E), S) and (E->S->(A, S), E)
have Monad and Comonad instances simultaneously. In fact giving a Monad/Comonad instance is
equivalent to giving a
monoid structure to resp. its points ∀r.r->f(r) or its copoints ∀r.f(r)->r, at least in classical,
non-constructive sense (I don't know the constructive answer). This fact suggests that actually a
Functor f has a very good chance that it can
be both Monad and Comonad, provided its points and copoints are non-trivial.
The real question, however, is whether the Monad/Comonad instances constructed as such do have natural
computational/categorical meanings. In this particular case I would say "no", because you don't seem to have
a priori knowledge about how to compose them in a way that suit your computational needs.
The standard categorical way to compose two (co)monads is via adjunctions. Let me summarize your situation:
Fₑ Fₛ
--> -->
Hask ⊣ Hask ⊣ Hask
<-- <--
Gₑ Gₛ
Fₜ(a) = (a,t)
Gₜ(a) = (t->a)
Proof of Fₜ ⊣ Gₜ:
Fₜ(x) -> y ≃
(x,t) -> y ≃
x -> (t->y) ≃
x -> Gₜ(y)
Now you can see that the state monad (s->(a,s)) ≃ (s->a,s->s) is the composition GₛFₛ and the costate comonad
is FₑGₑ. This adjunction says that Hask can be interpreted as a model of the (co)state (co)algebras.
Now, 'adjunctions compose.' For example,
FₛFₑ(x) -> y ≃
Fₑ(x) -> Gₛ(y) ≃
x -> GₑGₛ(y)
So FₛFₑ ⊣ GₑGₛ. This gives a pair of a monad and a comonad, namely
T(a) = GₑGₛFₛFₑ(a)
= GₑGₛFₛ(a,e)
= GₑGₛ(a,e,s)
= Gₑ(s->(a,e,s))
= e->s->(a,e,s)
= ((e,s)->a, (e,s)->(e,s))
G(a) = FₛFₑGₑGₛ(a)
= FₛFₑGₑ(s->a)
= FₛFₑ(e->s->a)
= Fₛ(e->s->a,e)
= (e->s->a,e,s)
= ((e,s)->a, (e,s))
T is simply the state monad with the state (e,s), G is the costate comonad with the costate (e,s), so
these do have very natural meanings.
Composing adjunctions is a natural, frequent mathematical operation. For example, a geometric morphism
between topoi (kind of Cartesian Closed Categories which admit complex (non-free) constructions at the 'type level') is defined as a pair of adjunctions, only requiring its left adjoint to be left exact (i.e. preserves finite limits). If those topoi are sheaves on topological spaces,
composing the adjunctions simply corresponds to composing (unique) continuous base change maps (in the opposite direction), having a very natural meaning.
On the other hand, composing monads/comonads directly seems to be a very rare practice in Mathematics.
This is because often a (co)monad is thought of as a carrier of an (co)algebraic theory, rather than as a
model. In this interpretation the corresponding adjunctions are the models, not the monad. The problem
is that composing two theories requires another theory, a theory about how to compose them. For example,
imagine composing two theories of monoids. Then you may get at least two new theories,
namely the theory of lists of lists, or ring-like algebras where two kinds of binary operations distribute.
Neither is a priori better/more natural than the other. This is the meaning of "monads don't compose"; it doesn't say the composition cannot be a monad, but it does say you will need another theory how to compose them.
In contrast, composing adjunctions naturally results in another adjunction simply because by doing so you are
implicity specifying the rules of composing two given theories. So by taking the monad of the composed adjunction you get the theory that also specifies the rules of composition.

A little category theory [duplicate]

Who first said the following?
A monad is just a monoid in the
category of endofunctors, what's the
problem?
And on a less important note, is this true and if so could you give an explanation (hopefully one that can be understood by someone who doesn't have much Haskell experience)?

That particular phrasing is by James Iry, from his highly entertaining Brief, Incomplete and Mostly Wrong History of Programming Languages, in which he fictionally attributes it to Philip Wadler.
The original quote is from Saunders Mac Lane in Categories for the Working Mathematician, one of the foundational texts of Category Theory. Here it is in context, which is probably the best place to learn exactly what it means.
But, I'll take a stab. The original sentence is this:
All told, a monad in X is just a monoid in the category of endofunctors of X, with product × replaced by composition of endofunctors and unit set by the identity endofunctor.
X here is a category. Endofunctors are functors from a category to itself (which is usually all Functors as far as functional programmers are concerned, since they're mostly dealing with just one category; the category of types - but I digress). But you could imagine another category which is the category of "endofunctors on X". This is a category in which the objects are endofunctors and the morphisms are natural transformations.
And of those endofunctors, some of them might be monads. Which ones are monads? Exactly the ones which are monoidal in a particular sense. Instead of spelling out the exact mapping from monads to monoids (since Mac Lane does that far better than I could hope to), I'll just put their respective definitions side by side and let you compare:
A monoid is...
A set, S
An operation, • : S × S → S
An element of S, e : 1 → S
...satisfying these laws:
(a • b) • c = a • (b • c), for all a, b and c in S
e • a = a • e = a, for all a in S
A monad is...
An endofunctor, T : X → X (in Haskell, a type constructor of kind * -> * with a Functor instance)
A natural transformation, μ : T × T → T, where × means functor composition (μ is known as join in Haskell)
A natural transformation, η : I → T, where I is the identity endofunctor on X (η is known as return in Haskell)
...satisfying these laws:
μ ∘ Tμ = μ ∘ μT
μ ∘ Tη = μ ∘ ηT = 1 (the identity natural transformation)
With a bit of squinting you might be able to see that both of these definitions are instances of the same abstract concept.

First, the extensions and libraries that we're going to use:
{-# LANGUAGE RankNTypes, TypeOperators #-}
import Control.Monad (join)
Of these, RankNTypes is the only one that's absolutely essential to the below. I once wrote an explanation of RankNTypes that some people seem to have found useful, so I'll refer to that.
Quoting Tom Crockett's excellent answer, we have:
A monad is...
An endofunctor, T : X -> X
A natural transformation, μ : T × T -> T, where × means functor composition
A natural transformation, η : I -> T, where I is the identity endofunctor on X
...satisfying these laws:
μ(μ(T × T) × T)) = μ(T × μ(T × T))
μ(η(T)) = T = μ(T(η))
How do we translate this to Haskell code? Well, let's start with the notion of a natural transformation:
-- | A natural transformations between two 'Functor' instances. Law:
--
-- > fmap f . eta g == eta g . fmap f
--
-- Neat fact: the type system actually guarantees this law.
--
newtype f :-> g =
Natural { eta :: forall x. f x -> g x }
A type of the form f :-> g is analogous to a function type, but instead of thinking of it as a function between two types (of kind *), think of it as a morphism between two functors (each of kind * -> *). Examples:
listToMaybe :: [] :-> Maybe
listToMaybe = Natural go
where go [] = Nothing
go (x:_) = Just x
maybeToList :: Maybe :-> []
maybeToList = Natural go
where go Nothing = []
go (Just x) = [x]
reverse' :: [] :-> []
reverse' = Natural reverse
Basically, in Haskell, natural transformations are functions from some type f x to another type g x such that the x type variable is "inaccessible" to the caller. So for example, sort :: Ord a => [a] -> [a] cannot be made into a natural transformation, because it's "picky" about which types we may instantiate for a. One intuitive way I often use to think of this is the following:
A functor is a way of operating on the content of something without touching the structure.
A natural transformation is a way of operating on the structure of something without touching or looking at the content.
Now, with that out of the way, let's tackle the clauses of the definition.
The first clause is "an endofunctor, T : X -> X." Well, every Functor in Haskell is an endofunctor in what people call "the Hask category," whose objects are Haskell types (of kind *) and whose morphisms are Haskell functions. This sounds like a complicated statement, but it's actually a very trivial one. All it means is that that a Functor f :: * -> * gives you the means of constructing a type f a :: * for any a :: * and a function fmap f :: f a -> f b out of any f :: a -> b, and that these obey the functor laws.
Second clause: the Identity functor in Haskell (which comes with the Platform, so you can just import it) is defined this way:
newtype Identity a = Identity { runIdentity :: a }
instance Functor Identity where
fmap f (Identity a) = Identity (f a)
So the natural transformation η : I -> T from Tom Crockett's definition can be written this way for any Monad instance t:
return' :: Monad t => Identity :-> t
return' = Natural (return . runIdentity)
Third clause: The composition of two functors in Haskell can be defined this way (which also comes with the Platform):
newtype Compose f g a = Compose { getCompose :: f (g a) }
-- | The composition of two 'Functor's is also a 'Functor'.
instance (Functor f, Functor g) => Functor (Compose f g) where
fmap f (Compose fga) = Compose (fmap (fmap f) fga)
So the natural transformation μ : T × T -> T from Tom Crockett's definition can be written like this:
join' :: Monad t => Compose t t :-> t
join' = Natural (join . getCompose)
The statement that this is a monoid in the category of endofunctors then means that Compose (partially applied to just its first two parameters) is associative, and that Identity is its identity element. I.e., that the following isomorphisms hold:
Compose f (Compose g h) ~= Compose (Compose f g) h
Compose f Identity ~= f
Compose Identity g ~= g
These are very easy to prove because Compose and Identity are both defined as newtype, and the Haskell Reports define the semantics of newtype as an isomorphism between the type being defined and the type of the argument to the newtype's data constructor. So for example, let's prove Compose f Identity ~= f:
Compose f Identity a
~= f (Identity a) -- newtype Compose f g a = Compose (f (g a))
~= f a -- newtype Identity a = Identity a
Q.E.D.

The answers here do an excellent job in defining both monoids and monads, however, they still don't seem to answer the question:
And on a less important note, is this true and if so could you give an explanation (hopefully one that can be understood by someone who doesn't have much Haskell experience)?
The crux of the matter that is missing here, is the different notion of "monoid", the so-called categorification more precisely -- the one of monoid in a monoidal category. Sadly Mac Lane's book itself makes it very confusing:
All told, a monad in X is just a monoid in the category of endofunctors of X, with product × replaced by composition of endofunctors and unit set by the identity endofunctor.
Main confusion
Why is this confusing? Because it does not define what is "monoid in the category of endofunctors" of X. Instead, this sentence suggests taking a monoid inside the set of all endofunctors together with the functor composition as binary operation and the identity functor as a monoidal unit. Which works perfectly fine and turns into a monoid any subset of endofunctors that contains the identity functor and is closed under functor composition.
Yet this is not the correct interpretation, which the book fails to make clear at that stage. A Monad f is a fixed endofunctor, not a subset of endofunctors closed under composition. A common construction is to use f to generate a monoid by taking the set of all k-fold compositions f^k = f(f(...)) of f with itself, including k=0 that corresponds to the identity f^0 = id. And now the set S of all these powers for all k>=0 is indeed a monoid "with product × replaced by composition of endofunctors and unit set by the identity endofunctor".
And yet:
This monoid S can be defined for any functor f or even literally for any self-map of X. It is the monoid generated by f.
The monoidal structure of S given by the functor composition and the identity functor has nothing do with f being or not being a monad.
And to make things more confusing, the definition of "monoid in monoidal category" comes later in the book as you can see from the table of contents. And yet understanding this notion is absolutely critical to understanding the connection with monads.
(Strict) monoidal categories
Going to Chapter VII on Monoids (which comes later than Chapter VI on Monads), we find the definition of the so-called strict monoidal category as triple (B, *, e), where B is a category, *: B x B-> B a bifunctor (functor with respect to each component with other component fixed) and e is a unit object in B, satisfying the associativity and unit laws:
(a * b) * c = a * (b * c)
a * e = e * a = a
for any objects a,b,c of B, and the same identities for any morphisms a,b,c with e replaced by id_e, the identity morphism of e. It is now instructive to observe that in our case of interest, where B is the category of endofunctors of X with natural transformations as morphisms, * the functor composition and e the identity functor, all these laws are satisfied, as can be directly verified.
What comes after in the book is the definition of the "relaxed" monoidal category, where the laws only hold modulo some fixed natural transformations satisfying so-called coherence relations, which is however not important for our cases of the endofunctor categories.
Monoids in monoidal categories
Finally, in section 3 "Monoids" of Chapter VII, the actual definition is given:
A monoid c in a monoidal category (B, *, e) is an object of B with two arrows (morphisms)
mu: c * c -> c
nu: e -> c
making 3 diagrams commutative. Recall that in our case, these are morphisms in the category of endofunctors, which are natural transformations corresponding to precisely join and return for a monad. The connection becomes even clearer when we make the composition * more explicit, replacing c * c by c^2, where c is our monad.
Finally, notice that the 3 commutative diagrams (in the definition of a monoid in monoidal category) are written for general (non-strict) monoidal categories, while in our case all natural transformations arising as part of the monoidal category are actually identities. That will make the diagrams exactly the same as the ones in the definition of a monad, making the correspondence complete.
Conclusion
In summary, any monad is by definition an endofunctor, hence an object in the category of endofunctors, where the monadic join and return operators satisfy the definition of a monoid in that particular (strict) monoidal category. Vice versa, any monoid in the monoidal category of endofunctors is by definition a triple (c, mu, nu) consisting of an object and two arrows, e.g. natural transformations in our case, satisfying the same laws as a monad.
Finally, note the key difference between the (classical) monoids and the more general monoids in monoidal categories. The two arrows mu and nu above are not anymore a binary operation and a unit in a set. Instead, you have one fixed endofunctor c. The functor composition * and the identity functor alone do not provide the complete structure needed for the monad, despite that confusing remark in the book.
Another approach would be to compare with the standard monoid C of all self-maps of a set A, where the binary operation is the composition, that can be seen to map the standard cartesian product C x C into C. Passing to the categorified monoid, we are replacing the cartesian product x with the functor composition *, and the binary operation gets replaced with the natural transformation mu from
c * c to c, that is a collection of the join operators
join: c(c(T))->c(T)
for every object T (type in programming). And the identity elements in classical monoids, which can be identified with images of maps from a fixed one-point-set, get replaced with the collection of the return operators
return: T->c(T)
But now there are no more cartesian products, so no pairs of elements and thus no binary operations.

I came to this post by way of better understanding the inference of the infamous quote from Mac Lane's Category Theory For the Working Mathematician.
In describing what something is, it's often equally useful to describe what it's not.
The fact that Mac Lane uses the description to describe a Monad, one might imply that it describes something unique to monads. Bear with me. To develop a broader understanding of the statement, I believe it needs to be made clear that he is not describing something that is unique to monads; the statement equally describes Applicative and Arrows among others. For the same reason we can have two monoids on Int (Sum and Product), we can have several monoids on X in the category of endofunctors. But there is even more to the similarities.
Both Monad and Applicative meet the criteria:
endo => any arrow, or morphism that starts and ends in the same place
functor => any arrow, or morphism between two Categories (e.g., in day to day Tree a -> List b, but in Category Tree -> List)
monoid => single object; i.e., a single type, but in this context, only in regards to the external layer; so, we can't have Tree -> List, only List -> List.
The statement uses "Category of..." This defines the scope of the statement. As an example, the Functor Category describes the scope of f * -> g *, i.e., Any functor -> Any functor, e.g., Tree * -> List * or Tree * -> Tree *.
What a Categorical statement does not specify describes where anything and everything is permitted.
In this case, inside the functors, * -> * aka a -> b is not specified which means Anything -> Anything including Anything else. As my imagination jumps to Int -> String, it also includes Integer -> Maybe Int, or even Maybe Double -> Either String Int where a :: Maybe Double; b :: Either String Int.
So the statement comes together as follows:
functor scope :: f a -> g b (i.e., any parameterized type to any parameterized type)
endo + functor :: f a -> f b (i.e., any one parameterized type to the same parameterized type) ... said differently,
a monoid in the category of endofunctor
So, where is the power of this construct? To appreciate the full dynamics, I needed to see that the typical drawings of a monoid (single object with what looks like an identity arrow, :: single object -> single object), fails to illustrate that I'm permitted to use an arrow parameterized with any number of monoid values, from the one type object permitted in Monoid. The endo, ~ identity arrow definition of equivalence ignores the functor's type value and both the type and value of the most inner, "payload" layer. Thus, equivalence returns true in any situation where the functorial types match (e.g., Nothing -> Just * -> Nothing is equivalent to Just * -> Just * -> Just * because they are both Maybe -> Maybe -> Maybe).
Sidebar: ~ outside is conceptual, but is the left most symbol in f a. It also describes what "Haskell" reads-in first (big picture); so Type is "outside" in relation to a Type Value. The relationship between layers (a chain of references) in programming is not easy to relate in Category. The Category of Set is used to describe Types (Int, Strings, Maybe Int etc.) which includes the Category of Functor (parameterized Types). The reference chain: Functor Type, Functor values (elements of that Functor's set, e.g., Nothing, Just), and in turn, everything else each functor value points to. In Category the relationship is described differently, e.g., return :: a -> m a is considered a natural transformation from one Functor to another Functor, different from anything mentioned thus far.
Back to the main thread, all in all, for any defined tensor product and a neutral value, the statement ends up describing an amazingly powerful computational construct born from its paradoxical structure:
on the outside it appears as a single object (e.g., :: List); static
but inside, permits a lot of dynamics
any number of values of the same type (e.g., Empty | ~NonEmpty) as fodder to functions of any arity. The tensor product will reduce any number of inputs to a single value... for the external layer (~fold that says nothing about the payload)
infinite range of both the type and values for the inner most layer
In Haskell, clarifying the applicability of the statement is important. The power and versatility of this construct, has absolutely nothing to do with a monad per se. In other words, the construct does not rely on what makes a monad unique.
When trying to figure out whether to build code with a shared context to support computations that depend on each other, versus computations that can be run in parallel, this infamous statement, with as much as it describes, is not a contrast between the choice of Applicative, Arrows and Monads, but rather is a description of how much they are the same. For the decision at hand, the statement is moot.
This is often misunderstood. The statement goes on to describe join :: m (m a) -> m a as the tensor product for the monoidal endofunctor. However, it does not articulate how, in the context of this statement, (<*>) could also have also been chosen. It truly is an example of 'six in one, half a dozen in the other'. The logic for combining values are exactly alike; same input generates the same output from each (unlike the Sum and Product monoids for Int because they generate different results when combining Ints).
So, to recap: A monoid in the category of endofunctors describes:
~t :: m * -> m * -> m *
and a neutral value for m *
(<*>) and (>>=) both provide simultaneous access to the two m values in order to compute the the single return value. The logic used to compute the return value is exactly the same. If it were not for the different shapes of the functions they parameterize (f :: a -> b versus k :: a -> m b) and the position of the parameter with the same return type of the computation (i.e., a -> b -> b versus b -> a -> b for each respectively), I suspect we could have parameterized the monoidal logic, the tensor product, for reuse in both definitions. As an exercise to make the point, try and implement ~t, and you end up with (<*>) and (>>=) depending on how you decide to define it forall a b.
If my last point is at minimum conceptually true, it then explains the precise, and only computational difference between Applicative and Monad: the functions they parameterize. In other words, the difference is external to the implementation of these type classes.
In conclusion, in my own experience, Mac Lane's infamous quote provided a great "goto" meme, a guidepost for me to reference while navigating my way through Category to better understand the idioms used in Haskell. It succeeds at capturing the scope of a powerful computing capacity made wonderfully accessible in Haskell.
However, there is irony in how I first misunderstood the statement's applicability outside of the monad, and what I hope conveyed here. Everything that it describes turns out to be what is similar between Applicative and Monads (and Arrows among others). What it doesn't say is precisely the small but useful distinction between them.

Note: No, this isn't true. At some point there was a comment on this answer from Dan Piponi himself saying that the cause and effect here was exactly the opposite, that he wrote his article in response to James Iry's quip. But it seems to have been removed, perhaps by some compulsive tidier.
Below is my original answer.
It's quite possible that Iry had read From Monoids to Monads, a post in which Dan Piponi (sigfpe) derives monads from monoids in Haskell, with much discussion of category theory and explicit mention of "the category of endofunctors on Hask" . In any case, anyone who wonders what it means for a monad to be a monoid in the category of endofunctors might benefit from reading this derivation.

Monads as adjunctions

I've been reading about monads in category theory. One definition of monads uses a pair of adjoint functors. A monad is defined by a round-trip using those functors. Apparently adjunctions are very important in category theory, but I haven't seen any explanation of Haskell monads in terms of adjoint functors. Has anyone given it a thought?

Edit: Just for fun, I'm going to do this right. Original answer preserved below
The current adjunction code for category-extras now is in the adjunctions package: http://hackage.haskell.org/package/adjunctions
I'm just going to work through the state monad explicitly and simply. This code uses Data.Functor.Compose from the transformers package, but is otherwise self-contained.
An adjunction between f (D -> C) and g (C -> D), written f -| g, can be characterized in a number of ways. We'll use the counit/unit (epsilon/eta) description, which gives two natural transformations (morphisms between functors).
class (Functor f, Functor g) => Adjoint f g where
counit :: f (g a) -> a
unit :: a -> g (f a)
Note that the "a" in counit is really the identity functor in C, and the "a" in unit is really the identity functor in D.
We can also recover the hom-set adjunction definition from the counit/unit definition.
phiLeft :: Adjoint f g => (f a -> b) -> (a -> g b)
phiLeft f = fmap f . unit
phiRight :: Adjoint f g => (a -> g b) -> (f a -> b)
phiRight f = counit . fmap f
In any case, we can now define a Monad from our unit/counit adjunction like so:
instance Adjoint f g => Monad (Compose g f) where
return x = Compose $ unit x
x >>= f = Compose . fmap counit . getCompose $ fmap (getCompose . f) x
Now we can implement the classic adjunction between (a,) and (a ->):
instance Adjoint ((,) a) ((->) a) where
-- counit :: (a,a -> b) -> b
counit (x, f) = f x
-- unit :: b -> (a -> (a,b))
unit x = \y -> (y, x)
And now a type synonym
type State s = Compose ((->) s) ((,) s)
And if we load this up in ghci, we can confirm that State is precisely our classic state monad. Note that we can take the opposite composition and get the Costate Comonad (aka the store comonad).
There are a bunch of other adjunctions we can make into monads in this fashion (such as (Bool,) Pair), but they're sort of strange monads. Unfortunately we can't do the adjunctions that induce Reader and Writer directly in Haskell in a pleasant way. We can do Cont, but as copumpkin describes, that requires an adjunction from an opposite category, so it actually uses a different "form" of the "Adjoint" typeclass that reverses some arrows. That form is also implemented in a different module in the adjunctions package.
this material is covered in a different way by Derek Elkins' article in The Monad Reader 13 -- Calculating Monads with Category Theory: http://www.haskell.org/wikiupload/8/85/TMR-Issue13.pdf
Also, Hinze's recent Kan Extensions for Program Optimization paper walks through the construction of the list monad from the adjunction between Mon and Set: http://www.cs.ox.ac.uk/ralf.hinze/Kan.pdf
Old answer:
Two references.
1) Category-extras delivers, as as always, with a representation of adjunctions and how monads arise from them. As usual, it's good to think with, but pretty light on documentation: http://hackage.haskell.org/packages/archive/category-extras/0.53.5/doc/html/Control-Functor-Adjunction.html
2) -Cafe also delivers with a promising but brief discussion on the role of adjunction. Some of which may help in interpreting category-extras: http://www.haskell.org/pipermail/haskell-cafe/2007-December/036328.html

Derek Elkins was showing me recently over dinner how the Cont Monad arises from composing the (_ -> k) contravariant functor with itself, since it happens to be self-adjoint. That's how you get (a -> k) -> k out of it. Its counit, however, leads to double negation elimination, which can't be written in Haskell.
For some Agda code that illustrates and proves this, please see http://hpaste.org/68257.

This is an old thread, but I found the question interesting,
so I did some calculations myself. Hopefully Bartosz is still there
and might read this..
In fact, the Eilenberg-Moore construction does give a very clear picture in this case.
(I will use CWM notation with Haskell like syntax)
Let T be the list monad < T,eta,mu > (eta = return and mu = concat)
and consider a T-algebra h:T a -> a.
(Note that T a = [a] is a free monoid <[a],[],(++)>, that is, identity [] and multiplication (++).)
By definition, h must satisfy h.T h == h.mu a and h.eta a== id.
Now, some easy diagram chasing proves that h actually induces a monoid structure on a (defined by x*y = h[x,y] ),
and that h becomes a monoid homomorphism for this structure.
Conversely, any monoid structure < a,a0,* > defined in Haskell is naturally defined as a T-algebra.
In this way (h = foldr ( * ) a0, a function that 'replaces' (:) with (*),and maps [] to a0, the identity).
So, in this case, the category of T-algebras is just the category of monoid structures definable in Haskell, HaskMon.
(Please check that the morphisms in T-algebras are actually monoid homomorphisms.)
It also characterizes lists as universal objects in HaskMon, just like free products in Grp, polynomial rings in CRng, etc.
The adjuction corresponding to the above construction is < F,G,eta,epsilon >
where
F:Hask -> HaskMon, which takes a type a to the 'free monoid generated by a',that is, [a],
G:HaskMon -> Hask, the forgetful functor (forget the multiplication),
eta:1 -> GF , the natural transformation defined by \x::a -> [x],
epsilon: FG -> 1 , the natural transformation defined by the folding function above
(the 'canonical surjection' from a free monoid to its quotient monoid)
Next, there is another 'Kleisli category' and the corresponding adjunction.
You can check that it is just the category of Haskell types with morphisms a -> T b,
where its compositions are given by the so-called 'Kleisli composition' (>=>).
A typical Haskell programmer will find this category more familiar.
Finally,as is illustrated in CWM, the category of T-algebras
(resp. Kleisli category) becomes the terminal (resp. initial) object in the category
of adjuctions that define the list monad T in a suitable sense.
I suggest to do a similar calculations for the binary tree functor T a = L a | B (T a) (T a) to check your understanding.

I've found a standard constructions of adjunct functors for any monad by Eilenberg-Moore, but I'm not sure if it adds any insight to the problem. The second category in the construction is a category of T-algebras. A T algebra adds a "product" to the initial category.
So how would it work for a list monad? The functor in the list monad consists of a type constructor, e.g., Int->[Int] and a mapping of functions (e.g., standard application of map to lists). An algebra adds a mapping from lists to elements. One example would be adding (or multiplying) all the elements of a list of integers. The functor F takes any type, e.g., Int, and maps it into the algebra defined on the lists of Int, where the product is defined by monadic join (or vice versa, join is defined as the product). The forgetful functor G takes an algebra and forgets the product. The pair F, G, of adjoint functors is then used to construct the monad in the usual way.
I must say I'm none the wiser.

If you are interested,here's some thoughts of a non-expert
on the role of monads and adjunctions in programming languages:
First of all, there exists for a given monad T a unique adjunction to the Kleisli category of T.
In Haskell,the use of monads is primarily confined to operations in this category
(which is essentially a category of free algebras,no quotients).
In fact, all one can do with a Haskell Monad is to compose some Kleisli morphisms of
type a->T b through the use of do expressions, (>>=), etc., to create a new
morphism. In this context, the role of monads is restricted to just the economy
of notation.One exploits associativity of morphisms to be able to write (say) [0,1,2]
instead of (Cons 0 (Cons 1 (Cons 2 Nil))), that is, you can write sequence as sequence,
not as a tree.
Even the use of IO monads is non essential, for the current Haskell type system is powerful
enough to realize data encapsulation (existential types).
This is my answer to your original question,
but I'm curious what Haskell experts have to say about this.
On the other hand, as we have noted, there's also a 1-1 correspondence between monads and
adjunctions to (T-)algebras. Adjoints, in MacLane's terms, are 'a way
to express equivalences of categories.'
In a typical setting of adjunctions <F,G>:X->A where F is some sort
of 'free algebra generator' and G a 'forgetful functor',the corresponding monad
will (through the use of T-algebras) describe how (and when) the algebraic structure of A is constructed on the objects of X.
In the case of Hask and the list monad T, the structure which T introduces is that
of monoid,and this can help us to establish properties (including the correctness) of code through algebraic
methods that the theory of monoids provides. For example, the function foldr (*) e::[a]->a can
readily be seen as an associative operation as long as <a,(*),e> is a monoid,
a fact which could be exploited by the compiler to optimize the computation (e.g. by parallelism).
Another application is to identify and classify 'recursion patterns' in functional programming using categorical
methods in the hope to (partially) dispose of 'the goto of functional programming', Y (the arbitrary recursion combinator).
Apparently, this kind of applications is one of the primary motivations of the creators of Category Theory (MacLane, Eilenberg, etc.),
namely, to establish natural equivalence of categories, and transfer a well-known method in one category
to another (e.g. homological methods to topological spaces,algebraic methods to programming, etc.).
Here, adjoints and monads are indispensable tools to exploit this connection of categories.
(Incidentally, the notion of monads (and its dual, comonads) is so general that one can even go so far as to define 'cohomologies' of
Haskell types.But I have not given a thought yet.)
As for non-determistic functions you mentioned, I have much less to say...
But note that; if an adjunction <F,G>:Hask->A for some category A defines the list monad T,
there must be a unique 'comparison functor' K:A->MonHask (the category of monoids definable in Haskell), see CWM.
This means, in effect, that your category of interest must be a category of monoids in some restricted form (e.g. it may lack some quotients but not free algebras) in order to define the list monad.
Finally,some remarks:
The binary tree functor I mentioned in my last posting easily generalizes to arbitrary data type
T a1 .. an = T1 T11 .. T1m | ....
Namely,any data type in Haskell naturally defines a monad (together with the corresponding category of algebras and the Kleisli category),
which is just the result of any data constructor in Haskell being total.
This is another reason why I consider Haskell's Monad class is not much more than a syntax sugar
(which is pretty important in practice,of course).

A monad is just a monoid in the category of endofunctors, what's the problem?

Who first said the following?
A monad is just a monoid in the
category of endofunctors, what's the
problem?
And on a less important note, is this true and if so could you give an explanation (hopefully one that can be understood by someone who doesn't have much Haskell experience)?

That particular phrasing is by James Iry, from his highly entertaining Brief, Incomplete and Mostly Wrong History of Programming Languages, in which he fictionally attributes it to Philip Wadler.
The original quote is from Saunders Mac Lane in Categories for the Working Mathematician, one of the foundational texts of Category Theory. Here it is in context, which is probably the best place to learn exactly what it means.
But, I'll take a stab. The original sentence is this:
All told, a monad in X is just a monoid in the category of endofunctors of X, with product × replaced by composition of endofunctors and unit set by the identity endofunctor.
X here is a category. Endofunctors are functors from a category to itself (which is usually all Functors as far as functional programmers are concerned, since they're mostly dealing with just one category; the category of types - but I digress). But you could imagine another category which is the category of "endofunctors on X". This is a category in which the objects are endofunctors and the morphisms are natural transformations.
And of those endofunctors, some of them might be monads. Which ones are monads? Exactly the ones which are monoidal in a particular sense. Instead of spelling out the exact mapping from monads to monoids (since Mac Lane does that far better than I could hope to), I'll just put their respective definitions side by side and let you compare:
A monoid is...
A set, S
An operation, • : S × S → S
An element of S, e : 1 → S
...satisfying these laws:
(a • b) • c = a • (b • c), for all a, b and c in S
e • a = a • e = a, for all a in S
A monad is...
An endofunctor, T : X → X (in Haskell, a type constructor of kind * -> * with a Functor instance)
A natural transformation, μ : T × T → T, where × means functor composition (μ is known as join in Haskell)
A natural transformation, η : I → T, where I is the identity endofunctor on X (η is known as return in Haskell)
...satisfying these laws:
μ ∘ Tμ = μ ∘ μT
μ ∘ Tη = μ ∘ ηT = 1 (the identity natural transformation)
With a bit of squinting you might be able to see that both of these definitions are instances of the same abstract concept.

First, the extensions and libraries that we're going to use:
{-# LANGUAGE RankNTypes, TypeOperators #-}
import Control.Monad (join)
Of these, RankNTypes is the only one that's absolutely essential to the below. I once wrote an explanation of RankNTypes that some people seem to have found useful, so I'll refer to that.
Quoting Tom Crockett's excellent answer, we have:
A monad is...
An endofunctor, T : X -> X
A natural transformation, μ : T × T -> T, where × means functor composition
A natural transformation, η : I -> T, where I is the identity endofunctor on X
...satisfying these laws:
μ(μ(T × T) × T)) = μ(T × μ(T × T))
μ(η(T)) = T = μ(T(η))
How do we translate this to Haskell code? Well, let's start with the notion of a natural transformation:
-- | A natural transformations between two 'Functor' instances. Law:
--
-- > fmap f . eta g == eta g . fmap f
--
-- Neat fact: the type system actually guarantees this law.
--
newtype f :-> g =
Natural { eta :: forall x. f x -> g x }
A type of the form f :-> g is analogous to a function type, but instead of thinking of it as a function between two types (of kind *), think of it as a morphism between two functors (each of kind * -> *). Examples:
listToMaybe :: [] :-> Maybe
listToMaybe = Natural go
where go [] = Nothing
go (x:_) = Just x
maybeToList :: Maybe :-> []
maybeToList = Natural go
where go Nothing = []
go (Just x) = [x]
reverse' :: [] :-> []
reverse' = Natural reverse
Basically, in Haskell, natural transformations are functions from some type f x to another type g x such that the x type variable is "inaccessible" to the caller. So for example, sort :: Ord a => [a] -> [a] cannot be made into a natural transformation, because it's "picky" about which types we may instantiate for a. One intuitive way I often use to think of this is the following:
A functor is a way of operating on the content of something without touching the structure.
A natural transformation is a way of operating on the structure of something without touching or looking at the content.
Now, with that out of the way, let's tackle the clauses of the definition.
The first clause is "an endofunctor, T : X -> X." Well, every Functor in Haskell is an endofunctor in what people call "the Hask category," whose objects are Haskell types (of kind *) and whose morphisms are Haskell functions. This sounds like a complicated statement, but it's actually a very trivial one. All it means is that that a Functor f :: * -> * gives you the means of constructing a type f a :: * for any a :: * and a function fmap f :: f a -> f b out of any f :: a -> b, and that these obey the functor laws.
Second clause: the Identity functor in Haskell (which comes with the Platform, so you can just import it) is defined this way:
newtype Identity a = Identity { runIdentity :: a }
instance Functor Identity where
fmap f (Identity a) = Identity (f a)
So the natural transformation η : I -> T from Tom Crockett's definition can be written this way for any Monad instance t:
return' :: Monad t => Identity :-> t
return' = Natural (return . runIdentity)
Third clause: The composition of two functors in Haskell can be defined this way (which also comes with the Platform):
newtype Compose f g a = Compose { getCompose :: f (g a) }
-- | The composition of two 'Functor's is also a 'Functor'.
instance (Functor f, Functor g) => Functor (Compose f g) where
fmap f (Compose fga) = Compose (fmap (fmap f) fga)
So the natural transformation μ : T × T -> T from Tom Crockett's definition can be written like this:
join' :: Monad t => Compose t t :-> t
join' = Natural (join . getCompose)
The statement that this is a monoid in the category of endofunctors then means that Compose (partially applied to just its first two parameters) is associative, and that Identity is its identity element. I.e., that the following isomorphisms hold:
Compose f (Compose g h) ~= Compose (Compose f g) h
Compose f Identity ~= f
Compose Identity g ~= g
These are very easy to prove because Compose and Identity are both defined as newtype, and the Haskell Reports define the semantics of newtype as an isomorphism between the type being defined and the type of the argument to the newtype's data constructor. So for example, let's prove Compose f Identity ~= f:
Compose f Identity a
~= f (Identity a) -- newtype Compose f g a = Compose (f (g a))
~= f a -- newtype Identity a = Identity a
Q.E.D.

I came to this post by way of better understanding the inference of the infamous quote from Mac Lane's Category Theory For the Working Mathematician.
In describing what something is, it's often equally useful to describe what it's not.
The fact that Mac Lane uses the description to describe a Monad, one might imply that it describes something unique to monads. Bear with me. To develop a broader understanding of the statement, I believe it needs to be made clear that he is not describing something that is unique to monads; the statement equally describes Applicative and Arrows among others. For the same reason we can have two monoids on Int (Sum and Product), we can have several monoids on X in the category of endofunctors. But there is even more to the similarities.
Both Monad and Applicative meet the criteria:
endo => any arrow, or morphism that starts and ends in the same place
functor => any arrow, or morphism between two Categories (e.g., in day to day Tree a -> List b, but in Category Tree -> List)
monoid => single object; i.e., a single type, but in this context, only in regards to the external layer; so, we can't have Tree -> List, only List -> List.
The statement uses "Category of..." This defines the scope of the statement. As an example, the Functor Category describes the scope of f * -> g *, i.e., Any functor -> Any functor, e.g., Tree * -> List * or Tree * -> Tree *.
What a Categorical statement does not specify describes where anything and everything is permitted.
In this case, inside the functors, * -> * aka a -> b is not specified which means Anything -> Anything including Anything else. As my imagination jumps to Int -> String, it also includes Integer -> Maybe Int, or even Maybe Double -> Either String Int where a :: Maybe Double; b :: Either String Int.
So the statement comes together as follows:
functor scope :: f a -> g b (i.e., any parameterized type to any parameterized type)
endo + functor :: f a -> f b (i.e., any one parameterized type to the same parameterized type) ... said differently,
a monoid in the category of endofunctor
So, where is the power of this construct? To appreciate the full dynamics, I needed to see that the typical drawings of a monoid (single object with what looks like an identity arrow, :: single object -> single object), fails to illustrate that I'm permitted to use an arrow parameterized with any number of monoid values, from the one type object permitted in Monoid. The endo, ~ identity arrow definition of equivalence ignores the functor's type value and both the type and value of the most inner, "payload" layer. Thus, equivalence returns true in any situation where the functorial types match (e.g., Nothing -> Just * -> Nothing is equivalent to Just * -> Just * -> Just * because they are both Maybe -> Maybe -> Maybe).
Sidebar: ~ outside is conceptual, but is the left most symbol in f a. It also describes what "Haskell" reads-in first (big picture); so Type is "outside" in relation to a Type Value. The relationship between layers (a chain of references) in programming is not easy to relate in Category. The Category of Set is used to describe Types (Int, Strings, Maybe Int etc.) which includes the Category of Functor (parameterized Types). The reference chain: Functor Type, Functor values (elements of that Functor's set, e.g., Nothing, Just), and in turn, everything else each functor value points to. In Category the relationship is described differently, e.g., return :: a -> m a is considered a natural transformation from one Functor to another Functor, different from anything mentioned thus far.
Back to the main thread, all in all, for any defined tensor product and a neutral value, the statement ends up describing an amazingly powerful computational construct born from its paradoxical structure:
on the outside it appears as a single object (e.g., :: List); static
but inside, permits a lot of dynamics
any number of values of the same type (e.g., Empty | ~NonEmpty) as fodder to functions of any arity. The tensor product will reduce any number of inputs to a single value... for the external layer (~fold that says nothing about the payload)
infinite range of both the type and values for the inner most layer
In Haskell, clarifying the applicability of the statement is important. The power and versatility of this construct, has absolutely nothing to do with a monad per se. In other words, the construct does not rely on what makes a monad unique.
When trying to figure out whether to build code with a shared context to support computations that depend on each other, versus computations that can be run in parallel, this infamous statement, with as much as it describes, is not a contrast between the choice of Applicative, Arrows and Monads, but rather is a description of how much they are the same. For the decision at hand, the statement is moot.
This is often misunderstood. The statement goes on to describe join :: m (m a) -> m a as the tensor product for the monoidal endofunctor. However, it does not articulate how, in the context of this statement, (<*>) could also have also been chosen. It truly is an example of 'six in one, half a dozen in the other'. The logic for combining values are exactly alike; same input generates the same output from each (unlike the Sum and Product monoids for Int because they generate different results when combining Ints).
So, to recap: A monoid in the category of endofunctors describes:
~t :: m * -> m * -> m *
and a neutral value for m *
(<*>) and (>>=) both provide simultaneous access to the two m values in order to compute the the single return value. The logic used to compute the return value is exactly the same. If it were not for the different shapes of the functions they parameterize (f :: a -> b versus k :: a -> m b) and the position of the parameter with the same return type of the computation (i.e., a -> b -> b versus b -> a -> b for each respectively), I suspect we could have parameterized the monoidal logic, the tensor product, for reuse in both definitions. As an exercise to make the point, try and implement ~t, and you end up with (<*>) and (>>=) depending on how you decide to define it forall a b.
If my last point is at minimum conceptually true, it then explains the precise, and only computational difference between Applicative and Monad: the functions they parameterize. In other words, the difference is external to the implementation of these type classes.
In conclusion, in my own experience, Mac Lane's infamous quote provided a great "goto" meme, a guidepost for me to reference while navigating my way through Category to better understand the idioms used in Haskell. It succeeds at capturing the scope of a powerful computing capacity made wonderfully accessible in Haskell.
However, there is irony in how I first misunderstood the statement's applicability outside of the monad, and what I hope conveyed here. Everything that it describes turns out to be what is similar between Applicative and Monads (and Arrows among others). What it doesn't say is precisely the small but useful distinction between them.

Note: No, this isn't true. At some point there was a comment on this answer from Dan Piponi himself saying that the cause and effect here was exactly the opposite, that he wrote his article in response to James Iry's quip. But it seems to have been removed, perhaps by some compulsive tidier.
Below is my original answer.
It's quite possible that Iry had read From Monoids to Monads, a post in which Dan Piponi (sigfpe) derives monads from monoids in Haskell, with much discussion of category theory and explicit mention of "the category of endofunctors on Hask" . In any case, anyone who wonders what it means for a monad to be a monoid in the category of endofunctors might benefit from reading this derivation.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string