Is Last a free monoid? - haskell

The free monoids are often being regarded as "list monoids". Yet, I am interested in other possible structures which might give us free monoids.
Firstly, let us go over the definition of free monoids. I have never quite understood how is it possible to define a free monoid as a structure which abides by monoid laws and nothing else. How do we prove that something abides by no rules but stated above? Or is this just an intuition?
Anyway, we are going to speak functors. If some monoid is free, we got it with a free functor. It is obvious that a list comes in quite handy here:
free :: Set -> Mon
free a = ([a], (++), [])
Yet, one might come up with several others. For example, here is one for Last of Data.Monoid:
freeLast :: Set -> Mon
freeLast a = (Last a, (<>) :: Last a -> Last a -> Last a, Last Nothing)
So, does this functor make Last a free monoid? More generally, if there is a law-abiding instance for Monoid (T a), is T a free monoid?

Here's one way to understand a free monoid: If somebody gives you a value, how much can you deduce about how it was created? Consider an additive monoid of natural numbers. I give you a 7 and ask you how I got it. I could have added 4+3, or 3+4, or 2+5, etc. There are many possibilities. This information was lost. If, on the other hand, I give you a list [4, 3], you know it was created from singletons [4] and [3]. Except that maybe there was a unit [] involved. Maybe it was [4]<>[3]<>[] or [4]<>[]<>[]<>[3]. But it definitely wasn't [3]<>[4].
With a longer list, [1, 2, 3], you have additional options ([1]<>[2]) <> [3], or [1] <> ([2]<>[3]), plus all possible insertions of the empty list. So the information you lose follows the unit laws and associativity, but nothing else. A free monoid value remembers how it was created, modulo unit laws and associativity.

For the sake of example, let's take non-negative Integer numbers, i.e. 0,1,2,.... How many monoids can we make?
Defining mempty = 0 and (<>) = (+). You can proof easily that this is a monoid.
Defining mempty = 1 and (<>) = (*). Again, This is a monoid (Prove it, it is easy)
The two monoids defined above, are called additive and multiplicative monoids over Natural numbers. They are different in structure, for example, the element 0 in the multiplicative monoid, behaves totally different than any other element in the additive monoid, hence there is something inner to Natural numbers, that makes this monoids different (hold this assertion till the next paragraph).
There exists a third monoid we can create, let's call it concatenation monoid.
Defining mempty = no-action and (<>) = glue one integer beside the other.
As an example, 3 <> mempty = 3 and 3 <> 2 = 32. Notice, that the fact that elements, are natural numbers is not relevant here. If instead of Natural, we take Rationals, or what ever symbols you like, the monoid would be exactly the same thing.(* read foot note) Hence, there is nothing inner to the underlying set that makes the monoid different to others. Thats why, the monoid is free because it doesn't depend on arithmetic rules of the Naturals, nor any other rule aside from monoid ones.
And this is the only way to build a monoid freely, not depending on the inner rules of the underlying set. Of course, concatenation is expressed as lists in haskell.
Note: The only important bit is that they share the same number of elements. For example, the free monoid with 3 elements a, b and c would be any arbitrary concatenation of those three, but you can choose what ever symbol: 1, 2, 3 or α, β, γ ... and the monoid would be the very same thing

Firstly, let us go over the definition of free monoids. I have never quite understood how is it possible to define a free monoid as a structure which abides by monoid laws and nothing else. How do we prove that something abides by no rules but stated above? Or is this just an intuition?
Let me illustrate the purpose of free monoids.
If I tell you there is a monoid, with some elements a, b, c, what can you deduce from that?
We can find more elements of that monoid by writing expressions involving the generators a, b, c and the monoid operations (+) and 0 (aka. (<>) and mempty). (cf. Definition 1, in the second half of this answer.)
We can use the monoid laws to prove that some expressions denote the same element: we can prove equations such as ((a + 0) + b) = (a + b). (Definition 2.) In fact, equations we can prove with just that knowledge are equations which hold in any monoid, for any values a, b, c. (Theorem 1.)
What about equations we can't prove from just the monoid laws? For example, we can't prove (a + b) = (b + a). But we can't prove its negation either, (a + b) /= (b + a), if we only know the monoid laws. What does that mean? It turns out that that equation holds in some monoids (e.g., commutative monoids), but not in others: for example, pick a monoid where x + y = y for almost all x and y (this is the Last monoid in Haskell), if we choose distinct a and b, then (a + b) /= (b + a).
But that was just one example. What can we say in general about equations that we cannot prove from just the monoid laws? The free monoid offers a definitive answer, in fact, a universal counterexample: unprovable equations are false in the free monoid (generated by a, b, c). In other words, we can prove an equation e = f using just the monoid laws if and only if it is true in the free monoid (emphasis on "if"). (Theorem 2.) This corresponds to the intuition that the free monoid "only abides by the monoid laws and nothing else".
So, does this functor make Last a free monoid? More generally, if there is a law-abiding instance for Monoid (T a), is T a free monoid?
The Last monoid is not free because it makes more equations true than what you can actually prove purely from the monoid laws. See other answer:
forall (t :: Type) (x, y :: t).
Last (Just x) <> Last (Just y) === Last (Just y)
Here's a sketch of how to formalize the above.
Definition 1. The set of monoidal expressions generated by (some atomic symbols) A, B, C is defined by the grammar:
e ::=
| A | B | C -- generators
| e + e -- binary operation (<>)
| 0 -- identity (mempty)
Given any "suitable monoid", that is to say, a monoid (M, (+), 0) with some chosen elements a, b, c in M (which don't have to be distinct), an expression e denotes an element eval e in M.
Definition 2. An equation is a pair of expressions, written e ~ f. The set of provable equations is the smallest set of equations ("smallest" when ordered by inclusion) satisfying the following:
It includes the monoid laws: (e + 0) ~ e, (0 + e) ~ e, ((e + f) + g) ~ (e + (f + g)) are provable.
It is an equivalence relation (viewing a set of tuples as a relation): for example, for reflexivity, e ~ e is provable.
It is a congruence relation: if e ~ f is provable then (g + e) ~ (g + f) and (e + g) ~ (f + g) are provable.
(The idea of that definition is that the assertion "e ~ f is provable" holds if and only if it can be deduced by "applying" those rules. "Smallest set" is a conventional method to formalize that.)
The definition of "provable equations" may seem arbitrary. Are those the right rules to define "provability"? Why these three rules in particular? Notably, the congruence rule may not be obvious in a first attempt at giving such a definition. This is the point of the following theorems, soundness and completeness. Add a (non-redundant) rule, and we lose soundness. Remove a rule, and we lose completeness.
Theorem 1. (Soundness) If e ~ f is provable, then eval e = eval f in any "suitable monoid" M.
Theorem 2. (Completeness) If e ~ f is not provable, then their denotations differ in F, eval e /= eval f, where F is the free monoid generated by A, B, C.
(Soundness is much easier to prove than completeness. Exercises for the reader.)
This completeness theorem is a characterization of the free monoid: any other monoid F which keeps the statement of the theorem true is isomorphic to the free monoid (technically, this requires both completeness and an assumption that the denotation function eval : Expr -> M is surjective). That is why we may say "the free monoid" instead of "the monoid of lists"; that practice is most accurate in contexts where the representation does not matter ("up to isomorphism").
In fact, completeness is trivial if you define "the free monoid" as the quotient of monoidal expressions by the equivalence relation "_ ~ _ is provable". The hard work actually resides in a separate proof, that this monoid is isomorphic to the monoid of lists.

Here is another law that Last satisfies:
forall (t :: Type) (x, y :: t).
Last (Just x) <> Last (Just y) === Last (Just y)
Since it satisfies another law, it must not be the free Monoid.

Related

What is the main difference between Free Monoid and Monoid?

Looks like I have a pretty clear understanding what a Monoid is in Haskell, but last time I heard about something called a free monoid.
What is a free monoid and how does it relate to a monoid?
Can you provide an example in Haskell?
As you already know, a monoid is a set with an element e and an operation <> satisfying
e <> x = x <> e = x (identity)
(x<>y)<>z = x<>(y<>z) (associativity)
Now, a free monoid, intuitively, is a monoid which satisfies only those equations above, and, obviously, all their consequences.
For instance, the Haskell list monoid ([a], [], (++)) is free.
By contrast, the Haskell sum monoid (Sum Int, Sum 0, \(Sum x) (Sum y) -> Sum (x+y)) is not free, since it also satisfies additional equations. For instance, it's commutative
x<>y = y<>x
and this does not follow from the first two equations.
Note that it can be proved, in maths, that all the free monoids are isomorphic to the list monoid [a]. So, "free monoid" in programming is only a fancy term for any data structure which 1) can be converted to a list, and back, with no loss of information, and 2) vice versa, a list can be converted to it, and back, with no loss of information.
In Haskell, you can mentally substitute "free monoid" with "list-like type".
In a programming context, I usually translate free monoid to [a]. In his excellent series of articles about category theory for programmers, Bartosz Milewski describes free monoids in Haskell as the list monoid (assuming one ignores some problems with infinite lists).
The identity element is the empty list, and the binary operation is list concatenation:
Prelude Data.Monoid> mempty :: [Int]
[]
Prelude Data.Monoid> [1..3] <> [7..10]
[1,2,3,7,8,9,10]
Intuitively, I think of this monoid to be 'free' because it a monoid that you can always apply, regardless of the type of value you want to work with (just like the free monad is a monad you can always create from any functor).
Additionally, when more than one monoid exists for a type, the free monoid defers the decision on which specific monoid to use. For example, for integers, infinitely many monoids exist, but the most common are addition and multiplication.
If you have two (or more integers), and you know that you may want to aggregate them, but you haven't yet decided which type of aggregation you want to apply, you can instead 'aggregate' them using the free monoid - practically, this means putting them in a list:
Prelude Data.Monoid> [3,7]
[3,7]
If you later decide that you want to add them together, then that's possible:
Prelude Data.Monoid> getSum $ mconcat $ Sum <$> [3,7]
10
If, instead, you wish to multiply them, you can do that as well:
Prelude Data.Monoid> getProduct $ mconcat $ Product <$> [3,7]
21
In these two examples, I've deliberately chosen to elevate each number to a type (Sum, Product) that embodies a more specific monoid, and then use mconcat to perform the aggregation.
For addition and multiplication, there are more succinct ways to do this, but I did it that way to illustrate how you can use a more specific monoid to interpret the free monoid.
A free monoid is a specific type of monoid. Specifically, it’s the monoid you get by taking some fixed set of elements as characters and then forming all possible strings from those elements. Those strings, with the underlying operation being string concatenation, form a monoid, and that monoid is called the free monoid.
A monoid (M,•,1) is a mathematical structure such that:
M is a set
1 is a member of M
• : M * M -> M
a•1 = a = 1•a
Given elements a, b and c in M, we have a•(b•c) = (a•b)•c.
A free monoid on a set M is a monoid (M',•,0) and function e : M -> M' such that, for any monoid (N,*,1), given a (set) map f : M -> N we can extend this to a monoid morphism f' : (M',•,0) -> (N,*,1), i.e
f a = f' (e a)
f' 0 = 1
f' (a•b) = (f' a) • (f' b)
In other words, it is a monoid that does nothing special.
An example monoid is the integers with the operation being addition and the identity being 0. Another monoid is sequences of integers with the operation being concatenation and the identity being the empty sequence. Now the integers under addition is not a free monoid on the integers. Consider the map into sequences of integers taking n to (n). Then for this to be free we would need to extend this to a map taking n + m to (n,m), i.e. it must take 0 to (0) and to (0,0) and to (0,0,0) and so on.
On the other hand if we try to look at sequences of integers as a free monoid on the integers, we see that it seems to work in this case. The extension of the map into the integers with addition is one that takes the sum of a sequence (with the sum of () being 0).
So what is the free monoid on a set S? Well one thing we could try is just arbitrary binary trees of S. In a Haskell type this would look like:
data T a = Unit | Single a | Conc (T a) (T a)
And it would have an identity of Unit, e = Single and (•) = Conc.
And we can write a function to show how it is free:
-- here the second argument represents a monoid structure on b
free :: (a -> b) -> (b -> b -> b, b) -> T a -> b
free f ((*),zero) = f' where
f' (Single a) = f a
f' Unit = zero
f' (Conc a b) = f' a * f' b
It should be quite obvious that this satisfies the required laws for a free monoid on a. Except for one: T a is not a monoid because it does not quite satisfy laws 4 or 5.
So now we should ask if we can make this into a simpler free monoid, ie one that is an actual monoid. The answer is yes. One way is to observe that Conc Unit a and Conc a Unit and Single a should be the same. So let’s make the first two types unrepresentable:
data TInner a = Single a | Conc (TInner a) (TInner a)
data T a = Unit | Inner (TInner a)
A second observation we can make is that there should be no difference between Conc (Conc a b) c and Conc a (Conc b c). This is due to law 5 above. We can then flatten our tree:
data TInner a = Single a | Conc (a,TInner a)
data T a = Unit | Inner (TInner a)
The strange construction with Conc forces us to only have a single way to represent Single a and Unit. But we see we can merge these all together: change the definition of Conc to Conc [a] and then we can change Single x to Conc [x], and Unit to Conc [] so we have:
data T a = Conc [a]
Or we can just write:
type T a = [a]
And the operations are:
unit = []
e a = [a]
(•) = append
free f ((*),zero) = f' where
f' [] = zero
f' (x:xs) = f x * f' xs
So in Haskell, the list type is called the free monoid.

The fixed point functors of Free and Cofree

To make that clear, I'm not talking about how the free monad looks a lot like a fixpoint combinator applied to a functor, i.e. how Free f is basically a fixed point of f. (Not that this isn't interesting!)
What I'm talking about are fixpoints of Free, Cofree :: (*->*) -> (*->*), i.e. functors f such that Free f is isomorphic to f itself.
Background: today, to firm up my rather lacking grasp on free monads, I decided to just write a few of them out for different simple functors, both for Free and for Cofree and see what better-known [co]monads they'd be isomorphic to. What intrigued me particularly was the discovery that Cofree Empty is isomorphic to Empty (meaning, Const Void, the functor that maps any type to the uninhabited). Ok, perhaps this is just stupid – I've discovered that if you put empty garbage in you get empty garbage out, yeah! – but hey, this is category theory, where whole universes rise up from seeming trivialities... right?
The immediate question is, if Cofree has such a fixed point, what about Free? Well, it certainly can't be Empty as that's not a monad. The quick suspect would be something nearby like Const () or Identity, but no:
Free (Const ()) ~~ Either () ~~ Maybe
Free Identity ~~ (Nat,) ~~ Writer Nat
Indeed, the fact that Free always adds an extra constructor suggests that the structure of any functor that's a fixed point would have to be already infinite. But it seems odd that, if Cofree has such a simple fixed point, Free should only have a much more complex one (like the fix-by-construction FixFree a = C (Free FixFree a) that Reid Barton brings up in the comments).
Is the boring truth just that Free has no “accidental fixed point” and it's a mere coincidence that Cofree has one, or am I missing something?
Your observation that Empty is a fixed point of Cofree (which is not really true in Haskell, but I guess you want to work in some model that ignores ⊥, like Set) boils down to the fact that
there is a set E (the empty set) such that for every set X, the projection p₂ : X × E -> E is an isomorphism.
We could say in this situation that E is an absorbing object for the product. We can replace the word “set” by “object of C” for any category C with products, and we get a statement about C that may or may not be true. For Set, it happens to be true.
If we pick C = Setop, which also has products (because Set has coproducts), and then dualize the language to talk about sets again, we get the statement
there is a set F such that for every set Y, the inclusion i₂ : F -> Y + F is an isomorphism.
Obviously, this statement is not true for any set F (we can pick any non-empty set Y as a counterexample for any F). No surprise there, after all Setop is a different category from Set.
So, we won't get a “trivial fixed point” of Free in the same way we got one for Cofree, because Setop is qualitatively different from Set. The initial object of Set is an absorbing element for the product, but the terminal object of Set is not an absorbing object for the coproduct.
If I may get on my soapbox for a moment:
There is much discussion among Haskell programmers about which constructions are the “duals” of which other constructions. Most of this is in a formal sense meaningless, because in category theory dualizing a construction works like this:
Suppose I have a construction which I can perform on any category C (or any category with certain extra structure and/or properties). Then the dual construction on a category C is the original construction on the opposite category Cop (which had better have the extra structure and properties we needed, if any).
For example: The notion of products makes sense in any category C (though products might not always exist), via the universal property defining products. To get a dual notion of coproducts in C we should ask what are the products in Cop, and we have just defined what products are in any category, so this notion makes sense.
The trouble with applying duality to the setting of Haskell is that the Haskell language prefers overwhelmingly to talk about just one category, Hask, in which we do our constructions. This causes two problems for talking about duality:
To obtain the dual of a construction as described above, I am supposed to be able to be able to do the construction in any category, or at least any category of a particular form. So we must first generalize the construction that, typically, we have only done in the category Hask to a larger class of categories. (And having done so, there are plenty of other interesting categories we could potentially interpret the resulting notion in besides Haskop, such as Kleisli categories of monads.)
The category Hask enjoys many special properties which can be summarized by saying that (ignoring ⊥) Hask is a cartesian closed category. For example, this implies that the initial object is an absorbing object for the product. Haskop does not have these properties, which means that the generalized notion may not make sense in Haskop; and it can also mean that two notions which happened to be equivalent in Hask are distinct in general, and have different duals.
For an example of the latter, take lenses. In Hask they can be constructed in a number of ways; two ways are in terms of getter/setter pairs and as coalgebras for the costate comonad. The former generalizes to categories with products and the second to categories enriched in a particular way over Hask. If we apply the former construction to Haskop then we get out prisms, but if we apply the latter construction to Haskop then we get algebras for the state monad and these are not the same thing.
A more familiar example might be comonads: starting from the Haskell-centric presentation
return :: a -> m a
(>>=) :: m a -> (a -> m b) -> m b
some insight seems to be needed to determine which arrows to reverse to obtain
extract :: w a -> a
extend :: w a -> (w b -> a) -> w b
The point is that it would have been much easier to start from join :: m (m a) -> m a instead of (>>=); but finding this alternative presentation (equivalent due to special features of Hask) is a creative process, not a mechanical one.
In a question like yours, and many others like it, where it is pretty clear what sense of dual is intended, there's still absolutely no reason to expect a priori that the dual construction will actually exist or have the same properties as the original, because Haskop qualitatively behaves quite differently from Hask. A slogan might be
the theory of categories is self-dual, but the theory of any particular category is not!
Since you asked about the structure of the fixed points of Free, I'm going to sketch an informal argument that Free only has one fixed point which is a Functor, namely the type
newtype FixFree a = C (Free FixFree a)
that Reid Barton described. Indeed, I make a somewhat stronger claim. Let's start with a few pieces:
newtype Fix f a = Fix (f (Fix f) a)
instance Functor (f (Fix f)) => Functor (Fix f) where
fmap f (Fix x) = Fix (fmap f x)
-- This is basically `MFunctor` from `Control.Monad.Morph`
class FFunctor (g :: (* -> *) -> * -> *) where
hoistF :: Functor f => (forall a . f a -> f' a) -> g f b -> g f' b
Notably,
instance FFunctor Free where
hoistF _f (Pure a) = Pure a
hoistF f (Free fffa) = Free . f . fmap (hoistF f) $ fffa
Then
fToFixG :: (Functor f, FFunctor g) => (forall a . f a -> g f a) -> f a -> Fix g a
fToFixG fToG fa = Fix $ hoistF (fToFixG fToG) $ fToG fa
fixGToF :: forall f b (g :: (* -> *) -> * -> *) .
(FFunctor g, Functor (g (Fix g)))
=> (forall a . g f a -> f a) -> Fix g b -> f b
fixGToF gToF (Fix ga) = gToF $ hoistF (fixGToF gToF) ga
If I'm not mistaken (which I could be), passing each side of an isomorphism between f and g f to each of these functions will yield each side of an isomorphism between f and Fix g. Substituting Free for g will demonstrate the claim. This argument is very hand-wavey, of course, because Haskell is inconsistent.

A little category theory [duplicate]

Who first said the following?
A monad is just a monoid in the
category of endofunctors, what's the
problem?
And on a less important note, is this true and if so could you give an explanation (hopefully one that can be understood by someone who doesn't have much Haskell experience)?
That particular phrasing is by James Iry, from his highly entertaining Brief, Incomplete and Mostly Wrong History of Programming Languages, in which he fictionally attributes it to Philip Wadler.
The original quote is from Saunders Mac Lane in Categories for the Working Mathematician, one of the foundational texts of Category Theory. Here it is in context, which is probably the best place to learn exactly what it means.
But, I'll take a stab. The original sentence is this:
All told, a monad in X is just a monoid in the category of endofunctors of X, with product × replaced by composition of endofunctors and unit set by the identity endofunctor.
X here is a category. Endofunctors are functors from a category to itself (which is usually all Functors as far as functional programmers are concerned, since they're mostly dealing with just one category; the category of types - but I digress). But you could imagine another category which is the category of "endofunctors on X". This is a category in which the objects are endofunctors and the morphisms are natural transformations.
And of those endofunctors, some of them might be monads. Which ones are monads? Exactly the ones which are monoidal in a particular sense. Instead of spelling out the exact mapping from monads to monoids (since Mac Lane does that far better than I could hope to), I'll just put their respective definitions side by side and let you compare:
A monoid is...
A set, S
An operation, • : S × S → S
An element of S, e : 1 → S
...satisfying these laws:
(a • b) • c = a • (b • c), for all a, b and c in S
e • a = a • e = a, for all a in S
A monad is...
An endofunctor, T : X → X (in Haskell, a type constructor of kind * -> * with a Functor instance)
A natural transformation, μ : T × T → T, where × means functor composition (μ is known as join in Haskell)
A natural transformation, η : I → T, where I is the identity endofunctor on X (η is known as return in Haskell)
...satisfying these laws:
μ ∘ Tμ = μ ∘ μT
μ ∘ Tη = μ ∘ ηT = 1 (the identity natural transformation)
With a bit of squinting you might be able to see that both of these definitions are instances of the same abstract concept.
First, the extensions and libraries that we're going to use:
{-# LANGUAGE RankNTypes, TypeOperators #-}
import Control.Monad (join)
Of these, RankNTypes is the only one that's absolutely essential to the below. I once wrote an explanation of RankNTypes that some people seem to have found useful, so I'll refer to that.
Quoting Tom Crockett's excellent answer, we have:
A monad is...
An endofunctor, T : X -> X
A natural transformation, μ : T × T -> T, where × means functor composition
A natural transformation, η : I -> T, where I is the identity endofunctor on X
...satisfying these laws:
μ(μ(T × T) × T)) = μ(T × μ(T × T))
μ(η(T)) = T = μ(T(η))
How do we translate this to Haskell code? Well, let's start with the notion of a natural transformation:
-- | A natural transformations between two 'Functor' instances. Law:
--
-- > fmap f . eta g == eta g . fmap f
--
-- Neat fact: the type system actually guarantees this law.
--
newtype f :-> g =
Natural { eta :: forall x. f x -> g x }
A type of the form f :-> g is analogous to a function type, but instead of thinking of it as a function between two types (of kind *), think of it as a morphism between two functors (each of kind * -> *). Examples:
listToMaybe :: [] :-> Maybe
listToMaybe = Natural go
where go [] = Nothing
go (x:_) = Just x
maybeToList :: Maybe :-> []
maybeToList = Natural go
where go Nothing = []
go (Just x) = [x]
reverse' :: [] :-> []
reverse' = Natural reverse
Basically, in Haskell, natural transformations are functions from some type f x to another type g x such that the x type variable is "inaccessible" to the caller. So for example, sort :: Ord a => [a] -> [a] cannot be made into a natural transformation, because it's "picky" about which types we may instantiate for a. One intuitive way I often use to think of this is the following:
A functor is a way of operating on the content of something without touching the structure.
A natural transformation is a way of operating on the structure of something without touching or looking at the content.
Now, with that out of the way, let's tackle the clauses of the definition.
The first clause is "an endofunctor, T : X -> X." Well, every Functor in Haskell is an endofunctor in what people call "the Hask category," whose objects are Haskell types (of kind *) and whose morphisms are Haskell functions. This sounds like a complicated statement, but it's actually a very trivial one. All it means is that that a Functor f :: * -> * gives you the means of constructing a type f a :: * for any a :: * and a function fmap f :: f a -> f b out of any f :: a -> b, and that these obey the functor laws.
Second clause: the Identity functor in Haskell (which comes with the Platform, so you can just import it) is defined this way:
newtype Identity a = Identity { runIdentity :: a }
instance Functor Identity where
fmap f (Identity a) = Identity (f a)
So the natural transformation η : I -> T from Tom Crockett's definition can be written this way for any Monad instance t:
return' :: Monad t => Identity :-> t
return' = Natural (return . runIdentity)
Third clause: The composition of two functors in Haskell can be defined this way (which also comes with the Platform):
newtype Compose f g a = Compose { getCompose :: f (g a) }
-- | The composition of two 'Functor's is also a 'Functor'.
instance (Functor f, Functor g) => Functor (Compose f g) where
fmap f (Compose fga) = Compose (fmap (fmap f) fga)
So the natural transformation μ : T × T -> T from Tom Crockett's definition can be written like this:
join' :: Monad t => Compose t t :-> t
join' = Natural (join . getCompose)
The statement that this is a monoid in the category of endofunctors then means that Compose (partially applied to just its first two parameters) is associative, and that Identity is its identity element. I.e., that the following isomorphisms hold:
Compose f (Compose g h) ~= Compose (Compose f g) h
Compose f Identity ~= f
Compose Identity g ~= g
These are very easy to prove because Compose and Identity are both defined as newtype, and the Haskell Reports define the semantics of newtype as an isomorphism between the type being defined and the type of the argument to the newtype's data constructor. So for example, let's prove Compose f Identity ~= f:
Compose f Identity a
~= f (Identity a) -- newtype Compose f g a = Compose (f (g a))
~= f a -- newtype Identity a = Identity a
Q.E.D.
The answers here do an excellent job in defining both monoids and monads, however, they still don't seem to answer the question:
And on a less important note, is this true and if so could you give an explanation (hopefully one that can be understood by someone who doesn't have much Haskell experience)?
The crux of the matter that is missing here, is the different notion of "monoid", the so-called categorification more precisely -- the one of monoid in a monoidal category. Sadly Mac Lane's book itself makes it very confusing:
All told, a monad in X is just a monoid in the category of endofunctors of X, with product × replaced by composition of endofunctors and unit set by the identity endofunctor.
Main confusion
Why is this confusing? Because it does not define what is "monoid in the category of endofunctors" of X. Instead, this sentence suggests taking a monoid inside the set of all endofunctors together with the functor composition as binary operation and the identity functor as a monoidal unit. Which works perfectly fine and turns into a monoid any subset of endofunctors that contains the identity functor and is closed under functor composition.
Yet this is not the correct interpretation, which the book fails to make clear at that stage. A Monad f is a fixed endofunctor, not a subset of endofunctors closed under composition. A common construction is to use f to generate a monoid by taking the set of all k-fold compositions f^k = f(f(...)) of f with itself, including k=0 that corresponds to the identity f^0 = id. And now the set S of all these powers for all k>=0 is indeed a monoid "with product × replaced by composition of endofunctors and unit set by the identity endofunctor".
And yet:
This monoid S can be defined for any functor f or even literally for any self-map of X. It is the monoid generated by f.
The monoidal structure of S given by the functor composition and the identity functor has nothing do with f being or not being a monad.
And to make things more confusing, the definition of "monoid in monoidal category" comes later in the book as you can see from the table of contents. And yet understanding this notion is absolutely critical to understanding the connection with monads.
(Strict) monoidal categories
Going to Chapter VII on Monoids (which comes later than Chapter VI on Monads), we find the definition of the so-called strict monoidal category as triple (B, *, e), where B is a category, *: B x B-> B a bifunctor (functor with respect to each component with other component fixed) and e is a unit object in B, satisfying the associativity and unit laws:
(a * b) * c = a * (b * c)
a * e = e * a = a
for any objects a,b,c of B, and the same identities for any morphisms a,b,c with e replaced by id_e, the identity morphism of e. It is now instructive to observe that in our case of interest, where B is the category of endofunctors of X with natural transformations as morphisms, * the functor composition and e the identity functor, all these laws are satisfied, as can be directly verified.
What comes after in the book is the definition of the "relaxed" monoidal category, where the laws only hold modulo some fixed natural transformations satisfying so-called coherence relations, which is however not important for our cases of the endofunctor categories.
Monoids in monoidal categories
Finally, in section 3 "Monoids" of Chapter VII, the actual definition is given:
A monoid c in a monoidal category (B, *, e) is an object of B with two arrows (morphisms)
mu: c * c -> c
nu: e -> c
making 3 diagrams commutative. Recall that in our case, these are morphisms in the category of endofunctors, which are natural transformations corresponding to precisely join and return for a monad. The connection becomes even clearer when we make the composition * more explicit, replacing c * c by c^2, where c is our monad.
Finally, notice that the 3 commutative diagrams (in the definition of a monoid in monoidal category) are written for general (non-strict) monoidal categories, while in our case all natural transformations arising as part of the monoidal category are actually identities. That will make the diagrams exactly the same as the ones in the definition of a monad, making the correspondence complete.
Conclusion
In summary, any monad is by definition an endofunctor, hence an object in the category of endofunctors, where the monadic join and return operators satisfy the definition of a monoid in that particular (strict) monoidal category. Vice versa, any monoid in the monoidal category of endofunctors is by definition a triple (c, mu, nu) consisting of an object and two arrows, e.g. natural transformations in our case, satisfying the same laws as a monad.
Finally, note the key difference between the (classical) monoids and the more general monoids in monoidal categories. The two arrows mu and nu above are not anymore a binary operation and a unit in a set. Instead, you have one fixed endofunctor c. The functor composition * and the identity functor alone do not provide the complete structure needed for the monad, despite that confusing remark in the book.
Another approach would be to compare with the standard monoid C of all self-maps of a set A, where the binary operation is the composition, that can be seen to map the standard cartesian product C x C into C. Passing to the categorified monoid, we are replacing the cartesian product x with the functor composition *, and the binary operation gets replaced with the natural transformation mu from
c * c to c, that is a collection of the join operators
join: c(c(T))->c(T)
for every object T (type in programming). And the identity elements in classical monoids, which can be identified with images of maps from a fixed one-point-set, get replaced with the collection of the return operators
return: T->c(T)
But now there are no more cartesian products, so no pairs of elements and thus no binary operations.
I came to this post by way of better understanding the inference of the infamous quote from Mac Lane's Category Theory For the Working Mathematician.
In describing what something is, it's often equally useful to describe what it's not.
The fact that Mac Lane uses the description to describe a Monad, one might imply that it describes something unique to monads. Bear with me. To develop a broader understanding of the statement, I believe it needs to be made clear that he is not describing something that is unique to monads; the statement equally describes Applicative and Arrows among others. For the same reason we can have two monoids on Int (Sum and Product), we can have several monoids on X in the category of endofunctors. But there is even more to the similarities.
Both Monad and Applicative meet the criteria:
endo => any arrow, or morphism that starts and ends in the same place
functor => any arrow, or morphism between two Categories (e.g., in day to day Tree a -> List b, but in Category Tree -> List)
monoid => single object; i.e., a single type, but in this context, only in regards to the external layer; so, we can't have Tree -> List, only List -> List.
The statement uses "Category of..." This defines the scope of the statement. As an example, the Functor Category describes the scope of f * -> g *, i.e., Any functor -> Any functor, e.g., Tree * -> List * or Tree * -> Tree *.
What a Categorical statement does not specify describes where anything and everything is permitted.
In this case, inside the functors, * -> * aka a -> b is not specified which means Anything -> Anything including Anything else. As my imagination jumps to Int -> String, it also includes Integer -> Maybe Int, or even Maybe Double -> Either String Int where a :: Maybe Double; b :: Either String Int.
So the statement comes together as follows:
functor scope :: f a -> g b (i.e., any parameterized type to any parameterized type)
endo + functor :: f a -> f b (i.e., any one parameterized type to the same parameterized type) ... said differently,
a monoid in the category of endofunctor
So, where is the power of this construct? To appreciate the full dynamics, I needed to see that the typical drawings of a monoid (single object with what looks like an identity arrow, :: single object -> single object), fails to illustrate that I'm permitted to use an arrow parameterized with any number of monoid values, from the one type object permitted in Monoid. The endo, ~ identity arrow definition of equivalence ignores the functor's type value and both the type and value of the most inner, "payload" layer. Thus, equivalence returns true in any situation where the functorial types match (e.g., Nothing -> Just * -> Nothing is equivalent to Just * -> Just * -> Just * because they are both Maybe -> Maybe -> Maybe).
Sidebar: ~ outside is conceptual, but is the left most symbol in f a. It also describes what "Haskell" reads-in first (big picture); so Type is "outside" in relation to a Type Value. The relationship between layers (a chain of references) in programming is not easy to relate in Category. The Category of Set is used to describe Types (Int, Strings, Maybe Int etc.) which includes the Category of Functor (parameterized Types). The reference chain: Functor Type, Functor values (elements of that Functor's set, e.g., Nothing, Just), and in turn, everything else each functor value points to. In Category the relationship is described differently, e.g., return :: a -> m a is considered a natural transformation from one Functor to another Functor, different from anything mentioned thus far.
Back to the main thread, all in all, for any defined tensor product and a neutral value, the statement ends up describing an amazingly powerful computational construct born from its paradoxical structure:
on the outside it appears as a single object (e.g., :: List); static
but inside, permits a lot of dynamics
any number of values of the same type (e.g., Empty | ~NonEmpty) as fodder to functions of any arity. The tensor product will reduce any number of inputs to a single value... for the external layer (~fold that says nothing about the payload)
infinite range of both the type and values for the inner most layer
In Haskell, clarifying the applicability of the statement is important. The power and versatility of this construct, has absolutely nothing to do with a monad per se. In other words, the construct does not rely on what makes a monad unique.
When trying to figure out whether to build code with a shared context to support computations that depend on each other, versus computations that can be run in parallel, this infamous statement, with as much as it describes, is not a contrast between the choice of Applicative, Arrows and Monads, but rather is a description of how much they are the same. For the decision at hand, the statement is moot.
This is often misunderstood. The statement goes on to describe join :: m (m a) -> m a as the tensor product for the monoidal endofunctor. However, it does not articulate how, in the context of this statement, (<*>) could also have also been chosen. It truly is an example of 'six in one, half a dozen in the other'. The logic for combining values are exactly alike; same input generates the same output from each (unlike the Sum and Product monoids for Int because they generate different results when combining Ints).
So, to recap: A monoid in the category of endofunctors describes:
~t :: m * -> m * -> m *
and a neutral value for m *
(<*>) and (>>=) both provide simultaneous access to the two m values in order to compute the the single return value. The logic used to compute the return value is exactly the same. If it were not for the different shapes of the functions they parameterize (f :: a -> b versus k :: a -> m b) and the position of the parameter with the same return type of the computation (i.e., a -> b -> b versus b -> a -> b for each respectively), I suspect we could have parameterized the monoidal logic, the tensor product, for reuse in both definitions. As an exercise to make the point, try and implement ~t, and you end up with (<*>) and (>>=) depending on how you decide to define it forall a b.
If my last point is at minimum conceptually true, it then explains the precise, and only computational difference between Applicative and Monad: the functions they parameterize. In other words, the difference is external to the implementation of these type classes.
In conclusion, in my own experience, Mac Lane's infamous quote provided a great "goto" meme, a guidepost for me to reference while navigating my way through Category to better understand the idioms used in Haskell. It succeeds at capturing the scope of a powerful computing capacity made wonderfully accessible in Haskell.
However, there is irony in how I first misunderstood the statement's applicability outside of the monad, and what I hope conveyed here. Everything that it describes turns out to be what is similar between Applicative and Monads (and Arrows among others). What it doesn't say is precisely the small but useful distinction between them.
Note: No, this isn't true. At some point there was a comment on this answer from Dan Piponi himself saying that the cause and effect here was exactly the opposite, that he wrote his article in response to James Iry's quip. But it seems to have been removed, perhaps by some compulsive tidier.
Below is my original answer.
It's quite possible that Iry had read From Monoids to Monads, a post in which Dan Piponi (sigfpe) derives monads from monoids in Haskell, with much discussion of category theory and explicit mention of "the category of endofunctors on Hask" . In any case, anyone who wonders what it means for a monad to be a monoid in the category of endofunctors might benefit from reading this derivation.

Monads as adjunctions

I've been reading about monads in category theory. One definition of monads uses a pair of adjoint functors. A monad is defined by a round-trip using those functors. Apparently adjunctions are very important in category theory, but I haven't seen any explanation of Haskell monads in terms of adjoint functors. Has anyone given it a thought?
Edit: Just for fun, I'm going to do this right. Original answer preserved below
The current adjunction code for category-extras now is in the adjunctions package: http://hackage.haskell.org/package/adjunctions
I'm just going to work through the state monad explicitly and simply. This code uses Data.Functor.Compose from the transformers package, but is otherwise self-contained.
An adjunction between f (D -> C) and g (C -> D), written f -| g, can be characterized in a number of ways. We'll use the counit/unit (epsilon/eta) description, which gives two natural transformations (morphisms between functors).
class (Functor f, Functor g) => Adjoint f g where
counit :: f (g a) -> a
unit :: a -> g (f a)
Note that the "a" in counit is really the identity functor in C, and the "a" in unit is really the identity functor in D.
We can also recover the hom-set adjunction definition from the counit/unit definition.
phiLeft :: Adjoint f g => (f a -> b) -> (a -> g b)
phiLeft f = fmap f . unit
phiRight :: Adjoint f g => (a -> g b) -> (f a -> b)
phiRight f = counit . fmap f
In any case, we can now define a Monad from our unit/counit adjunction like so:
instance Adjoint f g => Monad (Compose g f) where
return x = Compose $ unit x
x >>= f = Compose . fmap counit . getCompose $ fmap (getCompose . f) x
Now we can implement the classic adjunction between (a,) and (a ->):
instance Adjoint ((,) a) ((->) a) where
-- counit :: (a,a -> b) -> b
counit (x, f) = f x
-- unit :: b -> (a -> (a,b))
unit x = \y -> (y, x)
And now a type synonym
type State s = Compose ((->) s) ((,) s)
And if we load this up in ghci, we can confirm that State is precisely our classic state monad. Note that we can take the opposite composition and get the Costate Comonad (aka the store comonad).
There are a bunch of other adjunctions we can make into monads in this fashion (such as (Bool,) Pair), but they're sort of strange monads. Unfortunately we can't do the adjunctions that induce Reader and Writer directly in Haskell in a pleasant way. We can do Cont, but as copumpkin describes, that requires an adjunction from an opposite category, so it actually uses a different "form" of the "Adjoint" typeclass that reverses some arrows. That form is also implemented in a different module in the adjunctions package.
this material is covered in a different way by Derek Elkins' article in The Monad Reader 13 -- Calculating Monads with Category Theory: http://www.haskell.org/wikiupload/8/85/TMR-Issue13.pdf
Also, Hinze's recent Kan Extensions for Program Optimization paper walks through the construction of the list monad from the adjunction between Mon and Set: http://www.cs.ox.ac.uk/ralf.hinze/Kan.pdf
Old answer:
Two references.
1) Category-extras delivers, as as always, with a representation of adjunctions and how monads arise from them. As usual, it's good to think with, but pretty light on documentation: http://hackage.haskell.org/packages/archive/category-extras/0.53.5/doc/html/Control-Functor-Adjunction.html
2) -Cafe also delivers with a promising but brief discussion on the role of adjunction. Some of which may help in interpreting category-extras: http://www.haskell.org/pipermail/haskell-cafe/2007-December/036328.html
Derek Elkins was showing me recently over dinner how the Cont Monad arises from composing the (_ -> k) contravariant functor with itself, since it happens to be self-adjoint. That's how you get (a -> k) -> k out of it. Its counit, however, leads to double negation elimination, which can't be written in Haskell.
For some Agda code that illustrates and proves this, please see http://hpaste.org/68257.
This is an old thread, but I found the question interesting,
so I did some calculations myself. Hopefully Bartosz is still there
and might read this..
In fact, the Eilenberg-Moore construction does give a very clear picture in this case.
(I will use CWM notation with Haskell like syntax)
Let T be the list monad < T,eta,mu > (eta = return and mu = concat)
and consider a T-algebra h:T a -> a.
(Note that T a = [a] is a free monoid <[a],[],(++)>, that is, identity [] and multiplication (++).)
By definition, h must satisfy h.T h == h.mu a and h.eta a== id.
Now, some easy diagram chasing proves that h actually induces a monoid structure on a (defined by x*y = h[x,y] ),
and that h becomes a monoid homomorphism for this structure.
Conversely, any monoid structure < a,a0,* > defined in Haskell is naturally defined as a T-algebra.
In this way (h = foldr ( * ) a0, a function that 'replaces' (:) with (*),and maps [] to a0, the identity).
So, in this case, the category of T-algebras is just the category of monoid structures definable in Haskell, HaskMon.
(Please check that the morphisms in T-algebras are actually monoid homomorphisms.)
It also characterizes lists as universal objects in HaskMon, just like free products in Grp, polynomial rings in CRng, etc.
The adjuction corresponding to the above construction is < F,G,eta,epsilon >
where
F:Hask -> HaskMon, which takes a type a to the 'free monoid generated by a',that is, [a],
G:HaskMon -> Hask, the forgetful functor (forget the multiplication),
eta:1 -> GF , the natural transformation defined by \x::a -> [x],
epsilon: FG -> 1 , the natural transformation defined by the folding function above
(the 'canonical surjection' from a free monoid to its quotient monoid)
Next, there is another 'Kleisli category' and the corresponding adjunction.
You can check that it is just the category of Haskell types with morphisms a -> T b,
where its compositions are given by the so-called 'Kleisli composition' (>=>).
A typical Haskell programmer will find this category more familiar.
Finally,as is illustrated in CWM, the category of T-algebras
(resp. Kleisli category) becomes the terminal (resp. initial) object in the category
of adjuctions that define the list monad T in a suitable sense.
I suggest to do a similar calculations for the binary tree functor T a = L a | B (T a) (T a) to check your understanding.
I've found a standard constructions of adjunct functors for any monad by Eilenberg-Moore, but I'm not sure if it adds any insight to the problem. The second category in the construction is a category of T-algebras. A T algebra adds a "product" to the initial category.
So how would it work for a list monad? The functor in the list monad consists of a type constructor, e.g., Int->[Int] and a mapping of functions (e.g., standard application of map to lists). An algebra adds a mapping from lists to elements. One example would be adding (or multiplying) all the elements of a list of integers. The functor F takes any type, e.g., Int, and maps it into the algebra defined on the lists of Int, where the product is defined by monadic join (or vice versa, join is defined as the product). The forgetful functor G takes an algebra and forgets the product. The pair F, G, of adjoint functors is then used to construct the monad in the usual way.
I must say I'm none the wiser.
If you are interested,here's some thoughts of a non-expert
on the role of monads and adjunctions in programming languages:
First of all, there exists for a given monad T a unique adjunction to the Kleisli category of T.
In Haskell,the use of monads is primarily confined to operations in this category
(which is essentially a category of free algebras,no quotients).
In fact, all one can do with a Haskell Monad is to compose some Kleisli morphisms of
type a->T b through the use of do expressions, (>>=), etc., to create a new
morphism. In this context, the role of monads is restricted to just the economy
of notation.One exploits associativity of morphisms to be able to write (say) [0,1,2]
instead of (Cons 0 (Cons 1 (Cons 2 Nil))), that is, you can write sequence as sequence,
not as a tree.
Even the use of IO monads is non essential, for the current Haskell type system is powerful
enough to realize data encapsulation (existential types).
This is my answer to your original question,
but I'm curious what Haskell experts have to say about this.
On the other hand, as we have noted, there's also a 1-1 correspondence between monads and
adjunctions to (T-)algebras. Adjoints, in MacLane's terms, are 'a way
to express equivalences of categories.'
In a typical setting of adjunctions <F,G>:X->A where F is some sort
of 'free algebra generator' and G a 'forgetful functor',the corresponding monad
will (through the use of T-algebras) describe how (and when) the algebraic structure of A is constructed on the objects of X.
In the case of Hask and the list monad T, the structure which T introduces is that
of monoid,and this can help us to establish properties (including the correctness) of code through algebraic
methods that the theory of monoids provides. For example, the function foldr (*) e::[a]->a can
readily be seen as an associative operation as long as <a,(*),e> is a monoid,
a fact which could be exploited by the compiler to optimize the computation (e.g. by parallelism).
Another application is to identify and classify 'recursion patterns' in functional programming using categorical
methods in the hope to (partially) dispose of 'the goto of functional programming', Y (the arbitrary recursion combinator).
Apparently, this kind of applications is one of the primary motivations of the creators of Category Theory (MacLane, Eilenberg, etc.),
namely, to establish natural equivalence of categories, and transfer a well-known method in one category
to another (e.g. homological methods to topological spaces,algebraic methods to programming, etc.).
Here, adjoints and monads are indispensable tools to exploit this connection of categories.
(Incidentally, the notion of monads (and its dual, comonads) is so general that one can even go so far as to define 'cohomologies' of
Haskell types.But I have not given a thought yet.)
As for non-determistic functions you mentioned, I have much less to say...
But note that; if an adjunction <F,G>:Hask->A for some category A defines the list monad T,
there must be a unique 'comparison functor' K:A->MonHask (the category of monoids definable in Haskell), see CWM.
This means, in effect, that your category of interest must be a category of monoids in some restricted form (e.g. it may lack some quotients but not free algebras) in order to define the list monad.
Finally,some remarks:
The binary tree functor I mentioned in my last posting easily generalizes to arbitrary data type
T a1 .. an = T1 T11 .. T1m | ....
Namely,any data type in Haskell naturally defines a monad (together with the corresponding category of algebras and the Kleisli category),
which is just the result of any data constructor in Haskell being total.
This is another reason why I consider Haskell's Monad class is not much more than a syntax sugar
(which is pretty important in practice,of course).

A monad is just a monoid in the category of endofunctors, what's the problem?

Who first said the following?
A monad is just a monoid in the
category of endofunctors, what's the
problem?
And on a less important note, is this true and if so could you give an explanation (hopefully one that can be understood by someone who doesn't have much Haskell experience)?
That particular phrasing is by James Iry, from his highly entertaining Brief, Incomplete and Mostly Wrong History of Programming Languages, in which he fictionally attributes it to Philip Wadler.
The original quote is from Saunders Mac Lane in Categories for the Working Mathematician, one of the foundational texts of Category Theory. Here it is in context, which is probably the best place to learn exactly what it means.
But, I'll take a stab. The original sentence is this:
All told, a monad in X is just a monoid in the category of endofunctors of X, with product × replaced by composition of endofunctors and unit set by the identity endofunctor.
X here is a category. Endofunctors are functors from a category to itself (which is usually all Functors as far as functional programmers are concerned, since they're mostly dealing with just one category; the category of types - but I digress). But you could imagine another category which is the category of "endofunctors on X". This is a category in which the objects are endofunctors and the morphisms are natural transformations.
And of those endofunctors, some of them might be monads. Which ones are monads? Exactly the ones which are monoidal in a particular sense. Instead of spelling out the exact mapping from monads to monoids (since Mac Lane does that far better than I could hope to), I'll just put their respective definitions side by side and let you compare:
A monoid is...
A set, S
An operation, • : S × S → S
An element of S, e : 1 → S
...satisfying these laws:
(a • b) • c = a • (b • c), for all a, b and c in S
e • a = a • e = a, for all a in S
A monad is...
An endofunctor, T : X → X (in Haskell, a type constructor of kind * -> * with a Functor instance)
A natural transformation, μ : T × T → T, where × means functor composition (μ is known as join in Haskell)
A natural transformation, η : I → T, where I is the identity endofunctor on X (η is known as return in Haskell)
...satisfying these laws:
μ ∘ Tμ = μ ∘ μT
μ ∘ Tη = μ ∘ ηT = 1 (the identity natural transformation)
With a bit of squinting you might be able to see that both of these definitions are instances of the same abstract concept.
First, the extensions and libraries that we're going to use:
{-# LANGUAGE RankNTypes, TypeOperators #-}
import Control.Monad (join)
Of these, RankNTypes is the only one that's absolutely essential to the below. I once wrote an explanation of RankNTypes that some people seem to have found useful, so I'll refer to that.
Quoting Tom Crockett's excellent answer, we have:
A monad is...
An endofunctor, T : X -> X
A natural transformation, μ : T × T -> T, where × means functor composition
A natural transformation, η : I -> T, where I is the identity endofunctor on X
...satisfying these laws:
μ(μ(T × T) × T)) = μ(T × μ(T × T))
μ(η(T)) = T = μ(T(η))
How do we translate this to Haskell code? Well, let's start with the notion of a natural transformation:
-- | A natural transformations between two 'Functor' instances. Law:
--
-- > fmap f . eta g == eta g . fmap f
--
-- Neat fact: the type system actually guarantees this law.
--
newtype f :-> g =
Natural { eta :: forall x. f x -> g x }
A type of the form f :-> g is analogous to a function type, but instead of thinking of it as a function between two types (of kind *), think of it as a morphism between two functors (each of kind * -> *). Examples:
listToMaybe :: [] :-> Maybe
listToMaybe = Natural go
where go [] = Nothing
go (x:_) = Just x
maybeToList :: Maybe :-> []
maybeToList = Natural go
where go Nothing = []
go (Just x) = [x]
reverse' :: [] :-> []
reverse' = Natural reverse
Basically, in Haskell, natural transformations are functions from some type f x to another type g x such that the x type variable is "inaccessible" to the caller. So for example, sort :: Ord a => [a] -> [a] cannot be made into a natural transformation, because it's "picky" about which types we may instantiate for a. One intuitive way I often use to think of this is the following:
A functor is a way of operating on the content of something without touching the structure.
A natural transformation is a way of operating on the structure of something without touching or looking at the content.
Now, with that out of the way, let's tackle the clauses of the definition.
The first clause is "an endofunctor, T : X -> X." Well, every Functor in Haskell is an endofunctor in what people call "the Hask category," whose objects are Haskell types (of kind *) and whose morphisms are Haskell functions. This sounds like a complicated statement, but it's actually a very trivial one. All it means is that that a Functor f :: * -> * gives you the means of constructing a type f a :: * for any a :: * and a function fmap f :: f a -> f b out of any f :: a -> b, and that these obey the functor laws.
Second clause: the Identity functor in Haskell (which comes with the Platform, so you can just import it) is defined this way:
newtype Identity a = Identity { runIdentity :: a }
instance Functor Identity where
fmap f (Identity a) = Identity (f a)
So the natural transformation η : I -> T from Tom Crockett's definition can be written this way for any Monad instance t:
return' :: Monad t => Identity :-> t
return' = Natural (return . runIdentity)
Third clause: The composition of two functors in Haskell can be defined this way (which also comes with the Platform):
newtype Compose f g a = Compose { getCompose :: f (g a) }
-- | The composition of two 'Functor's is also a 'Functor'.
instance (Functor f, Functor g) => Functor (Compose f g) where
fmap f (Compose fga) = Compose (fmap (fmap f) fga)
So the natural transformation μ : T × T -> T from Tom Crockett's definition can be written like this:
join' :: Monad t => Compose t t :-> t
join' = Natural (join . getCompose)
The statement that this is a monoid in the category of endofunctors then means that Compose (partially applied to just its first two parameters) is associative, and that Identity is its identity element. I.e., that the following isomorphisms hold:
Compose f (Compose g h) ~= Compose (Compose f g) h
Compose f Identity ~= f
Compose Identity g ~= g
These are very easy to prove because Compose and Identity are both defined as newtype, and the Haskell Reports define the semantics of newtype as an isomorphism between the type being defined and the type of the argument to the newtype's data constructor. So for example, let's prove Compose f Identity ~= f:
Compose f Identity a
~= f (Identity a) -- newtype Compose f g a = Compose (f (g a))
~= f a -- newtype Identity a = Identity a
Q.E.D.
The answers here do an excellent job in defining both monoids and monads, however, they still don't seem to answer the question:
And on a less important note, is this true and if so could you give an explanation (hopefully one that can be understood by someone who doesn't have much Haskell experience)?
The crux of the matter that is missing here, is the different notion of "monoid", the so-called categorification more precisely -- the one of monoid in a monoidal category. Sadly Mac Lane's book itself makes it very confusing:
All told, a monad in X is just a monoid in the category of endofunctors of X, with product × replaced by composition of endofunctors and unit set by the identity endofunctor.
Main confusion
Why is this confusing? Because it does not define what is "monoid in the category of endofunctors" of X. Instead, this sentence suggests taking a monoid inside the set of all endofunctors together with the functor composition as binary operation and the identity functor as a monoidal unit. Which works perfectly fine and turns into a monoid any subset of endofunctors that contains the identity functor and is closed under functor composition.
Yet this is not the correct interpretation, which the book fails to make clear at that stage. A Monad f is a fixed endofunctor, not a subset of endofunctors closed under composition. A common construction is to use f to generate a monoid by taking the set of all k-fold compositions f^k = f(f(...)) of f with itself, including k=0 that corresponds to the identity f^0 = id. And now the set S of all these powers for all k>=0 is indeed a monoid "with product × replaced by composition of endofunctors and unit set by the identity endofunctor".
And yet:
This monoid S can be defined for any functor f or even literally for any self-map of X. It is the monoid generated by f.
The monoidal structure of S given by the functor composition and the identity functor has nothing do with f being or not being a monad.
And to make things more confusing, the definition of "monoid in monoidal category" comes later in the book as you can see from the table of contents. And yet understanding this notion is absolutely critical to understanding the connection with monads.
(Strict) monoidal categories
Going to Chapter VII on Monoids (which comes later than Chapter VI on Monads), we find the definition of the so-called strict monoidal category as triple (B, *, e), where B is a category, *: B x B-> B a bifunctor (functor with respect to each component with other component fixed) and e is a unit object in B, satisfying the associativity and unit laws:
(a * b) * c = a * (b * c)
a * e = e * a = a
for any objects a,b,c of B, and the same identities for any morphisms a,b,c with e replaced by id_e, the identity morphism of e. It is now instructive to observe that in our case of interest, where B is the category of endofunctors of X with natural transformations as morphisms, * the functor composition and e the identity functor, all these laws are satisfied, as can be directly verified.
What comes after in the book is the definition of the "relaxed" monoidal category, where the laws only hold modulo some fixed natural transformations satisfying so-called coherence relations, which is however not important for our cases of the endofunctor categories.
Monoids in monoidal categories
Finally, in section 3 "Monoids" of Chapter VII, the actual definition is given:
A monoid c in a monoidal category (B, *, e) is an object of B with two arrows (morphisms)
mu: c * c -> c
nu: e -> c
making 3 diagrams commutative. Recall that in our case, these are morphisms in the category of endofunctors, which are natural transformations corresponding to precisely join and return for a monad. The connection becomes even clearer when we make the composition * more explicit, replacing c * c by c^2, where c is our monad.
Finally, notice that the 3 commutative diagrams (in the definition of a monoid in monoidal category) are written for general (non-strict) monoidal categories, while in our case all natural transformations arising as part of the monoidal category are actually identities. That will make the diagrams exactly the same as the ones in the definition of a monad, making the correspondence complete.
Conclusion
In summary, any monad is by definition an endofunctor, hence an object in the category of endofunctors, where the monadic join and return operators satisfy the definition of a monoid in that particular (strict) monoidal category. Vice versa, any monoid in the monoidal category of endofunctors is by definition a triple (c, mu, nu) consisting of an object and two arrows, e.g. natural transformations in our case, satisfying the same laws as a monad.
Finally, note the key difference between the (classical) monoids and the more general monoids in monoidal categories. The two arrows mu and nu above are not anymore a binary operation and a unit in a set. Instead, you have one fixed endofunctor c. The functor composition * and the identity functor alone do not provide the complete structure needed for the monad, despite that confusing remark in the book.
Another approach would be to compare with the standard monoid C of all self-maps of a set A, where the binary operation is the composition, that can be seen to map the standard cartesian product C x C into C. Passing to the categorified monoid, we are replacing the cartesian product x with the functor composition *, and the binary operation gets replaced with the natural transformation mu from
c * c to c, that is a collection of the join operators
join: c(c(T))->c(T)
for every object T (type in programming). And the identity elements in classical monoids, which can be identified with images of maps from a fixed one-point-set, get replaced with the collection of the return operators
return: T->c(T)
But now there are no more cartesian products, so no pairs of elements and thus no binary operations.
I came to this post by way of better understanding the inference of the infamous quote from Mac Lane's Category Theory For the Working Mathematician.
In describing what something is, it's often equally useful to describe what it's not.
The fact that Mac Lane uses the description to describe a Monad, one might imply that it describes something unique to monads. Bear with me. To develop a broader understanding of the statement, I believe it needs to be made clear that he is not describing something that is unique to monads; the statement equally describes Applicative and Arrows among others. For the same reason we can have two monoids on Int (Sum and Product), we can have several monoids on X in the category of endofunctors. But there is even more to the similarities.
Both Monad and Applicative meet the criteria:
endo => any arrow, or morphism that starts and ends in the same place
functor => any arrow, or morphism between two Categories (e.g., in day to day Tree a -> List b, but in Category Tree -> List)
monoid => single object; i.e., a single type, but in this context, only in regards to the external layer; so, we can't have Tree -> List, only List -> List.
The statement uses "Category of..." This defines the scope of the statement. As an example, the Functor Category describes the scope of f * -> g *, i.e., Any functor -> Any functor, e.g., Tree * -> List * or Tree * -> Tree *.
What a Categorical statement does not specify describes where anything and everything is permitted.
In this case, inside the functors, * -> * aka a -> b is not specified which means Anything -> Anything including Anything else. As my imagination jumps to Int -> String, it also includes Integer -> Maybe Int, or even Maybe Double -> Either String Int where a :: Maybe Double; b :: Either String Int.
So the statement comes together as follows:
functor scope :: f a -> g b (i.e., any parameterized type to any parameterized type)
endo + functor :: f a -> f b (i.e., any one parameterized type to the same parameterized type) ... said differently,
a monoid in the category of endofunctor
So, where is the power of this construct? To appreciate the full dynamics, I needed to see that the typical drawings of a monoid (single object with what looks like an identity arrow, :: single object -> single object), fails to illustrate that I'm permitted to use an arrow parameterized with any number of monoid values, from the one type object permitted in Monoid. The endo, ~ identity arrow definition of equivalence ignores the functor's type value and both the type and value of the most inner, "payload" layer. Thus, equivalence returns true in any situation where the functorial types match (e.g., Nothing -> Just * -> Nothing is equivalent to Just * -> Just * -> Just * because they are both Maybe -> Maybe -> Maybe).
Sidebar: ~ outside is conceptual, but is the left most symbol in f a. It also describes what "Haskell" reads-in first (big picture); so Type is "outside" in relation to a Type Value. The relationship between layers (a chain of references) in programming is not easy to relate in Category. The Category of Set is used to describe Types (Int, Strings, Maybe Int etc.) which includes the Category of Functor (parameterized Types). The reference chain: Functor Type, Functor values (elements of that Functor's set, e.g., Nothing, Just), and in turn, everything else each functor value points to. In Category the relationship is described differently, e.g., return :: a -> m a is considered a natural transformation from one Functor to another Functor, different from anything mentioned thus far.
Back to the main thread, all in all, for any defined tensor product and a neutral value, the statement ends up describing an amazingly powerful computational construct born from its paradoxical structure:
on the outside it appears as a single object (e.g., :: List); static
but inside, permits a lot of dynamics
any number of values of the same type (e.g., Empty | ~NonEmpty) as fodder to functions of any arity. The tensor product will reduce any number of inputs to a single value... for the external layer (~fold that says nothing about the payload)
infinite range of both the type and values for the inner most layer
In Haskell, clarifying the applicability of the statement is important. The power and versatility of this construct, has absolutely nothing to do with a monad per se. In other words, the construct does not rely on what makes a monad unique.
When trying to figure out whether to build code with a shared context to support computations that depend on each other, versus computations that can be run in parallel, this infamous statement, with as much as it describes, is not a contrast between the choice of Applicative, Arrows and Monads, but rather is a description of how much they are the same. For the decision at hand, the statement is moot.
This is often misunderstood. The statement goes on to describe join :: m (m a) -> m a as the tensor product for the monoidal endofunctor. However, it does not articulate how, in the context of this statement, (<*>) could also have also been chosen. It truly is an example of 'six in one, half a dozen in the other'. The logic for combining values are exactly alike; same input generates the same output from each (unlike the Sum and Product monoids for Int because they generate different results when combining Ints).
So, to recap: A monoid in the category of endofunctors describes:
~t :: m * -> m * -> m *
and a neutral value for m *
(<*>) and (>>=) both provide simultaneous access to the two m values in order to compute the the single return value. The logic used to compute the return value is exactly the same. If it were not for the different shapes of the functions they parameterize (f :: a -> b versus k :: a -> m b) and the position of the parameter with the same return type of the computation (i.e., a -> b -> b versus b -> a -> b for each respectively), I suspect we could have parameterized the monoidal logic, the tensor product, for reuse in both definitions. As an exercise to make the point, try and implement ~t, and you end up with (<*>) and (>>=) depending on how you decide to define it forall a b.
If my last point is at minimum conceptually true, it then explains the precise, and only computational difference between Applicative and Monad: the functions they parameterize. In other words, the difference is external to the implementation of these type classes.
In conclusion, in my own experience, Mac Lane's infamous quote provided a great "goto" meme, a guidepost for me to reference while navigating my way through Category to better understand the idioms used in Haskell. It succeeds at capturing the scope of a powerful computing capacity made wonderfully accessible in Haskell.
However, there is irony in how I first misunderstood the statement's applicability outside of the monad, and what I hope conveyed here. Everything that it describes turns out to be what is similar between Applicative and Monads (and Arrows among others). What it doesn't say is precisely the small but useful distinction between them.
Note: No, this isn't true. At some point there was a comment on this answer from Dan Piponi himself saying that the cause and effect here was exactly the opposite, that he wrote his article in response to James Iry's quip. But it seems to have been removed, perhaps by some compulsive tidier.
Below is my original answer.
It's quite possible that Iry had read From Monoids to Monads, a post in which Dan Piponi (sigfpe) derives monads from monoids in Haskell, with much discussion of category theory and explicit mention of "the category of endofunctors on Hask" . In any case, anyone who wonders what it means for a monad to be a monoid in the category of endofunctors might benefit from reading this derivation.

Resources