Is my understanding of monoid valid?

Is my understanding of monoid valid? - haskell

So, I'm learning Haskell at the moment, and I would like to confirm or debunk my understanding of monoid.
What I figured out from reading CIS194 course is that monoid is basically "API" for defining custom binary operation on custom set.
Than I went to inform my self some more and I stumbled upon massive ammount of very confusing tutorials trying to clarify the thing, so I'm not so sure anymore.
I have decent mathematical background, but I just got confused from all the metaphors and am looking for clear yes/no answer to my understanding of monoid.

From Wikipedia:
In abstract algebra, a branch of mathematics, a monoid is an algebraic structure with a single associative binary operation and an identity element.
I think your understanding is correct. From a programming perspective, Monoid is an interface with two "methods" that must be implemented.
The only piece that seems to be missing from your description is the "identity", without which you are describing a Semigroup.
Anything that has a "zero" or an "empty" and a way of combining two values can be a Monoid. One thing to note is that it may be possible for a set/type to be made a Monoid in more than one way, for example numbers via addition with identity 0, or multiplication with identity 1.

from Wolfram:
A monoid is a set that is closed under an associative binary operation and has an identity element I in S such that for all a in S, Ia=aI=a.
from Wiki:
In abstract algebra, a branch of mathematics, a monoid is an algebraic structure with a single associative binary operation and an identity element.
so your intuition is more or less right.
You should only keep in mind that it's not defined for a "custom set" in Haskell but a type. The distinction is small (because types in type theory are very similar to sets in set theory) but the types for which you can define a Monoid instance need not be types that represent mathematical sets.
In other words: a type describes the set of all values that are of that type. Monoid is an "interface" that states that any type that claims to adhere to that interface must provide an identity value, a binary operation combining two values of that type, and there are some equations these should satisfy in order for all generic Monoid operations to work as intended (such as the generic summation of a list of monoid values) and not produce illogical/inconsistent results.
Also, note that the existence of an identity element in that set (type) is required for a type to be an instance of the Monoid class.
For example, natural numbers form a Monoid under both addition (identity = 0):
0 + n = n
n + 0 = n
as well as multiplication (identity = 1):
1 * n = n
n * 1 = n
also lists form a monoid under ++ (identity = []):
[] ++ xs = xs
xs ++ [] = xs
also, functions of type a -> a form a monoid under composition (identity = id)
id . f = f
f . id = f
so it's important to keep in mind that Monoid isn't about types that represents sets but about types when viewed as sets, so to say.
as an example of a malconstructed Monoid instance, consider:
import Data.Monoid
newtype MyInt = MyInt Int deriving Show
instance Monoid MyInt where
mempty = MyInt 0
mappend (MyInt a) (MyInt b) = MyInt (a * b)
if you now try to mconcat a list of MyInt values, you'll always get MyInt 0 as the result because the identity value 0 and binary operation * don't play well together:
λ> mconcat [MyInt 1, MyInt 2]
MyInt 0

At a basic level you're right - it's just an API for a binary operator we denote by <>.
However, the value of the monoid concept is in its relationship to other types and classes. Culturally we've decided that <> is the natural way of joining/appending two things of the same type together.
Consider this example:
{-# LANGUAGE OverloadedStrings #-}
import Data.Monoid
greet x = "Hello, " <> x
The function greet is extremely polymorphic - x can be a String, ByteString or Text just to name a few possibilities. Moreover, in each of these cases it does basically what you expect it to - it appends x to the string `"Hello, ".
Additionally, there are lots of algorithms which will work on anything that can be accumulated, and those are good candidates for generalization to a Monoid. For example consider the foldMap function from the Foldable class:
foldMap :: Monoid m => (a -> m) -> t a -> m
Not only does foldMap generalize the idea of folding over a structure, but I can generalize how the accumulation is performed by substituting the right Monoid instance.
If I have a foldable structure t containing Ints, I can use foldMap with the Sum monoid to get the sum of the Ints, or with Product to get the product, etc.
Finally, using <> affords convenience. For instance, there is an abundance of different Set implementations, but for all of them s <> t is always the union of two sets s and t (of the same type). This enables me to write code which is agnostic of the underlying implementation of the set thereby simplifying my code. The same can be said for a lot of other data structures, e.g. sequences, trees, maps, priority queues, etc.

Related

How to write length function for all Monoids

I'm playing around with rewriting simple functions in different ways and I clearly misunderstand some core concepts. Is there a better way to work with limited types like these?
mlength :: Monoid m => m -> Int
mlength mempty = 0
mlength (l <> r) = mlength l + mlength r
It fails compilation with the following error:
Parse error in pattern: l <> r
I can see that my usage of <> is misguided because there are multiple correct matches for l and r. Even though it looks like it doesn't matter which value is assigned, a value still has to be assigned in the end. Maybe there's a way for me to assert this decision for specific Monoid instances?
"ab" == "" <> "ab"
"ab" == "a" <> "b"
"ab" == "ab" <> ""

A monoid, in the general case, has no notion of length. Take for instance Sum Int, which is Int equipped with addition for its monoidal operation. We have
Sum 3 <> Sum 4 = Sum 7 = Sum (-100) <> Sum 7 <> Sum (100)
What should be its "length"? There is no real notion of length here, since the underlying type is Int, which is not a list-like type.
Another example: Endo Int which is Int -> Int equipped with composition. E.g.
Endo (\x -> x+1) <> Endo (\x -> x*2) = Endo (\x -> 2*x+1)
Again, no meaningful "length" can be defined here.
You can browse Data.Monoid and see other examples where there is no notion of "length".
Const a is also a (boring) monoid with no length.
Now, it is true that lists [a] form a monoid (the free monoid over a), and length can indeed be defined there. Still, this is only a particular case, which does not generalize.

The Semigroup and Monoid interfaces provide a means to build up values, (<>). They don't, however, give us a way to break down or otherwise extract information from values. That being so, a length generalised beyond some specific type requires a different abstraction.
As discussed in the comments to chi's answer, while Data.Foldable offers a generalised length :: Foldable t => t a -> Int, it isn't quite what you were aiming at -- in particular, the connection between Foldable and Monoid is that foldable structures can be converted to lists/the free monoid, and not that foldables themselves are necessarily monoids.
One other possibility, which is somewhat obscure but closer to the spirit of your question, is the Factorial class from the monoid-subclasses package, a subclass of Semigroup. It is built around factors :: Factorial m => m -> [m], which splits a value into irreducible factors, undoing what sconcat or mconcat do. A generalised length :: Factorial m => m -> Int can then be defined as the length of the list of factors. In any case, note that we still end up needing a further abstraction on the top of Semigroup/Monoid.

(ML) Modules vs (Haskell) Type Classes

According to Harper (https://existentialtype.wordpress.com/2011/04/16/modules-matter-most/), it seems that Type Classes simply do not offer the same level of abstraction that Modules offer and I'm having a hard time exactly figuring out why. And there are no examples in that link, so it's hard for me to see the key differences. There are also other papers on how to translate between Modules and Type Classes (http://www.cse.unsw.edu.au/~chak/papers/modules-classes.pdf), but this doesn't really have anything to do with the implementation in the programmer's perspective (it just says that there isn't something one can do that the other can't emulate).
Specifically, in the first link:
The first is that they insist that a type can implement a type class in exactly one way. For example, according to the philosophy of type classes, the integers can be ordered in precisely one way (the usual ordering), but obviously there are many orderings (say, by divisibility) of interest. The second is that they confound two separate issues: specifying how a type implements a type class and specifying when such a specification should be used during type inference.
I don't understand either. A type can implement a type class in more than 1 way in ML? How would you have the integers ordered by divisibility by example without creating a new type? In Haskell, you would have to do something like use data and have the instance Ord to offer an alternative ordering.
And the second one, aren't the two are distinct in Haskell?
Specifying "when such a specification should be used during type inference" can be done by something like this:
blah :: BlahType b => ...
where BlahType is the class being used during the type inference and NOT the implementing class. Whereas, "how a type implements a type class" is done using instance.
Can some one explain what the link is really trying to say? I'm just not quite understanding why Modules would be less restrictive than Type Classes.

To understand what the article is saying, take a moment to consider the Monoid typeclass in Haskell. A monoid is any type, T, which has a function mappend :: T -> T -> T and identity element mempty :: T for which the following holds.
a `mappend` (b `mappend` c) == (a `mappend` b) `mappend` c
a `mappend` mempty == mempty `mappend` a == a
There are many Haskell types which fit this definition. One example that springs immediately to mind are the integers, for which we can define the following.
instance Monoid Integer where
mappend = (+)
mempty = 0
You can confirm that all of the requirements hold.
a + (b + c) == (a + b) + c
a + 0 == 0 + a == a
Indeed, the those conditions hold for all numbers over addition, so we can define the following as well.
instance Num a => Monoid a where
mappend = (+)
mempty = 0
So now, in GHCi, we can do the following.
> mappend 3 5
8
> mempty
0
Particularly observant readers (or those with a background in mathemetics) will probably have noticed by now that we can also define a Monoid instance for numbers over multiplication.
instance Num a => Monoid a where
mappend = (*)
mempty = 1
a * (b * c) == (a * b) * c
a * 1 == 1 * a == a
But now the compiler encounters a problem. Which definiton of mappend should it use for numbers? Does mappend 3 5 equal 8 or 15? There is no way for it to decide. This is why Haskell does not allow multiple instances of a single typeclass. However, the issue still stands. Which Monoid instance of Num should we use? Both are perfectly valid and make sense for certain circumstances. The solution is to use neither. If you look Monoid in Hackage, you will see that there is no Monoid instance of Num, or Integer, Int, Float, or Double for that matter. Instead, there are Monoid instances of Sum and Product. Sum and Product are defined as follows.
newtype Sum a = Sum { getSum :: a }
newtype Product a = Product { getProduct :: a }
instance Num a => Monoid (Sum a) where
mappend (Sum a) (Sum b) = Sum $ a + b
mempty = Sum 0
instance Num a => Monoid (Product a) where
mappend (Product a) (Product b) = Product $ a * b
mempty = Product 1
Now, if you want to use a number as a Monoid you have to wrap it in either a Sum or Product type. Which type you use determines which Monoid instance is used. This is the essence of what the article was trying to describe. There is no system built into Haskell's typeclass system which allows you to choose between multiple intances. Instead you have to jump through hoops by wrapping and unwrapping them in skeleton types. Now whether or not you consider this a problem is a large part of what determines whether you prefer Haskell or ML.
ML gets around this by allowing multiple "instances" of the same class and type to be defined in different modules. Then, which module you import determines which "instance" you use. (Strictly speaking, ML doesn't have classes and instances, but it does have signatures and structures, which can act almost the same. For amore in depth comparison, read this paper).

Why doesn't Haskell have a stronger alternative to Eq?

The reason why Set is not a functor is given here. It seems to boil down to the fact that a == b && f a /= f b is possible. So, why doesn't Haskell have as standard an alternative to Eq, something like
class Eq a => StrongEq a where
(===) :: a -> a -> Bool
(/==) :: a -> a -> Bool
x /== y = not (x === y)
x === y = not (x /== y)
for which instances are supposed to obey the laws
∀a,b,f. not (a === b) || (f a === f b)
∀a. a === a
∀a,b. (a === b) == (b === a)
and maybe some others? Then we could have:
instance StrongEq a => Functor (Set a) where
-- ...
Or am I missing something?
Edit: my problem is not “Why are there types without an Eq instance?”, like some of you seem to have answered. It's the opposite: “Why are there instances of Eq that aren't extensionally equal? Why are there too many Eq instances?”, combined with “If a == b does imply extensional equality, why is Set not an instance of Functor?”.
Also, my instance declaration is rubbish (thanks #n.m.). I should have said:
newtype StrongSet a = StrongSet (Set a)
instance Functor StrongSet where
fmap :: (StrongEq a, StrongEq b) => (a -> b) -> StrongSet a -> StrongSet b
fmap (StrongSet s) = StrongSet (map s)

instance StrongEq a => Functor (Set a) where
This makes sense neither in Haskell nor in the grand mathematical/categorical scheme of things, regardless of what StrongEq means.
In Haskell, Functor requires a type constructor of kind * -> *. The arrow reflects the fact that in category theory, a functor is a kind of mapping. [] and (the hypothetical) Set are such type constructors. [a] and Set a have kind * and cannot be functors.
In Haskell, it is hard to define Set such that it can be made into a Functor because equality cannot be sensibly defined for some types no matter what. You cannot compare two things of type Integer->Integer, for example.
Let's suppose there is a function
goedel :: Integer -> Integer -> Integer
goedel x y = -- compute the result of a function with
-- Goedel number x, applied to y
Suppose you have a value s :: Set Integer. What fmap goedel s should look like? How do you eliminate duplicates?
In your typical set theory equality is magically defined for everything, including functions, so Set (or Powerset to be precise) is a functor, no problem with that.

Since I'm not a category theorist, I'll try to write a more concrete/practical explanation (i.e., one I can understand):
The key point is the one that #leftaroundabout made in a comment:
== is supposed to
witness "equivalent by all observable means" (that doesn't necessarily
require a == b must hold only for identical implementations; but
anything you can "officially" do with a and b should again yield
equivalent results. So unAlwaysEq should never be exposed in the first
place). If you can't ensure this for some type, you shouldn't give it
an Eq instance.
That is, there should be no need for your StrongEq because that's what Eq is supposed to be already.
Haskell values are often intended to represent some sort of mathematical or "real-life" value. Many times, this representation is one-to-one. For example, consider the type
data PlatonicSolid = Tetrahedron | Cube |
Octahedron | Dodecahedron | Icosahedron
This type contains exactly one representation of each Platonic solid. We can take advantage of this by adding deriving Eq to the declaration, and it will produce the correct instance.
In many cases, however, the same abstract value may be represented by more than one Haskell value. For example, the red-black trees Node B (Node R Leaf 1 Leaf) 2 Leaf and Node B Leaf 1 (Node R Leaf 2 Leaf) can both represent the set {1,2}. If we added deriving Eq to our declaration, we would get an instance of Eq that distinguishes things we want to be considered the same (outside of the implementation of the set operations).
It's important to make sure that types are only made instances of Eq (and Ord) when appropriate! It's very tempting to make something an instance of Ord just so you can stick it in a data structure that requires ordering, but if the ordering is not truly a total ordering of the abstract values, all manner of breakage may ensue. Unless the documentation absolutely guarantees it, for example, a function called sort :: Ord a => [a] -> [a] may not only be an unstable sort, but may not even produce a list containing all the Haskell values that go into it. sort [Bad 1 "Bob", Bad 1 "James"] can reasonably produce [Bad 1 "Bob", Bad 1 "James"], [Bad 1 "James", Bad 1 "Bob"], [Bad 1 "James", Bad 1 "James"], or [Bad 1 "Bob", Bad 1 "Bob"]. All of these are perfectly legitimate. A function that uses unsafePerformIO in the back room to implement a Las Vegas-style randomized algorithm or to race threads against each other to get an answer from the fastest may even give different results different times, as long as they're == to each other.
tl;dr: Making something an instance of Eq is a way of making a very strong statement to the world; don't make that statement if you don't mean it.

Your second Functor instance also doesn't make any sense. The biggest reason why Set can't be a Functor in Haskell is fmap can't have constraints. Inventing different notions of equality as StrongEq doesn't change the fact that you can't write those constraints on fmap in your Set instance.
fmap in general shouldn't have the constraints you need. It makes perfect sense to have functors of functions, for example (without it the whole notion of using Applicative to apply functions inside a functor breaks down), and functions can't be members of Eq or your StrongEq in general.
fmap can't have extra constraints on only some instances, because of code like this:
fmapBoth :: (Functor f, Functor g) => (a -> b, c -> d) -> (f a, g c) -> (f b, g d)
fmapBoth (h, j) (x, y) = (fmap h x, fmap j y)
This code claims to work regardless of the functors f and g, and regardless of the functions h and j. It has no way of checking whether one of the functors is a special one that has extra constraints on fmap, nor any way of checking whether one of the functions it's applying would violate those constraints.
Saying that Set is a Functor in Haskell, is saying that there is a (lawful) operation fmap :: (a -> b) -> Set a -> Set b, with that exact type. That is precisely what Functor means. fmap :: (Eq a -> Eq b) => (a -> b) -> Set a -> Set b is not an example of such an operation.
It is possible, I understand, to use the ConstraintKinds GHC extendsion to write a different Functor class that permits constraints on the values which vary by Functor (and what you actually need is an Ord constraint, not just Eq). This blog post talks about doing so to make a new Monad class which can have an instance for Set. I've never played around with code like this, so I don't know much more than that the technique exists. It wouldn't help you hand off Sets to existing code that needs Functors, but you should be able to use it instead of Functor in your own code if you wish.

This notion of StrongEq is tough. In general, equality is a place where computer science becomes significantly more rigorous than typical mathematics in the kind of way which makes things challenging.
In particular, typical mathematics likes to talk about objects as though they exist in a set and can be uniquely identified. Computer programs usually deal with types which are not always computable (as a simple counterexample, tell me what the set corresponding to the type data U = U (U -> U) is). This means that it may be undecidable as to whether two values are identifiable.
This becomes an enormous topic in dependently typed languages since typechecking requires identifying like types and dependently typed languages may have arbitrary values in their types and thus need a way to project equality.
So, StrongEq could be defined over a restricted part of Haskell containing only the types which can be decidably compared for equality. We can consider this a category with the arrows as computable functions and then see Set as an endofunctor from types to the type of sets of values of that type. Unfortunately, these restrictions have taken us far from standard Haskell and make defining StrongEq or Functor (Set a) a little less than practical.

A little category theory [duplicate]

Who first said the following?
A monad is just a monoid in the
category of endofunctors, what's the
problem?
And on a less important note, is this true and if so could you give an explanation (hopefully one that can be understood by someone who doesn't have much Haskell experience)?

That particular phrasing is by James Iry, from his highly entertaining Brief, Incomplete and Mostly Wrong History of Programming Languages, in which he fictionally attributes it to Philip Wadler.
The original quote is from Saunders Mac Lane in Categories for the Working Mathematician, one of the foundational texts of Category Theory. Here it is in context, which is probably the best place to learn exactly what it means.
But, I'll take a stab. The original sentence is this:
All told, a monad in X is just a monoid in the category of endofunctors of X, with product × replaced by composition of endofunctors and unit set by the identity endofunctor.
X here is a category. Endofunctors are functors from a category to itself (which is usually all Functors as far as functional programmers are concerned, since they're mostly dealing with just one category; the category of types - but I digress). But you could imagine another category which is the category of "endofunctors on X". This is a category in which the objects are endofunctors and the morphisms are natural transformations.
And of those endofunctors, some of them might be monads. Which ones are monads? Exactly the ones which are monoidal in a particular sense. Instead of spelling out the exact mapping from monads to monoids (since Mac Lane does that far better than I could hope to), I'll just put their respective definitions side by side and let you compare:
A monoid is...
A set, S
An operation, • : S × S → S
An element of S, e : 1 → S
...satisfying these laws:
(a • b) • c = a • (b • c), for all a, b and c in S
e • a = a • e = a, for all a in S
A monad is...
An endofunctor, T : X → X (in Haskell, a type constructor of kind * -> * with a Functor instance)
A natural transformation, μ : T × T → T, where × means functor composition (μ is known as join in Haskell)
A natural transformation, η : I → T, where I is the identity endofunctor on X (η is known as return in Haskell)
...satisfying these laws:
μ ∘ Tμ = μ ∘ μT
μ ∘ Tη = μ ∘ ηT = 1 (the identity natural transformation)
With a bit of squinting you might be able to see that both of these definitions are instances of the same abstract concept.

First, the extensions and libraries that we're going to use:
{-# LANGUAGE RankNTypes, TypeOperators #-}
import Control.Monad (join)
Of these, RankNTypes is the only one that's absolutely essential to the below. I once wrote an explanation of RankNTypes that some people seem to have found useful, so I'll refer to that.
Quoting Tom Crockett's excellent answer, we have:
A monad is...
An endofunctor, T : X -> X
A natural transformation, μ : T × T -> T, where × means functor composition
A natural transformation, η : I -> T, where I is the identity endofunctor on X
...satisfying these laws:
μ(μ(T × T) × T)) = μ(T × μ(T × T))
μ(η(T)) = T = μ(T(η))
How do we translate this to Haskell code? Well, let's start with the notion of a natural transformation:
-- | A natural transformations between two 'Functor' instances. Law:
--
-- > fmap f . eta g == eta g . fmap f
--
-- Neat fact: the type system actually guarantees this law.
--
newtype f :-> g =
Natural { eta :: forall x. f x -> g x }
A type of the form f :-> g is analogous to a function type, but instead of thinking of it as a function between two types (of kind *), think of it as a morphism between two functors (each of kind * -> *). Examples:
listToMaybe :: [] :-> Maybe
listToMaybe = Natural go
where go [] = Nothing
go (x:_) = Just x
maybeToList :: Maybe :-> []
maybeToList = Natural go
where go Nothing = []
go (Just x) = [x]
reverse' :: [] :-> []
reverse' = Natural reverse
Basically, in Haskell, natural transformations are functions from some type f x to another type g x such that the x type variable is "inaccessible" to the caller. So for example, sort :: Ord a => [a] -> [a] cannot be made into a natural transformation, because it's "picky" about which types we may instantiate for a. One intuitive way I often use to think of this is the following:
A functor is a way of operating on the content of something without touching the structure.
A natural transformation is a way of operating on the structure of something without touching or looking at the content.
Now, with that out of the way, let's tackle the clauses of the definition.
The first clause is "an endofunctor, T : X -> X." Well, every Functor in Haskell is an endofunctor in what people call "the Hask category," whose objects are Haskell types (of kind *) and whose morphisms are Haskell functions. This sounds like a complicated statement, but it's actually a very trivial one. All it means is that that a Functor f :: * -> * gives you the means of constructing a type f a :: * for any a :: * and a function fmap f :: f a -> f b out of any f :: a -> b, and that these obey the functor laws.
Second clause: the Identity functor in Haskell (which comes with the Platform, so you can just import it) is defined this way:
newtype Identity a = Identity { runIdentity :: a }
instance Functor Identity where
fmap f (Identity a) = Identity (f a)
So the natural transformation η : I -> T from Tom Crockett's definition can be written this way for any Monad instance t:
return' :: Monad t => Identity :-> t
return' = Natural (return . runIdentity)
Third clause: The composition of two functors in Haskell can be defined this way (which also comes with the Platform):
newtype Compose f g a = Compose { getCompose :: f (g a) }
-- | The composition of two 'Functor's is also a 'Functor'.
instance (Functor f, Functor g) => Functor (Compose f g) where
fmap f (Compose fga) = Compose (fmap (fmap f) fga)
So the natural transformation μ : T × T -> T from Tom Crockett's definition can be written like this:
join' :: Monad t => Compose t t :-> t
join' = Natural (join . getCompose)
The statement that this is a monoid in the category of endofunctors then means that Compose (partially applied to just its first two parameters) is associative, and that Identity is its identity element. I.e., that the following isomorphisms hold:
Compose f (Compose g h) ~= Compose (Compose f g) h
Compose f Identity ~= f
Compose Identity g ~= g
These are very easy to prove because Compose and Identity are both defined as newtype, and the Haskell Reports define the semantics of newtype as an isomorphism between the type being defined and the type of the argument to the newtype's data constructor. So for example, let's prove Compose f Identity ~= f:
Compose f Identity a
~= f (Identity a) -- newtype Compose f g a = Compose (f (g a))
~= f a -- newtype Identity a = Identity a
Q.E.D.

The answers here do an excellent job in defining both monoids and monads, however, they still don't seem to answer the question:
And on a less important note, is this true and if so could you give an explanation (hopefully one that can be understood by someone who doesn't have much Haskell experience)?
The crux of the matter that is missing here, is the different notion of "monoid", the so-called categorification more precisely -- the one of monoid in a monoidal category. Sadly Mac Lane's book itself makes it very confusing:
All told, a monad in X is just a monoid in the category of endofunctors of X, with product × replaced by composition of endofunctors and unit set by the identity endofunctor.
Main confusion
Why is this confusing? Because it does not define what is "monoid in the category of endofunctors" of X. Instead, this sentence suggests taking a monoid inside the set of all endofunctors together with the functor composition as binary operation and the identity functor as a monoidal unit. Which works perfectly fine and turns into a monoid any subset of endofunctors that contains the identity functor and is closed under functor composition.
Yet this is not the correct interpretation, which the book fails to make clear at that stage. A Monad f is a fixed endofunctor, not a subset of endofunctors closed under composition. A common construction is to use f to generate a monoid by taking the set of all k-fold compositions f^k = f(f(...)) of f with itself, including k=0 that corresponds to the identity f^0 = id. And now the set S of all these powers for all k>=0 is indeed a monoid "with product × replaced by composition of endofunctors and unit set by the identity endofunctor".
And yet:
This monoid S can be defined for any functor f or even literally for any self-map of X. It is the monoid generated by f.
The monoidal structure of S given by the functor composition and the identity functor has nothing do with f being or not being a monad.
And to make things more confusing, the definition of "monoid in monoidal category" comes later in the book as you can see from the table of contents. And yet understanding this notion is absolutely critical to understanding the connection with monads.
(Strict) monoidal categories
Going to Chapter VII on Monoids (which comes later than Chapter VI on Monads), we find the definition of the so-called strict monoidal category as triple (B, *, e), where B is a category, *: B x B-> B a bifunctor (functor with respect to each component with other component fixed) and e is a unit object in B, satisfying the associativity and unit laws:
(a * b) * c = a * (b * c)
a * e = e * a = a
for any objects a,b,c of B, and the same identities for any morphisms a,b,c with e replaced by id_e, the identity morphism of e. It is now instructive to observe that in our case of interest, where B is the category of endofunctors of X with natural transformations as morphisms, * the functor composition and e the identity functor, all these laws are satisfied, as can be directly verified.
What comes after in the book is the definition of the "relaxed" monoidal category, where the laws only hold modulo some fixed natural transformations satisfying so-called coherence relations, which is however not important for our cases of the endofunctor categories.
Monoids in monoidal categories
Finally, in section 3 "Monoids" of Chapter VII, the actual definition is given:
A monoid c in a monoidal category (B, *, e) is an object of B with two arrows (morphisms)
mu: c * c -> c
nu: e -> c
making 3 diagrams commutative. Recall that in our case, these are morphisms in the category of endofunctors, which are natural transformations corresponding to precisely join and return for a monad. The connection becomes even clearer when we make the composition * more explicit, replacing c * c by c^2, where c is our monad.
Finally, notice that the 3 commutative diagrams (in the definition of a monoid in monoidal category) are written for general (non-strict) monoidal categories, while in our case all natural transformations arising as part of the monoidal category are actually identities. That will make the diagrams exactly the same as the ones in the definition of a monad, making the correspondence complete.
Conclusion
In summary, any monad is by definition an endofunctor, hence an object in the category of endofunctors, where the monadic join and return operators satisfy the definition of a monoid in that particular (strict) monoidal category. Vice versa, any monoid in the monoidal category of endofunctors is by definition a triple (c, mu, nu) consisting of an object and two arrows, e.g. natural transformations in our case, satisfying the same laws as a monad.
Finally, note the key difference between the (classical) monoids and the more general monoids in monoidal categories. The two arrows mu and nu above are not anymore a binary operation and a unit in a set. Instead, you have one fixed endofunctor c. The functor composition * and the identity functor alone do not provide the complete structure needed for the monad, despite that confusing remark in the book.
Another approach would be to compare with the standard monoid C of all self-maps of a set A, where the binary operation is the composition, that can be seen to map the standard cartesian product C x C into C. Passing to the categorified monoid, we are replacing the cartesian product x with the functor composition *, and the binary operation gets replaced with the natural transformation mu from
c * c to c, that is a collection of the join operators
join: c(c(T))->c(T)
for every object T (type in programming). And the identity elements in classical monoids, which can be identified with images of maps from a fixed one-point-set, get replaced with the collection of the return operators
return: T->c(T)
But now there are no more cartesian products, so no pairs of elements and thus no binary operations.

I came to this post by way of better understanding the inference of the infamous quote from Mac Lane's Category Theory For the Working Mathematician.
In describing what something is, it's often equally useful to describe what it's not.
The fact that Mac Lane uses the description to describe a Monad, one might imply that it describes something unique to monads. Bear with me. To develop a broader understanding of the statement, I believe it needs to be made clear that he is not describing something that is unique to monads; the statement equally describes Applicative and Arrows among others. For the same reason we can have two monoids on Int (Sum and Product), we can have several monoids on X in the category of endofunctors. But there is even more to the similarities.
Both Monad and Applicative meet the criteria:
endo => any arrow, or morphism that starts and ends in the same place
functor => any arrow, or morphism between two Categories (e.g., in day to day Tree a -> List b, but in Category Tree -> List)
monoid => single object; i.e., a single type, but in this context, only in regards to the external layer; so, we can't have Tree -> List, only List -> List.
The statement uses "Category of..." This defines the scope of the statement. As an example, the Functor Category describes the scope of f * -> g *, i.e., Any functor -> Any functor, e.g., Tree * -> List * or Tree * -> Tree *.
What a Categorical statement does not specify describes where anything and everything is permitted.
In this case, inside the functors, * -> * aka a -> b is not specified which means Anything -> Anything including Anything else. As my imagination jumps to Int -> String, it also includes Integer -> Maybe Int, or even Maybe Double -> Either String Int where a :: Maybe Double; b :: Either String Int.
So the statement comes together as follows:
functor scope :: f a -> g b (i.e., any parameterized type to any parameterized type)
endo + functor :: f a -> f b (i.e., any one parameterized type to the same parameterized type) ... said differently,
a monoid in the category of endofunctor
So, where is the power of this construct? To appreciate the full dynamics, I needed to see that the typical drawings of a monoid (single object with what looks like an identity arrow, :: single object -> single object), fails to illustrate that I'm permitted to use an arrow parameterized with any number of monoid values, from the one type object permitted in Monoid. The endo, ~ identity arrow definition of equivalence ignores the functor's type value and both the type and value of the most inner, "payload" layer. Thus, equivalence returns true in any situation where the functorial types match (e.g., Nothing -> Just * -> Nothing is equivalent to Just * -> Just * -> Just * because they are both Maybe -> Maybe -> Maybe).
Sidebar: ~ outside is conceptual, but is the left most symbol in f a. It also describes what "Haskell" reads-in first (big picture); so Type is "outside" in relation to a Type Value. The relationship between layers (a chain of references) in programming is not easy to relate in Category. The Category of Set is used to describe Types (Int, Strings, Maybe Int etc.) which includes the Category of Functor (parameterized Types). The reference chain: Functor Type, Functor values (elements of that Functor's set, e.g., Nothing, Just), and in turn, everything else each functor value points to. In Category the relationship is described differently, e.g., return :: a -> m a is considered a natural transformation from one Functor to another Functor, different from anything mentioned thus far.
Back to the main thread, all in all, for any defined tensor product and a neutral value, the statement ends up describing an amazingly powerful computational construct born from its paradoxical structure:
on the outside it appears as a single object (e.g., :: List); static
but inside, permits a lot of dynamics
any number of values of the same type (e.g., Empty | ~NonEmpty) as fodder to functions of any arity. The tensor product will reduce any number of inputs to a single value... for the external layer (~fold that says nothing about the payload)
infinite range of both the type and values for the inner most layer
In Haskell, clarifying the applicability of the statement is important. The power and versatility of this construct, has absolutely nothing to do with a monad per se. In other words, the construct does not rely on what makes a monad unique.
When trying to figure out whether to build code with a shared context to support computations that depend on each other, versus computations that can be run in parallel, this infamous statement, with as much as it describes, is not a contrast between the choice of Applicative, Arrows and Monads, but rather is a description of how much they are the same. For the decision at hand, the statement is moot.
This is often misunderstood. The statement goes on to describe join :: m (m a) -> m a as the tensor product for the monoidal endofunctor. However, it does not articulate how, in the context of this statement, (<*>) could also have also been chosen. It truly is an example of 'six in one, half a dozen in the other'. The logic for combining values are exactly alike; same input generates the same output from each (unlike the Sum and Product monoids for Int because they generate different results when combining Ints).
So, to recap: A monoid in the category of endofunctors describes:
~t :: m * -> m * -> m *
and a neutral value for m *
(<*>) and (>>=) both provide simultaneous access to the two m values in order to compute the the single return value. The logic used to compute the return value is exactly the same. If it were not for the different shapes of the functions they parameterize (f :: a -> b versus k :: a -> m b) and the position of the parameter with the same return type of the computation (i.e., a -> b -> b versus b -> a -> b for each respectively), I suspect we could have parameterized the monoidal logic, the tensor product, for reuse in both definitions. As an exercise to make the point, try and implement ~t, and you end up with (<*>) and (>>=) depending on how you decide to define it forall a b.
If my last point is at minimum conceptually true, it then explains the precise, and only computational difference between Applicative and Monad: the functions they parameterize. In other words, the difference is external to the implementation of these type classes.
In conclusion, in my own experience, Mac Lane's infamous quote provided a great "goto" meme, a guidepost for me to reference while navigating my way through Category to better understand the idioms used in Haskell. It succeeds at capturing the scope of a powerful computing capacity made wonderfully accessible in Haskell.
However, there is irony in how I first misunderstood the statement's applicability outside of the monad, and what I hope conveyed here. Everything that it describes turns out to be what is similar between Applicative and Monads (and Arrows among others). What it doesn't say is precisely the small but useful distinction between them.

Note: No, this isn't true. At some point there was a comment on this answer from Dan Piponi himself saying that the cause and effect here was exactly the opposite, that he wrote his article in response to James Iry's quip. But it seems to have been removed, perhaps by some compulsive tidier.
Below is my original answer.
It's quite possible that Iry had read From Monoids to Monads, a post in which Dan Piponi (sigfpe) derives monads from monoids in Haskell, with much discussion of category theory and explicit mention of "the category of endofunctors on Hask" . In any case, anyone who wonders what it means for a monad to be a monoid in the category of endofunctors might benefit from reading this derivation.

Confused by the meaning of the 'Alternative' type class and its relationship to other type classes

I've been going through the Typeclassopedia to learn the type classes. I'm stuck understanding Alternative (and MonadPlus, for that matter).
The problems I'm having:
the 'pedia says that "the Alternative type class is for Applicative functors which also have a monoid structure." I don't get this -- doesn't Alternative mean something totally different from Monoid? i.e. I understood the point of the Alternative type class as picking between two things, whereas I understood Monoids as being about combining things.
why does Alternative need an empty method/member? I may be wrong, but it seems to not be used at all ... at least in the code I could find. And it seems not to fit with the theme of the class -- if I have two things, and need to pick one, what do I need an 'empty' for?
why does the Alternative type class need an Applicative constraint, and why does it need a kind of * -> *? Why not just have <|> :: a -> a -> a? All of the instances could still be implemented in the same way ... I think (not sure). What value does it provide that Monoid doesn't?
what's the point of the MonadPlus type class? Can't I unlock all of its goodness by just using something as both a Monad and Alternative? Why not just ditch it? (I'm sure I'm wrong, but I don't have any counterexamples)
Hopefully all those questions are coherent ... !
Bounty update: #Antal's answer is a great start, but Q3 is still open: what does Alternative provide that Monoid doesn't? I find this answer unsatisfactory since it lacks concrete examples, and a specific discussion of how the higher-kindedness of Alternative distinguishes it from Monoid.
If it's to combine applicative's effects with Monoid's behavior, why not just:
liftA2 mappend
This is even more confusing for me because many Monoid instances are exactly the same as the Alternative instances.
That's why I'm looking for specific examples that show why Alternative is necessary, and how it's different -- or means something different -- from Monoid.

To begin with, let me offer short answers to each of these questions. I will then expand each into a longer detailed answer, but these short ones will hopefully help in navigating those.
No, Alternative and Monoid don’t mean different things; Alternative is for types which have the structure both of Applicative and of Monoid. “Picking” and “combining” are two different intuitions for the same broader concept.
Alternative contains empty as well as <|> because the designers thought this would be useful, and because this gives rise to a monoid. In terms of picking, empty corresponds to making an impossible choice.
We need both Alternative and Monoid because the former obeys (or should) more laws than the latter; these laws relate the monoidal and applicative structure of the type constructor. Additionally, Alternative can’t depend on the inner type, while Monoid can.
MonadPlus is slightly stronger than Alternative, as it must obey more laws; these laws relate the monoidal structure to the monadic structure in addition to the applicative structure. If you have instances of both, they should coincide.
Doesn’t Alternative mean something totally different from Monoid?
Not really! Part of the reason for your confusion is that the Haskell Monoid class uses some pretty bad (well, insufficiently general) names. This is how a mathematician would define a monoid (being very explicit about it):
Definition. A monoid is a set M equipped with a distinguished element ε ∈ M and a binary operator · : M × M → M, denoted by juxtaposition, such that the following two conditions hold:
ε is the identity: for all m ∈ M, mε = εm = m.
· is associative: for all m₁,m₂,m₃ ∈ M, (m₁m₂)m₃ = m₁(m₂m₃).
That’s it. In Haskell, ε is spelled mempty and · is spelled mappend (or, these days, <>), and the set M is the type M in instance Monoid M where ....
Looking at this definition, we see that it says nothing about “combining” (or about “picking,” for that matter). It says things about · and about ε, but that’s it. Now, it’s certainly true that combining things works well with this structure: ε corresponds to having no things, and m₁m₂ says that if I glom m₁ and m₂’s stuff together, I can get a new thing containing all their stuff. But here’s an alternative intuition: ε corresponds to no choices at all, and m₁m₂ corresponds to a choice between m₁ and m₂. This is the “picking” intuition. Note that both obey the monoid laws:
Having nothing at all and having no choice are both the identity.
If I have no stuff and glom it together with some stuff, I end up with that same stuff again.
If I have a choice between no choice at all (something impossible) and some other choice, I have to pick the other (possible) choice.
Glomming collections together and making a choice are both associative.
If I have three collections of things, it doesn’t matter if I glom the first two together and then the third, or the last two together and then the first; either way, I end up with the same total glommed collection.
If I have a choice between three things, it doesn’t matter if I (a) first choose between first-or-second and third and then, if I need to, between first and second, or (b) first choose between first and second-or-third and then, if I need to, between second and third. Either way, I can pick what I want.
(Note: I’m playing fast and loose here; that’s why it’s intuition. For instance, it’s important to remember that · need not be commutative, which the above glosses over: it’s perfectly possible that m₁m₂ ≠ m₂m₁.)
Behold: both these sorts of things (and many others—is multiplying numbers really either “combining” or “picking”?) obey the same rules. Having an intuition is important to develop understanding, but it’s the rules and definitions that determine what’s actually going on.
And the best part is that these both of these intuitions can be interpreted by the same carrier! Let M be some set of sets (not a set of all sets!) containing the empty set, let ε be the empty set ∅, and let · be set union ∪. It is easy to see that ∅ is an identity for ∪, and that ∪ is associative, so we can conclude that (M,∅,∪) is a monoid. Now:
If we think about sets as being collections of things, then ∪ corresponds to glomming them together to get more things—the “combining” intuition.
If we think about sets as representing possible actions, then ∪ corresponds to increasing your pool of possible actions to pick from—the “picking” intuition.
And this is exactly what’s going on with [] in Haskell: [a] is a Monoid for all a, and [] as an applicative functor (and monad) is used to represent nondeterminism. Both the combining and the picking intuitions coincide at the same type: mempty = empty = [] and mappend = (<|>) = (++).
So the Alternative class is just there to represent objects which (a) are applicative functors, and (b) when instantiated at a type, have a value and a binary function on them which follow some rules. Which rules? The monoid rules. Why? Because it turns out to be useful :-)
Why does Alternative need an empty method/member?
Well, the snarky answer is “because Alternative represents a monoid structure.” But the real question is: why a monoid structure? Why not just a semigroup, a monoid without ε? One answer is to claim that monoids are just more useful. I think many people (but perhaps not Edward Kmett) would agree with this; almost all of the time, if you have a sensible (<|>)/mappend/·, you’ll be able to define a sensible empty/mempty/ε. On the other hand, having the extra generality is nice, since it lets you place more things under the umbrella.
You also want to know how this meshes with the “picking” intuition. Keeping in mind that, in some sense, the right answer is “know when to abandon the ‘picking’ intuition,” I think you can unify the two. Consider [], the applicative functor for nondeterminism. If I combine two values of type [a] with (<|>), that corresponds to nondeterministically picking either an action from the left or an action from the right. But sometimes, you’re going to have no possible actions on one side—and that’s fine. Similarly, if we consider parsers, (<|>) represents a parser which parses either what’s on the left or what’s on the right (it “picks”). And if you have a parser which always fails, that ends up being an identity: if you pick it, you immediately reject that pick and try the other one.
All this said, remember that it would be entirely possible to have a class almost like Alternative, but lacking empty. That would be perfectly valid—it could even be a superclass of Alternative—but happens not to be what Haskell did. Presumably this is out of a guess as to what’s useful.
Why does the Alternative type class need an Applicative constraint, and why does it need a kind of * -> *? … Why not just [use] liftA2 mappend?
Well, let’s consider each of these three proposed changes: getting rid of the Applicative constraint for Alternative; changing the kind of Alternative’s argument; and using liftA2 mappend instead of <|> and pure mempty instead of empty. We’ll look at this third change first, since it’s the most different. Suppose we got rid of Alternative entirely, and replaced the class with two plain functions:
fempty :: (Applicative f, Monoid a) => f a
fempty = pure mempty
(>|<) :: (Applicative f, Monoid a) => f a -> f a -> f a
(>|<) = liftA2 mappend
We could even keep the definitions of some and many. And this does give us a monoid structure, it’s true. But it seems like it gives us the wrong one . Should Just fst >|< Just snd fail, since (a,a) -> a isn’t an instance of Monoid? No, but that’s what the above code would result in. The monoid instance we want is one that’s inner-type agnostic (to borrow terminology from Matthew Farkas-Dyck in a very related haskell-cafe discussion which asks some very similar questions); the Alternative structure is about a monoid determined by f’s structure, not the structure of f’s argument.
Now that we think we want to leave Alternative as some sort of type class, let’s look at the two proposed ways to change it. If we change the kind, we have to get rid of the Applicative constraint; Applicative only talks about things of kind * -> *, and so there’s no way to refer to it. That leaves two possible changes; the first, more minor, change is to get rid of the Applicative constraint but leave the kind alone:
class Alternative' f where
empty' :: f a
(<||>) :: f a -> f a -> f a
The other, larger, change is to get rid of the Applicative constraint and change the kind:
class Alternative'' a where
empty'' :: a
(<|||>) :: a -> a -> a
In both cases, we have to get rid of some/many, but that’s OK; we can define them as standalone functions with the type (Applicative f, Alternative' f) => f a -> f [a] or (Applicative f, Alternative'' (f [a])) => f a -> f [a].
Now, in the second case, where we change the kind of the type variable, we see that our class is exactly the same as Monoid (or, if you still want to remove empty'', Semigroup), so there’s no advantage to having a separate class. And in fact, even if we leave the kind variable alone but remove the Applicative constraint, Alternative just becomes forall a. Monoid (f a), although we can’t write these quantified constraints in Haskell, not even with all the fancy GHC extensions. (Note that this expresses the inner-type–agnosticism mentioned above.) Thus, if we can make either of these changes, then we have no reason to keep Alternative (except for being able to express that quantified constraint, but that hardly seems compelling).
So the question boils down to “is there a relationship between the Alternative parts and the Applicative parts of an f which is an instance of both?” And while there’s nothing in the documentation, I’m going to take a stand and say yes—or at the very least, there ought to be. I think that Alternative is supposed to obey some laws relating to Applicative (in addition to the monoid laws); in particular, I think those laws are something like
Right distributivity (of <*>): (f <|> g) <*> a = (f <*> a) <|> (g <*> a)
Right absorption (for <*>): empty <*> a = empty
Left distributivity (of fmap): f <$> (a <|> b) = (f <$> a) <|> (f <$> b)
Left absorption (for fmap): f <$> empty = empty
These laws appear to be true for [] and Maybe, and (pretending its MonadPlus instance is an Alternative instance) IO, but I haven’t done any proofs or exhaustive testing. (For instance, I originally thought that left distributivity held for <*>, but this “performs the effects” in the wrong order for [].) By way of analogy, though, it is true that MonadPlus is expected to obey similar laws (although there is apparently some ambiguity about which). I had originally wanted to claim a third law, which seems natural:
Left absorption (for <*>): a <*> empty = empty
However, although I believe [] and Maybe obey this law, IO doesn’t, and I think (for reasons that will become apparent in the next couple of paragraphs) it’s best not to require it.
And indeed, it appears that Edward Kmett has some slides where he espouses a similar view; to get into that, we’ll need to take brief digression involving some more mathematical jargon. The final slide, “I Want More Structure,” says that “A Monoid is to an Applicative as a Right Seminearring is to an Alternative,” and “If you throw away the argument of an Applicative, you get a Monoid, if you throw away the argument of an Alternative you get a RightSemiNearRing.”
Right seminearrings? “How did right seminearrings get into it?” I hear you cry. Well,
Definition. A right near-semiring (also right seminearring, but the former seems to be used more on Google) is a quadruple (R,+,·,0) where (R,+,0) is a monoid, (R,·) is a semigroup, and the following two conditions hold:
· is right-distributive over +: for all r,s,t ∈ R, (s + t)r = sr + tr.
0 is right-absorbing for ·: for all r ∈ R, 0r = 0.
A left near-semiring is defined analogously.
Now, this doesn’t quite work, because <*> is not truly associative or a binary operator—the types don’t match. I think this is what Edward Kmett is getting at when he talks about “throw[ing] away the argument.” Another option might be to say (I’m unsure if this is right) that we actually want (f a, <|>, <*>, empty) to form a right near-semiringoid, where the “-oid” suffix indicates that the binary operators can only be applied to specific pairs of elements (à la groupoids). And we’d also want to say that (f a, <|>, <$>, empty) was a left near-semiringoid, although this could conceivably follow from the combination of the Applicative laws and the right near-semiringoid structure. But now I’m getting in over my head, and this isn’t deeply relevant anyway.
At any rate, these laws, being stronger than the monoid laws, mean that perfectly valid Monoid instances would become invalid Alternative instances. There are (at least) two examples of this in the standard library: Monoid a => (a,) and Maybe. Let’s look at each of them quickly.
Given any two monoids, their product is a monoid; consequently, tuples can be made an instance of Monoid in the obvious way (reformatting the base package’s source):
instance (Monoid a, Monoid b) => Monoid (a,b) where
mempty = (mempty, mempty)
(a1,b1) `mappend` (a2,b2) = (a1 `mappend` a2, b1 `mappend` b2)
Similarly, we can make tuples whose first component is an element of a monoid into an instance of Applicative by accumulating the monoid elements (reformatting the base package’s source):
instance Monoid a => Applicative ((,) a) where
pure x = (mempty, x)
(u, f) <*> (v, x) = (u `mappend` v, f x)
However, tuples aren’t an instance of Alternative, because they can’t be—the monoidal structure over Monoid a => (a,b) isn’t present for all types b, and Alternative’s monoidal structure must be inner-type agnostic. Not only must b be a monad, to be able to express (f <> g) <*> a, we need to use the Monoid instance for functions, which is for functions of the form Monoid b => a -> b. And even in the case where we have all the necessary monoidal structure, it violates all four of the Alternative laws. To see this, let ssf n = (Sum n, (<> Sum n)) and let ssn = (Sum n, Sum n). Then, writing (<>) for mappend, we get the following results (which can be checked in GHCi, with the occasional type annotation):
Right distributivity:
(ssf 1 <> ssf 1) <*> ssn 1 = (Sum 3, Sum 4)
(ssf 1 <*> ssn 1) <> (ssf 1 <*> ssn 1) = (Sum 4, Sum 4)
Right absorption:
mempty <*> ssn 1 = (Sum 1, Sum 0)
mempty = (Sum 0, Sum 0)
Left distributivity:
(<> Sum 1) <$> (ssn 1 <> ssn 1) = (Sum 2, Sum 3)
((<> Sum 1) <$> ssn 1) <> ((<> Sum 1) <$> ssn 1) = (Sum 2, Sum 4)
Left absorption:
(<> Sum 1) <$> mempty = (Sum 0, Sum 1)
mempty = (Sum 1, Sum 1)
Next, consider Maybe. As it stands, Maybe’s Monoid and Alternative instances disagree. (Although the haskell-cafe discussion I mention at the beginning of this section proposes changing this, there’s an Option newtype from the semigroups package which would produce the same effect.) As a Monoid, Maybe lifts semigroups into monoids by using Nothing as the identity; since the base package doesn’t have a semigroup class, it just lifts monoids, and so we get (reformatting the base package’s source):
instance Monoid a => Monoid (Maybe a) where
mempty = Nothing
Nothing `mappend` m = m
m `mappend` Nothing = m
Just m1 `mappend` Just m2 = Just (m1 `mappend` m2)
On the other hand, as an Alternative, Maybe represents prioritized choice with failure, and so we get (again reformatting the base package’s source):
instance Alternative Maybe where
empty = Nothing
Nothing <|> r = r
l <|> _ = l
And it turns out that only the latter satisfies the Alternative laws. The Monoid instance fails less badly than (,)’s; it does obey the laws with respect to <*>, although almost by accident—it comes form the behavior of the only instance of Monoid for functions, which (as mentioned above), lifts functions that return monoids into the reader applicative functor. If you work it out (it’s all very mechanical), you’ll find that right distributivity and right absorption for <*> all hold for both the Alternative and Monoid instances, as does left absorption for fmap. And left distributivity for fmap does hold for the Alternative instance, as follows:
f <$> (Nothing <|> b)
= f <$> b by the definition of (<|>)
= Nothing <|> (f <$> b) by the definition of (<|>)
= (f <$> Nothing) <|> (f <$> b) by the definition of (<$>)
f <$> (Just a <|> b)
= f <$> Just a by the definition of (<|>)
= Just (f a) by the definition of (<$>)
= Just (f a) <|> (f <$> b) by the definition of (<|>)
= (f <$> Just a) <|> (f <$> b) by the definition of (<$>)
However, it fails for the Monoid instance; writing (<>) for mappend, we have:
(<> Sum 1) <$> (Just (Sum 0) <> Just (Sum 0)) = Just (Sum 1)
((<> Sum 1) <$> Just (Sum 0)) <> ((<> Sum 1) <$> Just (Sum 0)) = Just (Sum 2)
Now, there is one caveat to this example. If you only require that Alternatives be compatibility with <*>, and not with <$>, then Maybe is fine. Edward Kmett’s slides, mentioned above, don’t make reference to <$>, but I think it seems reasonable to require laws with respect to it as well; nevertheless, I can’t find anything to back me up on this.
Thus, we can conclude that being an Alternative is a stronger requirement than being a Monoid, and so it requires a different class. The purest example of this would be a type with an inner-type agnostic Monoid instance and an Applicative instance which were incompatible with each other; however, there aren’t any such types in the base package, and I can’t think of any. (It’s possible none exist, although I’d be surprised.) Nevertheless, these inner-type gnostic examples demonstrate why the two type classes must be different.
What’s the point of the MonadPlus type class?
MonadPlus, like Alternative, is a strengthening of Monoid, but with respect to Monad instead of Applicative. According to Edward Kmett in his answer to the question “Distinction between typeclasses MonadPlus, Alternative, and Monoid?”, MonadPlus is also stronger than Alternative: the law empty <*> a, for instance, doesn’t imply that empty >>= f. AndrewC provides two examples of this: Maybe and its dual. The issue is complicated by the fact that there are two potential sets of laws for MonadPlus. It is universally agreed that MonadPlus is supposed to form a monoid with mplus and mempty, and it’s supposed to satisfy the left zero law, mempty >>= f = mempty. Hhowever, some MonadPlusses satisfy left distribution, mplus a b >>= f = mplus (a >>= f) (b >>= f); and others satisfy left catch, mplus (return a) b = return a. (Note that left zero/distribution for MonadPlus are analogous to right distributivity/absorption for Alternative; (<*>) is more analogous to (=<<) than (>>=).) Left distribution is probably “better,” so any MonadPlus instance which satisfies left catch, such as Maybe, is an Alternative but not the first kind of MonadPlus. And since left catch relies on ordering, you can imagine a newtype wrapper for Maybe whose Alternative instance is right-biased instead of left-biased: a <|> Just b = Just b. This will satisfy neither left distribution nor left catch, but will be a perfectly valid Alternative.
However, since any type which is a MonadPlus ought to have its instance coincide with its Alternative instance (I believe this is required in the same way that it is required that ap and (<*>) are equal for Monads that are Applicatives), you could imagine defining the MonadPlus class instead as
class (Monad m, Alternative m) => MonadPlus' m
The class doesn’t need to declare new functions; it’s just a promise about the laws obeyed by empty and (<|>) for the given type. This design technique isn’t used in the Haskell standard libraries, but is used in some more mathematically-minded packages for similar purposes; for instance, the lattices package uses it to express the idea that a lattice is just a join semilattice and a meet semilattice over the same type which are linked by absorption laws.
The reason you can’t do the same for Alternative, even if you wanted to guarantee that Alternative and Monoid always coincided, is because of the kind mismatch. The desired class declaration would have the form
class (Applicative f, forall a. Monoid (f a)) => Alternative''' f
but (as mentioned far above) not even GHC Haskell supports quantified constraints.
Also, note that having Alternative as be a superclass of MonadPlus would require Applicative being a superclass of Monad, so good luck getting that to happen. If you run into that problem, there’s always the WrappedMonad newtype, which turns any Monad into an Applicative in the obvious way; there’s an instance MonadPlus m => Alternative (WrappedMonad m) where ... which does exactly what you’d expect.

import Data.Monoid
import Control.Applicative
Let's trace through an example of how Monoid and Alternative interact with the Maybe functor and the ZipList functor, but let's start from scratch, partly to get all the definitions fresh in our minds, partly to stop from switching tabs to bits of hackage all the time, but mainly so I can run this past ghci to correct my typos!
(<>) :: Monoid a => a -> a -> a
(<>) = mappend -- I'll be using <> freely instead of `mappend`.
Here's the Maybe clone:
data Perhaps a = Yes a | No deriving (Eq, Show)
instance Functor Perhaps where
fmap f (Yes a) = Yes (f a)
fmap f No = No
instance Applicative Perhaps where
pure a = Yes a
No <*> _ = No
_ <*> No = No
Yes f <*> Yes x = Yes (f x)
and now ZipList:
data Zip a = Zip [a] deriving (Eq,Show)
instance Functor Zip where
fmap f (Zip xs) = Zip (map f xs)
instance Applicative Zip where
Zip fs <*> Zip xs = Zip (zipWith id fs xs) -- zip them up, applying the fs to the xs
pure a = Zip (repeat a) -- infinite so that when you zip with something, lengths don't change
Structure 1: combining elements: Monoid
Maybe clone
First let's look at Perhaps String. There are two ways of combining them. Firstly concatenation
(<++>) :: Perhaps String -> Perhaps String -> Perhaps String
Yes xs <++> Yes ys = Yes (xs ++ ys)
Yes xs <++> No = Yes xs
No <++> Yes ys = Yes ys
No <++> No = No
Concatenation works inherently at the String level, not really the Perhaps level, by treating No as if it were Yes []. It's equal to liftA2 (++). It's sensible and useful, but maybe we could generalise from just using ++ to using any way of combining - any Monoid then!
(<++>) :: Monoid a => Perhaps a -> Perhaps a -> Perhaps a
Yes xs <++> Yes ys = Yes (xs `mappend` ys)
Yes xs <++> No = Yes xs
No <++> Yes ys = Yes ys
No <++> No = No
This monoid structure for Perhaps tries to work as much as possible at the a level. Notice the Monoid a constraint, telling us we're using structure from the a level. This isn't an Alternative structure, it's a derived (lifted) Monoid structure.
instance Monoid a => Monoid (Perhaps a) where
mappend = (<++>)
mempty = No
Here I used the structure of the data a to add structure to the whole thing. If I were combining Sets, I'd be able to add an Ord a context instead.
ZipList clone
So how should we combine elements with a zipList? What should these zip to if we're combining them?
Zip ["HELLO","MUM","HOW","ARE","YOU?"]
<> Zip ["this", "is", "fun"]
= Zip ["HELLO" ? "this", "MUM" ? "is", "HOW" ? "fun"]
mempty = ["","","","",..] -- sensible zero element for zipping with ?
But what should we use for ?. I say the only sensible choice here is ++. Actually, for lists, (<>) = (++)
Zip [Just 1, Nothing, Just 3, Just 4]
<> Zip [Just 40, Just 70, Nothing]
= Zip [Just 1 ? Just 40, Nothing ? Just 70, Just 3 ? Nothing]
mempty = [Nothing, Nothing, Nothing, .....] -- sensible zero element
But what can we use for ? I say that we're meant to be combining elements, so we should use the element-combining operator from Monoid again: <>.
instance Monoid a => Monoid (Zip a) where
Zip as `mappend` Zip bs = Zip (zipWith (<>) as bs) -- zipWith the internal mappend
mempty = Zip (repeat mempty) -- repeat the internal mempty
This is the only sensible way of combining the elements using a zip - so it's the only sensible monoid instance.
Interestingly, that doesn't work for the Maybe example above, because Haskell doesn't know how to combine Ints - should it use + or *? To get a Monoid instance on numerical data, you wrap them in Sum or Product to tell it which monoid to use.
Zip [Just (Sum 1), Nothing, Just (Sum 3), Just (Sum 4)] <>
Zip [Just (Sum 40), Just (Sum 70), Nothing]
= Zip [Just (Sum 41),Just (Sum 70), Just (Sum 3)]
Zip [Product 5,Product 10,Product 15]
<> Zip [Product 3, Product 4]
= Zip [Product 15,Product 40]
Key point
Notice the fact that the type in a Monoid has kind * is exactly what allows us to put the Monoid a context here - we could also add Eq a or Ord a. In a Monoid, the raw elements matter. A Monoid instance is designed to let you manipulate and combine the data inside the structure.
Structure 2: higher-level choice: Alternative
A choice operator is similar, but also different.
Maybe clone
(<||>) :: Perhaps String -> Perhaps String -> Perhaps String
Yes xs <||> Yes ys = Yes xs -- if we can have both, choose the left one
Yes xs <||> No = Yes xs
No <||> Yes ys = Yes ys
No <||> No = No
Here there's no concatenation - we didn't use ++ at all - this combination works purely at the Perhaps level, so let's change the type signature to
(<||>) :: Perhaps a -> Perhaps a -> Perhaps a
Yes xs <||> Yes ys = Yes xs -- if we can have both, choose the left one
Yes xs <||> No = Yes xs
No <||> Yes ys = Yes ys
No <||> No = No
Notice there's no constraint - we're not using the structure from the a level, just structure at the Perhaps level. This is an Alternative structure.
instance Alternative Perhaps where
(<|>) = (<||>)
empty = No
ZipList clone
How should we choose between two ziplists?
Zip [1,3,4] <|> Zip [10,20,30,40] = ????
It would be very tempting to use <|> on the elements, but we can't because the type of the elements isn't available to us. Let's start with the empty. It can't use an element because we don't know the type of the elements when defining an Alternative, so it has to be Zip []. We need it to be a left (and preferably right) identity for <|>, so
Zip [] <|> Zip ys = Zip ys
Zip xs <|> Zip [] = Zip xs
There are two sensible choices for Zip [1,3,4] <|> Zip [10,20,30,40]:
Zip [1,3,4] because it's first - consistent with Maybe
Zip [10,20,30,40] because it's longest - consistent with Zip [] being discarded
Well that's easy to decide: since pure x = Zip (repeat x), both lists might be infinite, so comparing them for length might never terminate, so it has to be pick the first one. Thus the only sensible Alternative instance is:
instance Alternative Zip where
empty = Zip []
Zip [] <|> x = x
Zip xs <|> _ = Zip xs
This is the only sensible Alternative we could have defined. Notice how different it is from the Monoid instance, because we couldn't mess with the elements, we couldn't even look at them.
Key Point
Notice that because Alternative takes a constructor of kind * -> * there is no possible way to add an Ord a or Eq a or Monoid a context. An Alternative is not allowed to use any information about the data inside the structure. You cannot, no matter how much you would like to, do anything to the data, except possibly throw it away.
Key point: What's the difference between Alternative and Monoid?
Not a lot - they're both monoids, but to summarise the last two sections:
Monoid * instances make it possible to combine internal data. Alternative (* -> *) instances make it impossible. Monoid provides flexibility, Alternative provides guarantees. The kinds * and (* -> *) are the main drivers of this difference. Having them both allows you to use both sorts of operations.
This is the right thing, and our two flavours are both appropriate. The Monoid instance for Perhaps String represents putting together all characters, the Alternative instance represents a choice between Strings.
There is nothing wrong with the Monoid instance for Maybe - it's doing its job, combining data.
There's nothing wrong with the Alternative instance for Maybe - it's doing its job, choosing between things.
The Monoid instance for Zip combines its elements. The Alternative instance for Zip is forced to choose one of the lists - the first non-empty one.
It's good to be able to do both.
What's the Applicative context any use for?
There's some interaction between choosing and applying. See Antal S-Z's laws stated in his question or in the middle of his answer here.
From a practical point of view, it's useful because Alternative is something that is used for some Applicative Functors to choose. The functionality was being used for Applicatives, and so a general interface class was invented. Applicative Functors are good for representing computations that produce values (IO, Parser, Input UI element,...) and some of them have to handle failure - Alternative is needed.
Why does Alternative have empty?
why does Alternative need an empty method/member? I may be wrong, but it seems to not be used at all ... at least in the code I could find. And it seems not to fit with the theme of the class -- if I have two things, and need to pick one, what do I need an 'empty' for?
That's like asking why addition needs a 0 - if you want to add stuff, what's the point in having something that doesn't add anything? The answer is that 0 is the crucual pivotal number around which everything revolves in addition, just like 1 is crucial for multiplication, [] is crucial for lists (and y=e^x is crucial for calculus). In practical terms, you use these do-nothing elements to start your building:
sum = foldr (+) 0
concat = foldr (++) []
msum = foldr (`mappend`) mempty -- any Monoid
whichEverWorksFirst = foldr (<|>) empty -- any Alternative
Can't we replace MonadPlus with Monad+Alternative?
what's the point of the MonadPlus type class? Can't I unlock all of its goodness by just using something as both a Monad and Alternative? Why not just ditch it? (I'm sure I'm wrong, but I don't have any counterexamples)
You're not wrong, there aren't any counterexamples!
Your interesting question has got Antal S-Z, Petr Pudlák and I delved into what the relationship between MonadPlus and Applicative really is. The answer,
here
and here
is that anything that's a MonadPlus (in the left distribution sense - follow links for details) is also an Alternative, but not the other way around.
This means that if you make an instance of Monad and MonadPlus, it satisfies the conditions for Applicative and Alternative anyway. This means if you follow the rules for MonadPlus (with left dist), you may as well have made your Monad an Applicative and used Alternative.
If we remove the MonadPlus class, though, we remove a sensible place for the rules to be documented, and you lose the ability to specify that something's Alternative without being MonadPlus (which technically we ought to have done for Maybe). These are theoretical reasons. The practical reason is that it would break existing code. (Which is also why neither Applicative nor Functor are superclasses of Monad.)
Aren't Alternative and Monoid the same? Aren't Alternative and Monoid completely different?
the 'pedia says that "the Alternative type class is for Applicative functors which also have a monoid structure." I don't get this -- doesn't Alternative mean something totally different from Monoid? i.e. I understood the point of the Alternative type class as picking between two things, whereas I understood Monoids as being about combining things.
Monoid and Alternative are two ways of getting one object from two in a sensible way. Maths doesn't care whether you're choosing, combining, mixing or blowing up your data, which is why Alternative was referred to as a Monoid for Applicative. You seem to be at home with that concept now, but you now say
for types that have both an Alternative and a Monoid instance, the instances are intended to be the same
I disagree with this, and I think my Maybe and ZipList examples are carefully explained as to why they're different. If anything, I think it should be rare that they're the same. I can only think of one example, plain lists, where this is appropriate. That's because lists are a fundamental example of a monoid with ++, but also lists are used in some contexts as an indeterminate choice of elements, so <|> should also be ++.

Summary
We need to define (instances that provide the same operations as) Monoid instances for some applicative functors, that genuinely combine at the applicative functor level, and not just lifting lower level monoids. The example error below from litvar = liftA2 mappend literal variable shows that <|> cannot in general be defined as liftA2 mappend; <|> works in this case by combining parsers, not their data.
If we used Monoid directly, we'd need language extensions to define the instances. Alternative is higher kinded so you can make these instances without requiring language extensions.
Example: Parsers
Let's imagine we're parsing some declarations, so we import everything we're going to need
import Text.Parsec
import Text.Parsec.String
import Control.Applicative ((<$>),(<*>),liftA2,empty)
import Data.Monoid
import Data.Char
and think about how we'll parse a type. We choose simplistic:
data Type = Literal String | Variable String deriving Show
examples = [Literal "Int",Variable "a"]
Now let's write a parser for literal types:
literal :: Parser Type
literal = fmap Literal $ (:) <$> upper <*> many alphaNum
Meaning: parse an uppercase character, then many alphaNumeric characters, combine the results into a single String with the pure function (:). Afterwards, apply the pure function Literal to turn those Strings into Types. We'll parse variable types exactly the same way, except for starting with a lowercase letter:
variable :: Parser Type
variable = fmap Variable $ (:) <$> lower <*> many alphaNum
That's great, and parseTest literal "Bool" == Literal "Bool" exactly as we'd hoped.
Question 3a: If it's to combine applicative's effects with Monoid's behavior, why not just liftA2 mappend
Edit:Oops - forgot to actually use <|>!
Now let's combine these two parsers using Alternative:
types :: Parser Type
types = literal <|> variable
This can parse any Type: parseTest types "Int" == Literal "Bool" and parseTest types "a" == Variable "a".
This combines the two parsers, not the two values. That's the sense in which it works at the Applicative Functor level rather than the data level.
However, if we try:
litvar = liftA2 mappend literal variable
that would be asking the compiler to combine the two values that they generate, at the data level.
We get
No instance for (Monoid Type)
arising from a use of `mappend'
Possible fix: add an instance declaration for (Monoid Type)
In the first argument of `liftA2', namely `mappend'
In the expression: liftA2 mappend literal variable
In an equation for `litvar':
litvar = liftA2 mappend literal variable
So we found out the first thing; the Alternative class does something genuinely different to liftA2 mappend, becuase it combines objects at a different level - it combines the parsers, not the parsed data. If you like to think of it this way, it's combination at the genuinely higher-kind level, not merely a lift. I don't like saying it that way, because Parser Type has kind *, but it is true to say we're combining the Parsers, not the Types.
(Even for types with a Monoid instance, liftA2 mappend won't give you the same parser as <|>. If you try it on Parser String you'll get liftA2 mappend which parses one after the other then concatenates, versus <|> which will try the first parser and default to the second if it failed.)
Question 3b: In what way does Alternative's <|> :: f a -> f a -> f a differ from Monoid's mappend :: b -> b -> b?
Firstly, you're right to note that it doesn't provide new functionality over a Monoid instance.
Secondly, however, there's an issue with using Monoid directly:
Let's try to use mappend on parsers, at the same time as showing it's the same structure as Alternative:
instance Monoid (Parser a) where
mempty = empty
mappend = (<|>)
Oops! We get
Illegal instance declaration for `Monoid (Parser a)'
(All instance types must be of the form (T t1 ... tn)
where T is not a synonym.
Use -XTypeSynonymInstances if you want to disable this.)
In the instance declaration for `Monoid (Parser a)'
So if you have an applicative functor f, the Alternative instance shows that f a is a monoid, but you could only declare that as a Monoid with a language extension.
Once we add {-# LANGUAGE TypeSynonymInstances #-} at the top of the file, we're fine and can define
typeParser = literal `mappend` variable
and to our delight, it works: parseTest typeParser "Yes" == Literal "Yes" and parseTest typeParser "a" == Literal "a".
Even if you don't have any synonyms (Parser and String are synonyms, so they're out), you'll still need {-# LANGUAGE FlexibleInstances #-} to define an instance like this one:
data MyMaybe a = MyJust a | MyNothing deriving Show
instance Monoid (MyMaybe Int) where
mempty = MyNothing
mappend MyNothing x = x
mappend x MyNothing = x
mappend (MyJust a) (MyJust b) = MyJust (a + b)
(The monoid instance for Maybe gets around this by lifting the underlying monoid.)
Making a standard library unnecessarily dependent on language extensions is clearly undesirable.
So there you have it. Alternative is just Monoid for Applicative Functors (and isn't just a lift of a Monoid). It needs the higher-kinded type f a -> f a -> f a so you can define one without language extensions.
Your other Questions, for completeness:
Why does Alternative need an empty method/member?
Because having an identity for an operation is sometimes useful.
For example, you can define anyA = foldr (<|>) empty without using tedious edge cases.
what's the point of the MonadPlus type class? Can't I unlock all of its goodness by just using something as both a Monad and Alternative?
No. I refer you back to the question you linked to:
Moreover, even if Applicative was a superclass of Monad, you'd wind up needing the MonadPlus class anyways, because obeying empty <*> m = empty isn't strictly enough to prove that empty >>= f = empty.
....and I've come up with an example: Maybe. I explain in detail, with proof in this answer to Antal's question. For the purposes of this answer, it's worth noting that I was able to use >>= to make the MonadPlus instance that broke the Alternative laws.
Monoid structure is useful. Alternative is the best way of providing it for Applicative Functors.

I won't cover MonadPlus because there is disagreement about its laws.
After trying and failing to find any meaningful examples in which the structure of an Applicative leads naturally to an Alternative instance that disagrees with its Monoid instance*, I finally came up with this:
Alternative's laws are more strict than Monoid's, because the result cannot depend on the inner type. This excludes a large number of Monoid instances from being Alternatives.
These datatypes allow partial (meaning that they only work for some inner types) Monoid instances which are forbidden by the extra 'structure' of the * -> * kind. Examples:
the standard Maybe instance for Monoid assumes that the inner type is Monoid => not an Alternative
ZipLists, tuples, and functions can all be made Monoids, if their inner types are Monoids => not Alternatives
sequences that have at least one element -- cannot be Alternatives because there's no empty:
data Seq a
= End a
| Cons a (Seq a)
deriving (Show, Eq, Ord)
On the other hand, some data types cannot be made Alternatives because they're *-kinded:
unit -- ()
Ordering
numbers, booleans
My inferred conclusion: for types that have both an Alternative and a Monoid instance, the instances are intended to be the same. See also this answer.
excluding Maybe, which I argue doesn't count because its standard instance should not require Monoid for the inner type, in which case it would be identical to Alternative

I understood the point of the Alternative type class as picking between two things, whereas I understood Monoids as being about combining things.
If you think about this for a moment, they are the same.
The + combines things (usually numbers), and it's type signature is Int -> Int -> Int (or whatever).
The <|> operator selects between alternatives, and it's type signature is also the same: take two matching things and return a combined thing.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string