What is the point of having a lazy/strict version of Writer? - haskell

Why are there two different Writer-type monads in Haskell? Intuitively to me, reading "strict writer monad" means that the <> is strict, so that there's no thunk buildup in the log. However, looking at the source code, it turns out that that isn't the case:
-- Lazy Writer
instance (Monoid w, Monad m) => Monad (WriterT w m) where
-- ...
m >>= k = WriterT $ do
~(a, w) <- runWriterT m
~(b, w') <- runWriterT (k a)
return (b, w <> w')
In the strict version the patterns aren't irrefutable, i.e. the ~ are missing. So what happens above is that m and k a are not evaluated, but stored as thunks. In the strict version, they are evaluated to check whether they match the tuple patterns, the result is fed to <>. In both cases, the >>= isn't evaluated until something actually demands the resulting value.
So the way I understand it is that both the lazy and strict versions do the same thing, except that they have the thunk in a different place inside the definition of >>=: lazy produces runWriterT thunks, strict produces <> thunks.
This leaves me with two questions:
Is the above right, or do I misunderstand evaluation here?
Can I accomplish strict <> without writing my own wrapper and instance?

You first observation is correct, but this distinction between which thunks get created is important.
Lazy and Strict aren't about the strictness in the log type, but instead about the strictness in the pair.
These arise because a pair in Haskell has two possible ways to update it.
bimap f g (a,b) = (f a, g b)
or
bimap f g ~(a,b) = (f a, g b)
The latter is the same as
bimap f g p = (f (fst p), g (snd p))
The difference between these two is that when you pass the args to bimap in the first case, the pair is forced immediately.
In the latter case the pair is not immediately forced, but I instead hand you a (,) back filled with two non-strict computations.
This means that
fmap f _|_ = _|_
in the first case but
fmap f _|_ = (_|_, _|_)
in the second lazier pair case!
Both are correct under different interpretations of the concept of a pair. One is forced on you by pretending a pair is a pair in the categorical sense, that it doesn't have any interesting _|_'s in its own right. On the other hand, the interpretation of the domain as being as non-strict. as possible so you can have as many programs terminate as possible ushes you to the Lazy version.
(,) e is a perfectly admissable Writer, so this characterizes the problem.
The reason the distinction is made is that it matters for the termination of many exotic programs that take a fixed point through the monad. You can answer questions about certain circular programs involving state or writer, so long as they are Lazy.
Note, in neither case is this strict in the 'log' argument. Once you incur strictness in that you lose proper associativity and cease technically to be a Monad. =/
Because this isn't a monad, we don't supply it in the mtl!
With that, we can address your second question:
There are some workarounds though. You can construct a fake Writer on top of State. Basically pretend you aren't handed a state argument. and just mappend into the state as you would tell. Now you can do this strictly, because it isn't happening behind your back as part of every bind. The State is just passing through the state unmodified between actions.
shout :: Monoid s => s -> Strict.StateT s m ()
shout s' = do
s <- get
put $! s <> s'
This does, however mean that you force your entire State monad to get the output, and cannot produce parts of the Monoid lazily but you get something that is operationally closer to what an strict programmer would expect. Interestingly this works even with just Semigroup, because the only use of mempty is effectively at the start when you runState.

Related

Order of execution with Haskell's `mapM`

Consider the following Haskell statement:
mapM print ["1", "2", "3"]
Indeed, this prints "1", "2", and "3" in order.
Question: How do you know that mapM will first print "1", and then print "2", and finally print "3". Is there any guarantee that it will do this? Or is it a coincidence of how it is implemented deep within GHC?
If you evaluate mapM print ["1", "2", "3"] by expanding the definition of mapM you will arrive at (ignoring some irrelevant details)
print "1" >> print "2" >> print "3"
You can think of print and >> as abstract constructors of IO actions that cannot be evaluated any further, just as a data constructor like Just cannot be evaluated any further.
The interpretation of print s is the action of printing s, and the interpretation of a >> b is the action that first performs a and then performs b. So, the interpretation of
mapM print ["1", "2", "3"] = print "1" >> print "2" >> print "3"
is to first print 1, then print 2, and finally print 3.
How this is actually implemented in GHC is entirely a different matter which you shouldn't worry about for a long time.
There is no guarantee on the order of the evaluation but there is a guarantee on the order of the effects. For more information see this answer that discusses forM.
You need to learn to make the following, tricky distinction:
The order of evaluation
The order of effects (a.k.a. "actions")
What
forM, sequence and similar functions promise is that the effects will
be ordered from left to right. So for example, the following is
guaranteed to print characters in the same order that they occur in
the string...
Note: "forM is mapM with its arguments flipped. For a version that ignores the results see forM_."
Preliminary note: The answers by Reid Barton and Dair are entirely correct and fully cover your practical concerns. I mention that because partway through this answer one might have the impression that it contradicts them, which is not the case, as will be clear by the time we get to the end. That being clear, it is time to indulge in some language lawyering.
Is there any guarantee that [mapM print] will [print the list elements in order]?
Yes, there is, as explained by the other answers. Here, I will discuss what might justify this guarantee.
In this day and age, mapM is, by default, merely traverse specialised to monads:
traverse
:: (Traversable t, Applicative f) => (a -> f b) -> t a -> f (t b)
mapM
:: (Traversable t, Monad m) => (a -> m b) -> t a -> m (t b)
That being so, in what follows I will be primarily concerned with traverse, and how our expectations about the sequencing of effects relate to the Traversable class.
As far as the production of effects is concerned, traverse generates an Applicative effect for each value in the traversed container and combines all such effects through the relevant Applicative instance. This second part is clearly reflected by the type of sequenceA, through which the applicative context is, so to say, factored out of the container:
sequenceA :: (Traversable t, Applicative f) => t (f a) -> f (t a)
-- sequenceA and traverse are interrelated by:
traverse f = sequenceA . fmap f
sequenceA = traverse id
The Traversable instance for lists, for example, is:
instance Traversable [] where
{-# INLINE traverse #-} -- so that traverse can fuse
traverse f = List.foldr cons_f (pure [])
where cons_f x ys = (:) <$> f x <*> ys
It is plain to see that the combining, and therefore the sequencing, of effects is done through (<*>), so let's focus on it for a moment. Picking the IO applicative functor as an illustrative example, we can see (<*>) sequencing effects from left to right:
GHCi> -- Superfluous parentheses added for emphasis.
GHCi> ((putStrLn "Type something:" >> return reverse) <*> getLine) >>= putStrLn
Type something:
Whatever
revetahW
(<*>), however, sequences effects from left-to-right by convention, and not for any inherent reason. As witnessed by the Backwards wrapper from transformers, it is, in principle, always possible to implement (<*>) with right-to-left sequencing and still get a lawful Applicative instance. Without using the wrapper, it is also possible to take advantage of (<**>) from Control.Applicative to invert the sequencing:
(<**>) :: Applicative f => f a -> f (a -> b) -> f b
GHCi> import Control.Applicative
GHCi> (getLine <**> (putStrLn "Type something:" >> return reverse)) >>= putStrLn
Whatever
Type something:
revetahW
Given that it is so easy to flip the sequencing of Applicative effects, one might wonder whether this trick might transfer to Traversable. For instance, let's say we implement...
esrevart :: Applicative f => (a -> f b) -> [a] -> f [b]
... so that it is just like traverse for lists save for using Backwards or (<**>) to flip the sequencing of effects (I will leave that as an exercise for the reader). Would esrevart be a legal implementation of traverse? While we might figure it out by trying to prove the identity and composition laws of Traversable hold, that is actually not necessary: given that Backwards f for any applicative f is also applicative, an esrevart patterned after any lawful traverse will also follow the Traversable laws. The Reverse wrapper, also part of transformers, offers a general implementation of this reversal.
We have thus concluded that there can be legal Traversable instances that differ in the sequencing of effects. In particular, a list traverse that sequences effects from tail to head is conceivable. That doesn't make the possibility any less strange, though. To avoid utter bewilderment, Traversable instances are conventionally implemented with plain (<*>) and following the natural order in which the constructors are used to build the traversable container, which in the case of lists amounts to the expected head-to-tail sequencing of effects. One place where this convention shows up is in the automatic generation of instances by the DeriveTraversable extension.
A final, historical note. Couching this discussion, which is ultimately about mapM, in terms of the Traversable class would be a move of dubious relevance in a not so distant past. mapM was effectively subsumed by traverse only last year, but it has existed for much longer. For instance, the Haskell Report 1.3 from 1996, years before Applicative and Traversable came into being (not even ap is there, in fact), provides the following specification for mapM:
accumulate :: Monad m => [m a] -> m [a]
accumulate = foldr mcons (return [])
where mcons p q = p >>= \x -> q >>= \y -> return (x:y)
mapM :: Monad m => (a -> m b) -> [a] -> m [b]
mapM f as = accumulate (map f as)
The sequencing of effects, here enforced through (>>=), is left-to-right, for no other reason than it being the sensible thing to do.
P.S.: It is worth emphasising that, while it is possible to write a right-to-left mapM in terms of the Monad operations (in the Report 1.3 implementation quoted here, for instance, it merely requires exchanging p and q in the right-hand side of mcons), there is no such thing as a general Backwards for monads. Since f in x >>= f is a Monad m => a -> m b function which creates effects from values, the effects associated with f depend on x. As a consequence, a simple inversion of sequencing like that possible with (<*>) is not even guaranteed to be meaningful, let alone lawful.

Are Monad instances uniquely determined by their Applicative instances? [duplicate]

As described this question/answers, Functor instances are uniquely determined, if they exists.
For lists, there are two well know Applicative instances: [] and ZipList. So Applicative isn't unique (see also Can GHC derive Functor and Applicative instances for a monad transformer? and Why is there no -XDeriveApplicative extension?). However, ZipList needs infinite lists, as its pure repeats a given element indefinitely.
Are there other, perhaps better examples of data structures that have at least two Applicative instances?
Are there any such examples that only involve finite data structures? That is, like if hypothetically Haskell's type system distinguished inductive and coinductive data types, would it be possible to uniquely determine Applicative?
Going further, if we could extend both [] and ZipList to a Monad, we'd have an example where a monad isn't uniquely determined by the data type and its Functor. Alas, ZipList has a Monad instance only if we restrict ourselves to infinite lists (streams).
And return for [] creates a single-element list, so it requires finite lists. Therefore:
Are Monad instances uniquely determined by the data type? Or is there an example of a data type that can have two distinct Monad instances?
In the case there is an example with two or more distinct instances, an obvious question arises, if they must/can have the same Applicative instance:
Are Monad instances uniquely determined by the Applicative instance, or is there an example of an Applicative that can have two distinct Monad instances?
Is there an example of a data type with two distinct Monad instances, each having a different Applicative super-instance?
And finally we can ask the same question for Alternative/MonadPlus. This is complicated by the fact that there are two distinct set of MonadPlus laws. Assuming we accept one of the set of laws (and for Applicative we accept right/left distributivity/absorption, see also this question),
is Alternative uniquely determined by Applicative, and MonadPlus by Monad, or are there any counter-examples?
If any of the above are unique, I'd be interested in knowing why, to have a hint of a proof. If not, an counter-example.
First, since Monoids are not unique, neither are Writer Monads or Applicatives. Consider
data M a = M Int a
then you can give it Applicative and Monad instances isomorphic to either of:
Writer (Sum Int)
Writer (Product Int)
Given a Monoid instance for a type s, another isomorphic pair with different Applicative/Monad instances is:
ReaderT s (Writer s)
State s
As for having one Applicative instance extend to two different Monads, I cannot remember any example. However, back when I tried to convince myself completely about whether ZipList really cannot be made a Monad, I found the following pretty strong restriction that holds for any Monad:
join (fmap (\x -> fmap (\y -> f x y) ys) xs) = f <$> xs <*> ys
That doesn't give join for all values though: in the case of lists the restricted values are the ones where all elements have the same length, i.e. lists of lists with "rectangular" shape.
(For Reader monads, where the "shape" of monadic values doesn't vary, these are in fact all the m (m x) values, so those do have unique extension. EDIT: Come to think of it, Either, Maybe and Writer also have only "rectangular" m (m x) values, so their extension from Applicative to Monad is also unique.)
I wouldn't be surprised if an Applicative with two Monads exists, though.
For Alternative/MonadPlus, I cannot recall any law for instances using the Left Distribution law instead of Left Catch, I see nothing preventing you from just swapping (<|>) with flip (<|>). I don't know if there's a less trivial variation.
ADDENDUM: I suddenly remembered I had found an example of an Applicative with two Monads. Namely, finite lists. There's the usual Monad [] instance, but you can then replace its join by the following function (essentially making empty lists "infectious"):
ljoin xs
| any null xs = []
| otherwise = concat xs
(Alas, the lists need to be finite because otherwise the null check will never finish, and that would ruin the join . fmap return == id monad law.)
This has the same value as join/concat on rectangular lists of lists, so will give the same Applicative. As I recall, it turns out that the first two monad laws are automatic from that, and you just need to check ljoin . ljoin == ljoin . fmap ljoin.
Given that every Applicative has a Backwards counterpart,
newtype Backwards f x = Backwards {backwards :: f x}
instance Applicative f => Applicative (Backwards f) where
pure x = Backwards (pure x)
Backwards ff <*> Backwards fs = Backwards (flip ($) <$> fs <*> ff)
it's unusual for Applicative to be uniquely determined, just as (and this is very far from unrelated) many sets extend to monoids in multiple ways.
In this answer, I set the exercise of finding at least four distinct valid Applicative instances for nonempty lists: I won't spoil it here, but I will give a big hint on how to hunt.
Meanwhile, in some wonderful recent work (which I saw at a summer school a few months ago), Tarmo Uustalu showed a rather neat way to get a handle on this problem, at least when the underlying functor is a container, in the sense of Abbott, Altenkirch and Ghani.
Warning: Dependent types ahead!
What is a container? If you have dependent types to hand, you can present container-like functors F uniformly, as being determined by two components
a set of shapes, S : Set
an S-indexed set of positions, P : S -> Set
Up to isomorphism, container data structures in F X are given by the dependent pair of some shape s : S, and some function e : P s -> X, which tells you the element located at each position. That is, we define the extension of a container
(S <| P) X = (s : S) * (P s -> X)
(which, by the way, looks a lot like a generalized power series if you read -> as reversed exponentiation). The triangle is supposed to remind you of a tree node sideways, with an element s : S labelling the apex, and the baseline representing the position set P s. We say that some functor is a container if it is isomorphic to some S <| P.
In Haskell, you can easily take S = F (), but constructing P can take quite a bit of type-hackery. But that is something you can try at home. You'll find that containers are closed under all the usual polynomial type-forming operations, as well as identity,
Id ~= () <| \ _ -> ()
composition, where a whole shape is made from just one outer shape and an inner shape for each outer position,
(S0 <| P0) . (S1 <| P1) ~= ((S0 <| P0) S1) <| \ (s0, e0) -> (p0 : P0, P1 (e0 p0))
and some other things, notably the tensor, where there is one outer and one inner shape (so "outer" and "inner" are interchangeable)
(S0 <| P0) (X) (S1 <| P1) = ((S0, S1) <| \ (s0, s1) -> (P0 s0, P1 s1))
so that F (X) G means "F-structures of G-structures-all-the-same-shape", e.g., [] (X) [] means rectangular lists-of-lists. But I digress
Polymorphic functions between containers Every polymorphic function
m : forall X. (S0 <| P0) X -> (S1 <| P1) X
can be implemented by a container morphism, constructed from two components in a very particular way.
a function f : S0 -> S1 mapping input shapes to output shapes;
a function g : (s0 : S0) -> P1 (f s0) -> P0 s0 mapping output positions to input positions.
Our polymorphic function is then
\ (s0, e0) -> (f s0, e0 . g s0)
where the output shape is computed from the input shape, then the output positions are filled up by picking elements from input positions.
(If you're Peter Hancock, you have a whole other metaphor for what's going on. Shapes are Commands; Positions are Responses; a container morphism is a device driver, translating commands one way, then responses the other.)
Every container morphism gives you a polymorphic function, but the reverse is also true. Given such an m, we may take
(f s, g s) = m (s, id)
That is, we have a representation theorem, saying that every polymorphic function between two containers is given by such an f, g-pair.
What about Applicative? We kind of got a bit lost along the way, building all this machinery. But it has been worth it. When the underlying functors for monads and applicatives are containers, the polymorphic functions pure and <*>, return and join must be representable by the relevant notion of container morphism.
Let's take applicatives first, using their monoidal presentation. We need
unit : () -> (S <| P) ()
mult : forall X, Y. ((S <| P) X, (S <| P) Y) -> (S <| P) (X, Y)
The left-to-right maps for shapes require us to deliver
unitS : () -> S
multS : (S, S) -> S
so it looks like we might need a monoid. And when you check that the applicative laws, you find we need exactly a monoid. Equipping a container with applicative structure is exactly refining the monoid structures on its shapes with suitable position-respecting operations. There's nothing to do for unit (because there is no chocie of source position), but for mult, we need that whenenver
multS (s0, s1) = s
we have
multP (s0, s1) : P s -> (P s0, P s1)
satisfying appropriate identity and associativity conditions. If we switch to Hancock's interpretation, we're defining a monoid (skip, semicolon) for commands, where there is no way to look at the response to the first command before choosing the second, like commands are a deck of punch cards. We have to be able to chop up responses to combined commands into the individual responses to the individual commands.
So, every monoid on the shapes gives us a potential applicative structure. For lists, shapes are numbers (lengths), and there are a great many monoids from which to choose. Even if shapes live in Bool, we have quite a bit of choice.
What about Monad? Meanwhile, for monads M with M ~= S <| P. We need
return : Id -> M
join : M . M -> M
Looking at shapes first, that means we need a sort-of lopsided monoid.
return_f : () -> S
join_f : (S <| P) S -> S -- (s : S, P s -> S) -> S
It's lopsided because we get a bunch of shapes on the right, not just one. If we switch to Hancock's interpretation, we're defining a kind of sequential composition for commands, where we do let the second command be chosen on the basis of the first response, like we're interacting at a teletype. More geometrically, we're explaining how to glom two layers of a tree into one. It would be very surprising if such compositions were unique.
Again, for the positions, we have to map single output positions to pairs in a coherent way. This is trickier for monads: we first choose an outer position (response), then we have to choose an inner position(response) appropriate to the shape (command) found at the first position (chosen after the first response).
I'd love to link to Tarmo's work for the details, but it doesn't seem to have hit the streets yet. He has actually used this analysis to enumerate all possible monad structures for several choices of underlying container. I'm looking forward to the paper!
Edit. By way of doing honour to the other answer, I should observe that when everywhere P s = (), then (S <| P) X ~= (S, X) and the monad/applicative structures coincide exactly with each other and with the monoid structures on S. That is, for writer monads, we need only choose the shape-level operations, because there is exactly one position for a value in every case.

To what extent are Applicative/Monad instances uniquely determined?

As described this question/answers, Functor instances are uniquely determined, if they exists.
For lists, there are two well know Applicative instances: [] and ZipList. So Applicative isn't unique (see also Can GHC derive Functor and Applicative instances for a monad transformer? and Why is there no -XDeriveApplicative extension?). However, ZipList needs infinite lists, as its pure repeats a given element indefinitely.
Are there other, perhaps better examples of data structures that have at least two Applicative instances?
Are there any such examples that only involve finite data structures? That is, like if hypothetically Haskell's type system distinguished inductive and coinductive data types, would it be possible to uniquely determine Applicative?
Going further, if we could extend both [] and ZipList to a Monad, we'd have an example where a monad isn't uniquely determined by the data type and its Functor. Alas, ZipList has a Monad instance only if we restrict ourselves to infinite lists (streams).
And return for [] creates a single-element list, so it requires finite lists. Therefore:
Are Monad instances uniquely determined by the data type? Or is there an example of a data type that can have two distinct Monad instances?
In the case there is an example with two or more distinct instances, an obvious question arises, if they must/can have the same Applicative instance:
Are Monad instances uniquely determined by the Applicative instance, or is there an example of an Applicative that can have two distinct Monad instances?
Is there an example of a data type with two distinct Monad instances, each having a different Applicative super-instance?
And finally we can ask the same question for Alternative/MonadPlus. This is complicated by the fact that there are two distinct set of MonadPlus laws. Assuming we accept one of the set of laws (and for Applicative we accept right/left distributivity/absorption, see also this question),
is Alternative uniquely determined by Applicative, and MonadPlus by Monad, or are there any counter-examples?
If any of the above are unique, I'd be interested in knowing why, to have a hint of a proof. If not, an counter-example.
First, since Monoids are not unique, neither are Writer Monads or Applicatives. Consider
data M a = M Int a
then you can give it Applicative and Monad instances isomorphic to either of:
Writer (Sum Int)
Writer (Product Int)
Given a Monoid instance for a type s, another isomorphic pair with different Applicative/Monad instances is:
ReaderT s (Writer s)
State s
As for having one Applicative instance extend to two different Monads, I cannot remember any example. However, back when I tried to convince myself completely about whether ZipList really cannot be made a Monad, I found the following pretty strong restriction that holds for any Monad:
join (fmap (\x -> fmap (\y -> f x y) ys) xs) = f <$> xs <*> ys
That doesn't give join for all values though: in the case of lists the restricted values are the ones where all elements have the same length, i.e. lists of lists with "rectangular" shape.
(For Reader monads, where the "shape" of monadic values doesn't vary, these are in fact all the m (m x) values, so those do have unique extension. EDIT: Come to think of it, Either, Maybe and Writer also have only "rectangular" m (m x) values, so their extension from Applicative to Monad is also unique.)
I wouldn't be surprised if an Applicative with two Monads exists, though.
For Alternative/MonadPlus, I cannot recall any law for instances using the Left Distribution law instead of Left Catch, I see nothing preventing you from just swapping (<|>) with flip (<|>). I don't know if there's a less trivial variation.
ADDENDUM: I suddenly remembered I had found an example of an Applicative with two Monads. Namely, finite lists. There's the usual Monad [] instance, but you can then replace its join by the following function (essentially making empty lists "infectious"):
ljoin xs
| any null xs = []
| otherwise = concat xs
(Alas, the lists need to be finite because otherwise the null check will never finish, and that would ruin the join . fmap return == id monad law.)
This has the same value as join/concat on rectangular lists of lists, so will give the same Applicative. As I recall, it turns out that the first two monad laws are automatic from that, and you just need to check ljoin . ljoin == ljoin . fmap ljoin.
Given that every Applicative has a Backwards counterpart,
newtype Backwards f x = Backwards {backwards :: f x}
instance Applicative f => Applicative (Backwards f) where
pure x = Backwards (pure x)
Backwards ff <*> Backwards fs = Backwards (flip ($) <$> fs <*> ff)
it's unusual for Applicative to be uniquely determined, just as (and this is very far from unrelated) many sets extend to monoids in multiple ways.
In this answer, I set the exercise of finding at least four distinct valid Applicative instances for nonempty lists: I won't spoil it here, but I will give a big hint on how to hunt.
Meanwhile, in some wonderful recent work (which I saw at a summer school a few months ago), Tarmo Uustalu showed a rather neat way to get a handle on this problem, at least when the underlying functor is a container, in the sense of Abbott, Altenkirch and Ghani.
Warning: Dependent types ahead!
What is a container? If you have dependent types to hand, you can present container-like functors F uniformly, as being determined by two components
a set of shapes, S : Set
an S-indexed set of positions, P : S -> Set
Up to isomorphism, container data structures in F X are given by the dependent pair of some shape s : S, and some function e : P s -> X, which tells you the element located at each position. That is, we define the extension of a container
(S <| P) X = (s : S) * (P s -> X)
(which, by the way, looks a lot like a generalized power series if you read -> as reversed exponentiation). The triangle is supposed to remind you of a tree node sideways, with an element s : S labelling the apex, and the baseline representing the position set P s. We say that some functor is a container if it is isomorphic to some S <| P.
In Haskell, you can easily take S = F (), but constructing P can take quite a bit of type-hackery. But that is something you can try at home. You'll find that containers are closed under all the usual polynomial type-forming operations, as well as identity,
Id ~= () <| \ _ -> ()
composition, where a whole shape is made from just one outer shape and an inner shape for each outer position,
(S0 <| P0) . (S1 <| P1) ~= ((S0 <| P0) S1) <| \ (s0, e0) -> (p0 : P0, P1 (e0 p0))
and some other things, notably the tensor, where there is one outer and one inner shape (so "outer" and "inner" are interchangeable)
(S0 <| P0) (X) (S1 <| P1) = ((S0, S1) <| \ (s0, s1) -> (P0 s0, P1 s1))
so that F (X) G means "F-structures of G-structures-all-the-same-shape", e.g., [] (X) [] means rectangular lists-of-lists. But I digress
Polymorphic functions between containers Every polymorphic function
m : forall X. (S0 <| P0) X -> (S1 <| P1) X
can be implemented by a container morphism, constructed from two components in a very particular way.
a function f : S0 -> S1 mapping input shapes to output shapes;
a function g : (s0 : S0) -> P1 (f s0) -> P0 s0 mapping output positions to input positions.
Our polymorphic function is then
\ (s0, e0) -> (f s0, e0 . g s0)
where the output shape is computed from the input shape, then the output positions are filled up by picking elements from input positions.
(If you're Peter Hancock, you have a whole other metaphor for what's going on. Shapes are Commands; Positions are Responses; a container morphism is a device driver, translating commands one way, then responses the other.)
Every container morphism gives you a polymorphic function, but the reverse is also true. Given such an m, we may take
(f s, g s) = m (s, id)
That is, we have a representation theorem, saying that every polymorphic function between two containers is given by such an f, g-pair.
What about Applicative? We kind of got a bit lost along the way, building all this machinery. But it has been worth it. When the underlying functors for monads and applicatives are containers, the polymorphic functions pure and <*>, return and join must be representable by the relevant notion of container morphism.
Let's take applicatives first, using their monoidal presentation. We need
unit : () -> (S <| P) ()
mult : forall X, Y. ((S <| P) X, (S <| P) Y) -> (S <| P) (X, Y)
The left-to-right maps for shapes require us to deliver
unitS : () -> S
multS : (S, S) -> S
so it looks like we might need a monoid. And when you check that the applicative laws, you find we need exactly a monoid. Equipping a container with applicative structure is exactly refining the monoid structures on its shapes with suitable position-respecting operations. There's nothing to do for unit (because there is no chocie of source position), but for mult, we need that whenenver
multS (s0, s1) = s
we have
multP (s0, s1) : P s -> (P s0, P s1)
satisfying appropriate identity and associativity conditions. If we switch to Hancock's interpretation, we're defining a monoid (skip, semicolon) for commands, where there is no way to look at the response to the first command before choosing the second, like commands are a deck of punch cards. We have to be able to chop up responses to combined commands into the individual responses to the individual commands.
So, every monoid on the shapes gives us a potential applicative structure. For lists, shapes are numbers (lengths), and there are a great many monoids from which to choose. Even if shapes live in Bool, we have quite a bit of choice.
What about Monad? Meanwhile, for monads M with M ~= S <| P. We need
return : Id -> M
join : M . M -> M
Looking at shapes first, that means we need a sort-of lopsided monoid.
return_f : () -> S
join_f : (S <| P) S -> S -- (s : S, P s -> S) -> S
It's lopsided because we get a bunch of shapes on the right, not just one. If we switch to Hancock's interpretation, we're defining a kind of sequential composition for commands, where we do let the second command be chosen on the basis of the first response, like we're interacting at a teletype. More geometrically, we're explaining how to glom two layers of a tree into one. It would be very surprising if such compositions were unique.
Again, for the positions, we have to map single output positions to pairs in a coherent way. This is trickier for monads: we first choose an outer position (response), then we have to choose an inner position(response) appropriate to the shape (command) found at the first position (chosen after the first response).
I'd love to link to Tarmo's work for the details, but it doesn't seem to have hit the streets yet. He has actually used this analysis to enumerate all possible monad structures for several choices of underlying container. I'm looking forward to the paper!
Edit. By way of doing honour to the other answer, I should observe that when everywhere P s = (), then (S <| P) X ~= (S, X) and the monad/applicative structures coincide exactly with each other and with the monoid structures on S. That is, for writer monads, we need only choose the shape-level operations, because there is exactly one position for a value in every case.

Confused by the meaning of the 'Alternative' type class and its relationship to other type classes

I've been going through the Typeclassopedia to learn the type classes. I'm stuck understanding Alternative (and MonadPlus, for that matter).
The problems I'm having:
the 'pedia says that "the Alternative type class is for Applicative functors which also have a monoid structure." I don't get this -- doesn't Alternative mean something totally different from Monoid? i.e. I understood the point of the Alternative type class as picking between two things, whereas I understood Monoids as being about combining things.
why does Alternative need an empty method/member? I may be wrong, but it seems to not be used at all ... at least in the code I could find. And it seems not to fit with the theme of the class -- if I have two things, and need to pick one, what do I need an 'empty' for?
why does the Alternative type class need an Applicative constraint, and why does it need a kind of * -> *? Why not just have <|> :: a -> a -> a? All of the instances could still be implemented in the same way ... I think (not sure). What value does it provide that Monoid doesn't?
what's the point of the MonadPlus type class? Can't I unlock all of its goodness by just using something as both a Monad and Alternative? Why not just ditch it? (I'm sure I'm wrong, but I don't have any counterexamples)
Hopefully all those questions are coherent ... !
Bounty update: #Antal's answer is a great start, but Q3 is still open: what does Alternative provide that Monoid doesn't? I find this answer unsatisfactory since it lacks concrete examples, and a specific discussion of how the higher-kindedness of Alternative distinguishes it from Monoid.
If it's to combine applicative's effects with Monoid's behavior, why not just:
liftA2 mappend
This is even more confusing for me because many Monoid instances are exactly the same as the Alternative instances.
That's why I'm looking for specific examples that show why Alternative is necessary, and how it's different -- or means something different -- from Monoid.
To begin with, let me offer short answers to each of these questions. I will then expand each into a longer detailed answer, but these short ones will hopefully help in navigating those.
No, Alternative and Monoid don’t mean different things; Alternative is for types which have the structure both of Applicative and of Monoid. “Picking” and “combining” are two different intuitions for the same broader concept.
Alternative contains empty as well as <|> because the designers thought this would be useful, and because this gives rise to a monoid. In terms of picking, empty corresponds to making an impossible choice.
We need both Alternative and Monoid because the former obeys (or should) more laws than the latter; these laws relate the monoidal and applicative structure of the type constructor. Additionally, Alternative can’t depend on the inner type, while Monoid can.
MonadPlus is slightly stronger than Alternative, as it must obey more laws; these laws relate the monoidal structure to the monadic structure in addition to the applicative structure. If you have instances of both, they should coincide.
Doesn’t Alternative mean something totally different from Monoid?
Not really! Part of the reason for your confusion is that the Haskell Monoid class uses some pretty bad (well, insufficiently general) names. This is how a mathematician would define a monoid (being very explicit about it):
Definition. A monoid is a set M equipped with a distinguished element ε ∈ M and a binary operator · : M × M → M, denoted by juxtaposition, such that the following two conditions hold:
ε is the identity: for all m ∈ M, mε = εm = m.
· is associative: for all m₁,m₂,m₃ ∈ M, (m₁m₂)m₃ = m₁(m₂m₃).
That’s it. In Haskell, ε is spelled mempty and · is spelled mappend (or, these days, <>), and the set M is the type M in instance Monoid M where ....
Looking at this definition, we see that it says nothing about “combining” (or about “picking,” for that matter). It says things about · and about ε, but that’s it. Now, it’s certainly true that combining things works well with this structure: ε corresponds to having no things, and m₁m₂ says that if I glom m₁ and m₂’s stuff together, I can get a new thing containing all their stuff. But here’s an alternative intuition: ε corresponds to no choices at all, and m₁m₂ corresponds to a choice between m₁ and m₂. This is the “picking” intuition. Note that both obey the monoid laws:
Having nothing at all and having no choice are both the identity.
If I have no stuff and glom it together with some stuff, I end up with that same stuff again.
If I have a choice between no choice at all (something impossible) and some other choice, I have to pick the other (possible) choice.
Glomming collections together and making a choice are both associative.
If I have three collections of things, it doesn’t matter if I glom the first two together and then the third, or the last two together and then the first; either way, I end up with the same total glommed collection.
If I have a choice between three things, it doesn’t matter if I (a) first choose between first-or-second and third and then, if I need to, between first and second, or (b) first choose between first and second-or-third and then, if I need to, between second and third. Either way, I can pick what I want.
(Note: I’m playing fast and loose here; that’s why it’s intuition. For instance, it’s important to remember that · need not be commutative, which the above glosses over: it’s perfectly possible that m₁m₂ ≠ m₂m₁.)
Behold: both these sorts of things (and many others—is multiplying numbers really either “combining” or “picking”?) obey the same rules. Having an intuition is important to develop understanding, but it’s the rules and definitions that determine what’s actually going on.
And the best part is that these both of these intuitions can be interpreted by the same carrier! Let M be some set of sets (not a set of all sets!) containing the empty set, let ε be the empty set ∅, and let · be set union ∪. It is easy to see that ∅ is an identity for ∪, and that ∪ is associative, so we can conclude that (M,∅,∪) is a monoid. Now:
If we think about sets as being collections of things, then ∪ corresponds to glomming them together to get more things—the “combining” intuition.
If we think about sets as representing possible actions, then ∪ corresponds to increasing your pool of possible actions to pick from—the “picking” intuition.
And this is exactly what’s going on with [] in Haskell: [a] is a Monoid for all a, and [] as an applicative functor (and monad) is used to represent nondeterminism. Both the combining and the picking intuitions coincide at the same type: mempty = empty = [] and mappend = (<|>) = (++).
So the Alternative class is just there to represent objects which (a) are applicative functors, and (b) when instantiated at a type, have a value and a binary function on them which follow some rules. Which rules? The monoid rules. Why? Because it turns out to be useful :-)
Why does Alternative need an empty method/member?
Well, the snarky answer is “because Alternative represents a monoid structure.” But the real question is: why a monoid structure? Why not just a semigroup, a monoid without ε? One answer is to claim that monoids are just more useful. I think many people (but perhaps not Edward Kmett) would agree with this; almost all of the time, if you have a sensible (<|>)/mappend/·, you’ll be able to define a sensible empty/mempty/ε. On the other hand, having the extra generality is nice, since it lets you place more things under the umbrella.
You also want to know how this meshes with the “picking” intuition. Keeping in mind that, in some sense, the right answer is “know when to abandon the ‘picking’ intuition,” I think you can unify the two. Consider [], the applicative functor for nondeterminism. If I combine two values of type [a] with (<|>), that corresponds to nondeterministically picking either an action from the left or an action from the right. But sometimes, you’re going to have no possible actions on one side—and that’s fine. Similarly, if we consider parsers, (<|>) represents a parser which parses either what’s on the left or what’s on the right (it “picks”). And if you have a parser which always fails, that ends up being an identity: if you pick it, you immediately reject that pick and try the other one.
All this said, remember that it would be entirely possible to have a class almost like Alternative, but lacking empty. That would be perfectly valid—it could even be a superclass of Alternative—but happens not to be what Haskell did. Presumably this is out of a guess as to what’s useful.
Why does the Alternative type class need an Applicative constraint, and why does it need a kind of * -> *? … Why not just [use] liftA2 mappend?
Well, let’s consider each of these three proposed changes: getting rid of the Applicative constraint for Alternative; changing the kind of Alternative’s argument; and using liftA2 mappend instead of <|> and pure mempty instead of empty. We’ll look at this third change first, since it’s the most different. Suppose we got rid of Alternative entirely, and replaced the class with two plain functions:
fempty :: (Applicative f, Monoid a) => f a
fempty = pure mempty
(>|<) :: (Applicative f, Monoid a) => f a -> f a -> f a
(>|<) = liftA2 mappend
We could even keep the definitions of some and many. And this does give us a monoid structure, it’s true. But it seems like it gives us the wrong one . Should Just fst >|< Just snd fail, since (a,a) -> a isn’t an instance of Monoid? No, but that’s what the above code would result in. The monoid instance we want is one that’s inner-type agnostic (to borrow terminology from Matthew Farkas-Dyck in a very related haskell-cafe discussion which asks some very similar questions); the Alternative structure is about a monoid determined by f’s structure, not the structure of f’s argument.
Now that we think we want to leave Alternative as some sort of type class, let’s look at the two proposed ways to change it. If we change the kind, we have to get rid of the Applicative constraint; Applicative only talks about things of kind * -> *, and so there’s no way to refer to it. That leaves two possible changes; the first, more minor, change is to get rid of the Applicative constraint but leave the kind alone:
class Alternative' f where
empty' :: f a
(<||>) :: f a -> f a -> f a
The other, larger, change is to get rid of the Applicative constraint and change the kind:
class Alternative'' a where
empty'' :: a
(<|||>) :: a -> a -> a
In both cases, we have to get rid of some/many, but that’s OK; we can define them as standalone functions with the type (Applicative f, Alternative' f) => f a -> f [a] or (Applicative f, Alternative'' (f [a])) => f a -> f [a].
Now, in the second case, where we change the kind of the type variable, we see that our class is exactly the same as Monoid (or, if you still want to remove empty'', Semigroup), so there’s no advantage to having a separate class. And in fact, even if we leave the kind variable alone but remove the Applicative constraint, Alternative just becomes forall a. Monoid (f a), although we can’t write these quantified constraints in Haskell, not even with all the fancy GHC extensions. (Note that this expresses the inner-type–agnosticism mentioned above.) Thus, if we can make either of these changes, then we have no reason to keep Alternative (except for being able to express that quantified constraint, but that hardly seems compelling).
So the question boils down to “is there a relationship between the Alternative parts and the Applicative parts of an f which is an instance of both?” And while there’s nothing in the documentation, I’m going to take a stand and say yes—or at the very least, there ought to be. I think that Alternative is supposed to obey some laws relating to Applicative (in addition to the monoid laws); in particular, I think those laws are something like
Right distributivity (of <*>): (f <|> g) <*> a = (f <*> a) <|> (g <*> a)
Right absorption (for <*>): empty <*> a = empty
Left distributivity (of fmap): f <$> (a <|> b) = (f <$> a) <|> (f <$> b)
Left absorption (for fmap): f <$> empty = empty
These laws appear to be true for [] and Maybe, and (pretending its MonadPlus instance is an Alternative instance) IO, but I haven’t done any proofs or exhaustive testing. (For instance, I originally thought that left distributivity held for <*>, but this “performs the effects” in the wrong order for [].) By way of analogy, though, it is true that MonadPlus is expected to obey similar laws (although there is apparently some ambiguity about which). I had originally wanted to claim a third law, which seems natural:
Left absorption (for <*>): a <*> empty = empty
However, although I believe [] and Maybe obey this law, IO doesn’t, and I think (for reasons that will become apparent in the next couple of paragraphs) it’s best not to require it.
And indeed, it appears that Edward Kmett has some slides where he espouses a similar view; to get into that, we’ll need to take brief digression involving some more mathematical jargon. The final slide, “I Want More Structure,” says that “A Monoid is to an Applicative as a Right Seminearring is to an Alternative,” and “If you throw away the argument of an Applicative, you get a Monoid, if you throw away the argument of an Alternative you get a RightSemiNearRing.”
Right seminearrings? “How did right seminearrings get into it?” I hear you cry. Well,
Definition. A right near-semiring (also right seminearring, but the former seems to be used more on Google) is a quadruple (R,+,·,0) where (R,+,0) is a monoid, (R,·) is a semigroup, and the following two conditions hold:
· is right-distributive over +: for all r,s,t ∈ R, (s + t)r = sr + tr.
0 is right-absorbing for ·: for all r ∈ R, 0r = 0.
A left near-semiring is defined analogously.
Now, this doesn’t quite work, because <*> is not truly associative or a binary operator—the types don’t match. I think this is what Edward Kmett is getting at when he talks about “throw[ing] away the argument.” Another option might be to say (I’m unsure if this is right) that we actually want (f a, <|>, <*>, empty) to form a right near-semiringoid, where the “-oid” suffix indicates that the binary operators can only be applied to specific pairs of elements (à la groupoids). And we’d also want to say that (f a, <|>, <$>, empty) was a left near-semiringoid, although this could conceivably follow from the combination of the Applicative laws and the right near-semiringoid structure. But now I’m getting in over my head, and this isn’t deeply relevant anyway.
At any rate, these laws, being stronger than the monoid laws, mean that perfectly valid Monoid instances would become invalid Alternative instances. There are (at least) two examples of this in the standard library: Monoid a => (a,) and Maybe. Let’s look at each of them quickly.
Given any two monoids, their product is a monoid; consequently, tuples can be made an instance of Monoid in the obvious way (reformatting the base package’s source):
instance (Monoid a, Monoid b) => Monoid (a,b) where
mempty = (mempty, mempty)
(a1,b1) `mappend` (a2,b2) = (a1 `mappend` a2, b1 `mappend` b2)
Similarly, we can make tuples whose first component is an element of a monoid into an instance of Applicative by accumulating the monoid elements (reformatting the base package’s source):
instance Monoid a => Applicative ((,) a) where
pure x = (mempty, x)
(u, f) <*> (v, x) = (u `mappend` v, f x)
However, tuples aren’t an instance of Alternative, because they can’t be—the monoidal structure over Monoid a => (a,b) isn’t present for all types b, and Alternative’s monoidal structure must be inner-type agnostic. Not only must b be a monad, to be able to express (f <> g) <*> a, we need to use the Monoid instance for functions, which is for functions of the form Monoid b => a -> b. And even in the case where we have all the necessary monoidal structure, it violates all four of the Alternative laws. To see this, let ssf n = (Sum n, (<> Sum n)) and let ssn = (Sum n, Sum n). Then, writing (<>) for mappend, we get the following results (which can be checked in GHCi, with the occasional type annotation):
Right distributivity:
(ssf 1 <> ssf 1) <*> ssn 1 = (Sum 3, Sum 4)
(ssf 1 <*> ssn 1) <> (ssf 1 <*> ssn 1) = (Sum 4, Sum 4)
Right absorption:
mempty <*> ssn 1 = (Sum 1, Sum 0)
mempty = (Sum 0, Sum 0)
Left distributivity:
(<> Sum 1) <$> (ssn 1 <> ssn 1) = (Sum 2, Sum 3)
((<> Sum 1) <$> ssn 1) <> ((<> Sum 1) <$> ssn 1) = (Sum 2, Sum 4)
Left absorption:
(<> Sum 1) <$> mempty = (Sum 0, Sum 1)
mempty = (Sum 1, Sum 1)
Next, consider Maybe. As it stands, Maybe’s Monoid and Alternative instances disagree. (Although the haskell-cafe discussion I mention at the beginning of this section proposes changing this, there’s an Option newtype from the semigroups package which would produce the same effect.) As a Monoid, Maybe lifts semigroups into monoids by using Nothing as the identity; since the base package doesn’t have a semigroup class, it just lifts monoids, and so we get (reformatting the base package’s source):
instance Monoid a => Monoid (Maybe a) where
mempty = Nothing
Nothing `mappend` m = m
m `mappend` Nothing = m
Just m1 `mappend` Just m2 = Just (m1 `mappend` m2)
On the other hand, as an Alternative, Maybe represents prioritized choice with failure, and so we get (again reformatting the base package’s source):
instance Alternative Maybe where
empty = Nothing
Nothing <|> r = r
l <|> _ = l
And it turns out that only the latter satisfies the Alternative laws. The Monoid instance fails less badly than (,)’s; it does obey the laws with respect to <*>, although almost by accident—it comes form the behavior of the only instance of Monoid for functions, which (as mentioned above), lifts functions that return monoids into the reader applicative functor. If you work it out (it’s all very mechanical), you’ll find that right distributivity and right absorption for <*> all hold for both the Alternative and Monoid instances, as does left absorption for fmap. And left distributivity for fmap does hold for the Alternative instance, as follows:
f <$> (Nothing <|> b)
= f <$> b by the definition of (<|>)
= Nothing <|> (f <$> b) by the definition of (<|>)
= (f <$> Nothing) <|> (f <$> b) by the definition of (<$>)
f <$> (Just a <|> b)
= f <$> Just a by the definition of (<|>)
= Just (f a) by the definition of (<$>)
= Just (f a) <|> (f <$> b) by the definition of (<|>)
= (f <$> Just a) <|> (f <$> b) by the definition of (<$>)
However, it fails for the Monoid instance; writing (<>) for mappend, we have:
(<> Sum 1) <$> (Just (Sum 0) <> Just (Sum 0)) = Just (Sum 1)
((<> Sum 1) <$> Just (Sum 0)) <> ((<> Sum 1) <$> Just (Sum 0)) = Just (Sum 2)
Now, there is one caveat to this example. If you only require that Alternatives be compatibility with <*>, and not with <$>, then Maybe is fine. Edward Kmett’s slides, mentioned above, don’t make reference to <$>, but I think it seems reasonable to require laws with respect to it as well; nevertheless, I can’t find anything to back me up on this.
Thus, we can conclude that being an Alternative is a stronger requirement than being a Monoid, and so it requires a different class. The purest example of this would be a type with an inner-type agnostic Monoid instance and an Applicative instance which were incompatible with each other; however, there aren’t any such types in the base package, and I can’t think of any. (It’s possible none exist, although I’d be surprised.) Nevertheless, these inner-type gnostic examples demonstrate why the two type classes must be different.
What’s the point of the MonadPlus type class?
MonadPlus, like Alternative, is a strengthening of Monoid, but with respect to Monad instead of Applicative. According to Edward Kmett in his answer to the question “Distinction between typeclasses MonadPlus, Alternative, and Monoid?”, MonadPlus is also stronger than Alternative: the law empty <*> a, for instance, doesn’t imply that empty >>= f. AndrewC provides two examples of this: Maybe and its dual. The issue is complicated by the fact that there are two potential sets of laws for MonadPlus. It is universally agreed that MonadPlus is supposed to form a monoid with mplus and mempty, and it’s supposed to satisfy the left zero law, mempty >>= f = mempty. Hhowever, some MonadPlusses satisfy left distribution, mplus a b >>= f = mplus (a >>= f) (b >>= f); and others satisfy left catch, mplus (return a) b = return a. (Note that left zero/distribution for MonadPlus are analogous to right distributivity/absorption for Alternative; (<*>) is more analogous to (=<<) than (>>=).) Left distribution is probably “better,” so any MonadPlus instance which satisfies left catch, such as Maybe, is an Alternative but not the first kind of MonadPlus. And since left catch relies on ordering, you can imagine a newtype wrapper for Maybe whose Alternative instance is right-biased instead of left-biased: a <|> Just b = Just b. This will satisfy neither left distribution nor left catch, but will be a perfectly valid Alternative.
However, since any type which is a MonadPlus ought to have its instance coincide with its Alternative instance (I believe this is required in the same way that it is required that ap and (<*>) are equal for Monads that are Applicatives), you could imagine defining the MonadPlus class instead as
class (Monad m, Alternative m) => MonadPlus' m
The class doesn’t need to declare new functions; it’s just a promise about the laws obeyed by empty and (<|>) for the given type. This design technique isn’t used in the Haskell standard libraries, but is used in some more mathematically-minded packages for similar purposes; for instance, the lattices package uses it to express the idea that a lattice is just a join semilattice and a meet semilattice over the same type which are linked by absorption laws.
The reason you can’t do the same for Alternative, even if you wanted to guarantee that Alternative and Monoid always coincided, is because of the kind mismatch. The desired class declaration would have the form
class (Applicative f, forall a. Monoid (f a)) => Alternative''' f
but (as mentioned far above) not even GHC Haskell supports quantified constraints.
Also, note that having Alternative as be a superclass of MonadPlus would require Applicative being a superclass of Monad, so good luck getting that to happen. If you run into that problem, there’s always the WrappedMonad newtype, which turns any Monad into an Applicative in the obvious way; there’s an instance MonadPlus m => Alternative (WrappedMonad m) where ... which does exactly what you’d expect.
import Data.Monoid
import Control.Applicative
Let's trace through an example of how Monoid and Alternative interact with the Maybe functor and the ZipList functor, but let's start from scratch, partly to get all the definitions fresh in our minds, partly to stop from switching tabs to bits of hackage all the time, but mainly so I can run this past ghci to correct my typos!
(<>) :: Monoid a => a -> a -> a
(<>) = mappend -- I'll be using <> freely instead of `mappend`.
Here's the Maybe clone:
data Perhaps a = Yes a | No deriving (Eq, Show)
instance Functor Perhaps where
fmap f (Yes a) = Yes (f a)
fmap f No = No
instance Applicative Perhaps where
pure a = Yes a
No <*> _ = No
_ <*> No = No
Yes f <*> Yes x = Yes (f x)
and now ZipList:
data Zip a = Zip [a] deriving (Eq,Show)
instance Functor Zip where
fmap f (Zip xs) = Zip (map f xs)
instance Applicative Zip where
Zip fs <*> Zip xs = Zip (zipWith id fs xs) -- zip them up, applying the fs to the xs
pure a = Zip (repeat a) -- infinite so that when you zip with something, lengths don't change
Structure 1: combining elements: Monoid
Maybe clone
First let's look at Perhaps String. There are two ways of combining them. Firstly concatenation
(<++>) :: Perhaps String -> Perhaps String -> Perhaps String
Yes xs <++> Yes ys = Yes (xs ++ ys)
Yes xs <++> No = Yes xs
No <++> Yes ys = Yes ys
No <++> No = No
Concatenation works inherently at the String level, not really the Perhaps level, by treating No as if it were Yes []. It's equal to liftA2 (++). It's sensible and useful, but maybe we could generalise from just using ++ to using any way of combining - any Monoid then!
(<++>) :: Monoid a => Perhaps a -> Perhaps a -> Perhaps a
Yes xs <++> Yes ys = Yes (xs `mappend` ys)
Yes xs <++> No = Yes xs
No <++> Yes ys = Yes ys
No <++> No = No
This monoid structure for Perhaps tries to work as much as possible at the a level. Notice the Monoid a constraint, telling us we're using structure from the a level. This isn't an Alternative structure, it's a derived (lifted) Monoid structure.
instance Monoid a => Monoid (Perhaps a) where
mappend = (<++>)
mempty = No
Here I used the structure of the data a to add structure to the whole thing. If I were combining Sets, I'd be able to add an Ord a context instead.
ZipList clone
So how should we combine elements with a zipList? What should these zip to if we're combining them?
Zip ["HELLO","MUM","HOW","ARE","YOU?"]
<> Zip ["this", "is", "fun"]
= Zip ["HELLO" ? "this", "MUM" ? "is", "HOW" ? "fun"]
mempty = ["","","","",..] -- sensible zero element for zipping with ?
But what should we use for ?. I say the only sensible choice here is ++. Actually, for lists, (<>) = (++)
Zip [Just 1, Nothing, Just 3, Just 4]
<> Zip [Just 40, Just 70, Nothing]
= Zip [Just 1 ? Just 40, Nothing ? Just 70, Just 3 ? Nothing]
mempty = [Nothing, Nothing, Nothing, .....] -- sensible zero element
But what can we use for ? I say that we're meant to be combining elements, so we should use the element-combining operator from Monoid again: <>.
instance Monoid a => Monoid (Zip a) where
Zip as `mappend` Zip bs = Zip (zipWith (<>) as bs) -- zipWith the internal mappend
mempty = Zip (repeat mempty) -- repeat the internal mempty
This is the only sensible way of combining the elements using a zip - so it's the only sensible monoid instance.
Interestingly, that doesn't work for the Maybe example above, because Haskell doesn't know how to combine Ints - should it use + or *? To get a Monoid instance on numerical data, you wrap them in Sum or Product to tell it which monoid to use.
Zip [Just (Sum 1), Nothing, Just (Sum 3), Just (Sum 4)] <>
Zip [Just (Sum 40), Just (Sum 70), Nothing]
= Zip [Just (Sum 41),Just (Sum 70), Just (Sum 3)]
Zip [Product 5,Product 10,Product 15]
<> Zip [Product 3, Product 4]
= Zip [Product 15,Product 40]
Key point
Notice the fact that the type in a Monoid has kind * is exactly what allows us to put the Monoid a context here - we could also add Eq a or Ord a. In a Monoid, the raw elements matter. A Monoid instance is designed to let you manipulate and combine the data inside the structure.
Structure 2: higher-level choice: Alternative
A choice operator is similar, but also different.
Maybe clone
(<||>) :: Perhaps String -> Perhaps String -> Perhaps String
Yes xs <||> Yes ys = Yes xs -- if we can have both, choose the left one
Yes xs <||> No = Yes xs
No <||> Yes ys = Yes ys
No <||> No = No
Here there's no concatenation - we didn't use ++ at all - this combination works purely at the Perhaps level, so let's change the type signature to
(<||>) :: Perhaps a -> Perhaps a -> Perhaps a
Yes xs <||> Yes ys = Yes xs -- if we can have both, choose the left one
Yes xs <||> No = Yes xs
No <||> Yes ys = Yes ys
No <||> No = No
Notice there's no constraint - we're not using the structure from the a level, just structure at the Perhaps level. This is an Alternative structure.
instance Alternative Perhaps where
(<|>) = (<||>)
empty = No
ZipList clone
How should we choose between two ziplists?
Zip [1,3,4] <|> Zip [10,20,30,40] = ????
It would be very tempting to use <|> on the elements, but we can't because the type of the elements isn't available to us. Let's start with the empty. It can't use an element because we don't know the type of the elements when defining an Alternative, so it has to be Zip []. We need it to be a left (and preferably right) identity for <|>, so
Zip [] <|> Zip ys = Zip ys
Zip xs <|> Zip [] = Zip xs
There are two sensible choices for Zip [1,3,4] <|> Zip [10,20,30,40]:
Zip [1,3,4] because it's first - consistent with Maybe
Zip [10,20,30,40] because it's longest - consistent with Zip [] being discarded
Well that's easy to decide: since pure x = Zip (repeat x), both lists might be infinite, so comparing them for length might never terminate, so it has to be pick the first one. Thus the only sensible Alternative instance is:
instance Alternative Zip where
empty = Zip []
Zip [] <|> x = x
Zip xs <|> _ = Zip xs
This is the only sensible Alternative we could have defined. Notice how different it is from the Monoid instance, because we couldn't mess with the elements, we couldn't even look at them.
Key Point
Notice that because Alternative takes a constructor of kind * -> * there is no possible way to add an Ord a or Eq a or Monoid a context. An Alternative is not allowed to use any information about the data inside the structure. You cannot, no matter how much you would like to, do anything to the data, except possibly throw it away.
Key point: What's the difference between Alternative and Monoid?
Not a lot - they're both monoids, but to summarise the last two sections:
Monoid * instances make it possible to combine internal data. Alternative (* -> *) instances make it impossible. Monoid provides flexibility, Alternative provides guarantees. The kinds * and (* -> *) are the main drivers of this difference. Having them both allows you to use both sorts of operations.
This is the right thing, and our two flavours are both appropriate. The Monoid instance for Perhaps String represents putting together all characters, the Alternative instance represents a choice between Strings.
There is nothing wrong with the Monoid instance for Maybe - it's doing its job, combining data.
There's nothing wrong with the Alternative instance for Maybe - it's doing its job, choosing between things.
The Monoid instance for Zip combines its elements. The Alternative instance for Zip is forced to choose one of the lists - the first non-empty one.
It's good to be able to do both.
What's the Applicative context any use for?
There's some interaction between choosing and applying. See Antal S-Z's laws stated in his question or in the middle of his answer here.
From a practical point of view, it's useful because Alternative is something that is used for some Applicative Functors to choose. The functionality was being used for Applicatives, and so a general interface class was invented. Applicative Functors are good for representing computations that produce values (IO, Parser, Input UI element,...) and some of them have to handle failure - Alternative is needed.
Why does Alternative have empty?
why does Alternative need an empty method/member? I may be wrong, but it seems to not be used at all ... at least in the code I could find. And it seems not to fit with the theme of the class -- if I have two things, and need to pick one, what do I need an 'empty' for?
That's like asking why addition needs a 0 - if you want to add stuff, what's the point in having something that doesn't add anything? The answer is that 0 is the crucual pivotal number around which everything revolves in addition, just like 1 is crucial for multiplication, [] is crucial for lists (and y=e^x is crucial for calculus). In practical terms, you use these do-nothing elements to start your building:
sum = foldr (+) 0
concat = foldr (++) []
msum = foldr (`mappend`) mempty -- any Monoid
whichEverWorksFirst = foldr (<|>) empty -- any Alternative
Can't we replace MonadPlus with Monad+Alternative?
what's the point of the MonadPlus type class? Can't I unlock all of its goodness by just using something as both a Monad and Alternative? Why not just ditch it? (I'm sure I'm wrong, but I don't have any counterexamples)
You're not wrong, there aren't any counterexamples!
Your interesting question has got Antal S-Z, Petr Pudlák and I delved into what the relationship between MonadPlus and Applicative really is. The answer,
here
and here
is that anything that's a MonadPlus (in the left distribution sense - follow links for details) is also an Alternative, but not the other way around.
This means that if you make an instance of Monad and MonadPlus, it satisfies the conditions for Applicative and Alternative anyway. This means if you follow the rules for MonadPlus (with left dist), you may as well have made your Monad an Applicative and used Alternative.
If we remove the MonadPlus class, though, we remove a sensible place for the rules to be documented, and you lose the ability to specify that something's Alternative without being MonadPlus (which technically we ought to have done for Maybe). These are theoretical reasons. The practical reason is that it would break existing code. (Which is also why neither Applicative nor Functor are superclasses of Monad.)
Aren't Alternative and Monoid the same? Aren't Alternative and Monoid completely different?
the 'pedia says that "the Alternative type class is for Applicative functors which also have a monoid structure." I don't get this -- doesn't Alternative mean something totally different from Monoid? i.e. I understood the point of the Alternative type class as picking between two things, whereas I understood Monoids as being about combining things.
Monoid and Alternative are two ways of getting one object from two in a sensible way. Maths doesn't care whether you're choosing, combining, mixing or blowing up your data, which is why Alternative was referred to as a Monoid for Applicative. You seem to be at home with that concept now, but you now say
for types that have both an Alternative and a Monoid instance, the instances are intended to be the same
I disagree with this, and I think my Maybe and ZipList examples are carefully explained as to why they're different. If anything, I think it should be rare that they're the same. I can only think of one example, plain lists, where this is appropriate. That's because lists are a fundamental example of a monoid with ++, but also lists are used in some contexts as an indeterminate choice of elements, so <|> should also be ++.
Summary
We need to define (instances that provide the same operations as) Monoid instances for some applicative functors, that genuinely combine at the applicative functor level, and not just lifting lower level monoids. The example error below from litvar = liftA2 mappend literal variable shows that <|> cannot in general be defined as liftA2 mappend; <|> works in this case by combining parsers, not their data.
If we used Monoid directly, we'd need language extensions to define the instances. Alternative is higher kinded so you can make these instances without requiring language extensions.
Example: Parsers
Let's imagine we're parsing some declarations, so we import everything we're going to need
import Text.Parsec
import Text.Parsec.String
import Control.Applicative ((<$>),(<*>),liftA2,empty)
import Data.Monoid
import Data.Char
and think about how we'll parse a type. We choose simplistic:
data Type = Literal String | Variable String deriving Show
examples = [Literal "Int",Variable "a"]
Now let's write a parser for literal types:
literal :: Parser Type
literal = fmap Literal $ (:) <$> upper <*> many alphaNum
Meaning: parse an uppercase character, then many alphaNumeric characters, combine the results into a single String with the pure function (:). Afterwards, apply the pure function Literal to turn those Strings into Types. We'll parse variable types exactly the same way, except for starting with a lowercase letter:
variable :: Parser Type
variable = fmap Variable $ (:) <$> lower <*> many alphaNum
That's great, and parseTest literal "Bool" == Literal "Bool" exactly as we'd hoped.
Question 3a: If it's to combine applicative's effects with Monoid's behavior, why not just liftA2 mappend
Edit:Oops - forgot to actually use <|>!
Now let's combine these two parsers using Alternative:
types :: Parser Type
types = literal <|> variable
This can parse any Type: parseTest types "Int" == Literal "Bool" and parseTest types "a" == Variable "a".
This combines the two parsers, not the two values. That's the sense in which it works at the Applicative Functor level rather than the data level.
However, if we try:
litvar = liftA2 mappend literal variable
that would be asking the compiler to combine the two values that they generate, at the data level.
We get
No instance for (Monoid Type)
arising from a use of `mappend'
Possible fix: add an instance declaration for (Monoid Type)
In the first argument of `liftA2', namely `mappend'
In the expression: liftA2 mappend literal variable
In an equation for `litvar':
litvar = liftA2 mappend literal variable
So we found out the first thing; the Alternative class does something genuinely different to liftA2 mappend, becuase it combines objects at a different level - it combines the parsers, not the parsed data. If you like to think of it this way, it's combination at the genuinely higher-kind level, not merely a lift. I don't like saying it that way, because Parser Type has kind *, but it is true to say we're combining the Parsers, not the Types.
(Even for types with a Monoid instance, liftA2 mappend won't give you the same parser as <|>. If you try it on Parser String you'll get liftA2 mappend which parses one after the other then concatenates, versus <|> which will try the first parser and default to the second if it failed.)
Question 3b: In what way does Alternative's <|> :: f a -> f a -> f a differ from Monoid's mappend :: b -> b -> b?
Firstly, you're right to note that it doesn't provide new functionality over a Monoid instance.
Secondly, however, there's an issue with using Monoid directly:
Let's try to use mappend on parsers, at the same time as showing it's the same structure as Alternative:
instance Monoid (Parser a) where
mempty = empty
mappend = (<|>)
Oops! We get
Illegal instance declaration for `Monoid (Parser a)'
(All instance types must be of the form (T t1 ... tn)
where T is not a synonym.
Use -XTypeSynonymInstances if you want to disable this.)
In the instance declaration for `Monoid (Parser a)'
So if you have an applicative functor f, the Alternative instance shows that f a is a monoid, but you could only declare that as a Monoid with a language extension.
Once we add {-# LANGUAGE TypeSynonymInstances #-} at the top of the file, we're fine and can define
typeParser = literal `mappend` variable
and to our delight, it works: parseTest typeParser "Yes" == Literal "Yes" and parseTest typeParser "a" == Literal "a".
Even if you don't have any synonyms (Parser and String are synonyms, so they're out), you'll still need {-# LANGUAGE FlexibleInstances #-} to define an instance like this one:
data MyMaybe a = MyJust a | MyNothing deriving Show
instance Monoid (MyMaybe Int) where
mempty = MyNothing
mappend MyNothing x = x
mappend x MyNothing = x
mappend (MyJust a) (MyJust b) = MyJust (a + b)
(The monoid instance for Maybe gets around this by lifting the underlying monoid.)
Making a standard library unnecessarily dependent on language extensions is clearly undesirable.
So there you have it. Alternative is just Monoid for Applicative Functors (and isn't just a lift of a Monoid). It needs the higher-kinded type f a -> f a -> f a so you can define one without language extensions.
Your other Questions, for completeness:
Why does Alternative need an empty method/member?
Because having an identity for an operation is sometimes useful.
For example, you can define anyA = foldr (<|>) empty without using tedious edge cases.
what's the point of the MonadPlus type class? Can't I unlock all of its goodness by just using something as both a Monad and Alternative?
No. I refer you back to the question you linked to:
Moreover, even if Applicative was a superclass of Monad, you'd wind up needing the MonadPlus class anyways, because obeying empty <*> m = empty isn't strictly enough to prove that empty >>= f = empty.
....and I've come up with an example: Maybe. I explain in detail, with proof in this answer to Antal's question. For the purposes of this answer, it's worth noting that I was able to use >>= to make the MonadPlus instance that broke the Alternative laws.
Monoid structure is useful. Alternative is the best way of providing it for Applicative Functors.
I won't cover MonadPlus because there is disagreement about its laws.
After trying and failing to find any meaningful examples in which the structure of an Applicative leads naturally to an Alternative instance that disagrees with its Monoid instance*, I finally came up with this:
Alternative's laws are more strict than Monoid's, because the result cannot depend on the inner type. This excludes a large number of Monoid instances from being Alternatives.
These datatypes allow partial (meaning that they only work for some inner types) Monoid instances which are forbidden by the extra 'structure' of the * -> * kind. Examples:
the standard Maybe instance for Monoid assumes that the inner type is Monoid => not an Alternative
ZipLists, tuples, and functions can all be made Monoids, if their inner types are Monoids => not Alternatives
sequences that have at least one element -- cannot be Alternatives because there's no empty:
data Seq a
= End a
| Cons a (Seq a)
deriving (Show, Eq, Ord)
On the other hand, some data types cannot be made Alternatives because they're *-kinded:
unit -- ()
Ordering
numbers, booleans
My inferred conclusion: for types that have both an Alternative and a Monoid instance, the instances are intended to be the same. See also this answer.
excluding Maybe, which I argue doesn't count because its standard instance should not require Monoid for the inner type, in which case it would be identical to Alternative
I understood the point of the Alternative type class as picking between two things, whereas I understood Monoids as being about combining things.
If you think about this for a moment, they are the same.
The + combines things (usually numbers), and it's type signature is Int -> Int -> Int (or whatever).
The <|> operator selects between alternatives, and it's type signature is also the same: take two matching things and return a combined thing.

Monads with Join() instead of Bind()

Monads are usually explained in turns of return and bind. However, I gather you can also implement bind in terms of join (and fmap?)
In programming languages lacking first-class functions, bind is excruciatingly awkward to use. join, on the other hand, looks quite easy.
I'm not completely sure I understand how join works, however. Obviously, it has the [Haskell] type
join :: Monad m => m (m x) -> m x
For the list monad, this is trivially and obviously concat. But for a general monad, what, operationally, does this method actually do? I see what it does to the type signatures, but I'm trying to figure out how I'd write something like this in, say, Java or similar.
(Actually, that's easy: I wouldn't. Because generics is broken. ;-) But in principle the question still stands...)
Oops. It looks like this has been asked before:
Monad join function
Could somebody sketch out some implementations of common monads using return, fmap and join? (I.e., not mentioning >>= at all.) I think perhaps that might help it to sink in to my dumb brain...
Without plumbing the depths of metaphor, might I suggest to read a typical monad m as "strategy to produce a", so the type m value is a first class "strategy to produce a value". Different notions of computation or external interaction require different types of strategy, but the general notion requires some regular structure to make sense:
if you already have a value, then you have a strategy to produce a value (return :: v -> m v) consisting of nothing other than producing the value that you have;
if you have a function which transforms one sort of value into another, you can lift it to strategies (fmap :: (v -> u) -> m v -> m u) just by waiting for the strategy to deliver its value, then transforming it;
if you have a strategy to produce a strategy to produce a value, then you can construct a strategy to produce a value (join :: m (m v) -> m v) which follows the outer strategy until it produces the inner strategy, then follows that inner strategy all the way to a value.
Let's have an example: leaf-labelled binary trees...
data Tree v = Leaf v | Node (Tree v) (Tree v)
...represent strategies to produce stuff by tossing a coin. If the strategy is Leaf v, there's your v; if the strategy is Node h t, you toss a coin and continue by strategy h if the coin shows "heads", t if it's "tails".
instance Monad Tree where
return = Leaf
A strategy-producing strategy is a tree with tree-labelled leaves: in place of each such leaf, we can just graft in the tree which labels it...
join (Leaf tree) = tree
join (Node h t) = Node (join h) (join t)
...and of course we have fmap which just relabels leaves.
instance Functor Tree where
fmap f (Leaf x) = Leaf (f x)
fmap f (Node h t) = Node (fmap f h) (fmap f t)
Here's an strategy to produce a strategy to produce an Int.
Toss a coin: if it's "heads", toss another coin to decide between two strategies (producing, respectively, "toss a coin for producing 0 or producing 1" or "produce 2"); if it's "tails" produce a third ("toss a coin for producing 3 or tossing a coin for 4 or 5").
That clearly joins up to make a strategy producing an Int.
What we're making use of is the fact that a "strategy to produce a value" can itself be seen as a value. In Haskell, the embedding of strategies as values is silent, but in English, I use quotation marks to distinguish using a strategy from just talking about it. The join operator expresses the strategy "somehow produce then follow a strategy", or "if you are told a strategy, you may then use it".
(Meta. I'm not sure whether this "strategy" approach is a suitably generic way to think about monads and the value/computation distinction, or whether it's just another crummy metaphor. I do find leaf-labelled tree-like types a useful source of intuition, which is perhaps not a surprise as they're the free monads, with just enough structure to be monads at all, but no more.)
PS The type of "bind"
(>>=) :: m v -> (v -> m w) -> m w
says "if you have a strategy to produce a v, and for each v a follow-on strategy to produce a w, then you have a strategy to produce a w". How can we capture that in terms of join?
mv >>= v2mw = join (fmap v2mw mv)
We can relabel our v-producing strategy by v2mw, producing instead of each v value the w-producing strategy which follows on from it — ready to join!
join = concat -- []
join f = \x -> f x x -- (e ->)
join f = \s -> let (f', s') = f s in f' s' -- State
join (Just (Just a)) = Just a; join _ = Nothing -- Maybe
join (Identity (Identity a)) = Identity a -- Identity
join (Right (Right a)) = Right a; join (Right (Left e)) = Left e;
join (Left e) = Left e -- Either
join ((a, m), m') = (a, m' `mappend` m) -- Writer
-- N.B. there is a non-newtype-wrapped Monad instance for tuples that
-- behaves like the Writer instance, but with the tuple order swapped
join f = \k -> f (\f' -> f' k) -- Cont
Calling fmap (f :: a -> m b) (x ::ma) produces values (y ::m(m b)) so it is a very natural thing to use join to get back values (z :: m b).
Then bind is defined simply as bind ma f = join (fmap f ma), thus achieving the Kleisly compositionality of functions of (:: a -> m b) variety, which is what it is really all about:
ma `bind` (f >=> g) = (ma `bind` f) `bind` g -- bind = (>>=)
= (`bind` g) . (`bind` f) $ ma
= join . fmap g . join . fmap f $ ma
And so, with flip bind = (=<<), we have
((g <=< f) =<<) = (g =<<) . (f =<<) = join . (g <$>) . join . (f <$>)
OK, so it's not really good form to answer your own question, but I'm going to note down my thinking in case it enlightens anybody else. (I doubt it...)
If a monad can be thought of as a "container", then both return and join have pretty obvious semantics. return generates a 1-element container, and join turns a container of containers into a single container. Nothing hard about that.
So let us focus on monads which are more naturally thought of as "actions". In that case, m x is some sort of action which yields a value of type x when you "execute" it. return x does nothing special, and then yields x. fmap f takes an action that yields an x, and constructs an action that computes x and then applies f to it, and returns the result. So far, so good.
It's fairly obvious that if f itself generates an action, then what you end up with is m (m x). That is, an action that computes another action. In a way, that's maybe even simpler to wrap your mind around than the >>= function which takes an action and a "function that produces an action" and so on.
So, logically speaking, it seems join would run the first action, take the action it produces, and then run that. (Or rather, join would return an action that does what I just described, if you want to split hairs.)
That seems to be the central idea. To implement join, you want to run an action, which then gives you another action, and then you run that. (Whatever "run" happens to mean for this particular monad.)
Given this insight, I can take a stab at writing some join implementations:
join Nothing = Nothing
join (Just mx) = mx
If the outer action is Nothing, return Nothing, else return the inner action. Then again, Maybe is more of a container than an action, so let's try something else...
newtype Reader s x = Reader (s -> x)
join (Reader f) = Reader (\ s -> let Reader g = f s in g s)
That was... painless. A Reader is really just a function that takes a global state and only then returns its result. So to unstack, you apply the global state to the outer action, which returns a new Reader. You then apply the state to this inner function as well.
In a way, it's perhaps easier than the usual way:
Reader f >>= g = Reader (\ s -> let x = f s in g x)
Now, which one is the reader function, and which one is the function that computes the next reader...?
Now let's try the good old State monad. Here every function takes an initial state as input but also returns a new state along with its output.
data State s x = State (s -> (s, x))
join (State f) = State (\ s0 -> let (s1, State g) = f s0 in g s1)
That wasn't too hard. It's basically run followed by run.
I'm going to stop typing now. Feel free to point out all the glitches and typos in my examples... :-/
I've found many explanations of monads that say "you don't have to know anything about category theory, really, just think of monads as burritos / space suits / whatever".
Really, the article that demystified monads for me just said what categories were, described monads (including join and bind) in terms of categories, and didn't bother with any bogus metaphors:
http://en.wikibooks.org/wiki/Haskell/Category_theory
I think the article is very readable without much math knowledge required.
Asking what a type signature in Haskell does is rather like asking what an interface in Java does.
It, in some literal sense, "doesn't". (Though, of course, you will typically have some sort of purpose associated with it, that's mostly in your mind, and mostly not in the implementation.)
In both cases you are declaring legal sequences of symbols in the language which will be used in later definitions.
Of course, in Java, I suppose you could say that an interface corresponds to a type signature which is going to be implemented literally in the VM. You can get some polymorphism this way -- you can define a name that accepts an interface, and you can provide a different definition for the name which accepts a different interface. Something similar happens in Haskell, where you can provide a declaration for a name which accepts one type and then another declaration for that name which treats a different type.
This is Monad explained in one picture. The 2 functions in the green category are not composable, when being mapped to the blue category with join . fmap (strictly speaking, they are one category), they become composable. Monad is about turning a function of type T -> Monad<U> into a function of type Monad<T> -> Monad<U>.

Resources