How do operator associativity, the associative law and value dependencies of monads fit together? - haskell

On the one hand the monadic bind operator >>= is left associative (AFAIK). On the other hand the monad law demands associativity, i.e. evaluation order doesn't matter (like with monoids). Besides, monads encode a value dependency by making the next effect depend on the result of the previous one, i.e. monads effectively determine an evaluation order. This sounds contradictory to me, which clearly implies that my mental representation of the involved concepts is wrong. How does it all fit together?

On the one hand the monadic bind operator >>= is left associative
Yes.
Prelude> :i >>=
class Applicative m => Monad (m :: * -> *) where
(>>=) :: m a -> (a -> m b) -> m b
...
-- Defined in ‘GHC.Base’
infixl 1 >>=
That's just the way it's defined. + is left-associative too, although the (addition-) group laws demand associativity.
Prelude> :i +
class Num a where
(+) :: a -> a -> a
...
-- Defined in ‘GHC.Num’
infixl 6 +
All an infixl declaration means is that the compiler will parse a+b+c as (a+b)+c; whether or not that happens to be equal to a+(b+c) is another matter.
the monad law demands associativity
Well, >>= is actually not associative. The associative operator is >=>. For >>=, already the type shows that it can't be associative, because the second argument should be a function, the first not.
Besides, monads encode a value dependency by making the next effect depend on the result of the previous one
Yes, but this doesn't contradict associativity of >=>. Example:
teeAndInc :: String -> Int -> IO Int
teeAndInc name val = do
putStrLn $ name ++ "=" ++ show val
return $ val + 1
Prelude Control.Monad> ((teeAndInc "a" >=> teeAndInc "b") >=> teeAndInc "c") 37
a=37
b=38
c=39
40
Prelude Control.Monad> (teeAndInc "a" >=> (teeAndInc "b" >=> teeAndInc "c")) 37
a=37
b=38
c=39
40
Flipping around the parens does not change the order / dependency between the actions (that would be a commutativity law, not an associativity one), it just changes the grouping of the actions.

Related

cannot construct the infinite type

I want to use applicative function and tried as follow:
*ReaderExercise Control.Applicative> (+4) <*> (+3)
then got following error message:
<interactive>:51:11: error:
* Occurs check: cannot construct the infinite type: a ~ a -> b
Expected type: (a -> b) -> a
Actual type: a -> a
* In the second argument of `(<*>)', namely `(+ 3)'
In the expression: (+ 4) <*> (+ 3)
In an equation for `it': it = (+ 4) <*> (+ 3)
* Relevant bindings include
it :: (a -> b) -> b (bound at <interactive>:51:1)
What do I expect is a return function with one argument.
What does it mean an infinite type?
The error "Occurs check: cannot construct [an] infinite type" results when Haskell determines that a type variable (explicitly given by the programmer or implicitly introduced by Haskell) must satisfy a condition that implies it would need to be recursively defined in terms of itself in a way that would lead to an infinitely "deep" type (i.e., the type variable "occurs" in its own definition).
It normally results from either a typo or conceptual error on the part of the programmer related to confusing two different "levels of structure" in a program.
As a simple example, a list of ints (type [Int]) is a valid Haskell type, and so is a list of lists of ints ([[Int]]) or a list of lists of lists of lists of lists of ints ([[[[[Int]]]]]) but only a finite number of list levels are allowed. You can't have a list of lists of lists of lists of lists, etc. all the way down -- that would be an infinite type. If Haskell thinks you want it to construct such a type, it'll give you an "occurs check" error.
The following definition:
yuck (x:xs) = x == xs
gives this error for exactly this reason. Haskell knows from the left-hand side that yuck takes a list of some unknown element type a where variable x is the head of type a and variable xs is the tail of type [a]. From the RHS, the operator (==) forces x and xs to have the same type -- in other words, it implies the constraint a ~ [a] where the tilde indicates "type equality". No finite type (no type with a finite number of list levels) has this properties, only the invalid infinite type [[[[...forever...]]]] could allow you to remove the outer list level and still have the same type left over, so you get the error.
The issue here is that the programmer has confused two levels of structure: the list xs and an element x.
In your specific example, the reason for the error is similar, but harder to explain. The operator:
(<*>) :: (Applicative f) => f (a -> b) -> f a -> f b
takes two applicative actions with different underlying types: the left-hand side has type given by the applicative functor f applied to the underlying type a -> b; the right-hand side has type given by the same applicative functor f applied to the underlying type b.
You haven't told Haskell which applicative functor f you meant to use, so Haskell tries to infer it. Because the LHS has type:
(+4) :: (Num n) => n -> n
Haskell tries to match the type n -> n with f (a -> b). It may be clearer to write these types using the prefix form of the (->) type operator: Haskell is trying to match (->) n n with f ((->) a b) where f is an applicative functor.
Fortunately, there's an applicative functor instance for (->) t for any type t. So, Haskell reasons that the applicative functor you want is f = (->) n, and it successfully matches (->) n n = f n to f ((->) a b). This implies that n is equal to ((->) a b). Haskell then tries to match the types on the RHS, matching (->) n n = f n with (->) n a = f a. This works, and it implies that n is equal to a.
Now we have a problem. n is simultaneously equal to a -> b (from the LHS) and a (from the RHS). This implies creation of an infinite function type, something that looks like:
(((... forever ...)->b)->b)->b)->b
which is the only way you could remove an outer ...->b and be left with the same type. This is an impossible infinite type, so you get the error.
The underlying problem is that you've made a conceptual error. Given that you are working on a ReaderExample, I think you intended to use the (->) n applicative functor instance, so you and Haskell are in agreement on this point. In this context:
(+4) :: (Num n) -> n -> n
is a reader action that reads a number from the reader and adds four to it. Similarly (+3) is a reader action that reads a number from the reader and adds three to it.
However, (<*>) is an operator that takes a reader action on the LHS that reads from the reader to produce a function (not a number!) that is then applied to the result of using the RHS to read from the reader to produce a number. For example, if you defined:
multiplyByReader :: (Num n) -> n -> n -> n
multiplyByReader readerNum input = readerNum * input
then:
multiplyByReader <*> (+4)
or the simpler version:
(*) <*> (+4)
would make sense. The intended meaning would be: Construct a reader action that (1) uses the LHS to read a number from the reader to create a function that multiplies by the reader; and then (2) applies this function to the number that results from applying the RHS to the reader.
This would be equivalent to \r -> r * (r + 4), as you can see:
> ((*) <*> (+4)) 5 -- same a 5 * (5 + 4)
45
>
When you write (+3) <*> (+4), you're mixing up two different structural levels: the LHS reader yields a number but should instead yield a function that can be applied to a number.
My best guess is that you want to create a reader action that applies (+4) to the reader to get a number and then applies (+3) to that result. In this case, (+3) isn't a reader action; it's just a function you want to apply to the result of the reader action (+4), which is equivalent to fmapping over the reader action:
(+3) <$> (+4)
Of course, you could equivalently write it directly as:
(+3) . (+4)
Both are composite reader actions that add seven to the number read:
> ((+3) <$> (+4)) 5
12
> ((+3) . (+4)) 5
12
>

What are Applicative left and right star sequencing operators expected to do?

I looked up the implementation and it's even more mysterious:
-- | Sequence actions, discarding the value of the first argument.
(*>) :: f a -> f b -> f b
a1 *> a2 = (id <$ a1) <*> a2
-- This is essentially the same as liftA2 (flip const), but if the
-- Functor instance has an optimized (<$), it may be better to use
-- that instead. Before liftA2 became a method, this definition
-- was strictly better, but now it depends on the functor. For a
-- functor supporting a sharing-enhancing (<$), this definition
-- may reduce allocation by preventing a1 from ever being fully
-- realized. In an implementation with a boring (<$) but an optimizing
-- liftA2, it would likely be better to define (*>) using liftA2.
-- | Sequence actions, discarding the value of the second argument.
(<*) :: f a -> f b -> f a
(<*) = liftA2 const
I don't even understand why does <$ deserve a place in a typeclass. It looks like there is some sharing-enhancig effect which fmap . const might not have and that a1 might not be "fully realized". How is that related to the meaning of Applicative sequencing operators?
These operators sequence two applicative actions and provide the result of the action that the arrow points to. For example,
> Just 1 *> Just 2
Just 2
> Just 1 <* Just 2
Just 1
Another example in writing parser combinators is
brackets p = char '(' *> p <* char ')'
which will be a parser that matches p contained in brackets and gives the result of parsing p.
In fact, (*>) is the same as (>>) but only requires an Applicative constraint instead of a Monad constraint.
I don't even understand why does <$ deserve a place in a typeclass.
The answer is given by the Functor documentation: (<$) can sometimes have more efficient implementations than its default, which is fmap . const.
How is that related to the meaning of Applicative sequencing operators?
In cases where (<$) is more efficient, you want to maintain that efficiency in the definition of (*>).

Order of execution with Haskell's `mapM`

Consider the following Haskell statement:
mapM print ["1", "2", "3"]
Indeed, this prints "1", "2", and "3" in order.
Question: How do you know that mapM will first print "1", and then print "2", and finally print "3". Is there any guarantee that it will do this? Or is it a coincidence of how it is implemented deep within GHC?
If you evaluate mapM print ["1", "2", "3"] by expanding the definition of mapM you will arrive at (ignoring some irrelevant details)
print "1" >> print "2" >> print "3"
You can think of print and >> as abstract constructors of IO actions that cannot be evaluated any further, just as a data constructor like Just cannot be evaluated any further.
The interpretation of print s is the action of printing s, and the interpretation of a >> b is the action that first performs a and then performs b. So, the interpretation of
mapM print ["1", "2", "3"] = print "1" >> print "2" >> print "3"
is to first print 1, then print 2, and finally print 3.
How this is actually implemented in GHC is entirely a different matter which you shouldn't worry about for a long time.
There is no guarantee on the order of the evaluation but there is a guarantee on the order of the effects. For more information see this answer that discusses forM.
You need to learn to make the following, tricky distinction:
The order of evaluation
The order of effects (a.k.a. "actions")
What
forM, sequence and similar functions promise is that the effects will
be ordered from left to right. So for example, the following is
guaranteed to print characters in the same order that they occur in
the string...
Note: "forM is mapM with its arguments flipped. For a version that ignores the results see forM_."
Preliminary note: The answers by Reid Barton and Dair are entirely correct and fully cover your practical concerns. I mention that because partway through this answer one might have the impression that it contradicts them, which is not the case, as will be clear by the time we get to the end. That being clear, it is time to indulge in some language lawyering.
Is there any guarantee that [mapM print] will [print the list elements in order]?
Yes, there is, as explained by the other answers. Here, I will discuss what might justify this guarantee.
In this day and age, mapM is, by default, merely traverse specialised to monads:
traverse
:: (Traversable t, Applicative f) => (a -> f b) -> t a -> f (t b)
mapM
:: (Traversable t, Monad m) => (a -> m b) -> t a -> m (t b)
That being so, in what follows I will be primarily concerned with traverse, and how our expectations about the sequencing of effects relate to the Traversable class.
As far as the production of effects is concerned, traverse generates an Applicative effect for each value in the traversed container and combines all such effects through the relevant Applicative instance. This second part is clearly reflected by the type of sequenceA, through which the applicative context is, so to say, factored out of the container:
sequenceA :: (Traversable t, Applicative f) => t (f a) -> f (t a)
-- sequenceA and traverse are interrelated by:
traverse f = sequenceA . fmap f
sequenceA = traverse id
The Traversable instance for lists, for example, is:
instance Traversable [] where
{-# INLINE traverse #-} -- so that traverse can fuse
traverse f = List.foldr cons_f (pure [])
where cons_f x ys = (:) <$> f x <*> ys
It is plain to see that the combining, and therefore the sequencing, of effects is done through (<*>), so let's focus on it for a moment. Picking the IO applicative functor as an illustrative example, we can see (<*>) sequencing effects from left to right:
GHCi> -- Superfluous parentheses added for emphasis.
GHCi> ((putStrLn "Type something:" >> return reverse) <*> getLine) >>= putStrLn
Type something:
Whatever
revetahW
(<*>), however, sequences effects from left-to-right by convention, and not for any inherent reason. As witnessed by the Backwards wrapper from transformers, it is, in principle, always possible to implement (<*>) with right-to-left sequencing and still get a lawful Applicative instance. Without using the wrapper, it is also possible to take advantage of (<**>) from Control.Applicative to invert the sequencing:
(<**>) :: Applicative f => f a -> f (a -> b) -> f b
GHCi> import Control.Applicative
GHCi> (getLine <**> (putStrLn "Type something:" >> return reverse)) >>= putStrLn
Whatever
Type something:
revetahW
Given that it is so easy to flip the sequencing of Applicative effects, one might wonder whether this trick might transfer to Traversable. For instance, let's say we implement...
esrevart :: Applicative f => (a -> f b) -> [a] -> f [b]
... so that it is just like traverse for lists save for using Backwards or (<**>) to flip the sequencing of effects (I will leave that as an exercise for the reader). Would esrevart be a legal implementation of traverse? While we might figure it out by trying to prove the identity and composition laws of Traversable hold, that is actually not necessary: given that Backwards f for any applicative f is also applicative, an esrevart patterned after any lawful traverse will also follow the Traversable laws. The Reverse wrapper, also part of transformers, offers a general implementation of this reversal.
We have thus concluded that there can be legal Traversable instances that differ in the sequencing of effects. In particular, a list traverse that sequences effects from tail to head is conceivable. That doesn't make the possibility any less strange, though. To avoid utter bewilderment, Traversable instances are conventionally implemented with plain (<*>) and following the natural order in which the constructors are used to build the traversable container, which in the case of lists amounts to the expected head-to-tail sequencing of effects. One place where this convention shows up is in the automatic generation of instances by the DeriveTraversable extension.
A final, historical note. Couching this discussion, which is ultimately about mapM, in terms of the Traversable class would be a move of dubious relevance in a not so distant past. mapM was effectively subsumed by traverse only last year, but it has existed for much longer. For instance, the Haskell Report 1.3 from 1996, years before Applicative and Traversable came into being (not even ap is there, in fact), provides the following specification for mapM:
accumulate :: Monad m => [m a] -> m [a]
accumulate = foldr mcons (return [])
where mcons p q = p >>= \x -> q >>= \y -> return (x:y)
mapM :: Monad m => (a -> m b) -> [a] -> m [b]
mapM f as = accumulate (map f as)
The sequencing of effects, here enforced through (>>=), is left-to-right, for no other reason than it being the sensible thing to do.
P.S.: It is worth emphasising that, while it is possible to write a right-to-left mapM in terms of the Monad operations (in the Report 1.3 implementation quoted here, for instance, it merely requires exchanging p and q in the right-hand side of mcons), there is no such thing as a general Backwards for monads. Since f in x >>= f is a Monad m => a -> m b function which creates effects from values, the effects associated with f depend on x. As a consequence, a simple inversion of sequencing like that possible with (<*>) is not even guaranteed to be meaningful, let alone lawful.

What advantage does Monad give us over an Applicative?

I've read this article, but didn't understand last section.
The author says that Monad gives us context sensitivity, but it's possible to achieve the same result using only an Applicative instance:
let maybeAge = (\futureYear birthYear -> if futureYear < birthYear
then yearDiff birthYear futureYear
else yearDiff futureYear birthYear) <$> (readMay futureYearString) <*> (readMay birthYearString)
It's uglier for sure without do-syntax, but beside that I don't see why we need Monad. Can anyone clear this up for me?
Here's a couple of functions that use the Monad interface.
ifM :: Monad m => m Bool -> m a -> m a -> m a
ifM c x y = c >>= \z -> if z then x else y
whileM :: Monad m => (a -> m Bool) -> (a -> m a) -> a -> m a
whileM p step x = ifM (p x) (step x >>= whileM p step) (return x)
You can't implement them with the Applicative interface. But for the sake of enlightenment, let's try and see where things go wrong. How about..
import Control.Applicative
ifA :: Applicative f => f Bool -> f a -> f a -> f a
ifA c x y = (\c' x' y' -> if c' then x' else y') <$> c <*> x <*> y
Looks good! It has the right type, it must be the same thing! Let's just check to make sure..
*Main> ifM (Just True) (Just 1) (Just 2)
Just 1
*Main> ifM (Just True) (Just 1) (Nothing)
Just 1
*Main> ifA (Just True) (Just 1) (Just 2)
Just 1
*Main> ifA (Just True) (Just 1) (Nothing)
Nothing
And there's your first hint at the difference. You can't write a function using just the Applicative interface that replicates ifM.
If you divide this up into thinking about values of the form f a as being about "effects" and "results" (both of which are very fuzzy approximate terms that are the best terms available, but not very good), you can improve your understanding here. In the case of values of type Maybe a, the "effect" is success or failure, as a computation. The "result" is a value of type a that might be present when the computation completes. (The meanings of these terms depends heavily on the concrete type, so don't think this is a valid description of anything other than Maybe as a type.)
Given that setting, we can look at the difference in a bit more depth. The Applicative interface allows the "result" control flow to be dynamic, but it requires the "effect" control flow to be static. If your expression involves 3 computations that can fail, the failure of any one of them causes the failure of the whole computation. The Monad interface is more flexible. It allows the "effect" control flow to depend on the "result" values. ifM chooses which argument's "effects" to include in its own "effects" based on its first argument. This is the huge fundamental difference between ifA and ifM.
There's something even more serious going on with whileM. Let's try to make whileA and see what happens.
whileA :: Applicative f => (a -> f Bool) -> (a -> f a) -> a -> f a
whileA p step x = ifA (p x) (whileA p step <*> step x) (pure x)
Well.. What happens is a compile error. (<*>) doesn't have the right type there. whileA p step has the type a -> f a and step x has the type f a. (<*>) isn't the right shape to fit them together. For it to work, the function type would need to be f (a -> a).
You can try lots more things - but you'll eventually find that whileA has no implementation that works anything even close to the way whileM does. I mean, you can implement the type, but there's just no way to make it both loop and terminate.
Making it work requires either join or (>>=). (Well, or one of the many equivalents of one of those) And those the extra things you get out of the Monad interface.
With monads, subsequent effects can depend on previous values. For example, you can have:
main = do
b <- readLn :: IO Bool
if b
then fireMissiles
else return ()
You can't do that with Applicatives - the result value of one effectfull computation can't determine what effect will follow.
Somewhat related:
Why can applicative functors have side effects, but functors can't?
Good examples of Not a Functor/Functor/Applicative/Monad?
As Stephen Tetley said in a comment, that example doesn't actually use context-sensitivity. One way to think about context-sensitivity is that it lets use choose which actions to take depending on monadic values. Applicative computations must always have the same "shape", in a certain sense, regardless of the values involved; monadic computations need not. I personally think this is easier to understand with a concrete example, so let's look at one. Here's two versions of a simple program which ask you to enter a password, check that you entered the right one, and print out a response depending on whether or not you did.
import Control.Applicative
checkPasswordM :: IO ()
checkPasswordM = do putStrLn "What's the password?"
pass <- getLine
if pass == "swordfish"
then putStrLn "Correct. The secret answer is 42."
else putStrLn "INTRUDER ALERT! INTRUDER ALERT!"
checkPasswordA :: IO ()
checkPasswordA = if' . (== "swordfish")
<$> (putStrLn "What's the password?" *> getLine)
<*> putStrLn "Correct. The secret answer is 42."
<*> putStrLn "INTRUDER ALERT! INTRUDER ALERT!"
if' :: Bool -> a -> a -> a
if' True t _ = t
if' False _ f = f
Let's load this into GHCi and check what happens with the monadic version:
*Main> checkPasswordM
What's the password?
swordfish
Correct. The secret answer is 42.
*Main> checkPasswordM
What's the password?
zvbxrpl
INTRUDER ALERT! INTRUDER ALERT!
So far, so good. But if we use the applicative version:
*Main> checkPasswordA
What's the password?
hunter2
Correct. The secret answer is 42.
INTRUDER ALERT! INTRUDER ALERT!
We entered the wrong password, but we still got the secret! And an intruder alert! This is because <$> and <*>, or equivalently liftAn/liftMn, always execute the effects of all their arguments. The applicative version translates, in do notation, to
do pass <- putStrLn "What's the password?" *> getLine)
unit1 <- putStrLn "Correct. The secret answer is 42."
unit2 <- putStrLn "INTRUDER ALERT! INTRUDER ALERT!"
pure $ if' (pass == "swordfish") unit1 unit2
And it should be clear why this has the wrong behavior. In fact, every use of applicative functors is equivalent to monadic code of the form
do val1 <- app1
val2 <- app2
...
valN <- appN
pure $ f val1 val2 ... valN
(where some of the appI are allowed to be of the form pure xI). And equivalently, any monadic code in that form can be rewritten as
f <$> app1 <*> app2 <*> ... <*> appN
or equivalently as
liftAN f app1 app2 ... appN
To think about this, consider Applicative's methods:
pure :: a -> f a
(<$>) :: (a -> b) -> f a -> f b
(<*>) :: f (a -> b) -> f a -> f b
And then consider what Monad adds:
(=<<) :: (a -> m b) -> m a -> m b
join :: m (m a) -> m a
(Remember that you only need one of those.)
Handwaving a lot, if you think about it, the only way we can put together the applicative functions is to construct chains of the form f <$> app1 <*> ... <*> appN, and possibly nest those chains (e.g., f <$> (g <$> x <*> y) <*> z). However, (=<<) (or (>>=)) allows us to take a value and produce different monadic computations depending on that value, that could be constructed on the fly. This is what we use to decide whether to compute "print out the secret", or compute "print out an intruder alert", and why we can't make that decision with applicative functors alone; none of the types for applicative functions allow you to consume a plain value.
You can think about join in concert with fmap in a similar way: as I mentioned in a comment, you can do something like
checkPasswordFn :: String -> IO ()
checkPasswordFn pass = if pass == "swordfish"
then putStrLn "Correct. The secret answer is 42."
else putStrLn "INTRUDER ALERT! INTRUDER ALERT!"
checkPasswordA' :: IO (IO ())
checkPasswordA' = checkPasswordFn <$> (putStrLn "What's the password?" *> getLine)
This is what happens when we want to pick a different computation depending on the value, but only have applicative functionality available us. We can pick two different computations to return, but they're wrapped inside the outer layer of the applicative functor. To actually use the computation we've picked, we need join:
checkPasswordM' :: IO ()
checkPasswordM' = join checkPasswordA'
And this does the same thing as the previous monadic version (as long as we import Control.Monad first, to get join):
*Main> checkPasswordM'
What's the password?
12345
INTRUDER ALERT! INTRUDER ALERT!
On the other hand, here's a a practical example of the Applicative/Monad divide where Applicatives have an advantage: error handling! We clearly have a Monad implementation of Either that carries along errors, but it always terminates early.
Left e1 >> Left e2 === Left e1
You can think of this as an effect of intermingling values and contexts. Since (>>=) will try to pass the result of the Either e a value to a function like a -> Either e b, it must fail immediately if the input Either is Left.
Applicatives only pass their values to the final pure computation after running all of the effects. This means they can delay accessing the values for longer and we can write this.
data AllErrors e a = Error e | Pure a deriving (Functor)
instance Monoid e => Applicative (AllErrors e) where
pure = Pure
(Pure f) <*> (Pure x) = Pure (f x)
(Error e) <*> (Pure _) = Error e
(Pure _) <*> (Error e) = Error e
-- This is the non-Monadic case
(Error e1) <*> (Error e2) = Error (e1 <> e2)
It's impossible to write a Monad instance for AllErrors such that ap matches (<*>) because (<*>) takes advantage of running both the first and second contexts before using any values in order to get both errors and (<>) them together. Monadic (>>=) and (join) can only access contexts interwoven with their values. That's why Either's Applicative instance is left-biased, so that it can also have a harmonious Monad instance.
> Left "a" <*> Left "b"
Left 'a'
> Error "a" <*> Error "b"
Error "ab"
With Applicative, the sequence of effectful actions to be performed is fixed at compile-time. With Monad, it can be varied at run-time based on the results of effects.
For example, with an Applicative parser, the sequence of parsing actions is fixed for all time. That means that you can potentially perform "optimisations" on it. On the other hand, I can write a Monadic parser which parses some a BNF grammar description, dynamically constructs a parser for that grammar, and then runs that parser over the rest of the input. Every time you run this parser, it potentially constructs a brand new parser to parse the second portion of the input. Applicative has no hope of doing such a thing - and there is no chance of performing compile-time optimisations on a parser that doesn't exist yet...
As you can see, sometimes the "limitation" of Applicative is actually beneficial - and sometimes the extra power offered by Monad is required to get the job done. This is why we have both.
If you try to convert the type signature of Monad's bind and Applicative <*> to natural language, you will find that:
bind : I will give you the contained value and you will return me a new packaged value
<*>: You give me a packaged function that accepts a contained value and return a value and I will use it to create new packaged value based on my rules.
Now as you can see from the above description, bind gives you more control as compared to <*>
If you work with Applicatives, the "shape" of the result is already determined by the "shape" of the input, e.g. if you call [f,g,h] <*> [a,b,c,d,e], your result will be a list of 15 elements, regardless which values the variables have. You don't have this guarantee/limitation with monads. Consider [x,y,z] >>= join replicate: For [0,0,0] you'll get the result [], for [1,2,3] the result [1,2,2,3,3,3].
Now that ApplicativeDo extension become pretty common thing, the difference between Monad and Applicative can be illustrated using simple code snippet.
With Monad you can do
do
r1 <- act1
if r1
then act2
else act3
but having only Applicative do-block, you can't use if on things you've pulled out with <-.

Explanation of Monad laws

From a gentle introduction to Haskell, there are the following monad laws. Can anyone intuitively explain what they mean?
return a >>= k = k a
m >>= return = m
xs >>= return . f = fmap f xs
m >>= (\x -> k x >>= h) = (m >>= k) >>= h
Here is my attempted explanation:
We expect the return function to wrap a so that its monadic nature is trivial. When we bind it to a function, there are no monadic effects, it should just pass a to the function.
The unwrapped output of m is passed to return that rewraps it. The monadic nature remains the same. So it is the same as the original monad.
The unwrapped value is passed to f then rewrapped. The monadic nature remains the same. This is the behavior expected when we transform a normal function into a monadic function.
I don't have an explanation for this law. This does say that the monad must be "almost associative" though.
Your descriptions seem pretty good. Generally people speak of three monad laws, which you have as 1, 2, and 4. Your third law is slightly different, and I'll get to that later.
For the three monad laws, I find it much easier to get an intuitive understanding of what they mean when they're re-written using Kleisli composition:
-- defined in Control.Monad
(>=>) :: Monad m => (a -> m b) -> (b -> m c) -> a -> m c
mf >=> n = \x -> mf x >>= n
Now the laws can be written as:
1) return >=> mf = mf -- left identity
2) mf >=> return = mf -- right identity
4) (f >=> g) >=> h = f >=> (g >=> h) -- associativity
1) Left Identity Law - returning a value doesn't change the value and doesn't do anything in the monad.
2) Right Identity Law - returning a value doesn't change the value and doesn't do anything in the monad.
4) Associativity - monadic composition is associative (I like KennyTM's answer for this)
The two identity laws basically say the same thing, but they're both necessary because return should have identity behavior on both sides of the bind operator.
Now for the third law. This law essentially says that both the Functor instance and your Monad instance behave the same way when lifting a function into the monad, and that neither does anything monadic. If I'm not mistaken, it's the case that when a monad obeys the other three laws and the Functor instance obeys the functor laws, then this statement will always be true.
A lot of this comes from the Haskell Wiki. The Typeclassopedia is a good reference too.
No disagreements with the other answers, but it might help to think of the monad laws as actually describing two sets of properties. As John says, the third law you mention is slightly different, but here's how the others can be split apart:
Functions that you bind to a monad compose just like regular functions.
As in John's answer, what's called a Kleisli arrow for a monad is a function with type a -> m b. Think of return as id and (<=<) as (.), and the monad laws are the translations of these:
id . f is equivalent to f
f . id is equivalent to f
(f . g) . h is equivalent to f . (g . h)
Sequences of monadic effects append like lists.
For the most part, you can think of the extra monadic structure as a sequence of extra behaviors associated with a monadic value; e.g. Maybe being "give up" for Nothing and "keep going" for Just. Combining two monadic actions then essentially concatenates the sequences of behaviors they held.
In this sense, return is again an identity--the null action, akin to an empty list of behaviors--and (>=>) is concatenation. So, the monad laws are translations of these:
[] ++ xs is equivalent to xs
xs ++ [] is equivalent to xs
(xs ++ ys) ++ zs is equivalent to xs ++ (ys ++ zs)
These three laws describe a ridiculously common pattern, which Haskell unfortunately can't quite express in full generality. If you're interested, Control.Category gives a generalization of "things that look like function composition", while Data.Monoid generalizes the latter case where no type parameters are involved.
In terms of do notation, rule 4 means we can add an extra do block to group a sequence of monadic operations.
do do
y <- do
x <- m x <- m
y <- k x <=> k x
h y h y
This allows functions that return a monadic value to work properly.
The first three laws say that "return" only wraps a value and does nothing else. So you can eliminate "return" calls without changing the semantics.
The last law is associativity for bind. It means that you take something like:
do
x <- foo
bar x
z <- baz
and turn it into
do
do
x <- foo
bar x
z <- baz
without changing the meaning. Of course you wouldn't do exactly this, but you might want to put the inner "do" clause in an "if" statement and want it to mean the same when the "if" is true.
Sometimes monads don't exactly follow these laws, particularly when some kind of bottom value occurs. That's OK as long as its documented and is "morally correct" (i.e. the laws are followed for non-bottom values, or the results are considered equivalent in some other way).

Resources