Regardless of composition, can I say monadic if there is only a way to change m(m a) to m a - haskell

Monad has the following components.
a and m a which converted from a by a Functor
a -> m b arrow and a way to change m m to m
composition of arrows
If there is a structure that has 1,2 regardless of composition, can I say it's Monadic?
Sure, If a structure has 1 and 2, It can be composed. but It doesn't have to be concerned or not?
I thought that associativity, Identity are about composition. If a structure has all exactly, It's just a Monad. My Question is about "Monadic". the most important factor of Monad is mu(join), so if a structure has only mu, Could it be a Monadic thing roughly ?
Monadic is just an adjective for Monad.
"Monadic" thing means that It's a factor of a Monad.
Monadic structure of a Monad
Monadic type m a of a Monad,
Monadic arrow a -> m b of a Monad,
Monadic compostion >=> of a Monad

This looks like two questions: a question about terminology, and a question about whether the monad laws relate to "composition" only.
For the terminology question...
As pointed out by #cstml in a comment, there's a formal definition of a "monadic functor" in category theory, but I don't think Haskell programmers typically use "monadic" in this formal sense, and it doesn't look like you're asking about that.
When I use the term "monadic", I typically use it in the sense of "things pertaining to a monad". So a monadic action is an expression of the form m a for a Monad m and when I engage in monadic programming, I'm using monads.
I don't typically use it in the sense of "monad-like", but I could see maybe saying something like "arrows have some monadic behavior, but they aren't necessarily monads". On the other hand, if you say "type X is monadic" without qualification, that sounds like just an unusual way of saying "type X is a monad".
For the monad laws question...
Even though the monad laws for Haskell are usually expressed in terms of return and >>=, an alternative set of laws that only involve return, fmap, and join can be formulated. That is, the join operator all by itself satisfies identity and associativity laws that are more or less equivalent to the laws satisfied by >>=.
Specifically, any monad obeys the obvious laws:
fmap f . return = return . f
join . fmap return = id
join . return = id
If you have an object that supports operations:
return :: a -> m a -- convert from `a` by functor
join :: m (m a) -> m a -- change `m m` to `m`
but doesn't satisfy any of these laws, then it doesn't seem very "monadic".
If it does satisfy the above laws, then the composition defined by:
x >>= f = join (fmap f x)
will automatically satisfy the usual Haskell left and right identity laws:
return a >>= k = k a
m >>= return = m
so these laws aren't "just about composition".
The only usual Haskell law not implied by the above is associativity. Like the others, it can be expressed in terms of join and fmap instead of >>= and it takes the form:
join . fmap h . join . fmap k = join . fmap (join . fmap h . k)
Basically, the question of associativity can either be expressed as the associativity of the composition of monad arrows or as the associativity of the join operation acting on the type m m m a -- in other words, does it matter if you join the outer layer first or the inner layer first?


How exactly does the `(<*>) = ap` Applicative/Monad law relate the two classes?

ap doesn't have a documented spec, and reads with a comment pointing out it could be <*>, but isn't for practical reasons:
ap :: (Monad m) => m (a -> b) -> m a -> m b
ap m1 m2 = do { x1 <- m1; x2 <- m2; return (x1 x2) }
-- Since many Applicative instances define (<*>) = ap, we
-- cannot define ap = (<*>)
So I assume the ap in the (<*>) = ap law is shorthand for "right-hand side of ap" and the law actually expresses a relationship between >>= , return and <*> right? Otherwise the law is meaningless.
The context is me thinking about Validation and how unsatisfying it is that it can't seem to have a lawful Monad instance. I'm also thinking about ApplicativeDo and how that transformation sort of lets us recover from the practical effects of a Monad instance for Validation; what I most often want to do is accumulate errors as far as possible, but still be able to use bind when necessary. We actually export a bindV function which we need to use just about everywhere, it's all kind of absurd. The only practical consequence I can think of the lawlessness is that we accumulate different or fewer errors depending on what sort of composition we use (or how our program might theoretically be transformed by rewrite rules, though I'm not sure why applicative composition would ever get converted to monadic).
EDIT: The documentation for the same laws in Monad is more extensive:
Furthermore, the Monad and Applicative operations should relate as follows:
pure = return
(<*>) = ap
The above laws imply:
fmap f xs = xs >>= return . f
(>>) = (*>)
"The above laws imply"... so is the idea here that these are the real laws we care about?
But now I'm left trying to understand these in the context of Validation. The first law would hold. The second could obviously be made to hold if we just define (>>) = (*>).
But the documentation for Monad surprisingly says nothing at all (unless I'm just missing it) about how >> should relate. Presumably we want that
a >> b = a >>= \_ -> b
...and (>>) is included in the class so that it can be overridden for efficiency, and this just never quite made it into the docs.
So if that's the case, then I guess the way Monad and Applicative relate is actually something like:
return = pure
xs >>= return . f = fmap f xs
a >>= \_ -> b = fmap (const id) a <*> b
Every Monad gives rise to an Applicative, and for that induced Applicative, <*> = ap will hold definitionally. But given two structures - Monad m and Applicative m - there is no guarantee that these structures agree without the two laws <*> = ap and pure = return. For example, take the 'regular' Monad instance for lists, and the zip-list Applicative instance. While there is nothing fundamentally 'wrong' about a Monad and Applicative instance disagreeing, it would probably be confusing to most users, and so it's prohibited by the Monad laws.
tl;dr The laws in question serve to ensure that Monad and Applicative agree in an intuitively obvious way.
So I assume the ap in the (<*>) = ap law is shorthand for "right-hand side of ap" and the law actually expresses a relationship between >>=, return and <*> right?
It seems to me (<*>) = ap doesn't strictly imply anything (at least post-AMP). Presumably it's trying to express some relationship between <*> and the right-hand side of ap. Maybe I'm being pedantic.
Speaking pedantically, I'd say the opposite: because ap is definitionally equal to its right-hand side, saying (<*>) = ap is exactly the same as saying m1 <*> m2 = do { x1 <- m1; x2 <- m2; return (x1 x2) }. It's just the normal first step of dealing with equalities like that: expanding the definitions.
Reply to the comment:
Right, but the definition is free to change.
Then the law would change or be removed too. Just as when/if join is added to Monad the current definition will become a law instead.
it wouldn't have been possible to define it literally as ap = <*>
Do you mean it would be impossible to define ap or the law in this way?
If ap, then you are correct: it would have the wrong type. But stating the law like this would be fine.

Is it better to define Functor in terms of Applicative in terms of Monad, or vice versa?

This is a general question, not tied to any one piece of code.
Say you have a type T a that can be given an instance of Monad. Since every monad is an Applicative by assigning pure = return and (<*>) = ap, and then every applicative is a Functor via fmap f x = pure f <*> x, is it better to define your instance of Monad first, and then trivially give T instances of Applicative and Functor?
It feels a bit backward to me. If I were doing math instead of programming, I would think that I would first show that my object is a functor, and then continue adding restrictions until I have also shown it to be a monad. I know Haskell is merely inspired by Category Theory and obviously the techniques one would use when constructing a proof aren't the techniques one would use when writing a useful program, but I'd like to get an opinion from the Haskell community. Is it better to go from Monad down to Functor? or from Functor up to Monad?
I tend to write and see written the Functor instance first. Doubly so because if you use the LANGUAGE DeriveFunctor pragma then data Foo a = Foo a deriving ( Functor ) works most of the time.
The tricky bits are around agreement of instances when your Applicative can be more general than your Monad. For instance, here's an Err data type
data Err e a = Err [e] | Ok a deriving ( Functor )
instance Applicative (Err e) where
pure = Ok
Err es <*> Err es' = Err (es ++ es')
Err es <*> _ = Err es
_ <*> Err es = Err es
Ok f <*> Ok x = Ok (f x)
instance Monad (Err e) where
return = pure
Err es >>= _ = Err es
Ok a >>= f = f a
Above I defined the instances in Functor-to-Monad order and, taken in isolation, each instance is correct. Unfortunately, the Applicative and Monad instances do not align: ap and (<*>) are observably different as are (>>) and (*>).
Err "hi" <*> Err "bye" == Err "hibye"
Err "hi" `ap` Err "bye" == Err "hi"
For sensibility purposes, especially once the Applicative/Monad Proposal is in everyone's hands, these should align. If you defined instance Applicative (Err e) where { pure = return; (<*>) = ap } then they will align.
But then, finally, you may be capable of carefully teasing apart the differences in Applicative and Monad so that they behave differently in benign ways---such as having a lazier or more efficient Applicative instance. This actually occurs fairly frequently and I feel the jury is still a little bit out on what "benign" means and under what kinds of "observation" should your instances align. Perhaps some of the most gregarious use of this is in the Haxl project at Facebook where the Applicative instance is more parallelized than the Monad instance, and thus is far more efficient at the cost of some fairly severe "unobserved" side effects.
In any case, if they differ, document it.
I often choose a reverse approach as compared to the one in Abrahamson's answer. I manually define only the Monad instance and define the Applicative and Functor in terms of it with the help of already defined functions in the Control.Monad, which renders those instances the same for absolutely any monad, i.e.:
instance Applicative SomeMonad where
pure = return
(<*>) = ap
instance Functore SomeMonad where
fmap = liftM
While this way the definition of Functor and Applicative is always "brain-free" and very easy to reason about, I must note that this is not the ultimate solution, since there are cases, when the instances can be implemented more efficiently or even provide new features. E.g., the Applicative instance of Concurrently executes things ... concurrently, while the Monad instance can only execute them sequentially due to the nature of monads.
Functor instances are typically very simple to define, I'd normally do those by hand.
For Applicative and Monad, it depends. pure and return are usually similarly easy, and it really doesn't matter in which class you put the expanded definition. For bind, it is sometimes benfitial to go the "category way", i.e. define a specialised join' :: (M (M x)) -> M x first and then a>>=b = join' $ fmap b a (which of course wouldn't work if you had defined fmap in terms of >>=). Then it's probably useful to just re-use (>>=) for the Applicative instance.
Other times, the Applicative instance can be written quite easily or is more efficient than the generic Monad-derived implementation. In that case, you should definitely define <*> separately.
The magic here, that the Haskell uses the Kleisli-tiplet notation of a monad, that
is more convenient way, if somebody wants to use monads in imperative programming like tools.
I asked the same question, and the answer come after a while, if you see the definitions
of the Functor, Applicative, Monad in haskell you miss one link, which is the original definition of the monad, which contains only the join operation, that can be found on the HaskellWiki.
With this point of view you will see how haskell monads are built up functor, applicative functors, monads and Kliesli triplet.
A rough explanation can be found here:
And other with the same ideas here:
I think you mis-understand how sub-classes work in Haskell. They aren't like OO sub-classes! Instead, a sub-class constraint, like
class Applicative m => Monad m
says "any type with a canonical Monad structure must also have a canonical Applicative structure". There are two basic reasons why you would place a constraint like that:
The sub-class structure induces a super-class structure.
The super-class structure is a natural subset of the sub-class structure.
For example, consider:
class Vector v where
(.^) :: Double -> v -> v
(+^) :: v -> v -> v
negateV :: v -> v
class Metric a where
distance :: a -> a -> Double
class (Vector v, Metric v) => Norm v where
norm :: v -> Double
The first super-class constraint on Norm arises because the concept of a normed space is really weak unless you also assume a vector space structure; the second arises because (given a vector space) a Norm induces a Metric, which you can prove by observing that
instance Metric V where
distance v0 v1 = norm (v0 .^ negateV v1)
is a valid Metric instance for any V with a valid Vector instance and a valid norm function. We say that the norm induces a metric. See .
The Functor and Applicative super-classes on Monad are like Metric, not like Vector: the return and >>= functions from Monad induce Functor and Applicative structures:
fmap: can be defined as fmap f a = a >>= return . f, which was liftM in the Haskell 98 standard library.
pure: is the same operation as return; the two names is a legacy from when Applicative wasn't a super-class of Monad.
<*>: can be defined as af <*> ax = af >>= \ f -> ax >>= \ x -> return (f x), which was liftM2 ($) in the Haskell 98 standard library.
join: can be defined as join aa = aa >>= id.
So it's perfectly sensible, mathematically, to define the Functor and Applicative operations in terms of Monad.

Are monads just ways of composing functions which would otherwise not compose?

The bind function seems remarkably similar like a composition function. And it helps in composing functions which return monads.
Is there anything more enlightening about monads than this idea?
Is there anything more enlightening about monads than this idea?
Yes, very much so!
Monadic binding is a way of composing functions where something else is happening over and above the application of a function to an input. What the something else is depends on the monad under consideration.
The Maybe monad is function composition with the possibility that one of the functions in the chain might fail, in which case the failure is automatically propagated to the end of the chain. The expression return x >>= f >>= g applies f to the value x. If the result is Nothing (i.e. failure) then the entire expression returns Nothing, with no other work taking place. Otherwise, g is applied to f x and its result is returned.
The Either e monad, where e is some type, is function composition with the possibility of failure with an error of type e. This is conceptually similar to the Maybe monad, but we get some more information about how and where the failure occured.
The List monad is function composition with the possibility of returning multiple values. If f and g are functions that return a list of outputs, then return x >>= f >>= g applies f to x, and then applies g to every output of f, collecting all of the outputs of these applications together into one big list.
Other monads represent function composition in various other contexts. Very briefly:
The Writer w monad is function composition with a value of type w being accumulated on the side. For example, often w = [String] (a list of strings) which is useful for logging.
The Reader r monad is function composition where each of the functions is also allowed to depend on a value of type r. This is useful when building evaluators for domain-specific languages, when r might be a map from variable names to values in the language - this allows simple implementation of lexical closures, for example.
The State s monad is a bit like a combination of reader and writer. It is function composition where each function is allowed to depend on, and modify, a value of type s.
The composition point of view is in fact quite enlightening in itself.
Monads can be seen as some of "funky composition" between functions of the form a -> Mb. You can compose f : a -> M b and g: b -> M c into something a -> M c, via the monad operations (just bind the return value of f into g).
This turns arrows of the form a -> M b as arrows of a category, termed the Kleisli category of M.
If M were not a monad but just a functor, you would be only able to compose fmap g and f into something (fmap g) . f :: a -> M (M c). Monads have join :: M (M a) -> M a that I let you define as an (easy and useful) exercise using only monad operations (for mathematicians, join is usually part of the definition of a monad). Then join . (fmap g) . f provides the composition for the Kleisli category.
All the funk of monadic composition can thus be seen to happen inside join, join represents the composition of side effects: for IO it sequences the effects, for List it concatenates lists, for Maybe it "stops a computation" when a result is Nothing, for Writer it sequences the writes, for State it sequences operations on the state, etc. It can be seen as an "overloadable semicolon" if you know C-like languages. It is very instructive to think about monads this way.
Of course, Dan Piponi explains this much better than I do, and here is some post of his that you may find enlightening:

Monads as adjunctions

I've been reading about monads in category theory. One definition of monads uses a pair of adjoint functors. A monad is defined by a round-trip using those functors. Apparently adjunctions are very important in category theory, but I haven't seen any explanation of Haskell monads in terms of adjoint functors. Has anyone given it a thought?
Edit: Just for fun, I'm going to do this right. Original answer preserved below
The current adjunction code for category-extras now is in the adjunctions package:
I'm just going to work through the state monad explicitly and simply. This code uses Data.Functor.Compose from the transformers package, but is otherwise self-contained.
An adjunction between f (D -> C) and g (C -> D), written f -| g, can be characterized in a number of ways. We'll use the counit/unit (epsilon/eta) description, which gives two natural transformations (morphisms between functors).
class (Functor f, Functor g) => Adjoint f g where
counit :: f (g a) -> a
unit :: a -> g (f a)
Note that the "a" in counit is really the identity functor in C, and the "a" in unit is really the identity functor in D.
We can also recover the hom-set adjunction definition from the counit/unit definition.
phiLeft :: Adjoint f g => (f a -> b) -> (a -> g b)
phiLeft f = fmap f . unit
phiRight :: Adjoint f g => (a -> g b) -> (f a -> b)
phiRight f = counit . fmap f
In any case, we can now define a Monad from our unit/counit adjunction like so:
instance Adjoint f g => Monad (Compose g f) where
return x = Compose $ unit x
x >>= f = Compose . fmap counit . getCompose $ fmap (getCompose . f) x
Now we can implement the classic adjunction between (a,) and (a ->):
instance Adjoint ((,) a) ((->) a) where
-- counit :: (a,a -> b) -> b
counit (x, f) = f x
-- unit :: b -> (a -> (a,b))
unit x = \y -> (y, x)
And now a type synonym
type State s = Compose ((->) s) ((,) s)
And if we load this up in ghci, we can confirm that State is precisely our classic state monad. Note that we can take the opposite composition and get the Costate Comonad (aka the store comonad).
There are a bunch of other adjunctions we can make into monads in this fashion (such as (Bool,) Pair), but they're sort of strange monads. Unfortunately we can't do the adjunctions that induce Reader and Writer directly in Haskell in a pleasant way. We can do Cont, but as copumpkin describes, that requires an adjunction from an opposite category, so it actually uses a different "form" of the "Adjoint" typeclass that reverses some arrows. That form is also implemented in a different module in the adjunctions package.
this material is covered in a different way by Derek Elkins' article in The Monad Reader 13 -- Calculating Monads with Category Theory:
Also, Hinze's recent Kan Extensions for Program Optimization paper walks through the construction of the list monad from the adjunction between Mon and Set:
Old answer:
Two references.
1) Category-extras delivers, as as always, with a representation of adjunctions and how monads arise from them. As usual, it's good to think with, but pretty light on documentation:
2) -Cafe also delivers with a promising but brief discussion on the role of adjunction. Some of which may help in interpreting category-extras:
Derek Elkins was showing me recently over dinner how the Cont Monad arises from composing the (_ -> k) contravariant functor with itself, since it happens to be self-adjoint. That's how you get (a -> k) -> k out of it. Its counit, however, leads to double negation elimination, which can't be written in Haskell.
For some Agda code that illustrates and proves this, please see
This is an old thread, but I found the question interesting,
so I did some calculations myself. Hopefully Bartosz is still there
and might read this..
In fact, the Eilenberg-Moore construction does give a very clear picture in this case.
(I will use CWM notation with Haskell like syntax)
Let T be the list monad < T,eta,mu > (eta = return and mu = concat)
and consider a T-algebra h:T a -> a.
(Note that T a = [a] is a free monoid <[a],[],(++)>, that is, identity [] and multiplication (++).)
By definition, h must satisfy h.T h == a and h.eta a== id.
Now, some easy diagram chasing proves that h actually induces a monoid structure on a (defined by x*y = h[x,y] ),
and that h becomes a monoid homomorphism for this structure.
Conversely, any monoid structure < a,a0,* > defined in Haskell is naturally defined as a T-algebra.
In this way (h = foldr ( * ) a0, a function that 'replaces' (:) with (*),and maps [] to a0, the identity).
So, in this case, the category of T-algebras is just the category of monoid structures definable in Haskell, HaskMon.
(Please check that the morphisms in T-algebras are actually monoid homomorphisms.)
It also characterizes lists as universal objects in HaskMon, just like free products in Grp, polynomial rings in CRng, etc.
The adjuction corresponding to the above construction is < F,G,eta,epsilon >
F:Hask -> HaskMon, which takes a type a to the 'free monoid generated by a',that is, [a],
G:HaskMon -> Hask, the forgetful functor (forget the multiplication),
eta:1 -> GF , the natural transformation defined by \x::a -> [x],
epsilon: FG -> 1 , the natural transformation defined by the folding function above
(the 'canonical surjection' from a free monoid to its quotient monoid)
Next, there is another 'Kleisli category' and the corresponding adjunction.
You can check that it is just the category of Haskell types with morphisms a -> T b,
where its compositions are given by the so-called 'Kleisli composition' (>=>).
A typical Haskell programmer will find this category more familiar.
Finally,as is illustrated in CWM, the category of T-algebras
(resp. Kleisli category) becomes the terminal (resp. initial) object in the category
of adjuctions that define the list monad T in a suitable sense.
I suggest to do a similar calculations for the binary tree functor T a = L a | B (T a) (T a) to check your understanding.
I've found a standard constructions of adjunct functors for any monad by Eilenberg-Moore, but I'm not sure if it adds any insight to the problem. The second category in the construction is a category of T-algebras. A T algebra adds a "product" to the initial category.
So how would it work for a list monad? The functor in the list monad consists of a type constructor, e.g., Int->[Int] and a mapping of functions (e.g., standard application of map to lists). An algebra adds a mapping from lists to elements. One example would be adding (or multiplying) all the elements of a list of integers. The functor F takes any type, e.g., Int, and maps it into the algebra defined on the lists of Int, where the product is defined by monadic join (or vice versa, join is defined as the product). The forgetful functor G takes an algebra and forgets the product. The pair F, G, of adjoint functors is then used to construct the monad in the usual way.
I must say I'm none the wiser.
If you are interested,here's some thoughts of a non-expert
on the role of monads and adjunctions in programming languages:
First of all, there exists for a given monad T a unique adjunction to the Kleisli category of T.
In Haskell,the use of monads is primarily confined to operations in this category
(which is essentially a category of free algebras,no quotients).
In fact, all one can do with a Haskell Monad is to compose some Kleisli morphisms of
type a->T b through the use of do expressions, (>>=), etc., to create a new
morphism. In this context, the role of monads is restricted to just the economy
of notation.One exploits associativity of morphisms to be able to write (say) [0,1,2]
instead of (Cons 0 (Cons 1 (Cons 2 Nil))), that is, you can write sequence as sequence,
not as a tree.
Even the use of IO monads is non essential, for the current Haskell type system is powerful
enough to realize data encapsulation (existential types).
This is my answer to your original question,
but I'm curious what Haskell experts have to say about this.
On the other hand, as we have noted, there's also a 1-1 correspondence between monads and
adjunctions to (T-)algebras. Adjoints, in MacLane's terms, are 'a way
to express equivalences of categories.'
In a typical setting of adjunctions <F,G>:X->A where F is some sort
of 'free algebra generator' and G a 'forgetful functor',the corresponding monad
will (through the use of T-algebras) describe how (and when) the algebraic structure of A is constructed on the objects of X.
In the case of Hask and the list monad T, the structure which T introduces is that
of monoid,and this can help us to establish properties (including the correctness) of code through algebraic
methods that the theory of monoids provides. For example, the function foldr (*) e::[a]->a can
readily be seen as an associative operation as long as <a,(*),e> is a monoid,
a fact which could be exploited by the compiler to optimize the computation (e.g. by parallelism).
Another application is to identify and classify 'recursion patterns' in functional programming using categorical
methods in the hope to (partially) dispose of 'the goto of functional programming', Y (the arbitrary recursion combinator).
Apparently, this kind of applications is one of the primary motivations of the creators of Category Theory (MacLane, Eilenberg, etc.),
namely, to establish natural equivalence of categories, and transfer a well-known method in one category
to another (e.g. homological methods to topological spaces,algebraic methods to programming, etc.).
Here, adjoints and monads are indispensable tools to exploit this connection of categories.
(Incidentally, the notion of monads (and its dual, comonads) is so general that one can even go so far as to define 'cohomologies' of
Haskell types.But I have not given a thought yet.)
As for non-determistic functions you mentioned, I have much less to say...
But note that; if an adjunction <F,G>:Hask->A for some category A defines the list monad T,
there must be a unique 'comparison functor' K:A->MonHask (the category of monoids definable in Haskell), see CWM.
This means, in effect, that your category of interest must be a category of monoids in some restricted form (e.g. it may lack some quotients but not free algebras) in order to define the list monad.
Finally,some remarks:
The binary tree functor I mentioned in my last posting easily generalizes to arbitrary data type
T a1 .. an = T1 T11 .. T1m | ....
Namely,any data type in Haskell naturally defines a monad (together with the corresponding category of algebras and the Kleisli category),
which is just the result of any data constructor in Haskell being total.
This is another reason why I consider Haskell's Monad class is not much more than a syntax sugar
(which is pretty important in practice,of course).

Explanation of Monad laws

From a gentle introduction to Haskell, there are the following monad laws. Can anyone intuitively explain what they mean?
return a >>= k = k a
m >>= return = m
xs >>= return . f = fmap f xs
m >>= (\x -> k x >>= h) = (m >>= k) >>= h
Here is my attempted explanation:
We expect the return function to wrap a so that its monadic nature is trivial. When we bind it to a function, there are no monadic effects, it should just pass a to the function.
The unwrapped output of m is passed to return that rewraps it. The monadic nature remains the same. So it is the same as the original monad.
The unwrapped value is passed to f then rewrapped. The monadic nature remains the same. This is the behavior expected when we transform a normal function into a monadic function.
I don't have an explanation for this law. This does say that the monad must be "almost associative" though.
Your descriptions seem pretty good. Generally people speak of three monad laws, which you have as 1, 2, and 4. Your third law is slightly different, and I'll get to that later.
For the three monad laws, I find it much easier to get an intuitive understanding of what they mean when they're re-written using Kleisli composition:
-- defined in Control.Monad
(>=>) :: Monad m => (a -> m b) -> (b -> m c) -> a -> m c
mf >=> n = \x -> mf x >>= n
Now the laws can be written as:
1) return >=> mf = mf -- left identity
2) mf >=> return = mf -- right identity
4) (f >=> g) >=> h = f >=> (g >=> h) -- associativity
1) Left Identity Law - returning a value doesn't change the value and doesn't do anything in the monad.
2) Right Identity Law - returning a value doesn't change the value and doesn't do anything in the monad.
4) Associativity - monadic composition is associative (I like KennyTM's answer for this)
The two identity laws basically say the same thing, but they're both necessary because return should have identity behavior on both sides of the bind operator.
Now for the third law. This law essentially says that both the Functor instance and your Monad instance behave the same way when lifting a function into the monad, and that neither does anything monadic. If I'm not mistaken, it's the case that when a monad obeys the other three laws and the Functor instance obeys the functor laws, then this statement will always be true.
A lot of this comes from the Haskell Wiki. The Typeclassopedia is a good reference too.
No disagreements with the other answers, but it might help to think of the monad laws as actually describing two sets of properties. As John says, the third law you mention is slightly different, but here's how the others can be split apart:
Functions that you bind to a monad compose just like regular functions.
As in John's answer, what's called a Kleisli arrow for a monad is a function with type a -> m b. Think of return as id and (<=<) as (.), and the monad laws are the translations of these:
id . f is equivalent to f
f . id is equivalent to f
(f . g) . h is equivalent to f . (g . h)
Sequences of monadic effects append like lists.
For the most part, you can think of the extra monadic structure as a sequence of extra behaviors associated with a monadic value; e.g. Maybe being "give up" for Nothing and "keep going" for Just. Combining two monadic actions then essentially concatenates the sequences of behaviors they held.
In this sense, return is again an identity--the null action, akin to an empty list of behaviors--and (>=>) is concatenation. So, the monad laws are translations of these:
[] ++ xs is equivalent to xs
xs ++ [] is equivalent to xs
(xs ++ ys) ++ zs is equivalent to xs ++ (ys ++ zs)
These three laws describe a ridiculously common pattern, which Haskell unfortunately can't quite express in full generality. If you're interested, Control.Category gives a generalization of "things that look like function composition", while Data.Monoid generalizes the latter case where no type parameters are involved.
In terms of do notation, rule 4 means we can add an extra do block to group a sequence of monadic operations.
do do
y <- do
x <- m x <- m
y <- k x <=> k x
h y h y
This allows functions that return a monadic value to work properly.
The first three laws say that "return" only wraps a value and does nothing else. So you can eliminate "return" calls without changing the semantics.
The last law is associativity for bind. It means that you take something like:
x <- foo
bar x
z <- baz
and turn it into
x <- foo
bar x
z <- baz
without changing the meaning. Of course you wouldn't do exactly this, but you might want to put the inner "do" clause in an "if" statement and want it to mean the same when the "if" is true.
Sometimes monads don't exactly follow these laws, particularly when some kind of bottom value occurs. That's OK as long as its documented and is "morally correct" (i.e. the laws are followed for non-bottom values, or the results are considered equivalent in some other way).
