Haskell monad return arbitrary data type

Haskell monad return arbitrary data type - haskell

I am having trouble defining the return over a custom defined recursive data type.
The data type is as follows:
data A a = B a | C (A a) (A a)
However, I don't know how to define the return statement since I can't figure out when to return B value and when to recursively return C.
Any help is appreciated!

One way to define a Monad instance for this type is to treat it as a free monad. In effect, this takes A a to be a little syntax with one binary operator C, and variables represented by values of type a embedded by the B constructor. That makes return the B constructor, embedding variables, and >>= the operator which performs subsitution.
instance Monad A where
return = B
B x >>= f = f x
C l r >>= f = C (l >>= f) (r >>= f)
It's not hard to see that (>>= B) performs the identity substitution, and that composition of substitutions is associative.
Another, more "imperative" way to see this monad is that it captures the idea of computations that can flip coins (or read a bitstream or otherwise have some access to a sequence of binary choices).
data Coin = Heads | Tails
Any computation which can flip coins must either stop flipping and be a value (with B), or flip a coin and carry on (with C) in one way if the coin comes up Heads and another if Tails. The monadic operation which flips a coin and tells you what came up is
coin :: A Coin
coin = C (B Heads) (B Tails)
The >>= of A can now be seen as sequencing coin-flipping computations, allowing the choice of a subsequent computation to depend on the value delivered by an earlier computation.
If you have an infinite stream of coins, then (apart from your extraordinary good fortune) you're also lucky enough to be able to run any A-computation to its value, as follows
data Stream x = x :> Stream x -- actually, I mean "codata"
flipping :: Stream Coin -> A v -> v
flipping _ (B v) = v
flipping (Heads :> cs) (C h t) = flipping cs h
flipping (Tails :> cs) (C h t) = flipping cs t
The general pattern in this sort of monad is to have one constructor for returning a value (B here) and a bunch of others which represent the choice of possible operations and the different ways computations can continue given the result of an operation. Here C has no non-recursive parameters and two subtrees, so I could tell that there must be just one operation and that it must have just two possible outcomes, hence flipping a coin.
So, it's substitution for a syntax with variables and one binary operator, or it's a way of sequencing computations that flip coins. Which view is better? Well... they're two sides of the same coin.

A good rule of thumb for return is to make it the simplest possible thing which could work (of course, any definition that satisfies the monad laws is fine, but usually you want something with minimal structure). In this case it's as simple as return = B (now write a (>>=) to match!).
By the way, this is an example of a free monad -- in fact, it's the example given in the documentation, so I'll let the documentation speak for itself.

Related

Monads with Join() instead of Bind()

Monads are usually explained in turns of return and bind. However, I gather you can also implement bind in terms of join (and fmap?)
In programming languages lacking first-class functions, bind is excruciatingly awkward to use. join, on the other hand, looks quite easy.
I'm not completely sure I understand how join works, however. Obviously, it has the [Haskell] type
join :: Monad m => m (m x) -> m x
For the list monad, this is trivially and obviously concat. But for a general monad, what, operationally, does this method actually do? I see what it does to the type signatures, but I'm trying to figure out how I'd write something like this in, say, Java or similar.
(Actually, that's easy: I wouldn't. Because generics is broken. ;-) But in principle the question still stands...)
Oops. It looks like this has been asked before:
Monad join function
Could somebody sketch out some implementations of common monads using return, fmap and join? (I.e., not mentioning >>= at all.) I think perhaps that might help it to sink in to my dumb brain...

Without plumbing the depths of metaphor, might I suggest to read a typical monad m as "strategy to produce a", so the type m value is a first class "strategy to produce a value". Different notions of computation or external interaction require different types of strategy, but the general notion requires some regular structure to make sense:
if you already have a value, then you have a strategy to produce a value (return :: v -> m v) consisting of nothing other than producing the value that you have;
if you have a function which transforms one sort of value into another, you can lift it to strategies (fmap :: (v -> u) -> m v -> m u) just by waiting for the strategy to deliver its value, then transforming it;
if you have a strategy to produce a strategy to produce a value, then you can construct a strategy to produce a value (join :: m (m v) -> m v) which follows the outer strategy until it produces the inner strategy, then follows that inner strategy all the way to a value.
Let's have an example: leaf-labelled binary trees...
data Tree v = Leaf v | Node (Tree v) (Tree v)
...represent strategies to produce stuff by tossing a coin. If the strategy is Leaf v, there's your v; if the strategy is Node h t, you toss a coin and continue by strategy h if the coin shows "heads", t if it's "tails".
instance Monad Tree where
return = Leaf
A strategy-producing strategy is a tree with tree-labelled leaves: in place of each such leaf, we can just graft in the tree which labels it...
join (Leaf tree) = tree
join (Node h t) = Node (join h) (join t)
...and of course we have fmap which just relabels leaves.
instance Functor Tree where
fmap f (Leaf x) = Leaf (f x)
fmap f (Node h t) = Node (fmap f h) (fmap f t)
Here's an strategy to produce a strategy to produce an Int.
Toss a coin: if it's "heads", toss another coin to decide between two strategies (producing, respectively, "toss a coin for producing 0 or producing 1" or "produce 2"); if it's "tails" produce a third ("toss a coin for producing 3 or tossing a coin for 4 or 5").
That clearly joins up to make a strategy producing an Int.
What we're making use of is the fact that a "strategy to produce a value" can itself be seen as a value. In Haskell, the embedding of strategies as values is silent, but in English, I use quotation marks to distinguish using a strategy from just talking about it. The join operator expresses the strategy "somehow produce then follow a strategy", or "if you are told a strategy, you may then use it".
(Meta. I'm not sure whether this "strategy" approach is a suitably generic way to think about monads and the value/computation distinction, or whether it's just another crummy metaphor. I do find leaf-labelled tree-like types a useful source of intuition, which is perhaps not a surprise as they're the free monads, with just enough structure to be monads at all, but no more.)
PS The type of "bind"
(>>=) :: m v -> (v -> m w) -> m w
says "if you have a strategy to produce a v, and for each v a follow-on strategy to produce a w, then you have a strategy to produce a w". How can we capture that in terms of join?
mv >>= v2mw = join (fmap v2mw mv)
We can relabel our v-producing strategy by v2mw, producing instead of each v value the w-producing strategy which follows on from it — ready to join!

join = concat -- []
join f = \x -> f x x -- (e ->)
join f = \s -> let (f', s') = f s in f' s' -- State
join (Just (Just a)) = Just a; join _ = Nothing -- Maybe
join (Identity (Identity a)) = Identity a -- Identity
join (Right (Right a)) = Right a; join (Right (Left e)) = Left e;
join (Left e) = Left e -- Either
join ((a, m), m') = (a, m' `mappend` m) -- Writer
-- N.B. there is a non-newtype-wrapped Monad instance for tuples that
-- behaves like the Writer instance, but with the tuple order swapped
join f = \k -> f (\f' -> f' k) -- Cont

Calling fmap (f :: a -> m b) (x ::ma) produces values (y ::m(m b)) so it is a very natural thing to use join to get back values (z :: m b).
Then bind is defined simply as bind ma f = join (fmap f ma), thus achieving the Kleisly compositionality of functions of (:: a -> m b) variety, which is what it is really all about:
ma `bind` (f >=> g) = (ma `bind` f) `bind` g -- bind = (>>=)
= (`bind` g) . (`bind` f) $ ma
= join . fmap g . join . fmap f $ ma
And so, with flip bind = (=<<), we have
((g <=< f) =<<) = (g =<<) . (f =<<) = join . (g <$>) . join . (f <$>)

OK, so it's not really good form to answer your own question, but I'm going to note down my thinking in case it enlightens anybody else. (I doubt it...)
If a monad can be thought of as a "container", then both return and join have pretty obvious semantics. return generates a 1-element container, and join turns a container of containers into a single container. Nothing hard about that.
So let us focus on monads which are more naturally thought of as "actions". In that case, m x is some sort of action which yields a value of type x when you "execute" it. return x does nothing special, and then yields x. fmap f takes an action that yields an x, and constructs an action that computes x and then applies f to it, and returns the result. So far, so good.
It's fairly obvious that if f itself generates an action, then what you end up with is m (m x). That is, an action that computes another action. In a way, that's maybe even simpler to wrap your mind around than the >>= function which takes an action and a "function that produces an action" and so on.
So, logically speaking, it seems join would run the first action, take the action it produces, and then run that. (Or rather, join would return an action that does what I just described, if you want to split hairs.)
That seems to be the central idea. To implement join, you want to run an action, which then gives you another action, and then you run that. (Whatever "run" happens to mean for this particular monad.)
Given this insight, I can take a stab at writing some join implementations:
join Nothing = Nothing
join (Just mx) = mx
If the outer action is Nothing, return Nothing, else return the inner action. Then again, Maybe is more of a container than an action, so let's try something else...
newtype Reader s x = Reader (s -> x)
join (Reader f) = Reader (\ s -> let Reader g = f s in g s)
That was... painless. A Reader is really just a function that takes a global state and only then returns its result. So to unstack, you apply the global state to the outer action, which returns a new Reader. You then apply the state to this inner function as well.
In a way, it's perhaps easier than the usual way:
Reader f >>= g = Reader (\ s -> let x = f s in g x)
Now, which one is the reader function, and which one is the function that computes the next reader...?
Now let's try the good old State monad. Here every function takes an initial state as input but also returns a new state along with its output.
data State s x = State (s -> (s, x))
join (State f) = State (\ s0 -> let (s1, State g) = f s0 in g s1)
That wasn't too hard. It's basically run followed by run.
I'm going to stop typing now. Feel free to point out all the glitches and typos in my examples... :-/

I've found many explanations of monads that say "you don't have to know anything about category theory, really, just think of monads as burritos / space suits / whatever".
Really, the article that demystified monads for me just said what categories were, described monads (including join and bind) in terms of categories, and didn't bother with any bogus metaphors:
http://en.wikibooks.org/wiki/Haskell/Category_theory
I think the article is very readable without much math knowledge required.

Asking what a type signature in Haskell does is rather like asking what an interface in Java does.
It, in some literal sense, "doesn't". (Though, of course, you will typically have some sort of purpose associated with it, that's mostly in your mind, and mostly not in the implementation.)
In both cases you are declaring legal sequences of symbols in the language which will be used in later definitions.
Of course, in Java, I suppose you could say that an interface corresponds to a type signature which is going to be implemented literally in the VM. You can get some polymorphism this way -- you can define a name that accepts an interface, and you can provide a different definition for the name which accepts a different interface. Something similar happens in Haskell, where you can provide a declaration for a name which accepts one type and then another declaration for that name which treats a different type.

This is Monad explained in one picture. The 2 functions in the green category are not composable, when being mapped to the blue category with join . fmap (strictly speaking, they are one category), they become composable. Monad is about turning a function of type T -> Monad<U> into a function of type Monad<T> -> Monad<U>.

Haskell Monad bind operator confusion

Okay, so I am not a Haskell programmer, but I am absolutely intrigued by a lot of the ideas behind Haskell and am looking into learning it. But I'm stuck at square one: I can't seem to wrap my head around Monads, which seem to be fairly fundamental. I know there are a million questions on SO asking to explain Monads, so I'm going to be a little more specific about what's bugging me:
I read this excellent article (an introduction in Javascript), and thought that I understood Monads completely. Then I read the Wikipedia entry on Monads, and saw this:
A binding operation of polymorphic type (M t)→(t→M u)→(M u), which Haskell represents by the infix operator >>=. Its first argument is a value in a monadic type, its second argument is a function that maps from the underlying type of the first argument to another monadic type, and its result is in that other monadic type.
Okay, in the article that I cited, bind was a function which took only one argument. Wikipedia says two. What I thought I understood about Monads was the following:
A Monad's purpose is to take a function with different input and output types and to make it composable. It does this by wrapping the input and output types with a single monadic type.
A Monad consists of two interrelated functions: bind and unit. Bind takes a non-composable function f and returns a new function g that accepts the monadic type as input and returns the monadic type. g is composable. The unit function takes an argument of the type that f expected, and wraps it in the monadic type. This can then be passed to g, or to any composition of functions like g.
But there must be something wrong, because my concept of bind takes one argument: a function. But (according to Wikipedia) Haskell's bind actually takes two arguments! Where is my mistake?

You are not making a mistake. The key idea to understand here is currying - that a Haskell function of two arguments can be seen in two ways. The first is as simply a function of two arguments. If you have, for example, (+), this is usually seen as taking two arguments and adding them. The other way to see it is as a addition machine producer. (+) is a function that takes a number, say x, and makes a function that will add x.
(+) x = \y -> x + y
(+) x y = (\y -> x + y) y = x + y
When dealing with monads, sometimes it is probably better, as ephemient mentioned above, to think of =<<, the flipped version of >>=. There are two ways to look at this:
(=<<) :: (a -> m b) -> m a -> m b
which is a function of two arguments, and
(=<<) :: (a -> m b) -> (m a -> m b)
which transforms the input function to an easily composed version as the article mentioned. These are equivalent just like (+) as I explained before.

Allow me to tear down your beliefs about Monads. I sincerely hope you realize that I am not trying to be rude; I'm simply trying to avoid mincing words.
A Monad's purpose is to take a function with different input and output types and to make it composable. It does this by wrapping the input and output types with a single monadic type.
Not exactly. When you start a sentence with "A Monad's purpose", you're already on the wrong foot. Monads don't necessarily have a "purpose". Monad is simply an abstraction, a classification which applies to certain types and not to others. The purpose of the Monad abstraction is simply that, abstraction.
A Monad consists of two interrelated functions: bind and unit.
Yes and no. The combination of bind and unit are sufficient to define a Monad, but the combination of join, fmap, and unit is equally sufficient. The latter is, in fact, the way that Monads are typically described in Category Theory.
Bind takes a non-composable function f and returns a new function g that accepts the monadic type as input and returns the monadic type.
Again, not exactly. A monadic function f :: a -> m b is perfectly composable, with certain types. I can post-compose it with a function g :: m b -> c to get g . f :: a -> c, or I can pre-compose it with a function h :: c -> a to get f . h :: c -> m b.
But you got the second part absolutely right: (>>= f) :: m a -> m b. As others have noted, Haskell's bind function takes the arguments in the opposite order.
g is composable.
Well, yes. If g :: m a -> m b, then you can pre-compose it with a function f :: c -> m a to get g . f :: c -> m b, or you can post-compose it with a function h :: m b -> c to get h . g :: m a -> c. Note that c could be of the form m v where m is a Monad. I suppose when you say "composable" you mean to say "you can compose arbitrarily long chains of functions of this form", which is sort of true.
The unit function takes an argument of the type that f expected, and wraps it in the monadic type.
A roundabout way of saying it, but yes, that's about right.
This [the result of applying unit to some value] can then be passed to g, or to any composition of functions like g.
Again, yes. Although it is generally not idiomatic Haskell to call unit (or in Haskell, return) and then pass that to (>>= f).
-- instead of
return x >>= f >>= g
-- simply go with
f x >>= g
-- instead of
\x -> return x >>= f >>= g
-- simply go with
f >=> g
-- or
g <=< f

The article you link is based on sigfpe's article, which uses a flipped definition of bind:
The first thing is that I've flipped the definition of bind and written it as the word 'bind' whereas it's normally written as the operator >>=. So bind f x is normally written as x >>= f.
So, the Haskell bind takes a value enclosed in a monad, and returns a function, which takes a function and then calls it with the extracted value. I might be using non-precise terminology, so maybe better with code.
You have:
sine x = (sin x, "sine was called.")
cube x = (x * x * x, "cube was called.")
Now, translating your JS bind (Haskell does automatic currying, so calling bind f returns a function that takes a tuple, and then pattern matching takes care of unpacking it into x and s, I hope that's understandable):
bind f (x, s) = (y, s ++ t)
where (y, t) = f x
You can see it working:
*Main> :t sine
sine :: Floating t => t -> (t, [Char])
*Main> :t bind sine
bind sine :: Floating t1 => (t1, [Char]) -> (t1, [Char])
*Main> (bind sine . bind cube) (3, "")
(0.956375928404503,"cube was called.sine was called.")
Now, let's reverse arguments of bind:
bind' (x, s) f = (y, s ++ t)
where (y, t) = f x
You can clearly see it's still doing the same thing, but with a bit different syntax:
*Main> bind' (bind' (3, "") cube) sine
(0.956375928404503,"cube was called.sine was called.")
Now, Haskell has a syntax trick that allows you to use any function as an infix operator. So you can write:
*Main> (3, "") `bind'` cube `bind'` sine
(0.956375928404503,"cube was called.sine was called.")
Now rename bind' to >>= ((3, "") >>= cube >>= sine) and you've got what you were looking for. As you can see, with this definition, you can effectively get rid of the separate composition operator.
Translating the new thing back into JavaScript would yield something like this (notice that again, I only reverse the argument order):
var bind = function(tuple) {
return function(f) {
var x = tuple[0],
s = tuple[1],
fx = f(x),
y = fx[0],
t = fx[1];
return [y, s + t];
};
};
// ugly, but it's JS, after all
var f = function(x) { return bind(bind(x)(cube))(sine); }
f([3, ""]); // [0.956375928404503, "cube was called.sine was called."]
Hope this helps, and not introduces more confusion — the point is that those two bind definitions are equivalent, only differing in call syntax.

What are arrows, and how can I use them?

I tried to learn the meaning of arrows, but I didn't understand them.
I used the Wikibooks tutorial. I think Wikibook's problem is mainly that it seems to be written for somebody who already understands the topic.
Can somebody explain what arrows are and how I can use them?

I don't know a tutorial, but I think it's easiest to understand arrows if you look at some concrete examples. The biggest problem I had learning how to use arrows was that none of the tutorials or examples actually show how to use arrows, just how to compose them. So, with that in mind, here's my mini-tutorial. I'll examine two different arrows: functions and a user-defined arrow type MyArr.
-- type representing a computation
data MyArr b c = MyArr (b -> (c,MyArr b c))
1) An Arrow is a calculation from input of a specified type to output of a specified type. The arrow typeclass takes three type arguments: the arrow type, the input type, and the output type. Looking at the instance head for arrow instances we find:
instance Arrow (->) b c where
instance Arrow MyArr b c where
The Arrow (either (->) or MyArr) is an abstraction of a computation.
For a function b -> c, b is the input and c is the output.
For a MyArr b c, b is the input and c is the output.
2) To actually run an arrow computation, you use a function specific to your arrow type. For functions you simply apply the function to an argument. For other arrows, there needs to be a separate function (just like runIdentity, runState, etc. for monads).
-- run a function arrow
runF :: (b -> c) -> b -> c
runF = id
-- run a MyArr arrow, discarding the remaining computation
runMyArr :: MyArr b c -> b -> c
runMyArr (MyArr step) = fst . step
3) Arrows are frequently used to process a list of inputs. For functions these can be done in parallel, but for some arrows output at any given step depends upon previous inputs (e.g. keeping a running total of inputs).
-- run a function arrow over multiple inputs
runFList :: (b -> c) -> [b] -> [c]
runFList f = map f
-- run a MyArr over multiple inputs.
-- Each step of the computation gives the next step to use
runMyArrList :: MyArr b c -> [b] -> [c]
runMyArrList _ [] = []
runMyArrList (MyArr step) (b:bs) = let (this, step') = step b
in this : runMyArrList step' bs
This is one reason Arrows are useful. They provide a computation model that can implicitly make use of state without ever exposing that state to the programmer. The programmer can use arrowized computations and combine them to create sophisticated systems.
Here's a MyArr that keeps count of the number of inputs it has received:
-- count the number of inputs received:
count :: MyArr b Int
count = count' 0
where
count' n = MyArr (\_ -> (n+1, count' (n+1)))
Now the function runMyArrList count will take a list length n as input and return a list of Ints from 1 to n.
Note that we still haven't used any "arrow" functions, that is either Arrow class methods or functions written in terms of them.
4) Most of the code above is specific to each Arrow instance[1]. Everything in Control.Arrow (and Control.Category) is about composing arrows to make new arrows. If we pretend that Category is part of Arrow instead of a separate class:
-- combine two arrows in sequence
>>> :: Arrow a => a b c -> a c d -> a b d
-- the function arrow instance
-- >>> :: (b -> c) -> (c -> d) -> (b -> d)
-- this is just flip (.)
-- MyArr instance
-- >>> :: MyArr b c -> MyArr c d -> MyArr b d
The >>> function takes two arrows and uses the output of the first as input to the second.
Here's another operator, commonly called "fanout":
-- &&& applies two arrows to a single input in parallel
&&& :: Arrow a => a b c -> a b c' -> a b (c,c')
-- function instance type
-- &&& :: (b -> c) -> (b -> c') -> (b -> (c,c'))
-- MyArr instance type
-- &&& :: MyArr b c -> MyArr b c' -> MyArr b (c,c')
-- first and second omitted for brevity, see the accepted answer from KennyTM's link
-- for further details.
Since Control.Arrow provides a means to combine computations, here's one example:
-- function that, given an input n, returns "n+1" and "n*2"
calc1 :: Int -> (Int,Int)
calc1 = (+1) &&& (*2)
I've frequently found functions like calc1 useful in complicated folds, or functions that operate on pointers for example.
The Monad type class provides us with a means to combine monadic computations into a single new monadic computation using the >>= function. Similarly, the Arrow class provides us with means to combine arrowized computations into a single new arrowized computation using a few primitive functions (first, arr, and ***, with >>> and id from Control.Category). Also similar to Monads, the question of "What does an arrow do?" can't be generally answered. It depends on the arrow.
Unfortunately I don't know of many examples of arrow instances in the wild. Functions and FRP seem to be the most common applications. HXT is the only other significant usage that comes to mind.
[1] Except count. It's possible to write a count function that does the same thing for any instance of ArrowLoop.

From a glance at your history on Stack Overflow, I'm going to assume that you're comfortable with some of the other standard type classes, particularly Functor and Monoid, and start with a brief analogy from those.
The single operation on Functor is fmap, which serves as a generalized version of map on lists. This is pretty much the entire purpose of the type class; it defines "things that you can map over". So, in a sense Functor represents a generalization of that particular aspect of lists.
The operations for Monoid are generalized versions of the empty list and (++), and it defines "things that can be combined associatively, with a particular thing that's an identity value". Lists are pretty much the simplest thing that fits that description, and Monoid represents a generalization of that aspect of lists.
In the same way as the above two, the operations on the Category type class are generalized versions of id and (.), and it defines "things connecting two types in a particular direction, that can be connected head-to-tail". So, this represents a generalization of that aspect of functions. Notably not included in the generalization are currying or function application.
The Arrow type class builds off of Category, but the underlying concept is the same: Arrows are things that compose like functions and have an "identity arrow" defined for any type. The additional operations defined on the Arrow class itself just define a way to lift an arbitrary function to an Arrow and a way to combine two arrows "in parallel" as a single arrow between tuples.
So, the first thing to keep in mind here is that expressions building Arrows are, essentially, elaborate function composition. The combinators like (***) and (>>>) are for writing "pointfree" style, while proc notation gives a way to assign temporary names to inputs and outputs while wiring things up.
A useful thing to note here is that, even though Arrows are sometimes described as being "the next step" from Monads, there's really not a terribly meaningful relationship there. For any Monad you can work with Kleisli arrows, which are just functions with a type like a -> m b. The (<=<) operator in Control.Monad is arrow composition for these. On the other hand, Arrows don't get you a Monad unless you also include the ArrowApply class. So there's no direct connection as such.
The key difference here is that whereas Monads can be used to sequence computations and do things step-by-step, Arrows are in some sense "timeless" just like regular functions. They can include extra machinery and functionality that gets spliced in by (.), but it's more like building a pipeline, not accumulating actions.
The other related type classes add additional functionality to an arrow, such as being able to combine arrows with Either as well as (,).
My favorite example of an Arrow is stateful stream transducers, which look something like this:
data StreamTrans a b = StreamTrans (a -> (b, StreamTrans a b))
A StreamTrans arrow converts an input value to an output and an "updated" version of itself; consider the ways that this differs from a stateful Monad.
Writing instances for Arrow and its related type classes for the above type may be a good exercise for understanding how they work!
I also wrote a similar answer previously that you may find helpful.

I'd like to add that arrows in Haskell are far simpler than they might appear
based on the literature. They are simply abstractions of functions.
To see how this is practically useful, consider that you have a bunch of
functions you want to compose, where some of them are pure and some are
monadic. For example, f :: a -> b, g :: b -> m1 c, and h :: c -> m2 d.
Knowing each of the types involved, I could build a composition by hand, but
the output type of the composition would have to reflect the intermediate
monad types (in the above case, m1 (m2 d)). What if I just wanted to treat
the functions as if they were just a -> b, b -> c, and c -> d? That is,
I want to abstract away the presence of monads and reason only about the
underlying types. I can use arrows to do exactly this.
Here is an arrow which abstracts away the presence of IO for functions in the
IO monad, such that I can compose them with pure functions without the
composing code needing to know that IO is involved. We start by defining an
IOArrow to wrap IO functions:
data IOArrow a b = IOArrow { runIOArrow :: a -> IO b }
instance Category IOArrow where
id = IOArrow return
IOArrow f . IOArrow g = IOArrow $ f <=< g
instance Arrow IOArrow where
arr f = IOArrow $ return . f
first (IOArrow f) = IOArrow $ \(a, c) -> do
x <- f a
return (x, c)
Then I make some simple functions that I want to compose:
foo :: Int -> String
foo = show
bar :: String -> IO Int
bar = return . read
And use them:
main :: IO ()
main = do
let f = arr (++ "!") . arr foo . IOArrow bar . arr id
result <- runIOArrow f "123"
putStrLn result
Here I am calling IOArrow and runIOArrow, but if I were passing these arrows
around in a library of polymorphic functions, they would only need to accept
arguments of type "Arrow a => a b c". None of the library code would need to
be made aware that a monad was involved. Only the creator and end user of the
arrow needs to know.
Generalizing IOArrow to work for functions in any Monad is called the "Kleisli
arrow", and there is already a builtin arrow for doing just that:
main :: IO ()
main = do
let g = arr (++ "!") . arr foo . Kleisli bar . arr id
result <- runKleisli g "123"
putStrLn result
You could of course also use arrow composition operators, and proc syntax, to
make it a little clearer that arrows are involved:
arrowUser :: Arrow a => a String String -> a String String
arrowUser f = proc x -> do
y <- f -< x
returnA -< y
main :: IO ()
main = do
let h = arr (++ "!")
<<< arr foo
<<< Kleisli bar
<<< arr id
result <- runKleisli (arrowUser h) "123"
putStrLn result
Here it should be clear that although main knows the IO monad is involved,
arrowUser does not. There would be no way of "hiding" IO from arrowUser
without arrows -- not without resorting to unsafePerformIO to turn the
intermediate monadic value back into a pure one (and thus losing that context
forever). For example:
arrowUser' :: (String -> String) -> String -> String
arrowUser' f x = f x
main' :: IO ()
main' = do
let h = (++ "!") . foo . unsafePerformIO . bar . id
result = arrowUser' h "123"
putStrLn result
Try writing that without unsafePerformIO, and without arrowUser' having to
deal with any Monad type arguments.

There are John Hughes's Lecture Notes from an AFP (Advanced Functional Programming) workshop. Note they were written before the Arrow classes were changed in the Base libraries:
http://www.cse.chalmers.se/~rjmh/afp-arrows.pdf

When I began exploring Arrow compositions (essentially Monads), my approach was to break out of the functional syntax and composition it is most commonly associated with and begin by understanding its principles using a more declarative approach. With this in mind, I find the following breakdown more intuitive:
function(x) {
func1result = func1(x)
if(func1result == null) {
return null
} else {
func2result = func2(func1result)
if(func2result == null) {
return null
} else {
func3(func2result)
}
So, essentially, for some value x, call one function first that we assume may return null (func1), another that may retun null or be assigned to null interchangably, finally, a third function that may also return null. Now given the value x, pass x to func3, only then, if it does not return null, pass this value into func2, and only if this value is not null, pass this value into func1. It's more deterministic and the control flow allows you to construct more sophisticated exception handling.
Here we can utilise the arrow composition: (func3 <=< func2 <=< func1) x.

What is a monad?

Having briefly looked at Haskell recently, what would be a brief, succinct, practical explanation as to what a monad essentially is?
I have found most explanations I've come across to be fairly inaccessible and lacking in practical detail.

First: The term monad is a bit vacuous if you are not a mathematician. An alternative term is computation builder which is a bit more descriptive of what they are actually useful for.
They are a pattern for chaining operations. It looks a bit like method chaining in object-oriented languages, but the mechanism is slightly different.
The pattern is mostly used in functional languages (especially Haskell which uses monads pervasively) but can be used in any language which support higher-order functions (that is, functions which can take other functions as arguments).
Arrays in JavaScript support the pattern, so let’s use that as the first example.
The gist of the pattern is we have a type (Array in this case) which has a method which takes a function as argument. The operation supplied must return an instance of the same type (i.e. return an Array).
First an example of method chaining which does not use the monad pattern:
[1,2,3].map(x => x + 1)
The result is [2,3,4]. The code does not conform to the monad pattern, since the function we are supplying as an argument returns a number, not an Array. The same logic in monad form would be:
[1,2,3].flatMap(x => [x + 1])
Here we supply an operation which returns an Array, so now it conforms to the pattern. The flatMap method executes the provided function for every element in the array. It expects an array as result for each invocation (rather than single values), but merges the resulting set of arrays into a single array. So the end result is the same, the array [2,3,4].
(The function argument provided to a method like map or flatMap is often called a "callback" in JavaScript. I will call it the "operation" since it is more general.)
If we chain multiple operations (in the traditional way):
[1,2,3].map(a => a + 1).filter(b => b != 3)
Results in the array [2,4]
The same chaining in monad form:
[1,2,3].flatMap(a => [a + 1]).flatMap(b => b != 3 ? [b] : [])
Yields the same result, the array [2,4].
You will immediately notice that the monad form is quite a bit uglier than the non-monad! This just goes to show that monads are not necessarily “good”. They are a pattern which is sometimes beneficial and sometimes not.
Do note that the monad pattern can be combined in a different way:
[1,2,3].flatMap(a => [a + 1].flatMap(b => b != 3 ? [b] : []))
Here the binding is nested rather than chained, but the result is the same. This is an important property of monads as we will see later. It means two operations combined can be treated the same as a single operation.
The operation is allowed to return an array with different element types, for example transforming an array of numbers into an array of strings or something else; as long as it still an Array.
This can be described a bit more formally using Typescript notation. An array has the type Array<T>, where T is the type of the elements in the array. The method flatMap() takes a function argument of the type T => Array<U> and returns an Array<U>.
Generalized, a monad is any type Foo<Bar> which has a "bind" method which takes a function argument of type Bar => Foo<Baz> and returns a Foo<Baz>.
This answers what monads are. The rest of this answer will try to explain through examples why monads can be a useful pattern in a language like Haskell which has good support for them.
Haskell and Do-notation
To translate the map/filter example directly to Haskell, we replace flatMap with the >>= operator:
[1,2,3] >>= \a -> [a+1] >>= \b -> if b == 3 then [] else [b]
The >>= operator is the bind function in Haskell. It does the same as flatMap in JavaScript when the operand is a list, but it is overloaded with different meaning for other types.
But Haskell also has a dedicated syntax for monad expressions, the do-block, which hides the bind operator altogether:
do
a <- [1,2,3]
b <- [a+1]
if b == 3 then [] else [b]
This hides the "plumbing" and lets you focus on the actual operations applied at each step.
In a do-block, each line is an operation. The constraint still holds that all operations in the block must return the same type. Since the first expression is a list, the other operations must also return a list.
The back-arrow <- looks deceptively like an assignment, but note that this is the parameter passed in the bind. So, when the expression on the right side is a List of Integers, the variable on the left side will be a single Integer – but will be executed for each integer in the list.
Example: Safe navigation (the Maybe type)
Enough about lists, lets see how the monad pattern can be useful for other types.
Some functions may not always return a valid value. In Haskell this is represented by the Maybe-type, which is an option that is either Just value or Nothing.
Chaining operations which always return a valid value is of course straightforward:
streetName = getStreetName (getAddress (getUser 17))
But what if any of the functions could return Nothing? We need to check each result individually and only pass the value to the next function if it is not Nothing:
case getUser 17 of
Nothing -> Nothing
Just user ->
case getAddress user of
Nothing -> Nothing
Just address ->
getStreetName address
Quite a lot of repetitive checks! Imagine if the chain was longer. Haskell solves this with the monad pattern for Maybe:
do
user <- getUser 17
addr <- getAddress user
getStreetName addr
This do-block invokes the bind-function for the Maybe type (since the result of the first expression is a Maybe). The bind-function only executes the following operation if the value is Just value, otherwise it just passes the Nothing along.
Here the monad-pattern is used to avoid repetitive code. This is similar to how some other languages use macros to simplify syntax, although macros achieve the same goal in a very different way.
Note that it is the combination of the monad pattern and the monad-friendly syntax in Haskell which result in the cleaner code. In a language like JavaScript without any special syntax support for monads, I doubt the monad pattern would be able to simplify the code in this case.
Mutable state
Haskell does not support mutable state. All variables are constants and all values immutable. But the State type can be used to emulate programming with mutable state:
add2 :: State Integer Integer
add2 = do
-- add 1 to state
x <- get
put (x + 1)
-- increment in another way
modify (+1)
-- return state
get
evalState add2 7
=> 9
The add2 function builds a monad chain which is then evaluated with 7 as the initial state.
Obviously this is something which only makes sense in Haskell. Other languages support mutable state out of the box. Haskell is generally "opt-in" on language features - you enable mutable state when you need it, and the type system ensures the effect is explicit. IO is another example of this.
IO
The IO type is used for chaining and executing “impure” functions.
Like any other practical language, Haskell has a bunch of built-in functions which interface with the outside world: putStrLine, readLine and so on. These functions are called “impure” because they either cause side effects or have non-deterministic results. Even something simple like getting the time is considered impure because the result is non-deterministic – calling it twice with the same arguments may return different values.
A pure function is deterministic – its result depends purely on the arguments passed and it has no side effects on the environment beside returning a value.
Haskell heavily encourages the use of pure functions – this is a major selling point of the language. Unfortunately for purists, you need some impure functions to do anything useful. The Haskell compromise is to cleanly separate pure and impure, and guarantee that there is no way that pure functions can execute impure functions, directly or indirect.
This is guaranteed by giving all impure functions the IO type. The entry point in Haskell program is the main function which have the IO type, so we can execute impure functions at the top level.
But how does the language prevent pure functions from executing impure functions? This is due to the lazy nature of Haskell. A function is only executed if its output is consumed by some other function. But there is no way to consume an IO value except to assign it to main. So if a function wants to execute an impure function, it has to be connected to main and have the IO type.
Using monad chaining for IO operations also ensures that they are executed in a linear and predictable order, just like statements in an imperative language.
This brings us to the first program most people will write in Haskell:
main :: IO ()
main = do
putStrLn ”Hello World”
The do keyword is superfluous when there is only a single operation and therefore nothing to bind, but I keep it anyway for consistency.
The () type means “void”. This special return type is only useful for IO functions called for their side effect.
A longer example:
main = do
putStrLn "What is your name?"
name <- getLine
putStrLn ("hello" ++ name)
This builds a chain of IO operations, and since they are assigned to the main function, they get executed.
Comparing IO with Maybe shows the versatility of the monad pattern. For Maybe, the pattern is used to avoid repetitive code by moving conditional logic to the binding function. For IO, the pattern is used to ensure that all operations of the IO type are sequenced and that IO operations cannot "leak" to pure functions.
Summing up
In my subjective opinion, the monad pattern is only really worthwhile in a language which has some built-in support for the pattern. Otherwise it just leads to overly convoluted code. But Haskell (and some other languages) have some built-in support which hides the tedious parts, and then the pattern can be used for a variety of useful things. Like:
Avoiding repetitive code (Maybe)
Adding language features like mutable state or exceptions for delimited areas of the program.
Isolating icky stuff from nice stuff (IO)
Embedded domain-specific languages (Parser)
Adding GOTO to the language.

Explaining "what is a monad" is a bit like saying "what is a number?" We use numbers all the time. But imagine you met someone who didn't know anything about numbers. How the heck would you explain what numbers are? And how would you even begin to describe why that might be useful?
What is a monad? The short answer: It's a specific way of chaining operations together.
In essence, you're writing execution steps and linking them together with the "bind function". (In Haskell, it's named >>=.) You can write the calls to the bind operator yourself, or you can use syntax sugar which makes the compiler insert those function calls for you. But either way, each step is separated by a call to this bind function.
So the bind function is like a semicolon; it separates the steps in a process. The bind function's job is to take the output from the previous step, and feed it into the next step.
That doesn't sound too hard, right? But there is more than one kind of monad. Why? How?
Well, the bind function can just take the result from one step, and feed it to the next step. But if that's "all" the monad does... that actually isn't very useful. And that's important to understand: Every useful monad does something else in addition to just being a monad. Every useful monad has a "special power", which makes it unique.
(A monad that does nothing special is called the "identity monad". Rather like the identity function, this sounds like an utterly pointless thing, yet turns out not to be... But that's another story™.)
Basically, each monad has its own implementation of the bind function. And you can write a bind function such that it does hoopy things between execution steps. For example:
If each step returns a success/failure indicator, you can have bind execute the next step only if the previous one succeeded. In this way, a failing step aborts the whole sequence "automatically", without any conditional testing from you. (The Failure Monad.)
Extending this idea, you can implement "exceptions". (The Error Monad or Exception Monad.) Because you're defining them yourself rather than it being a language feature, you can define how they work. (E.g., maybe you want to ignore the first two exceptions and only abort when a third exception is thrown.)
You can make each step return multiple results, and have the bind function loop over them, feeding each one into the next step for you. In this way, you don't have to keep writing loops all over the place when dealing with multiple results. The bind function "automatically" does all that for you. (The List Monad.)
As well as passing a "result" from one step to another, you can have the bind function pass extra data around as well. This data now doesn't show up in your source code, but you can still access it from anywhere, without having to manually pass it to every function. (The Reader Monad.)
You can make it so that the "extra data" can be replaced. This allows you to simulate destructive updates, without actually doing destructive updates. (The State Monad and its cousin the Writer Monad.)
Because you're only simulating destructive updates, you can trivially do things that would be impossible with real destructive updates. For example, you can undo the last update, or revert to an older version.
You can make a monad where calculations can be paused, so you can pause your program, go in and tinker with internal state data, and then resume it.
You can implement "continuations" as a monad. This allows you to break people's minds!
All of this and more is possible with monads. Of course, all of this is also perfectly possible without monads too. It's just drastically easier using monads.

Actually, contrary to common understanding of Monads, they have nothing to do with state. Monads are simply a way to wrapping things and provide methods to do operations on the wrapped stuff without unwrapping it.
For example, you can create a type to wrap another one, in Haskell:
data Wrapped a = Wrap a
To wrap stuff we define
return :: a -> Wrapped a
return x = Wrap x
To perform operations without unwrapping, say you have a function f :: a -> b, then you can do this to lift that function to act on wrapped values:
fmap :: (a -> b) -> (Wrapped a -> Wrapped b)
fmap f (Wrap x) = Wrap (f x)
That's about all there is to understand. However, it turns out that there is a more general function to do this lifting, which is bind:
bind :: (a -> Wrapped b) -> (Wrapped a -> Wrapped b)
bind f (Wrap x) = f x
bind can do a bit more than fmap, but not vice versa. Actually, fmap can be defined only in terms of bind and return. So, when defining a monad.. you give its type (here it was Wrapped a) and then say how its return and bind operations work.
The cool thing is that this turns out to be such a general pattern that it pops up all over the place, encapsulating state in a pure way is only one of them.
For a good article on how monads can be used to introduce functional dependencies and thus control order of evaluation, like it is used in Haskell's IO monad, check out IO Inside.
As for understanding monads, don't worry too much about it. Read about them what you find interesting and don't worry if you don't understand right away. Then just diving in a language like Haskell is the way to go. Monads are one of these things where understanding trickles into your brain by practice, one day you just suddenly realize you understand them.

But, You could have invented Monads!
sigfpe says:
But all of these introduce monads as something esoteric in need of explanation. But what I want to argue is that they aren't esoteric at all. In fact, faced with various problems in functional programming you would have been led, inexorably, to certain solutions, all of which are examples of monads. In fact, I hope to get you to invent them now if you haven't already. It's then a small step to notice that all of these solutions are in fact the same solution in disguise. And after reading this, you might be in a better position to understand other documents on monads because you'll recognise everything you see as something you've already invented.
Many of the problems that monads try to solve are related to the issue of side effects. So we'll start with them. (Note that monads let you do more than handle side-effects, in particular many types of container object can be viewed as monads. Some of the introductions to monads find it hard to reconcile these two different uses of monads and concentrate on just one or the other.)
In an imperative programming language such as C++, functions behave nothing like the functions of mathematics. For example, suppose we have a C++ function that takes a single floating point argument and returns a floating point result. Superficially it might seem a little like a mathematical function mapping reals to reals, but a C++ function can do more than just return a number that depends on its arguments. It can read and write the values of global variables as well as writing output to the screen and receiving input from the user. In a pure functional language, however, a function can only read what is supplied to it in its arguments and the only way it can have an effect on the world is through the values it returns.

A monad is a datatype that has two operations: >>= (aka bind) and return (aka unit). return takes an arbitrary value and creates an instance of the monad with it. >>= takes an instance of the monad and maps a function over it. (You can see already that a monad is a strange kind of datatype, since in most programming languages you couldn't write a function that takes an arbitrary value and creates a type from it. Monads use a kind of parametric polymorphism.)
In Haskell notation, the monad interface is written
class Monad m where
return :: a -> m a
(>>=) :: forall a b . m a -> (a -> m b) -> m b
These operations are supposed to obey certain "laws", but that's not terrifically important: the "laws" just codify the way sensible implementations of the operations ought to behave (basically, that >>= and return ought to agree about how values get transformed into monad instances and that >>= is associative).
Monads are not just about state and I/O: they abstract a common pattern of computation that includes working with state, I/O, exceptions, and non-determinism. Probably the simplest monads to understand are lists and option types:
instance Monad [ ] where
[] >>= k = []
(x:xs) >>= k = k x ++ (xs >>= k)
return x = [x]
instance Monad Maybe where
Just x >>= k = k x
Nothing >>= k = Nothing
return x = Just x
where [] and : are the list constructors, ++ is the concatenation operator, and Just and Nothing are the Maybe constructors. Both of these monads encapsulate common and useful patterns of computation on their respective data types (note that neither has anything to do with side effects or I/O).
You really have to play around writing some non-trivial Haskell code to appreciate what monads are about and why they are useful.

You should first understand what a functor is. Before that, understand higher-order functions.
A higher-order function is simply a function that takes a function as an argument.
A functor is any type construction T for which there exists a higher-order function, call it map, that transforms a function of type a -> b (given any two types a and b) into a function T a -> T b. This map function must also obey the laws of identity and composition such that the following expressions return true for all p and q (Haskell notation):
map id = id
map (p . q) = map p . map q
For example, a type constructor called List is a functor if it comes equipped with a function of type (a -> b) -> List a -> List b which obeys the laws above. The only practical implementation is obvious. The resulting List a -> List b function iterates over the given list, calling the (a -> b) function for each element, and returns the list of the results.
A monad is essentially just a functor T with two extra methods, join, of type T (T a) -> T a, and unit (sometimes called return, fork, or pure) of type a -> T a. For lists in Haskell:
join :: [[a]] -> [a]
pure :: a -> [a]
Why is that useful? Because you could, for example, map over a list with a function that returns a list. Join takes the resulting list of lists and concatenates them. List is a monad because this is possible.
You can write a function that does map, then join. This function is called bind, or flatMap, or (>>=), or (=<<). This is normally how a monad instance is given in Haskell.
A monad has to satisfy certain laws, namely that join must be associative. This means that if you have a value x of type [[[a]]] then join (join x) should equal join (map join x). And pure must be an identity for join such that join (pure x) == x.

[Disclaimer: I am still trying to fully grok monads. The following is just what I have understood so far. If it’s wrong, hopefully someone knowledgeable will call me on the carpet.]
Arnar wrote:
Monads are simply a way to wrapping things and provide methods to do operations on the wrapped stuff without unwrapping it.
That’s precisely it. The idea goes like this:
You take some kind of value and wrap it with some additional information. Just like the value is of a certain kind (eg. an integer or a string), so the additional information is of a certain kind.
E.g., that extra information might be a Maybe or an IO.
Then you have some operators that allow you to operate on the wrapped data while carrying along that additional information. These operators use the additional information to decide how to change the behaviour of the operation on the wrapped value.
E.g., a Maybe Int can be a Just Int or Nothing. Now, if you add a Maybe Int to a Maybe Int, the operator will check to see if they are both Just Ints inside, and if so, will unwrap the Ints, pass them the addition operator, re-wrap the resulting Int into a new Just Int (which is a valid Maybe Int), and thus return a Maybe Int. But if one of them was a Nothing inside, this operator will just immediately return Nothing, which again is a valid Maybe Int. That way, you can pretend that your Maybe Ints are just normal numbers and perform regular math on them. If you were to get a Nothing, your equations will still produce the right result – without you having to litter checks for Nothing everywhere.
But the example is just what happens for Maybe. If the extra information was an IO, then that special operator defined for IOs would be called instead, and it could do something totally different before performing the addition. (OK, adding two IO Ints together is probably nonsensical – I’m not sure yet.) (Also, if you paid attention to the Maybe example, you have noticed that “wrapping a value with extra stuff” is not always correct. But it’s hard to be exact, correct and precise without being inscrutable.)
Basically, “monad” roughly means “pattern”. But instead of a book full of informally explained and specifically named Patterns, you now have a language construct – syntax and all – that allows you to declare new patterns as things in your program. (The imprecision here is all the patterns have to follow a particular form, so a monad is not quite as generic as a pattern. But I think that’s the closest term that most people know and understand.)
And that is why people find monads so confusing: because they are such a generic concept. To ask what makes something a monad is similarly vague as to ask what makes something a pattern.
But think of the implications of having syntactic support in the language for the idea of a pattern: instead of having to read the Gang of Four book and memorise the construction of a particular pattern, you just write code that implements this pattern in an agnostic, generic way once and then you are done! You can then reuse this pattern, like Visitor or Strategy or Façade or whatever, just by decorating the operations in your code with it, without having to re-implement it over and over!
So that is why people who understand monads find them so useful: it’s not some ivory tower concept that intellectual snobs pride themselves on understanding (OK, that too of course, teehee), but actually makes code simpler.

After much striving, I think I finally understand the monad. After rereading my own lengthy critique of the overwhelmingly top voted answer, I will offer this explanation.
There are three questions that need to be answered to understand monads:
Why do you need a monad?
What is a monad?
How is a monad implemented?
As I noted in my original comments, too many monad explanations get caught up in question number 3, without, and before really adequately covering question 2, or question 1.
Why do you need a monad?
Pure functional languages like Haskell are different from imperative languages like C, or Java in that, a pure functional program is not necessarily executed in a specific order, one step at a time. A Haskell program is more akin to a mathematical function, in which you may solve the "equation" in any number of potential orders. This confers a number of benefits, among which is that it eliminates the possibility of certain kinds of bugs, particularly those relating to things like "state".
However, there are certain problems that are not so straightforward to solve with this style of programming. Some things, like console programming, and file i/o, need things to happen in a particular order, or need to maintain state. One way to deal with this problem is to create a kind of object that represents the state of a computation, and a series of functions that take a state object as input, and return a new modified state object.
So let's create a hypothetical "state" value, that represents the state of a console screen. exactly how this value is constructed is not important, but let's say it's an array of byte length ascii characters that represents what is currently visible on the screen, and an array that represents the last line of input entered by the user, in pseudocode. We've defined some functions that take console state, modify it, and return a new console state.
consolestate MyConsole = new consolestate;
So to do console programming, but in a pure functional manner, you would need to nest a lot of function calls inside eachother.
consolestate FinalConsole = print(input(print(myconsole, "Hello, what's your name?")),"hello, %inputbuffer%!");
Programming in this way keeps the "pure" functional style, while forcing changes to the console to happen in a particular order. But, we'll probably want to do more than just a few operations at a time like in the above example. Nesting functions in that way will start to become ungainly. What we want, is code that does essentially the same thing as above, but is written a bit more like this:
consolestate FinalConsole = myconsole:
print("Hello, what's your name?"):
input():
print("hello, %inputbuffer%!");
This would indeed be a more convenient way to write it. How do we do that though?
What is a monad?
Once you have a type (such as consolestate) that you define along with a bunch of functions designed specifically to operate on that type, you can turn the whole package of these things into a "monad" by defining an operator like : (bind) that automatically feeds return values on its left, into function parameters on its right, and a lift operator that turns normal functions, into functions that work with that specific kind of bind operator.
How is a monad implemented?
See other answers, that seem quite free to jump into the details of that.

After giving an answer to this question a few years ago, I believe I can improve and simplify that response with...
A monad is a function composition technique that externalizes treatment for some input scenarios using a composing function, bind, to pre-process input during composition.
In normal composition, the function, compose (>>), is use to apply the composed function to the result of its predecessor in sequence. Importantly, the function being composed is required to handle all scenarios of its input.
(x -> y) >> (y -> z)
This design can be improved by restructuring the input so that relevant states are more easily interrogated. So, instead of simply y the value can become Mb such as, for instance, (is_OK, b) if y included a notion of validity.
For example, when the input is only possibly a number, instead of returning a string which can dutifully contain a number or not, you could restructure the type into a bool indicating the presence of a valid number and a number in tuple such as, bool * float. The composed functions would now no longer need to parse an input string to determine whether a number exists but could merely inspect the bool portion of a tuple.
(Ma -> Mb) >> (Mb -> Mc)
Here, again, composition occurs naturally with compose and so each function must handle all scenarios of its input individually, though doing so is now much easier.
However, what if we could externalize the effort of interrogation for those times where handling a scenario is routine. For example, what if our program does nothing when the input is not OK as in when is_OK is false. If that were done then composed functions would not need to handle that scenario themselves, dramatically simplifying their code and effecting another level of reuse.
To achieve this externalization we could use a function, bind (>>=), to perform the composition instead of compose. As such, instead of simply transferring values from the output of one function to the input of another Bind would inspect the M portion of Ma and decide whether and how to apply the composed function to the a. Of course, the function bind would be defined specifically for our particular M so as to be able to inspect its structure and perform whatever type of application we want. Nonetheless, the a can be anything since bind merely passes the a uninspected to the the composed function when it determines application necessary. Additionally, the composed functions themselves no longer need to deal with the M portion of the input structure either, simplifying them. Hence...
(a -> Mb) >>= (b -> Mc) or more succinctly Mb >>= (b -> Mc)
In short, a monad externalizes and thereby provides standard behaviour around the treatment of certain input scenarios once the input becomes designed to sufficiently expose them. This design is a shell and content model where the shell contains data relevant to the application of the composed function and is interrogated by and remains only available to the bind function.
Therefore, a monad is three things:
an M shell for holding monad relevant information,
a bind function implemented to make use of this shell information in its application of the composed functions to the content value(s) it finds within the shell, and
composable functions of the form, a -> Mb, producing results that include monadic management data.
Generally speaking, the input to a function is far more restrictive than its output which may include such things as error conditions; hence, the Mb result structure is generally very useful. For instance, the division operator does not return a number when the divisor is 0.
Additionally, monads may include wrap functions that wrap values, a, into the monadic type, Ma, and general functions, a -> b, into monadic functions, a -> Mb, by wrapping their results after application. Of course, like bind, such wrap functions are specific to M. An example:
let return a = [a]
let lift f a = return (f a)
The design of the bind function presumes immutable data structures and pure functions others things get complex and guarantees cannot be made. As such, there are monadic laws:
Given...
M_
return = (a -> Ma)
f = (a -> Mb)
g = (b -> Mc)
Then...
Left Identity : (return a) >>= f === f a
Right Identity : Ma >>= return === Ma
Associative : Ma >>= (f >>= g) === Ma >>= ((fun x -> f x) >>= g)
Associativity means that bind preserves the order of evaluation regardless of when bind is applied. That is, in the definition of Associativity above, the force early evaluation of the parenthesized binding of f and g will only result in a function that expects Ma in order to complete the bind. Hence the evaluation of Ma must be determined before its value can become applied to f and that result in turn applied to g.

A monad is, effectively, a form of "type operator". It will do three things. First it will "wrap" (or otherwise convert) a value of one type into another type (typically called a "monadic type"). Secondly it will make all the operations (or functions) available on the underlying type available on the monadic type. Finally it will provide support for combining its self with another monad to produce a composite monad.
The "maybe monad" is essentially the equivalent of "nullable types" in Visual Basic / C#. It takes a non nullable type "T" and converts it into a "Nullable<T>", and then defines what all the binary operators mean on a Nullable<T>.
Side effects are represented simillarly. A structure is created that holds descriptions of side effects alongside a function's return value. The "lifted" operations then copy around side effects as values are passed between functions.
They are called "monads" rather than the easier-to-grasp name of "type operators" for several reasons:
Monads have restrictions on what they can do (see the definiton for details).
Those restrictions, along with the fact that there are three operations involved, conform to the structure of something called a monad in Category Theory, which is an obscure branch of mathematics.
They were designed by proponents of "pure" functional languages
Proponents of pure functional languages like obscure branches of mathematics
Because the math is obscure, and monads are associated with particular styles of programming, people tend to use the word monad as a sort of secret handshake. Because of this no one has bothered to invest in a better name.

(See also the answers at What is a monad?)
A good motivation to Monads is sigfpe (Dan Piponi)'s You Could Have Invented Monads! (And Maybe You Already Have). There are a LOT of other monad tutorials, many of which misguidedly try to explain monads in "simple terms" using various analogies: this is the monad tutorial fallacy; avoid them.
As DR MacIver says in Tell us why your language sucks:
So, things I hate about Haskell:
Let’s start with the obvious. Monad tutorials. No, not monads. Specifically the tutorials. They’re endless, overblown and dear god are they tedious. Further, I’ve never seen any convincing evidence that they actually help. Read the class definition, write some code, get over the scary name.
You say you understand the Maybe monad? Good, you're on your way. Just start using other monads and sooner or later you'll understand what monads are in general.
[If you are mathematically oriented, you might want to ignore the dozens of tutorials and learn the definition, or follow lectures in category theory :)
The main part of the definition is that a Monad M involves a "type constructor" that defines for each existing type "T" a new type "M T", and some ways for going back and forth between "regular" types and "M" types.]
Also, surprisingly enough, one of the best introductions to monads is actually one of the early academic papers introducing monads, Philip Wadler's Monads for functional programming. It actually has practical, non-trivial motivating examples, unlike many of the artificial tutorials out there.

Monads are to control flow what abstract data types are to data.
In other words, many developers are comfortable with the idea of Sets, Lists, Dictionaries (or Hashes, or Maps), and Trees. Within those data types there are many special cases (for instance InsertionOrderPreservingIdentityHashMap).
However, when confronted with program "flow" many developers haven't been exposed to many more constructs than if, switch/case, do, while, goto (grr), and (maybe) closures.
So, a monad is simply a control flow construct. A better phrase to replace monad would be 'control type'.
As such, a monad has slots for control logic, or statements, or functions - the equivalent in data structures would be to say that some data structures allow you to add data, and remove it.
For example, the "if" monad:
if( clause ) then block
at its simplest has two slots - a clause, and a block. The if monad is usually built to evaluate the result of the clause, and if not false, evaluate the block. Many developers are not introduced to monads when they learn 'if', and it just isn't necessary to understand monads to write effective logic.
Monads can become more complicated, in the same way that data structures can become more complicated, but there are many broad categories of monad that may have similar semantics, but differing implementations and syntax.
Of course, in the same way that data structures may be iterated over, or traversed, monads may be evaluated.
Compilers may or may not have support for user-defined monads. Haskell certainly does. Ioke has some similar capabilities, although the term monad is not used in the language.

My favorite Monad tutorial:
http://www.haskell.org/haskellwiki/All_About_Monads
(out of 170,000 hits on a Google search for "monad tutorial"!)
#Stu: The point of monads is to allow you to add (usually) sequential semantics to otherwise pure code; you can even compose monads (using Monad Transformers) and get more interesting and complicated combined semantics, like parsing with error handling, shared state, and logging, for example. All of this is possible in pure code, monads just allow you to abstract it away and reuse it in modular libraries (always good in programming), as well as providing convenient syntax to make it look imperative.
Haskell already has operator overloading[1]: it uses type classes much the way one might use interfaces in Java or C# but Haskell just happens to also allow non-alphanumeric tokens like + && and > as infix identifiers. It's only operator overloading in your way of looking at it if you mean "overloading the semicolon" [2]. It sounds like black magic and asking for trouble to "overload the semicolon" (picture enterprising Perl hackers getting wind of this idea) but the point is that without monads there is no semicolon, since purely functional code does not require or allow explicit sequencing.
This all sounds much more complicated than it needs to. sigfpe's article is pretty cool but uses Haskell to explain it, which sort of fails to break the chicken and egg problem of understanding Haskell to grok Monads and understanding Monads to grok Haskell.
[1] This is a separate issue from monads but monads use Haskell's operator overloading feature.
[2] This is also an oversimplification since the operator for chaining monadic actions is >>= (pronounced "bind") but there is syntactic sugar ("do") that lets you use braces and semicolons and/or indentation and newlines.

I am still new to monads, but I thought I would share a link I found that felt really good to read (WITH PICTURES!!):
http://www.matusiak.eu/numerodix/blog/2012/3/11/monads-for-the-layman/
(no affiliation)
Basically, the warm and fuzzy concept that I got from the article was the concept that monads are basically adapters that allow disparate functions to work in a composable fashion, i.e. be able to string up multiple functions and mix and match them without worrying about inconsistent return types and such. So the BIND function is in charge of keeping apples with apples and oranges with oranges when we're trying to make these adapters. And the LIFT function is in charge of taking "lower level" functions and "upgrading" them to work with BIND functions and be composable as well.
I hope I got it right, and more importantly, hope that the article has a valid view on monads. If nothing else, this article helped whet my appetite for learning more about monads.

I've been thinking of Monads in a different way, lately. I've been thinking of them as abstracting out execution order in a mathematical way, which makes new kinds of polymorphism possible.
If you're using an imperative language, and you write some expressions in order, the code ALWAYS runs exactly in that order.
And in the simple case, when you use a monad, it feels the same -- you define a list of expressions that happen in order. Except that, depending on which monad you use, your code might run in order (like in IO monad), in parallel over several items at once (like in the List monad), it might halt partway through (like in the Maybe monad), it might pause partway through to be resumed later (like in a Resumption monad), it might rewind and start from the beginning (like in a Transaction monad), or it might rewind partway to try other options (like in a Logic monad).
And because monads are polymorphic, it's possible to run the same code in different monads, depending on your needs.
Plus, in some cases, it's possible to combine monads together (with monad transformers) to get multiple features at the same time.

tl;dr
{-# LANGUAGE InstanceSigs #-}
newtype Id t = Id t
instance Monad Id where
return :: t -> Id t
return = Id
(=<<) :: (a -> Id b) -> Id a -> Id b
f =<< (Id x) = f x
Prologue
The application operator $ of functions
forall a b. a -> b
is canonically defined
($) :: (a -> b) -> a -> b
f $ x = f x
infixr 0 $
in terms of Haskell-primitive function application f x (infixl 10).
Composition . is defined in terms of $ as
(.) :: (b -> c) -> (a -> b) -> (a -> c)
f . g = \ x -> f $ g x
infixr 9 .
and satisfies the equivalences forall f g h.
f . id = f :: c -> d Right identity
id . g = g :: b -> c Left identity
(f . g) . h = f . (g . h) :: a -> d Associativity
. is associative, and id is its right and left identity.
The Kleisli triple
In programming, a monad is a functor type constructor with an instance of the monad type class. There are several equivalent variants of definition and implementation, each carrying slightly different intuitions about the monad abstraction.
A functor is a type constructor f of kind * -> * with an instance of the functor type class.
{-# LANGUAGE KindSignatures #-}
class Functor (f :: * -> *) where
map :: (a -> b) -> (f a -> f b)
In addition to following statically enforced type protocol, instances of the functor type class must obey the algebraic functor laws forall f g.
map id = id :: f t -> f t Identity
map f . map g = map (f . g) :: f a -> f c Composition / short cut fusion
Functor computations have the type
forall f t. Functor f => f t
A computation c r consists in results r within context c.
Unary monadic functions or Kleisli arrows have the type
forall m a b. Functor m => a -> m b
Kleisi arrows are functions that take one argument a and return a monadic computation m b.
Monads are canonically defined in terms of the Kleisli triple forall m. Functor m =>
(m, return, (=<<))
implemented as the type class
class Functor m => Monad m where
return :: t -> m t
(=<<) :: (a -> m b) -> m a -> m b
infixr 1 =<<
The Kleisli identity return is a Kleisli arrow that promotes a value t into monadic context m. Extension or Kleisli application =<< applies a Kleisli arrow a -> m b to results of a computation m a.
Kleisli composition <=< is defined in terms of extension as
(<=<) :: Monad m => (b -> m c) -> (a -> m b) -> (a -> m c)
f <=< g = \ x -> f =<< g x
infixr 1 <=<
<=< composes two Kleisli arrows, applying the left arrow to results of the right arrow’s application.
Instances of the monad type class must obey the monad laws, most elegantly stated in terms of Kleisli composition: forall f g h.
f <=< return = f :: c -> m d Right identity
return <=< g = g :: b -> m c Left identity
(f <=< g) <=< h = f <=< (g <=< h) :: a -> m d Associativity
<=< is associative, and return is its right and left identity.
Identity
The identity type
type Id t = t
is the identity function on types
Id :: * -> *
Interpreted as a functor,
return :: t -> Id t
= id :: t -> t
(=<<) :: (a -> Id b) -> Id a -> Id b
= ($) :: (a -> b) -> a -> b
(<=<) :: (b -> Id c) -> (a -> Id b) -> (a -> Id c)
= (.) :: (b -> c) -> (a -> b) -> (a -> c)
In canonical Haskell, the identity monad is defined
newtype Id t = Id t
instance Functor Id where
map :: (a -> b) -> Id a -> Id b
map f (Id x) = Id (f x)
instance Monad Id where
return :: t -> Id t
return = Id
(=<<) :: (a -> Id b) -> Id a -> Id b
f =<< (Id x) = f x
Option
An option type
data Maybe t = Nothing | Just t
encodes computation Maybe t that not necessarily yields a result t, computation that may “fail”. The option monad is defined
instance Functor Maybe where
map :: (a -> b) -> (Maybe a -> Maybe b)
map f (Just x) = Just (f x)
map _ Nothing = Nothing
instance Monad Maybe where
return :: t -> Maybe t
return = Just
(=<<) :: (a -> Maybe b) -> Maybe a -> Maybe b
f =<< (Just x) = f x
_ =<< Nothing = Nothing
a -> Maybe b is applied to a result only if Maybe a yields a result.
newtype Nat = Nat Int
The natural numbers can be encoded as those integers greater than or equal to zero.
toNat :: Int -> Maybe Nat
toNat i | i >= 0 = Just (Nat i)
| otherwise = Nothing
The natural numbers are not closed under subtraction.
(-?) :: Nat -> Nat -> Maybe Nat
(Nat n) -? (Nat m) = toNat (n - m)
infixl 6 -?
The option monad covers a basic form of exception handling.
(-? 20) <=< toNat :: Int -> Maybe Nat
List
The list monad, over the list type
data [] t = [] | t : [t]
infixr 5 :
and its additive monoid operation “append”
(++) :: [t] -> [t] -> [t]
(x : xs) ++ ys = x : xs ++ ys
[] ++ ys = ys
infixr 5 ++
encodes nonlinear computation [t] yielding a natural amount 0, 1, ... of results t.
instance Functor [] where
map :: (a -> b) -> ([a] -> [b])
map f (x : xs) = f x : map f xs
map _ [] = []
instance Monad [] where
return :: t -> [t]
return = (: [])
(=<<) :: (a -> [b]) -> [a] -> [b]
f =<< (x : xs) = f x ++ (f =<< xs)
_ =<< [] = []
Extension =<< concatenates ++ all lists [b] resulting from applications f x of a Kleisli arrow a -> [b] to elements of [a] into a single result list [b].
Let the proper divisors of a positive integer n be
divisors :: Integral t => t -> [t]
divisors n = filter (`divides` n) [2 .. n - 1]
divides :: Integral t => t -> t -> Bool
(`divides` n) = (== 0) . (n `rem`)
then
forall n. let { f = f <=< divisors } in f n = []
In defining the monad type class, instead of extension =<<, the Haskell standard uses its flip, the bind operator >>=.
class Applicative m => Monad m where
(>>=) :: forall a b. m a -> (a -> m b) -> m b
(>>) :: forall a b. m a -> m b -> m b
m >> k = m >>= \ _ -> k
{-# INLINE (>>) #-}
return :: a -> m a
return = pure
For simplicity's sake, this explanation uses the type class hierarchy
class Functor f
class Functor m => Monad m
In Haskell, the current standard hierarchy is
class Functor f
class Functor p => Applicative p
class Applicative m => Monad m
because not only is every monad a functor, but every applicative is a functor and every monad is an applicative, too.
Using the list monad, the imperative pseudocode
for a in (1, ..., 10)
for b in (1, ..., 10)
p <- a * b
if even(p)
yield p
roughly translates to the do block,
do a <- [1 .. 10]
b <- [1 .. 10]
let p = a * b
guard (even p)
return p
the equivalent monad comprehension,
[ p | a <- [1 .. 10], b <- [1 .. 10], let p = a * b, even p ]
and the expression
[1 .. 10] >>= (\ a ->
[1 .. 10] >>= (\ b ->
let p = a * b in
guard (even p) >> -- [ () | even p ] >>
return p
)
)
Do notation and monad comprehensions are syntactic sugar for nested bind expressions. The bind operator is used for local name binding of monadic results.
let x = v in e = (\ x -> e) $ v = v & (\ x -> e)
do { r <- m; c } = (\ r -> c) =<< m = m >>= (\ r -> c)
where
(&) :: a -> (a -> b) -> b
(&) = flip ($)
infixl 0 &
The guard function is defined
guard :: Additive m => Bool -> m ()
guard True = return ()
guard False = fail
where the unit type or “empty tuple”
data () = ()
Additive monads that support choice and failure can be abstracted over using a type class
class Monad m => Additive m where
fail :: m t
(<|>) :: m t -> m t -> m t
infixl 3 <|>
instance Additive Maybe where
fail = Nothing
Nothing <|> m = m
m <|> _ = m
instance Additive [] where
fail = []
(<|>) = (++)
where fail and <|> form a monoid forall k l m.
k <|> fail = k
fail <|> l = l
(k <|> l) <|> m = k <|> (l <|> m)
and fail is the absorbing/annihilating zero element of additive monads
_ =<< fail = fail
If in
guard (even p) >> return p
even p is true, then the guard produces [()], and, by the definition of >>, the local constant function
\ _ -> return p
is applied to the result (). If false, then the guard produces the list monad’s fail ( [] ), which yields no result for a Kleisli arrow to be applied >> to, so this p is skipped over.
State
Infamously, monads are used to encode stateful computation.
A state processor is a function
forall st t. st -> (t, st)
that transitions a state st and yields a result t. The state st can be anything. Nothing, flag, count, array, handle, machine, world.
The type of state processors is usually called
type State st t = st -> (t, st)
The state processor monad is the kinded * -> * functor State st. Kleisli arrows of the state processor monad are functions
forall st a b. a -> (State st) b
In canonical Haskell, the lazy version of the state processor monad is defined
newtype State st t = State { stateProc :: st -> (t, st) }
instance Functor (State st) where
map :: (a -> b) -> ((State st) a -> (State st) b)
map f (State p) = State $ \ s0 -> let (x, s1) = p s0
in (f x, s1)
instance Monad (State st) where
return :: t -> (State st) t
return x = State $ \ s -> (x, s)
(=<<) :: (a -> (State st) b) -> (State st) a -> (State st) b
f =<< (State p) = State $ \ s0 -> let (x, s1) = p s0
in stateProc (f x) s1
A state processor is run by supplying an initial state:
run :: State st t -> st -> (t, st)
run = stateProc
eval :: State st t -> st -> t
eval = fst . run
exec :: State st t -> st -> st
exec = snd . run
State access is provided by primitives get and put, methods of abstraction over stateful monads:
{-# LANGUAGE MultiParamTypeClasses, FunctionalDependencies #-}
class Monad m => Stateful m st | m -> st where
get :: m st
put :: st -> m ()
m -> st declares a functional dependency of the state type st on the monad m; that a State t, for example, will determine the state type to be t uniquely.
instance Stateful (State st) st where
get :: State st st
get = State $ \ s -> (s, s)
put :: st -> State st ()
put s = State $ \ _ -> ((), s)
with the unit type used analogously to void in C.
modify :: Stateful m st => (st -> st) -> m ()
modify f = do
s <- get
put (f s)
gets :: Stateful m st => (st -> t) -> m t
gets f = do
s <- get
return (f s)
gets is often used with record field accessors.
The state monad equivalent of the variable threading
let s0 = 34
s1 = (+ 1) s0
n = (* 12) s1
s2 = (+ 7) s1
in (show n, s2)
where s0 :: Int, is the equally referentially transparent, but infinitely more elegant and practical
(flip run) 34
(do
modify (+ 1)
n <- gets (* 12)
modify (+ 7)
return (show n)
)
modify (+ 1) is a computation of type State Int (), except for its effect equivalent to return ().
(flip run) 34
(modify (+ 1) >>
gets (* 12) >>= (\ n ->
modify (+ 7) >>
return (show n)
)
)
The monad law of associativity can be written in terms of >>= forall m f g.
(m >>= f) >>= g = m >>= (\ x -> f x >>= g)
or
do { do { do {
r1 <- do { x <- m; r0 <- m;
r0 <- m; = do { = r1 <- f r0;
f r0 r1 <- f x; g r1
}; g r1 }
g r1 }
} }
Like in expression-oriented programming (e.g. Rust), the last statement of a block represents its yield. The bind operator is sometimes called a “programmable semicolon”.
Iteration control structure primitives from structured imperative programming are emulated monadically
for :: Monad m => (a -> m b) -> [a] -> m ()
for f = foldr ((>>) . f) (return ())
while :: Monad m => m Bool -> m t -> m ()
while c m = do
b <- c
if b then m >> while c m
else return ()
forever :: Monad m => m t
forever m = m >> forever m
Input/Output
data World
The I/O world state processor monad is a reconciliation of pure Haskell and the real world, of functional denotative and imperative operational semantics. A close analogue of the actual strict implementation:
type IO t = World -> (t, World)
Interaction is facilitated by impure primitives
getChar :: IO Char
putChar :: Char -> IO ()
readFile :: FilePath -> IO String
writeFile :: FilePath -> String -> IO ()
hSetBuffering :: Handle -> BufferMode -> IO ()
hTell :: Handle -> IO Integer
. . . . . .
The impurity of code that uses IO primitives is permanently protocolized by the type system. Because purity is awesome, what happens in IO, stays in IO.
unsafePerformIO :: IO t -> t
Or, at least, should.
The type signature of a Haskell program
main :: IO ()
main = putStrLn "Hello, World!"
expands to
World -> ((), World)
A function that transforms a world.
Epilogue
The category whiches objects are Haskell types and whiches morphisms are functions between Haskell types is, “fast and loose”, the category Hask.
A functor T is a mapping from a category C to a category D; for each object in C an object in D
Tobj : Obj(C) -> Obj(D)
f :: * -> *
and for each morphism in C a morphism in D
Tmor : HomC(X, Y) -> HomD(Tobj(X), Tobj(Y))
map :: (a -> b) -> (f a -> f b)
where X, Y are objects in C. HomC(X, Y) is the homomorphism class of all morphisms X -> Y in C. The functor must preserve morphism identity and composition, the “structure” of C, in D.
Tmor Tobj
T(id) = id : T(X) -> T(X) Identity
T(f) . T(g) = T(f . g) : T(X) -> T(Z) Composition
The Kleisli category of a category C is given by a Kleisli triple
<T, eta, _*>
of an endofunctor
T : C -> C
(f), an identity morphism eta (return), and an extension operator * (=<<).
Each Kleisli morphism in Hask
f : X -> T(Y)
f :: a -> m b
by the extension operator
(_)* : Hom(X, T(Y)) -> Hom(T(X), T(Y))
(=<<) :: (a -> m b) -> (m a -> m b)
is given a morphism in Hask’s Kleisli category
f* : T(X) -> T(Y)
(f =<<) :: m a -> m b
Composition in the Kleisli category .T is given in terms of extension
f .T g = f* . g : X -> T(Z)
f <=< g = (f =<<) . g :: a -> m c
and satisfies the category axioms
eta .T g = g : Y -> T(Z) Left identity
return <=< g = g :: b -> m c
f .T eta = f : Z -> T(U) Right identity
f <=< return = f :: c -> m d
(f .T g) .T h = f .T (g .T h) : X -> T(U) Associativity
(f <=< g) <=< h = f <=< (g <=< h) :: a -> m d
which, applying the equivalence transformations
eta .T g = g
eta* . g = g By definition of .T
eta* . g = id . g forall f. id . f = f
eta* = id forall f g h. f . h = g . h ==> f = g
(f .T g) .T h = f .T (g .T h)
(f* . g)* . h = f* . (g* . h) By definition of .T
(f* . g)* . h = f* . g* . h . is associative
(f* . g)* = f* . g* forall f g h. f . h = g . h ==> f = g
in terms of extension are canonically given
eta* = id : T(X) -> T(X) Left identity
(return =<<) = id :: m t -> m t
f* . eta = f : Z -> T(U) Right identity
(f =<<) . return = f :: c -> m d
(f* . g)* = f* . g* : T(X) -> T(Z) Associativity
(((f =<<) . g) =<<) = (f =<<) . (g =<<) :: m a -> m c
Monads can also be defined in terms not of Kleislian extension, but a natural transformation mu, in programming called join. A monad is defined in terms of mu as a triple over a category C, of an endofunctor
T : C -> C
f :: * -> *
and two natural tranformations
eta : Id -> T
return :: t -> f t
mu : T . T -> T
join :: f (f t) -> f t
satisfying the equivalences
mu . T(mu) = mu . mu : T . T . T -> T . T Associativity
join . map join = join . join :: f (f (f t)) -> f t
mu . T(eta) = mu . eta = id : T -> T Identity
join . map return = join . return = id :: f t -> f t
The monad type class is then defined
class Functor m => Monad m where
return :: t -> m t
join :: m (m t) -> m t
The canonical mu implementation of the option monad:
instance Monad Maybe where
return = Just
join (Just m) = m
join Nothing = Nothing
The concat function
concat :: [[a]] -> [a]
concat (x : xs) = x ++ concat xs
concat [] = []
is the join of the list monad.
instance Monad [] where
return :: t -> [t]
return = (: [])
(=<<) :: (a -> [b]) -> ([a] -> [b])
(f =<<) = concat . map f
Implementations of join can be translated from extension form using the equivalence
mu = id* : T . T -> T
join = (id =<<) :: m (m t) -> m t
The reverse translation from mu to extension form is given by
f* = mu . T(f) : T(X) -> T(Y)
(f =<<) = join . map f :: m a -> m b
Philip Wadler: Monads for functional programming
Simon L Peyton Jones, Philip Wadler: Imperative functional programming
Jonathan M. D. Hill, Keith Clarke: An introduction to category theory, category theory monads, and their relationship to functional programming
´
Kleisli category
Eugenio Moggi: Notions of computation and monads
What a monad is not
But why should a theory so abstract be of any use for programming?
The answer is simple: as computer scientists, we value abstraction! When we design the interface to a software component, we want it to reveal as little as possible about the implementation. We want to be able to replace the implementation with many alternatives, many other ‘instances’ of the same ‘concept’. When we design a generic interface to many program libraries, it is even more important that the interface we choose have a variety of implementations. It is the generality of the monad concept which we value so highly, it is because category theory is so abstract that its concepts are so useful for programming.
It is hardly suprising, then, that the generalisation of monads that we present below also has a close connection to category theory. But we stress that our purpose is very practical: it is not to ‘implement category theory’, it is to find a more general way to structure combinator libraries. It is simply our good fortune that mathematicians have already done much of the work for us!
from Generalising Monads to Arrows by John Hughes

Monads Are Not Metaphors, but a practically useful abstraction emerging from a common pattern, as Daniel Spiewak explains.

In addition to the excellent answers above, let me offer you a link to the following article (by Patrick Thomson) which explains monads by relating the concept to the JavaScript library jQuery (and its way of using "method chaining" to manipulate the DOM):
jQuery is a Monad
The jQuery documentation itself doesn't refer to the term "monad" but talks about the "builder pattern" which is probably more familiar. This doesn't change the fact that you have a proper monad there maybe without even realizing it.

A monad is a way of combining computations together that share a common context. It is like building a network of pipes. When constructing the network, there is no data flowing through it. But when I have finished piecing all the bits together with 'bind' and 'return' then I invoke something like runMyMonad monad data and the data flows through the pipes.

In practice, monad is a custom implementation of function composition operator that takes care of side effects and incompatible input and return values (for chaining).

The two things that helped me best when learning about there were:
Chapter 8, "Functional Parsers," from Graham Hutton's book Programming in Haskell. This doesn't mention monads at all, actually, but if you can work through chapter and really understand everything in it, particularly how a sequence of bind operations is evaluated, you'll understand the internals of monads. Expect this to take several tries.
The tutorial All About Monads. This gives several good examples of their use, and I have to say that the analogy in Appendex I worked for me.

Monoid appears to be something that ensures that all operations defined on a Monoid and a supported type will always return a supported type inside the Monoid. Eg, Any number + Any number = A number, no errors.
Whereas division accepts two fractionals, and returns a fractional, which defined division by zero as Infinity in haskell somewhy(which happens to be a fractional somewhy)...
In any case, it appears Monads are just a way to ensure that your chain of operations behaves in a predictable way, and a function that claims to be Num -> Num, composed with another function of Num->Num called with x does not say, fire the missiles.
On the other hand, if we have a function which does fire the missiles, we can compose it with other functions which also fire the missiles, because our intent is clear -- we want to fire the missiles -- but it won't try printing "Hello World" for some odd reason.
In Haskell, main is of type IO (), or IO [()], the distiction is strange and I will not discuss it but here's what I think happens:
If I have main, I want it to do a chain of actions, the reason I run the program is to produce an effect -- usually though IO. Thus I can chain IO operations together in main in order to -- do IO, nothing else.
If I try to do something which does not "return IO", the program will complain that the chain does not flow, or basically "How does this relate to what we are trying to do -- an IO action", it appears to force the programmer to keep their train of thought, without straying off and thinking about firing the missiles, while creating algorithms for sorting -- which does not flow.
Basically, Monads appear to be a tip to the compiler that "hey, you know this function that returns a number here, it doesn't actually always work, it can sometimes produce a Number, and sometimes Nothing at all, just keep this in mind". Knowing this, if you try to assert a monadic action, the monadic action may act as a compile time exception saying "hey, this isn't actually a number, this CAN be a number, but you can't assume this, do something to ensure that the flow is acceptable." which prevents unpredictable program behavior -- to a fair extent.
It appears monads are not about purity, nor control, but about maintaining an identity of a category on which all behavior is predictable and defined, or does not compile. You cannot do nothing when you are expected to do something, and you cannot do something if you are expected to do nothing (visible).
The biggest reason I could think of for Monads is -- go look at Procedural/OOP code, and you will notice that you do not know where the program starts, nor ends, all you see is a lot of jumping and a lot of math,magic,and missiles. You will not be able to maintain it, and if you can, you will spend quite a lot of time wrapping your mind around the whole program before you can understand any part of it, because modularity in this context is based on interdependant "sections" of code, where code is optimized to be as related as possible for promise of efficiency/inter-relation. Monads are very concrete, and well defined by definition, and ensure that the flow of program is possible to analyze, and isolate parts which are hard to analyze -- as they themselves are monads. A monad appears to be a "comprehensible unit which is predictable upon its full understanding" -- If you understand "Maybe" monad, there's no possible way it will do anything except be "Maybe", which appears trivial, but in most non monadic code, a simple function "helloworld" can fire the missiles, do nothing, or destroy the universe or even distort time -- we have no idea nor have any guarantees that IT IS WHAT IT IS. A monad GUARANTEES that IT IS WHAT IT IS. which is very powerful.
All things in "real world" appear to be monads, in the sense that it is bound by definite observable laws preventing confusion. This does not mean we have to mimic all the operations of this object to create classes, instead we can simply say "a square is a square", nothing but a square, not even a rectangle nor a circle, and "a square has area of the length of one of it's existing dimensions multiplied by itself. No matter what square you have, if it's a square in 2D space, it's area absolutely cannot be anything but its length squared, it's almost trivial to prove. This is very powerful because we do not need to make assertions to make sure that our world is the way it is, we just use implications of reality to prevent our programs from falling off track.
Im pretty much guaranteed to be wrong but I think this could help somebody out there, so hopefully it helps somebody.

In the context of Scala you will find the following to be the simplest definition. Basically flatMap (or bind) is 'associative' and there exists an identity.
trait M[+A] {
def flatMap[B](f: A => M[B]): M[B] // AKA bind
// Pseudo Meta Code
def isValidMonad: Boolean = {
// for every parameter the following holds
def isAssociativeOn[X, Y, Z](x: M[X], f: X => M[Y], g: Y => M[Z]): Boolean =
x.flatMap(f).flatMap(g) == x.flatMap(f(_).flatMap(g))
// for every parameter X and x, there exists an id
// such that the following holds
def isAnIdentity[X](x: M[X], id: X => M[X]): Boolean =
x.flatMap(id) == x
}
}
E.g.
// These could be any functions
val f: Int => Option[String] = number => if (number == 7) Some("hello") else None
val g: String => Option[Double] = string => Some(3.14)
// Observe these are identical. Since Option is a Monad
// they will always be identical no matter what the functions are
scala> Some(7).flatMap(f).flatMap(g)
res211: Option[Double] = Some(3.14)
scala> Some(7).flatMap(f(_).flatMap(g))
res212: Option[Double] = Some(3.14)
// As Option is a Monad, there exists an identity:
val id: Int => Option[Int] = x => Some(x)
// Observe these are identical
scala> Some(7).flatMap(id)
res213: Option[Int] = Some(7)
scala> Some(7)
res214: Some[Int] = Some(7)
NOTE Strictly speaking the definition of a Monad in functional programming is not the same as the definition of a Monad in Category Theory, which is defined in turns of map and flatten. Though they are kind of equivalent under certain mappings. This presentations is very good: http://www.slideshare.net/samthemonad/monad-presentation-scala-as-a-category

This answer begins with a motivating example, works through the example, derives an example of a monad, and formally defines "monad".
Consider these three functions in pseudocode:
f(<x, messages>) := <x, messages "called f. ">
g(<x, messages>) := <x, messages "called g. ">
wrap(x) := <x, "">
f takes an ordered pair of the form <x, messages> and returns an ordered pair. It leaves the first item untouched and appends "called f. " to the second item. Same with g.
You can compose these functions and get your original value, along with a string that shows which order the functions were called in:
f(g(wrap(x)))
= f(g(<x, "">))
= f(<x, "called g. ">)
= <x, "called g. called f. ">
You dislike the fact that f and g are responsible for appending their own log messages to the previous logging information. (Just imagine for the sake of argument that instead of appending strings, f and g must perform complicated logic on the second item of the pair. It would be a pain to repeat that complicated logic in two -- or more -- different functions.)
You prefer to write simpler functions:
f(x) := <x, "called f. ">
g(x) := <x, "called g. ">
wrap(x) := <x, "">
But look at what happens when you compose them:
f(g(wrap(x)))
= f(g(<x, "">))
= f(<<x, "">, "called g. ">)
= <<<x, "">, "called g. ">, "called f. ">
The problem is that passing a pair into a function does not give you what you want. But what if you could feed a pair into a function:
feed(f, feed(g, wrap(x)))
= feed(f, feed(g, <x, "">))
= feed(f, <x, "called g. ">)
= <x, "called g. called f. ">
Read feed(f, m) as "feed m into f". To feed a pair <x, messages> into a function f is to pass x into f, get <y, message> out of f, and return <y, messages message>.
feed(f, <x, messages>) := let <y, message> = f(x)
in <y, messages message>
Notice what happens when you do three things with your functions:
First: if you wrap a value and then feed the resulting pair into a function:
feed(f, wrap(x))
= feed(f, <x, "">)
= let <y, message> = f(x)
in <y, "" message>
= let <y, message> = <x, "called f. ">
in <y, "" message>
= <x, "" "called f. ">
= <x, "called f. ">
= f(x)
That is the same as passing the value into the function.
Second: if you feed a pair into wrap:
feed(wrap, <x, messages>)
= let <y, message> = wrap(x)
in <y, messages message>
= let <y, message> = <x, "">
in <y, messages message>
= <x, messages "">
= <x, messages>
That does not change the pair.
Third: if you define a function that takes x and feeds g(x) into f:
h(x) := feed(f, g(x))
and feed a pair into it:
feed(h, <x, messages>)
= let <y, message> = h(x)
in <y, messages message>
= let <y, message> = feed(f, g(x))
in <y, messages message>
= let <y, message> = feed(f, <x, "called g. ">)
in <y, messages message>
= let <y, message> = let <z, msg> = f(x)
in <z, "called g. " msg>
in <y, messages message>
= let <y, message> = let <z, msg> = <x, "called f. ">
in <z, "called g. " msg>
in <y, messages message>
= let <y, message> = <x, "called g. " "called f. ">
in <y, messages message>
= <x, messages "called g. " "called f. ">
= feed(f, <x, messages "called g. ">)
= feed(f, feed(g, <x, messages>))
That is the same as feeding the pair into g and feeding the resulting pair into f.
You have most of a monad. Now you just need to know about the data types in your program.
What type of value is <x, "called f. ">? Well, that depends on what type of value x is. If x is of type t, then your pair is a value of type "pair of t and string". Call that type M t.
M is a type constructor: M alone does not refer to a type, but M _ refers to a type once you fill in the blank with a type. An M int is a pair of an int and a string. An M string is a pair of a string and a string. Etc.
Congratulations, you have created a monad!
Formally, your monad is the tuple <M, feed, wrap>.
A monad is a tuple <M, feed, wrap> where:
M is a type constructor.
feed takes a (function that takes a t and returns an M u) and an M t and returns an M u.
wrap takes a v and returns an M v.
t, u, and v are any three types that may or may not be the same. A monad satisfies the three properties you proved for your specific monad:
Feeding a wrapped t into a function is the same as passing the unwrapped t into the function.
Formally: feed(f, wrap(x)) = f(x)
Feeding an M t into wrap does nothing to the M t.
Formally: feed(wrap, m) = m
Feeding an M t (call it m) into a function that
passes the t into g
gets an M u (call it n) from g
feeds n into f
is the same as
feeding m into g
getting n from g
feeding n into f
Formally: feed(h, m) = feed(f, feed(g, m)) where h(x) := feed(f, g(x))
Typically, feed is called bind (AKA >>= in Haskell) and wrap is called return.

I will try to explain Monad in the context of Haskell.
In functional programming, function composition is important. It allows our program to consist of small, easy-to-read functions.
Let's say we have two functions: g :: Int -> String and f :: String -> Bool.
We can do (f . g) x, which is just the same as f (g x), where x is an Int value.
When doing composition/applying the result of one function to another, having the types match up is important. In the above case, the type of the result returned by g must be the same as the type accepted by f.
But sometimes values are in contexts, and this makes it a bit less easy to line up types. (Having values in contexts is very useful. For example, the Maybe Int type represents an Int value that may not be there, the IO String type represents a String value that is there as a result of performing some side effects.)
Let's say we now have g1 :: Int -> Maybe String and f1 :: String -> Maybe Bool. g1 and f1 are very similar to g and f respectively.
We can't do (f1 . g1) x or f1 (g1 x), where x is an Int value. The type of the result returned by g1 is not what f1 expects.
We could compose f and g with the . operator, but now we can't compose f1 and g1 with .. The problem is that we can't straightforwardly pass a value in a context to a function that expects a value that is not in a context.
Wouldn't it be nice if we introduce an operator to compose g1 and f1, such that we can write (f1 OPERATOR g1) x? g1 returns a value in a context. The value will be taken out of context and applied to f1. And yes, we have such an operator. It's <=<.
We also have the >>= operator that does for us the exact same thing, though in a slightly different syntax.
We write: g1 x >>= f1. g1 x is a Maybe Int value. The >>= operator helps take that Int value out of the "perhaps-not-there" context, and apply it to f1. The result of f1, which is a Maybe Bool, will be the result of the entire >>= operation.
And finally, why is Monad useful? Because Monad is the type class that defines the >>= operator, very much the same as the Eq type class that defines the == and /= operators.
To conclude, the Monad type class defines the >>= operator that allows us to pass values in a context (we call these monadic values) to functions that don't expect values in a context. The context will be taken care of.
If there is one thing to remember here, it is that Monads allow function composition that involves values in contexts.

A Monad is an Applicative (i.e. something that you can lift binary -- hence, "n-ary" -- functions to,(1) and inject pure values into(2)) Functor (i.e. something that you can map over,(3) i.e. lift unary functions to(3)) with the added ability to flatten the nested datatype (with each of the three notions following its corresponding set of laws). In Haskell, this flattening operation is called join.
The general (generic, parametric) type of this "join" operation is:
join :: Monad m => m (m a) -> m a
for any monad m (NB all ms in the type are the same!).
A specific m monad defines its specific version of join working for any value type a "carried" by the monadic values of type m a. Some specific types are:
join :: [[a]] -> [a] -- for lists, or nondeterministic values
join :: Maybe (Maybe a) -> Maybe a -- for Maybe, or optional values
join :: IO (IO a) -> IO a -- for I/O-produced values
The join operation converts an m-computation producing an m-computation of a-type values into one combined m-computation of a-type values. This allows for combination of computation steps into one larger computation.
This computation steps-combining "bind" (>>=) operator simply uses fmap and join together, i.e.
(ma >>= k) == join (fmap k ma)
{-
ma :: m a -- `m`-computation which produces `a`-type values
k :: a -> m b -- create new `m`-computation from an `a`-type value
fmap k ma :: m ( m b ) -- `m`-computation of `m`-computation of `b`-type values
(m >>= k) :: m b -- `m`-computation which produces `b`-type values
-}
Conversely, join can be defined via bind, join mma == join (fmap id mma) == mma >>= id where id ma = ma -- whichever is more convenient for a given type m.
For monads, both the do-notation and its equivalent bind-using code,
do { x <- mx ; y <- my ; return (f x y) } -- x :: a , mx :: m a
-- y :: b , my :: m b
mx >>= (\x -> -- nested
my >>= (\y -> -- lambda
return (f x y) )) -- functions
can be read as
first "do" mx, and when it's done, get its "result" as x and let me use it to "do" something else.
In a given do block, each of the values to the right of the binding arrow <- is of type m a for some type a and the same monad m throughout the do block.
return x is a neutral m-computation which just produces the pure value x it is given, such that binding any m-computation with return does not change that computation at all.
(1) with liftA2 :: Applicative m => (a -> b -> c) -> m a -> m b -> m c
(2) with pure :: Applicative m => a -> m a
(3) with fmap :: Functor m => (a -> b) -> m a -> m b
There's also the equivalent Monad methods,
liftM2 :: Monad m => (a -> b -> c) -> m a -> m b -> m c
return :: Monad m => a -> m a
liftM :: Monad m => (a -> b) -> m a -> m b
Given a monad, the other definitions could be made as
pure a = return a
fmap f ma = do { a <- ma ; return (f a) }
liftA2 f ma mb = do { a <- ma ; b <- mb ; return (f a b) }
(ma >>= k) = do { a <- ma ; b <- k a ; return b }

If I've understood correctly, IEnumerable is derived from monads. I wonder if that might be an interesting angle of approach for those of us from the C# world?
For what it's worth, here are some links to tutorials that helped me (and no, I still haven't understood what monads are).
http://osteele.com/archives/2007/12/overloading-semicolon
http://spbhug.folding-maps.org/wiki/MonadsEn
http://www.loria.fr/~kow/monads/

What the world needs is another monad blog post, but I think this is useful in identifying existing monads in the wild.
monads are fractals
The above is a fractal called Sierpinski triangle, the only fractal I can remember to draw. Fractals are self-similar structure like the above triangle, in which the parts are similar to the whole (in this case exactly half the scale as parent triangle).
Monads are fractals. Given a monadic data structure, its values can be composed to form another value of the data structure. This is why it's useful to programming, and this is why it occurrs in many situations.

http://code.google.com/p/monad-tutorial/ is a work in progress to address exactly this question.

A monad is a container, but for data. A special container.
All containers can have openings and handles and spouts, but these containers are all guaranteed to have certain openings and handles and spouts.
Why? Because these guaranteed openings and handles and spouts are useful for picking up and linking together the containers in specific, common ways.
This allows you to pick up different containers without having to know much about them. It also allows different kinds of containers to link together easily.

Can anyone explain Monads? [duplicate]

Having briefly looked at Haskell recently, what would be a brief, succinct, practical explanation as to what a monad essentially is?
I have found most explanations I've come across to be fairly inaccessible and lacking in practical detail.

First: The term monad is a bit vacuous if you are not a mathematician. An alternative term is computation builder which is a bit more descriptive of what they are actually useful for.
They are a pattern for chaining operations. It looks a bit like method chaining in object-oriented languages, but the mechanism is slightly different.
The pattern is mostly used in functional languages (especially Haskell which uses monads pervasively) but can be used in any language which support higher-order functions (that is, functions which can take other functions as arguments).
Arrays in JavaScript support the pattern, so let’s use that as the first example.
The gist of the pattern is we have a type (Array in this case) which has a method which takes a function as argument. The operation supplied must return an instance of the same type (i.e. return an Array).
First an example of method chaining which does not use the monad pattern:
[1,2,3].map(x => x + 1)
The result is [2,3,4]. The code does not conform to the monad pattern, since the function we are supplying as an argument returns a number, not an Array. The same logic in monad form would be:
[1,2,3].flatMap(x => [x + 1])
Here we supply an operation which returns an Array, so now it conforms to the pattern. The flatMap method executes the provided function for every element in the array. It expects an array as result for each invocation (rather than single values), but merges the resulting set of arrays into a single array. So the end result is the same, the array [2,3,4].
(The function argument provided to a method like map or flatMap is often called a "callback" in JavaScript. I will call it the "operation" since it is more general.)
If we chain multiple operations (in the traditional way):
[1,2,3].map(a => a + 1).filter(b => b != 3)
Results in the array [2,4]
The same chaining in monad form:
[1,2,3].flatMap(a => [a + 1]).flatMap(b => b != 3 ? [b] : [])
Yields the same result, the array [2,4].
You will immediately notice that the monad form is quite a bit uglier than the non-monad! This just goes to show that monads are not necessarily “good”. They are a pattern which is sometimes beneficial and sometimes not.
Do note that the monad pattern can be combined in a different way:
[1,2,3].flatMap(a => [a + 1].flatMap(b => b != 3 ? [b] : []))
Here the binding is nested rather than chained, but the result is the same. This is an important property of monads as we will see later. It means two operations combined can be treated the same as a single operation.
The operation is allowed to return an array with different element types, for example transforming an array of numbers into an array of strings or something else; as long as it still an Array.
This can be described a bit more formally using Typescript notation. An array has the type Array<T>, where T is the type of the elements in the array. The method flatMap() takes a function argument of the type T => Array<U> and returns an Array<U>.
Generalized, a monad is any type Foo<Bar> which has a "bind" method which takes a function argument of type Bar => Foo<Baz> and returns a Foo<Baz>.
This answers what monads are. The rest of this answer will try to explain through examples why monads can be a useful pattern in a language like Haskell which has good support for them.
Haskell and Do-notation
To translate the map/filter example directly to Haskell, we replace flatMap with the >>= operator:
[1,2,3] >>= \a -> [a+1] >>= \b -> if b == 3 then [] else [b]
The >>= operator is the bind function in Haskell. It does the same as flatMap in JavaScript when the operand is a list, but it is overloaded with different meaning for other types.
But Haskell also has a dedicated syntax for monad expressions, the do-block, which hides the bind operator altogether:
do
a <- [1,2,3]
b <- [a+1]
if b == 3 then [] else [b]
This hides the "plumbing" and lets you focus on the actual operations applied at each step.
In a do-block, each line is an operation. The constraint still holds that all operations in the block must return the same type. Since the first expression is a list, the other operations must also return a list.
The back-arrow <- looks deceptively like an assignment, but note that this is the parameter passed in the bind. So, when the expression on the right side is a List of Integers, the variable on the left side will be a single Integer – but will be executed for each integer in the list.
Example: Safe navigation (the Maybe type)
Enough about lists, lets see how the monad pattern can be useful for other types.
Some functions may not always return a valid value. In Haskell this is represented by the Maybe-type, which is an option that is either Just value or Nothing.
Chaining operations which always return a valid value is of course straightforward:
streetName = getStreetName (getAddress (getUser 17))
But what if any of the functions could return Nothing? We need to check each result individually and only pass the value to the next function if it is not Nothing:
case getUser 17 of
Nothing -> Nothing
Just user ->
case getAddress user of
Nothing -> Nothing
Just address ->
getStreetName address
Quite a lot of repetitive checks! Imagine if the chain was longer. Haskell solves this with the monad pattern for Maybe:
do
user <- getUser 17
addr <- getAddress user
getStreetName addr
This do-block invokes the bind-function for the Maybe type (since the result of the first expression is a Maybe). The bind-function only executes the following operation if the value is Just value, otherwise it just passes the Nothing along.
Here the monad-pattern is used to avoid repetitive code. This is similar to how some other languages use macros to simplify syntax, although macros achieve the same goal in a very different way.
Note that it is the combination of the monad pattern and the monad-friendly syntax in Haskell which result in the cleaner code. In a language like JavaScript without any special syntax support for monads, I doubt the monad pattern would be able to simplify the code in this case.
Mutable state
Haskell does not support mutable state. All variables are constants and all values immutable. But the State type can be used to emulate programming with mutable state:
add2 :: State Integer Integer
add2 = do
-- add 1 to state
x <- get
put (x + 1)
-- increment in another way
modify (+1)
-- return state
get
evalState add2 7
=> 9
The add2 function builds a monad chain which is then evaluated with 7 as the initial state.
Obviously this is something which only makes sense in Haskell. Other languages support mutable state out of the box. Haskell is generally "opt-in" on language features - you enable mutable state when you need it, and the type system ensures the effect is explicit. IO is another example of this.
IO
The IO type is used for chaining and executing “impure” functions.
Like any other practical language, Haskell has a bunch of built-in functions which interface with the outside world: putStrLine, readLine and so on. These functions are called “impure” because they either cause side effects or have non-deterministic results. Even something simple like getting the time is considered impure because the result is non-deterministic – calling it twice with the same arguments may return different values.
A pure function is deterministic – its result depends purely on the arguments passed and it has no side effects on the environment beside returning a value.
Haskell heavily encourages the use of pure functions – this is a major selling point of the language. Unfortunately for purists, you need some impure functions to do anything useful. The Haskell compromise is to cleanly separate pure and impure, and guarantee that there is no way that pure functions can execute impure functions, directly or indirect.
This is guaranteed by giving all impure functions the IO type. The entry point in Haskell program is the main function which have the IO type, so we can execute impure functions at the top level.
But how does the language prevent pure functions from executing impure functions? This is due to the lazy nature of Haskell. A function is only executed if its output is consumed by some other function. But there is no way to consume an IO value except to assign it to main. So if a function wants to execute an impure function, it has to be connected to main and have the IO type.
Using monad chaining for IO operations also ensures that they are executed in a linear and predictable order, just like statements in an imperative language.
This brings us to the first program most people will write in Haskell:
main :: IO ()
main = do
putStrLn ”Hello World”
The do keyword is superfluous when there is only a single operation and therefore nothing to bind, but I keep it anyway for consistency.
The () type means “void”. This special return type is only useful for IO functions called for their side effect.
A longer example:
main = do
putStrLn "What is your name?"
name <- getLine
putStrLn ("hello" ++ name)
This builds a chain of IO operations, and since they are assigned to the main function, they get executed.
Comparing IO with Maybe shows the versatility of the monad pattern. For Maybe, the pattern is used to avoid repetitive code by moving conditional logic to the binding function. For IO, the pattern is used to ensure that all operations of the IO type are sequenced and that IO operations cannot "leak" to pure functions.
Summing up
In my subjective opinion, the monad pattern is only really worthwhile in a language which has some built-in support for the pattern. Otherwise it just leads to overly convoluted code. But Haskell (and some other languages) have some built-in support which hides the tedious parts, and then the pattern can be used for a variety of useful things. Like:
Avoiding repetitive code (Maybe)
Adding language features like mutable state or exceptions for delimited areas of the program.
Isolating icky stuff from nice stuff (IO)
Embedded domain-specific languages (Parser)
Adding GOTO to the language.

Explaining "what is a monad" is a bit like saying "what is a number?" We use numbers all the time. But imagine you met someone who didn't know anything about numbers. How the heck would you explain what numbers are? And how would you even begin to describe why that might be useful?
What is a monad? The short answer: It's a specific way of chaining operations together.
In essence, you're writing execution steps and linking them together with the "bind function". (In Haskell, it's named >>=.) You can write the calls to the bind operator yourself, or you can use syntax sugar which makes the compiler insert those function calls for you. But either way, each step is separated by a call to this bind function.
So the bind function is like a semicolon; it separates the steps in a process. The bind function's job is to take the output from the previous step, and feed it into the next step.
That doesn't sound too hard, right? But there is more than one kind of monad. Why? How?
Well, the bind function can just take the result from one step, and feed it to the next step. But if that's "all" the monad does... that actually isn't very useful. And that's important to understand: Every useful monad does something else in addition to just being a monad. Every useful monad has a "special power", which makes it unique.
(A monad that does nothing special is called the "identity monad". Rather like the identity function, this sounds like an utterly pointless thing, yet turns out not to be... But that's another story™.)
Basically, each monad has its own implementation of the bind function. And you can write a bind function such that it does hoopy things between execution steps. For example:
If each step returns a success/failure indicator, you can have bind execute the next step only if the previous one succeeded. In this way, a failing step aborts the whole sequence "automatically", without any conditional testing from you. (The Failure Monad.)
Extending this idea, you can implement "exceptions". (The Error Monad or Exception Monad.) Because you're defining them yourself rather than it being a language feature, you can define how they work. (E.g., maybe you want to ignore the first two exceptions and only abort when a third exception is thrown.)
You can make each step return multiple results, and have the bind function loop over them, feeding each one into the next step for you. In this way, you don't have to keep writing loops all over the place when dealing with multiple results. The bind function "automatically" does all that for you. (The List Monad.)
As well as passing a "result" from one step to another, you can have the bind function pass extra data around as well. This data now doesn't show up in your source code, but you can still access it from anywhere, without having to manually pass it to every function. (The Reader Monad.)
You can make it so that the "extra data" can be replaced. This allows you to simulate destructive updates, without actually doing destructive updates. (The State Monad and its cousin the Writer Monad.)
Because you're only simulating destructive updates, you can trivially do things that would be impossible with real destructive updates. For example, you can undo the last update, or revert to an older version.
You can make a monad where calculations can be paused, so you can pause your program, go in and tinker with internal state data, and then resume it.
You can implement "continuations" as a monad. This allows you to break people's minds!
All of this and more is possible with monads. Of course, all of this is also perfectly possible without monads too. It's just drastically easier using monads.

Actually, contrary to common understanding of Monads, they have nothing to do with state. Monads are simply a way to wrapping things and provide methods to do operations on the wrapped stuff without unwrapping it.
For example, you can create a type to wrap another one, in Haskell:
data Wrapped a = Wrap a
To wrap stuff we define
return :: a -> Wrapped a
return x = Wrap x
To perform operations without unwrapping, say you have a function f :: a -> b, then you can do this to lift that function to act on wrapped values:
fmap :: (a -> b) -> (Wrapped a -> Wrapped b)
fmap f (Wrap x) = Wrap (f x)
That's about all there is to understand. However, it turns out that there is a more general function to do this lifting, which is bind:
bind :: (a -> Wrapped b) -> (Wrapped a -> Wrapped b)
bind f (Wrap x) = f x
bind can do a bit more than fmap, but not vice versa. Actually, fmap can be defined only in terms of bind and return. So, when defining a monad.. you give its type (here it was Wrapped a) and then say how its return and bind operations work.
The cool thing is that this turns out to be such a general pattern that it pops up all over the place, encapsulating state in a pure way is only one of them.
For a good article on how monads can be used to introduce functional dependencies and thus control order of evaluation, like it is used in Haskell's IO monad, check out IO Inside.
As for understanding monads, don't worry too much about it. Read about them what you find interesting and don't worry if you don't understand right away. Then just diving in a language like Haskell is the way to go. Monads are one of these things where understanding trickles into your brain by practice, one day you just suddenly realize you understand them.

But, You could have invented Monads!
sigfpe says:
But all of these introduce monads as something esoteric in need of explanation. But what I want to argue is that they aren't esoteric at all. In fact, faced with various problems in functional programming you would have been led, inexorably, to certain solutions, all of which are examples of monads. In fact, I hope to get you to invent them now if you haven't already. It's then a small step to notice that all of these solutions are in fact the same solution in disguise. And after reading this, you might be in a better position to understand other documents on monads because you'll recognise everything you see as something you've already invented.
Many of the problems that monads try to solve are related to the issue of side effects. So we'll start with them. (Note that monads let you do more than handle side-effects, in particular many types of container object can be viewed as monads. Some of the introductions to monads find it hard to reconcile these two different uses of monads and concentrate on just one or the other.)
In an imperative programming language such as C++, functions behave nothing like the functions of mathematics. For example, suppose we have a C++ function that takes a single floating point argument and returns a floating point result. Superficially it might seem a little like a mathematical function mapping reals to reals, but a C++ function can do more than just return a number that depends on its arguments. It can read and write the values of global variables as well as writing output to the screen and receiving input from the user. In a pure functional language, however, a function can only read what is supplied to it in its arguments and the only way it can have an effect on the world is through the values it returns.

A monad is a datatype that has two operations: >>= (aka bind) and return (aka unit). return takes an arbitrary value and creates an instance of the monad with it. >>= takes an instance of the monad and maps a function over it. (You can see already that a monad is a strange kind of datatype, since in most programming languages you couldn't write a function that takes an arbitrary value and creates a type from it. Monads use a kind of parametric polymorphism.)
In Haskell notation, the monad interface is written
class Monad m where
return :: a -> m a
(>>=) :: forall a b . m a -> (a -> m b) -> m b
These operations are supposed to obey certain "laws", but that's not terrifically important: the "laws" just codify the way sensible implementations of the operations ought to behave (basically, that >>= and return ought to agree about how values get transformed into monad instances and that >>= is associative).
Monads are not just about state and I/O: they abstract a common pattern of computation that includes working with state, I/O, exceptions, and non-determinism. Probably the simplest monads to understand are lists and option types:
instance Monad [ ] where
[] >>= k = []
(x:xs) >>= k = k x ++ (xs >>= k)
return x = [x]
instance Monad Maybe where
Just x >>= k = k x
Nothing >>= k = Nothing
return x = Just x
where [] and : are the list constructors, ++ is the concatenation operator, and Just and Nothing are the Maybe constructors. Both of these monads encapsulate common and useful patterns of computation on their respective data types (note that neither has anything to do with side effects or I/O).
You really have to play around writing some non-trivial Haskell code to appreciate what monads are about and why they are useful.

You should first understand what a functor is. Before that, understand higher-order functions.
A higher-order function is simply a function that takes a function as an argument.
A functor is any type construction T for which there exists a higher-order function, call it map, that transforms a function of type a -> b (given any two types a and b) into a function T a -> T b. This map function must also obey the laws of identity and composition such that the following expressions return true for all p and q (Haskell notation):
map id = id
map (p . q) = map p . map q
For example, a type constructor called List is a functor if it comes equipped with a function of type (a -> b) -> List a -> List b which obeys the laws above. The only practical implementation is obvious. The resulting List a -> List b function iterates over the given list, calling the (a -> b) function for each element, and returns the list of the results.
A monad is essentially just a functor T with two extra methods, join, of type T (T a) -> T a, and unit (sometimes called return, fork, or pure) of type a -> T a. For lists in Haskell:
join :: [[a]] -> [a]
pure :: a -> [a]
Why is that useful? Because you could, for example, map over a list with a function that returns a list. Join takes the resulting list of lists and concatenates them. List is a monad because this is possible.
You can write a function that does map, then join. This function is called bind, or flatMap, or (>>=), or (=<<). This is normally how a monad instance is given in Haskell.
A monad has to satisfy certain laws, namely that join must be associative. This means that if you have a value x of type [[[a]]] then join (join x) should equal join (map join x). And pure must be an identity for join such that join (pure x) == x.

After much striving, I think I finally understand the monad. After rereading my own lengthy critique of the overwhelmingly top voted answer, I will offer this explanation.
There are three questions that need to be answered to understand monads:
Why do you need a monad?
What is a monad?
How is a monad implemented?
As I noted in my original comments, too many monad explanations get caught up in question number 3, without, and before really adequately covering question 2, or question 1.
Why do you need a monad?
Pure functional languages like Haskell are different from imperative languages like C, or Java in that, a pure functional program is not necessarily executed in a specific order, one step at a time. A Haskell program is more akin to a mathematical function, in which you may solve the "equation" in any number of potential orders. This confers a number of benefits, among which is that it eliminates the possibility of certain kinds of bugs, particularly those relating to things like "state".
However, there are certain problems that are not so straightforward to solve with this style of programming. Some things, like console programming, and file i/o, need things to happen in a particular order, or need to maintain state. One way to deal with this problem is to create a kind of object that represents the state of a computation, and a series of functions that take a state object as input, and return a new modified state object.
So let's create a hypothetical "state" value, that represents the state of a console screen. exactly how this value is constructed is not important, but let's say it's an array of byte length ascii characters that represents what is currently visible on the screen, and an array that represents the last line of input entered by the user, in pseudocode. We've defined some functions that take console state, modify it, and return a new console state.
consolestate MyConsole = new consolestate;
So to do console programming, but in a pure functional manner, you would need to nest a lot of function calls inside eachother.
consolestate FinalConsole = print(input(print(myconsole, "Hello, what's your name?")),"hello, %inputbuffer%!");
Programming in this way keeps the "pure" functional style, while forcing changes to the console to happen in a particular order. But, we'll probably want to do more than just a few operations at a time like in the above example. Nesting functions in that way will start to become ungainly. What we want, is code that does essentially the same thing as above, but is written a bit more like this:
consolestate FinalConsole = myconsole:
print("Hello, what's your name?"):
input():
print("hello, %inputbuffer%!");
This would indeed be a more convenient way to write it. How do we do that though?
What is a monad?
Once you have a type (such as consolestate) that you define along with a bunch of functions designed specifically to operate on that type, you can turn the whole package of these things into a "monad" by defining an operator like : (bind) that automatically feeds return values on its left, into function parameters on its right, and a lift operator that turns normal functions, into functions that work with that specific kind of bind operator.
How is a monad implemented?
See other answers, that seem quite free to jump into the details of that.

After giving an answer to this question a few years ago, I believe I can improve and simplify that response with...
A monad is a function composition technique that externalizes treatment for some input scenarios using a composing function, bind, to pre-process input during composition.
In normal composition, the function, compose (>>), is use to apply the composed function to the result of its predecessor in sequence. Importantly, the function being composed is required to handle all scenarios of its input.
(x -> y) >> (y -> z)
This design can be improved by restructuring the input so that relevant states are more easily interrogated. So, instead of simply y the value can become Mb such as, for instance, (is_OK, b) if y included a notion of validity.
For example, when the input is only possibly a number, instead of returning a string which can dutifully contain a number or not, you could restructure the type into a bool indicating the presence of a valid number and a number in tuple such as, bool * float. The composed functions would now no longer need to parse an input string to determine whether a number exists but could merely inspect the bool portion of a tuple.
(Ma -> Mb) >> (Mb -> Mc)
Here, again, composition occurs naturally with compose and so each function must handle all scenarios of its input individually, though doing so is now much easier.
However, what if we could externalize the effort of interrogation for those times where handling a scenario is routine. For example, what if our program does nothing when the input is not OK as in when is_OK is false. If that were done then composed functions would not need to handle that scenario themselves, dramatically simplifying their code and effecting another level of reuse.
To achieve this externalization we could use a function, bind (>>=), to perform the composition instead of compose. As such, instead of simply transferring values from the output of one function to the input of another Bind would inspect the M portion of Ma and decide whether and how to apply the composed function to the a. Of course, the function bind would be defined specifically for our particular M so as to be able to inspect its structure and perform whatever type of application we want. Nonetheless, the a can be anything since bind merely passes the a uninspected to the the composed function when it determines application necessary. Additionally, the composed functions themselves no longer need to deal with the M portion of the input structure either, simplifying them. Hence...
(a -> Mb) >>= (b -> Mc) or more succinctly Mb >>= (b -> Mc)
In short, a monad externalizes and thereby provides standard behaviour around the treatment of certain input scenarios once the input becomes designed to sufficiently expose them. This design is a shell and content model where the shell contains data relevant to the application of the composed function and is interrogated by and remains only available to the bind function.
Therefore, a monad is three things:
an M shell for holding monad relevant information,
a bind function implemented to make use of this shell information in its application of the composed functions to the content value(s) it finds within the shell, and
composable functions of the form, a -> Mb, producing results that include monadic management data.
Generally speaking, the input to a function is far more restrictive than its output which may include such things as error conditions; hence, the Mb result structure is generally very useful. For instance, the division operator does not return a number when the divisor is 0.
Additionally, monads may include wrap functions that wrap values, a, into the monadic type, Ma, and general functions, a -> b, into monadic functions, a -> Mb, by wrapping their results after application. Of course, like bind, such wrap functions are specific to M. An example:
let return a = [a]
let lift f a = return (f a)
The design of the bind function presumes immutable data structures and pure functions others things get complex and guarantees cannot be made. As such, there are monadic laws:
Given...
M_
return = (a -> Ma)
f = (a -> Mb)
g = (b -> Mc)
Then...
Left Identity : (return a) >>= f === f a
Right Identity : Ma >>= return === Ma
Associative : Ma >>= (f >>= g) === Ma >>= ((fun x -> f x) >>= g)
Associativity means that bind preserves the order of evaluation regardless of when bind is applied. That is, in the definition of Associativity above, the force early evaluation of the parenthesized binding of f and g will only result in a function that expects Ma in order to complete the bind. Hence the evaluation of Ma must be determined before its value can become applied to f and that result in turn applied to g.

My favorite Monad tutorial:
http://www.haskell.org/haskellwiki/All_About_Monads
(out of 170,000 hits on a Google search for "monad tutorial"!)
#Stu: The point of monads is to allow you to add (usually) sequential semantics to otherwise pure code; you can even compose monads (using Monad Transformers) and get more interesting and complicated combined semantics, like parsing with error handling, shared state, and logging, for example. All of this is possible in pure code, monads just allow you to abstract it away and reuse it in modular libraries (always good in programming), as well as providing convenient syntax to make it look imperative.
Haskell already has operator overloading[1]: it uses type classes much the way one might use interfaces in Java or C# but Haskell just happens to also allow non-alphanumeric tokens like + && and > as infix identifiers. It's only operator overloading in your way of looking at it if you mean "overloading the semicolon" [2]. It sounds like black magic and asking for trouble to "overload the semicolon" (picture enterprising Perl hackers getting wind of this idea) but the point is that without monads there is no semicolon, since purely functional code does not require or allow explicit sequencing.
This all sounds much more complicated than it needs to. sigfpe's article is pretty cool but uses Haskell to explain it, which sort of fails to break the chicken and egg problem of understanding Haskell to grok Monads and understanding Monads to grok Haskell.
[1] This is a separate issue from monads but monads use Haskell's operator overloading feature.
[2] This is also an oversimplification since the operator for chaining monadic actions is >>= (pronounced "bind") but there is syntactic sugar ("do") that lets you use braces and semicolons and/or indentation and newlines.

I am still new to monads, but I thought I would share a link I found that felt really good to read (WITH PICTURES!!):
http://www.matusiak.eu/numerodix/blog/2012/3/11/monads-for-the-layman/
(no affiliation)
Basically, the warm and fuzzy concept that I got from the article was the concept that monads are basically adapters that allow disparate functions to work in a composable fashion, i.e. be able to string up multiple functions and mix and match them without worrying about inconsistent return types and such. So the BIND function is in charge of keeping apples with apples and oranges with oranges when we're trying to make these adapters. And the LIFT function is in charge of taking "lower level" functions and "upgrading" them to work with BIND functions and be composable as well.
I hope I got it right, and more importantly, hope that the article has a valid view on monads. If nothing else, this article helped whet my appetite for learning more about monads.

I've been thinking of Monads in a different way, lately. I've been thinking of them as abstracting out execution order in a mathematical way, which makes new kinds of polymorphism possible.
If you're using an imperative language, and you write some expressions in order, the code ALWAYS runs exactly in that order.
And in the simple case, when you use a monad, it feels the same -- you define a list of expressions that happen in order. Except that, depending on which monad you use, your code might run in order (like in IO monad), in parallel over several items at once (like in the List monad), it might halt partway through (like in the Maybe monad), it might pause partway through to be resumed later (like in a Resumption monad), it might rewind and start from the beginning (like in a Transaction monad), or it might rewind partway to try other options (like in a Logic monad).
And because monads are polymorphic, it's possible to run the same code in different monads, depending on your needs.
Plus, in some cases, it's possible to combine monads together (with monad transformers) to get multiple features at the same time.

tl;dr
{-# LANGUAGE InstanceSigs #-}
newtype Id t = Id t
instance Monad Id where
return :: t -> Id t
return = Id
(=<<) :: (a -> Id b) -> Id a -> Id b
f =<< (Id x) = f x
Prologue
The application operator $ of functions
forall a b. a -> b
is canonically defined
($) :: (a -> b) -> a -> b
f $ x = f x
infixr 0 $
in terms of Haskell-primitive function application f x (infixl 10).
Composition . is defined in terms of $ as
(.) :: (b -> c) -> (a -> b) -> (a -> c)
f . g = \ x -> f $ g x
infixr 9 .
and satisfies the equivalences forall f g h.
f . id = f :: c -> d Right identity
id . g = g :: b -> c Left identity
(f . g) . h = f . (g . h) :: a -> d Associativity
. is associative, and id is its right and left identity.
The Kleisli triple
In programming, a monad is a functor type constructor with an instance of the monad type class. There are several equivalent variants of definition and implementation, each carrying slightly different intuitions about the monad abstraction.
A functor is a type constructor f of kind * -> * with an instance of the functor type class.
{-# LANGUAGE KindSignatures #-}
class Functor (f :: * -> *) where
map :: (a -> b) -> (f a -> f b)
In addition to following statically enforced type protocol, instances of the functor type class must obey the algebraic functor laws forall f g.
map id = id :: f t -> f t Identity
map f . map g = map (f . g) :: f a -> f c Composition / short cut fusion
Functor computations have the type
forall f t. Functor f => f t
A computation c r consists in results r within context c.
Unary monadic functions or Kleisli arrows have the type
forall m a b. Functor m => a -> m b
Kleisi arrows are functions that take one argument a and return a monadic computation m b.
Monads are canonically defined in terms of the Kleisli triple forall m. Functor m =>
(m, return, (=<<))
implemented as the type class
class Functor m => Monad m where
return :: t -> m t
(=<<) :: (a -> m b) -> m a -> m b
infixr 1 =<<
The Kleisli identity return is a Kleisli arrow that promotes a value t into monadic context m. Extension or Kleisli application =<< applies a Kleisli arrow a -> m b to results of a computation m a.
Kleisli composition <=< is defined in terms of extension as
(<=<) :: Monad m => (b -> m c) -> (a -> m b) -> (a -> m c)
f <=< g = \ x -> f =<< g x
infixr 1 <=<
<=< composes two Kleisli arrows, applying the left arrow to results of the right arrow’s application.
Instances of the monad type class must obey the monad laws, most elegantly stated in terms of Kleisli composition: forall f g h.
f <=< return = f :: c -> m d Right identity
return <=< g = g :: b -> m c Left identity
(f <=< g) <=< h = f <=< (g <=< h) :: a -> m d Associativity
<=< is associative, and return is its right and left identity.
Identity
The identity type
type Id t = t
is the identity function on types
Id :: * -> *
Interpreted as a functor,
return :: t -> Id t
= id :: t -> t
(=<<) :: (a -> Id b) -> Id a -> Id b
= ($) :: (a -> b) -> a -> b
(<=<) :: (b -> Id c) -> (a -> Id b) -> (a -> Id c)
= (.) :: (b -> c) -> (a -> b) -> (a -> c)
In canonical Haskell, the identity monad is defined
newtype Id t = Id t
instance Functor Id where
map :: (a -> b) -> Id a -> Id b
map f (Id x) = Id (f x)
instance Monad Id where
return :: t -> Id t
return = Id
(=<<) :: (a -> Id b) -> Id a -> Id b
f =<< (Id x) = f x
Option
An option type
data Maybe t = Nothing | Just t
encodes computation Maybe t that not necessarily yields a result t, computation that may “fail”. The option monad is defined
instance Functor Maybe where
map :: (a -> b) -> (Maybe a -> Maybe b)
map f (Just x) = Just (f x)
map _ Nothing = Nothing
instance Monad Maybe where
return :: t -> Maybe t
return = Just
(=<<) :: (a -> Maybe b) -> Maybe a -> Maybe b
f =<< (Just x) = f x
_ =<< Nothing = Nothing
a -> Maybe b is applied to a result only if Maybe a yields a result.
newtype Nat = Nat Int
The natural numbers can be encoded as those integers greater than or equal to zero.
toNat :: Int -> Maybe Nat
toNat i | i >= 0 = Just (Nat i)
| otherwise = Nothing
The natural numbers are not closed under subtraction.
(-?) :: Nat -> Nat -> Maybe Nat
(Nat n) -? (Nat m) = toNat (n - m)
infixl 6 -?
The option monad covers a basic form of exception handling.
(-? 20) <=< toNat :: Int -> Maybe Nat
List
The list monad, over the list type
data [] t = [] | t : [t]
infixr 5 :
and its additive monoid operation “append”
(++) :: [t] -> [t] -> [t]
(x : xs) ++ ys = x : xs ++ ys
[] ++ ys = ys
infixr 5 ++
encodes nonlinear computation [t] yielding a natural amount 0, 1, ... of results t.
instance Functor [] where
map :: (a -> b) -> ([a] -> [b])
map f (x : xs) = f x : map f xs
map _ [] = []
instance Monad [] where
return :: t -> [t]
return = (: [])
(=<<) :: (a -> [b]) -> [a] -> [b]
f =<< (x : xs) = f x ++ (f =<< xs)
_ =<< [] = []
Extension =<< concatenates ++ all lists [b] resulting from applications f x of a Kleisli arrow a -> [b] to elements of [a] into a single result list [b].
Let the proper divisors of a positive integer n be
divisors :: Integral t => t -> [t]
divisors n = filter (`divides` n) [2 .. n - 1]
divides :: Integral t => t -> t -> Bool
(`divides` n) = (== 0) . (n `rem`)
then
forall n. let { f = f <=< divisors } in f n = []
In defining the monad type class, instead of extension =<<, the Haskell standard uses its flip, the bind operator >>=.
class Applicative m => Monad m where
(>>=) :: forall a b. m a -> (a -> m b) -> m b
(>>) :: forall a b. m a -> m b -> m b
m >> k = m >>= \ _ -> k
{-# INLINE (>>) #-}
return :: a -> m a
return = pure
For simplicity's sake, this explanation uses the type class hierarchy
class Functor f
class Functor m => Monad m
In Haskell, the current standard hierarchy is
class Functor f
class Functor p => Applicative p
class Applicative m => Monad m
because not only is every monad a functor, but every applicative is a functor and every monad is an applicative, too.
Using the list monad, the imperative pseudocode
for a in (1, ..., 10)
for b in (1, ..., 10)
p <- a * b
if even(p)
yield p
roughly translates to the do block,
do a <- [1 .. 10]
b <- [1 .. 10]
let p = a * b
guard (even p)
return p
the equivalent monad comprehension,
[ p | a <- [1 .. 10], b <- [1 .. 10], let p = a * b, even p ]
and the expression
[1 .. 10] >>= (\ a ->
[1 .. 10] >>= (\ b ->
let p = a * b in
guard (even p) >> -- [ () | even p ] >>
return p
)
)
Do notation and monad comprehensions are syntactic sugar for nested bind expressions. The bind operator is used for local name binding of monadic results.
let x = v in e = (\ x -> e) $ v = v & (\ x -> e)
do { r <- m; c } = (\ r -> c) =<< m = m >>= (\ r -> c)
where
(&) :: a -> (a -> b) -> b
(&) = flip ($)
infixl 0 &
The guard function is defined
guard :: Additive m => Bool -> m ()
guard True = return ()
guard False = fail
where the unit type or “empty tuple”
data () = ()
Additive monads that support choice and failure can be abstracted over using a type class
class Monad m => Additive m where
fail :: m t
(<|>) :: m t -> m t -> m t
infixl 3 <|>
instance Additive Maybe where
fail = Nothing
Nothing <|> m = m
m <|> _ = m
instance Additive [] where
fail = []
(<|>) = (++)
where fail and <|> form a monoid forall k l m.
k <|> fail = k
fail <|> l = l
(k <|> l) <|> m = k <|> (l <|> m)
and fail is the absorbing/annihilating zero element of additive monads
_ =<< fail = fail
If in
guard (even p) >> return p
even p is true, then the guard produces [()], and, by the definition of >>, the local constant function
\ _ -> return p
is applied to the result (). If false, then the guard produces the list monad’s fail ( [] ), which yields no result for a Kleisli arrow to be applied >> to, so this p is skipped over.
State
Infamously, monads are used to encode stateful computation.
A state processor is a function
forall st t. st -> (t, st)
that transitions a state st and yields a result t. The state st can be anything. Nothing, flag, count, array, handle, machine, world.
The type of state processors is usually called
type State st t = st -> (t, st)
The state processor monad is the kinded * -> * functor State st. Kleisli arrows of the state processor monad are functions
forall st a b. a -> (State st) b
In canonical Haskell, the lazy version of the state processor monad is defined
newtype State st t = State { stateProc :: st -> (t, st) }
instance Functor (State st) where
map :: (a -> b) -> ((State st) a -> (State st) b)
map f (State p) = State $ \ s0 -> let (x, s1) = p s0
in (f x, s1)
instance Monad (State st) where
return :: t -> (State st) t
return x = State $ \ s -> (x, s)
(=<<) :: (a -> (State st) b) -> (State st) a -> (State st) b
f =<< (State p) = State $ \ s0 -> let (x, s1) = p s0
in stateProc (f x) s1
A state processor is run by supplying an initial state:
run :: State st t -> st -> (t, st)
run = stateProc
eval :: State st t -> st -> t
eval = fst . run
exec :: State st t -> st -> st
exec = snd . run
State access is provided by primitives get and put, methods of abstraction over stateful monads:
{-# LANGUAGE MultiParamTypeClasses, FunctionalDependencies #-}
class Monad m => Stateful m st | m -> st where
get :: m st
put :: st -> m ()
m -> st declares a functional dependency of the state type st on the monad m; that a State t, for example, will determine the state type to be t uniquely.
instance Stateful (State st) st where
get :: State st st
get = State $ \ s -> (s, s)
put :: st -> State st ()
put s = State $ \ _ -> ((), s)
with the unit type used analogously to void in C.
modify :: Stateful m st => (st -> st) -> m ()
modify f = do
s <- get
put (f s)
gets :: Stateful m st => (st -> t) -> m t
gets f = do
s <- get
return (f s)
gets is often used with record field accessors.
The state monad equivalent of the variable threading
let s0 = 34
s1 = (+ 1) s0
n = (* 12) s1
s2 = (+ 7) s1
in (show n, s2)
where s0 :: Int, is the equally referentially transparent, but infinitely more elegant and practical
(flip run) 34
(do
modify (+ 1)
n <- gets (* 12)
modify (+ 7)
return (show n)
)
modify (+ 1) is a computation of type State Int (), except for its effect equivalent to return ().
(flip run) 34
(modify (+ 1) >>
gets (* 12) >>= (\ n ->
modify (+ 7) >>
return (show n)
)
)
The monad law of associativity can be written in terms of >>= forall m f g.
(m >>= f) >>= g = m >>= (\ x -> f x >>= g)
or
do { do { do {
r1 <- do { x <- m; r0 <- m;
r0 <- m; = do { = r1 <- f r0;
f r0 r1 <- f x; g r1
}; g r1 }
g r1 }
} }
Like in expression-oriented programming (e.g. Rust), the last statement of a block represents its yield. The bind operator is sometimes called a “programmable semicolon”.
Iteration control structure primitives from structured imperative programming are emulated monadically
for :: Monad m => (a -> m b) -> [a] -> m ()
for f = foldr ((>>) . f) (return ())
while :: Monad m => m Bool -> m t -> m ()
while c m = do
b <- c
if b then m >> while c m
else return ()
forever :: Monad m => m t
forever m = m >> forever m
Input/Output
data World
The I/O world state processor monad is a reconciliation of pure Haskell and the real world, of functional denotative and imperative operational semantics. A close analogue of the actual strict implementation:
type IO t = World -> (t, World)
Interaction is facilitated by impure primitives
getChar :: IO Char
putChar :: Char -> IO ()
readFile :: FilePath -> IO String
writeFile :: FilePath -> String -> IO ()
hSetBuffering :: Handle -> BufferMode -> IO ()
hTell :: Handle -> IO Integer
. . . . . .
The impurity of code that uses IO primitives is permanently protocolized by the type system. Because purity is awesome, what happens in IO, stays in IO.
unsafePerformIO :: IO t -> t
Or, at least, should.
The type signature of a Haskell program
main :: IO ()
main = putStrLn "Hello, World!"
expands to
World -> ((), World)
A function that transforms a world.
Epilogue
The category whiches objects are Haskell types and whiches morphisms are functions between Haskell types is, “fast and loose”, the category Hask.
A functor T is a mapping from a category C to a category D; for each object in C an object in D
Tobj : Obj(C) -> Obj(D)
f :: * -> *
and for each morphism in C a morphism in D
Tmor : HomC(X, Y) -> HomD(Tobj(X), Tobj(Y))
map :: (a -> b) -> (f a -> f b)
where X, Y are objects in C. HomC(X, Y) is the homomorphism class of all morphisms X -> Y in C. The functor must preserve morphism identity and composition, the “structure” of C, in D.
Tmor Tobj
T(id) = id : T(X) -> T(X) Identity
T(f) . T(g) = T(f . g) : T(X) -> T(Z) Composition
The Kleisli category of a category C is given by a Kleisli triple
<T, eta, _*>
of an endofunctor
T : C -> C
(f), an identity morphism eta (return), and an extension operator * (=<<).
Each Kleisli morphism in Hask
f : X -> T(Y)
f :: a -> m b
by the extension operator
(_)* : Hom(X, T(Y)) -> Hom(T(X), T(Y))
(=<<) :: (a -> m b) -> (m a -> m b)
is given a morphism in Hask’s Kleisli category
f* : T(X) -> T(Y)
(f =<<) :: m a -> m b
Composition in the Kleisli category .T is given in terms of extension
f .T g = f* . g : X -> T(Z)
f <=< g = (f =<<) . g :: a -> m c
and satisfies the category axioms
eta .T g = g : Y -> T(Z) Left identity
return <=< g = g :: b -> m c
f .T eta = f : Z -> T(U) Right identity
f <=< return = f :: c -> m d
(f .T g) .T h = f .T (g .T h) : X -> T(U) Associativity
(f <=< g) <=< h = f <=< (g <=< h) :: a -> m d
which, applying the equivalence transformations
eta .T g = g
eta* . g = g By definition of .T
eta* . g = id . g forall f. id . f = f
eta* = id forall f g h. f . h = g . h ==> f = g
(f .T g) .T h = f .T (g .T h)
(f* . g)* . h = f* . (g* . h) By definition of .T
(f* . g)* . h = f* . g* . h . is associative
(f* . g)* = f* . g* forall f g h. f . h = g . h ==> f = g
in terms of extension are canonically given
eta* = id : T(X) -> T(X) Left identity
(return =<<) = id :: m t -> m t
f* . eta = f : Z -> T(U) Right identity
(f =<<) . return = f :: c -> m d
(f* . g)* = f* . g* : T(X) -> T(Z) Associativity
(((f =<<) . g) =<<) = (f =<<) . (g =<<) :: m a -> m c
Monads can also be defined in terms not of Kleislian extension, but a natural transformation mu, in programming called join. A monad is defined in terms of mu as a triple over a category C, of an endofunctor
T : C -> C
f :: * -> *
and two natural tranformations
eta : Id -> T
return :: t -> f t
mu : T . T -> T
join :: f (f t) -> f t
satisfying the equivalences
mu . T(mu) = mu . mu : T . T . T -> T . T Associativity
join . map join = join . join :: f (f (f t)) -> f t
mu . T(eta) = mu . eta = id : T -> T Identity
join . map return = join . return = id :: f t -> f t
The monad type class is then defined
class Functor m => Monad m where
return :: t -> m t
join :: m (m t) -> m t
The canonical mu implementation of the option monad:
instance Monad Maybe where
return = Just
join (Just m) = m
join Nothing = Nothing
The concat function
concat :: [[a]] -> [a]
concat (x : xs) = x ++ concat xs
concat [] = []
is the join of the list monad.
instance Monad [] where
return :: t -> [t]
return = (: [])
(=<<) :: (a -> [b]) -> ([a] -> [b])
(f =<<) = concat . map f
Implementations of join can be translated from extension form using the equivalence
mu = id* : T . T -> T
join = (id =<<) :: m (m t) -> m t
The reverse translation from mu to extension form is given by
f* = mu . T(f) : T(X) -> T(Y)
(f =<<) = join . map f :: m a -> m b
Philip Wadler: Monads for functional programming
Simon L Peyton Jones, Philip Wadler: Imperative functional programming
Jonathan M. D. Hill, Keith Clarke: An introduction to category theory, category theory monads, and their relationship to functional programming
´
Kleisli category
Eugenio Moggi: Notions of computation and monads
What a monad is not
But why should a theory so abstract be of any use for programming?
The answer is simple: as computer scientists, we value abstraction! When we design the interface to a software component, we want it to reveal as little as possible about the implementation. We want to be able to replace the implementation with many alternatives, many other ‘instances’ of the same ‘concept’. When we design a generic interface to many program libraries, it is even more important that the interface we choose have a variety of implementations. It is the generality of the monad concept which we value so highly, it is because category theory is so abstract that its concepts are so useful for programming.
It is hardly suprising, then, that the generalisation of monads that we present below also has a close connection to category theory. But we stress that our purpose is very practical: it is not to ‘implement category theory’, it is to find a more general way to structure combinator libraries. It is simply our good fortune that mathematicians have already done much of the work for us!
from Generalising Monads to Arrows by John Hughes

Monads Are Not Metaphors, but a practically useful abstraction emerging from a common pattern, as Daniel Spiewak explains.

In addition to the excellent answers above, let me offer you a link to the following article (by Patrick Thomson) which explains monads by relating the concept to the JavaScript library jQuery (and its way of using "method chaining" to manipulate the DOM):
jQuery is a Monad
The jQuery documentation itself doesn't refer to the term "monad" but talks about the "builder pattern" which is probably more familiar. This doesn't change the fact that you have a proper monad there maybe without even realizing it.

A monad is a way of combining computations together that share a common context. It is like building a network of pipes. When constructing the network, there is no data flowing through it. But when I have finished piecing all the bits together with 'bind' and 'return' then I invoke something like runMyMonad monad data and the data flows through the pipes.

In practice, monad is a custom implementation of function composition operator that takes care of side effects and incompatible input and return values (for chaining).

The two things that helped me best when learning about there were:
Chapter 8, "Functional Parsers," from Graham Hutton's book Programming in Haskell. This doesn't mention monads at all, actually, but if you can work through chapter and really understand everything in it, particularly how a sequence of bind operations is evaluated, you'll understand the internals of monads. Expect this to take several tries.
The tutorial All About Monads. This gives several good examples of their use, and I have to say that the analogy in Appendex I worked for me.

Monoid appears to be something that ensures that all operations defined on a Monoid and a supported type will always return a supported type inside the Monoid. Eg, Any number + Any number = A number, no errors.
Whereas division accepts two fractionals, and returns a fractional, which defined division by zero as Infinity in haskell somewhy(which happens to be a fractional somewhy)...
In any case, it appears Monads are just a way to ensure that your chain of operations behaves in a predictable way, and a function that claims to be Num -> Num, composed with another function of Num->Num called with x does not say, fire the missiles.
On the other hand, if we have a function which does fire the missiles, we can compose it with other functions which also fire the missiles, because our intent is clear -- we want to fire the missiles -- but it won't try printing "Hello World" for some odd reason.
In Haskell, main is of type IO (), or IO [()], the distiction is strange and I will not discuss it but here's what I think happens:
If I have main, I want it to do a chain of actions, the reason I run the program is to produce an effect -- usually though IO. Thus I can chain IO operations together in main in order to -- do IO, nothing else.
If I try to do something which does not "return IO", the program will complain that the chain does not flow, or basically "How does this relate to what we are trying to do -- an IO action", it appears to force the programmer to keep their train of thought, without straying off and thinking about firing the missiles, while creating algorithms for sorting -- which does not flow.
Basically, Monads appear to be a tip to the compiler that "hey, you know this function that returns a number here, it doesn't actually always work, it can sometimes produce a Number, and sometimes Nothing at all, just keep this in mind". Knowing this, if you try to assert a monadic action, the monadic action may act as a compile time exception saying "hey, this isn't actually a number, this CAN be a number, but you can't assume this, do something to ensure that the flow is acceptable." which prevents unpredictable program behavior -- to a fair extent.
It appears monads are not about purity, nor control, but about maintaining an identity of a category on which all behavior is predictable and defined, or does not compile. You cannot do nothing when you are expected to do something, and you cannot do something if you are expected to do nothing (visible).
The biggest reason I could think of for Monads is -- go look at Procedural/OOP code, and you will notice that you do not know where the program starts, nor ends, all you see is a lot of jumping and a lot of math,magic,and missiles. You will not be able to maintain it, and if you can, you will spend quite a lot of time wrapping your mind around the whole program before you can understand any part of it, because modularity in this context is based on interdependant "sections" of code, where code is optimized to be as related as possible for promise of efficiency/inter-relation. Monads are very concrete, and well defined by definition, and ensure that the flow of program is possible to analyze, and isolate parts which are hard to analyze -- as they themselves are monads. A monad appears to be a "comprehensible unit which is predictable upon its full understanding" -- If you understand "Maybe" monad, there's no possible way it will do anything except be "Maybe", which appears trivial, but in most non monadic code, a simple function "helloworld" can fire the missiles, do nothing, or destroy the universe or even distort time -- we have no idea nor have any guarantees that IT IS WHAT IT IS. A monad GUARANTEES that IT IS WHAT IT IS. which is very powerful.
All things in "real world" appear to be monads, in the sense that it is bound by definite observable laws preventing confusion. This does not mean we have to mimic all the operations of this object to create classes, instead we can simply say "a square is a square", nothing but a square, not even a rectangle nor a circle, and "a square has area of the length of one of it's existing dimensions multiplied by itself. No matter what square you have, if it's a square in 2D space, it's area absolutely cannot be anything but its length squared, it's almost trivial to prove. This is very powerful because we do not need to make assertions to make sure that our world is the way it is, we just use implications of reality to prevent our programs from falling off track.
Im pretty much guaranteed to be wrong but I think this could help somebody out there, so hopefully it helps somebody.

In the context of Scala you will find the following to be the simplest definition. Basically flatMap (or bind) is 'associative' and there exists an identity.
trait M[+A] {
def flatMap[B](f: A => M[B]): M[B] // AKA bind
// Pseudo Meta Code
def isValidMonad: Boolean = {
// for every parameter the following holds
def isAssociativeOn[X, Y, Z](x: M[X], f: X => M[Y], g: Y => M[Z]): Boolean =
x.flatMap(f).flatMap(g) == x.flatMap(f(_).flatMap(g))
// for every parameter X and x, there exists an id
// such that the following holds
def isAnIdentity[X](x: M[X], id: X => M[X]): Boolean =
x.flatMap(id) == x
}
}
E.g.
// These could be any functions
val f: Int => Option[String] = number => if (number == 7) Some("hello") else None
val g: String => Option[Double] = string => Some(3.14)
// Observe these are identical. Since Option is a Monad
// they will always be identical no matter what the functions are
scala> Some(7).flatMap(f).flatMap(g)
res211: Option[Double] = Some(3.14)
scala> Some(7).flatMap(f(_).flatMap(g))
res212: Option[Double] = Some(3.14)
// As Option is a Monad, there exists an identity:
val id: Int => Option[Int] = x => Some(x)
// Observe these are identical
scala> Some(7).flatMap(id)
res213: Option[Int] = Some(7)
scala> Some(7)
res214: Some[Int] = Some(7)
NOTE Strictly speaking the definition of a Monad in functional programming is not the same as the definition of a Monad in Category Theory, which is defined in turns of map and flatten. Though they are kind of equivalent under certain mappings. This presentations is very good: http://www.slideshare.net/samthemonad/monad-presentation-scala-as-a-category

This answer begins with a motivating example, works through the example, derives an example of a monad, and formally defines "monad".
Consider these three functions in pseudocode:
f(<x, messages>) := <x, messages "called f. ">
g(<x, messages>) := <x, messages "called g. ">
wrap(x) := <x, "">
f takes an ordered pair of the form <x, messages> and returns an ordered pair. It leaves the first item untouched and appends "called f. " to the second item. Same with g.
You can compose these functions and get your original value, along with a string that shows which order the functions were called in:
f(g(wrap(x)))
= f(g(<x, "">))
= f(<x, "called g. ">)
= <x, "called g. called f. ">
You dislike the fact that f and g are responsible for appending their own log messages to the previous logging information. (Just imagine for the sake of argument that instead of appending strings, f and g must perform complicated logic on the second item of the pair. It would be a pain to repeat that complicated logic in two -- or more -- different functions.)
You prefer to write simpler functions:
f(x) := <x, "called f. ">
g(x) := <x, "called g. ">
wrap(x) := <x, "">
But look at what happens when you compose them:
f(g(wrap(x)))
= f(g(<x, "">))
= f(<<x, "">, "called g. ">)
= <<<x, "">, "called g. ">, "called f. ">
The problem is that passing a pair into a function does not give you what you want. But what if you could feed a pair into a function:
feed(f, feed(g, wrap(x)))
= feed(f, feed(g, <x, "">))
= feed(f, <x, "called g. ">)
= <x, "called g. called f. ">
Read feed(f, m) as "feed m into f". To feed a pair <x, messages> into a function f is to pass x into f, get <y, message> out of f, and return <y, messages message>.
feed(f, <x, messages>) := let <y, message> = f(x)
in <y, messages message>
Notice what happens when you do three things with your functions:
First: if you wrap a value and then feed the resulting pair into a function:
feed(f, wrap(x))
= feed(f, <x, "">)
= let <y, message> = f(x)
in <y, "" message>
= let <y, message> = <x, "called f. ">
in <y, "" message>
= <x, "" "called f. ">
= <x, "called f. ">
= f(x)
That is the same as passing the value into the function.
Second: if you feed a pair into wrap:
feed(wrap, <x, messages>)
= let <y, message> = wrap(x)
in <y, messages message>
= let <y, message> = <x, "">
in <y, messages message>
= <x, messages "">
= <x, messages>
That does not change the pair.
Third: if you define a function that takes x and feeds g(x) into f:
h(x) := feed(f, g(x))
and feed a pair into it:
feed(h, <x, messages>)
= let <y, message> = h(x)
in <y, messages message>
= let <y, message> = feed(f, g(x))
in <y, messages message>
= let <y, message> = feed(f, <x, "called g. ">)
in <y, messages message>
= let <y, message> = let <z, msg> = f(x)
in <z, "called g. " msg>
in <y, messages message>
= let <y, message> = let <z, msg> = <x, "called f. ">
in <z, "called g. " msg>
in <y, messages message>
= let <y, message> = <x, "called g. " "called f. ">
in <y, messages message>
= <x, messages "called g. " "called f. ">
= feed(f, <x, messages "called g. ">)
= feed(f, feed(g, <x, messages>))
That is the same as feeding the pair into g and feeding the resulting pair into f.
You have most of a monad. Now you just need to know about the data types in your program.
What type of value is <x, "called f. ">? Well, that depends on what type of value x is. If x is of type t, then your pair is a value of type "pair of t and string". Call that type M t.
M is a type constructor: M alone does not refer to a type, but M _ refers to a type once you fill in the blank with a type. An M int is a pair of an int and a string. An M string is a pair of a string and a string. Etc.
Congratulations, you have created a monad!
Formally, your monad is the tuple <M, feed, wrap>.
A monad is a tuple <M, feed, wrap> where:
M is a type constructor.
feed takes a (function that takes a t and returns an M u) and an M t and returns an M u.
wrap takes a v and returns an M v.
t, u, and v are any three types that may or may not be the same. A monad satisfies the three properties you proved for your specific monad:
Feeding a wrapped t into a function is the same as passing the unwrapped t into the function.
Formally: feed(f, wrap(x)) = f(x)
Feeding an M t into wrap does nothing to the M t.
Formally: feed(wrap, m) = m
Feeding an M t (call it m) into a function that
passes the t into g
gets an M u (call it n) from g
feeds n into f
is the same as
feeding m into g
getting n from g
feeding n into f
Formally: feed(h, m) = feed(f, feed(g, m)) where h(x) := feed(f, g(x))
Typically, feed is called bind (AKA >>= in Haskell) and wrap is called return.

I will try to explain Monad in the context of Haskell.
In functional programming, function composition is important. It allows our program to consist of small, easy-to-read functions.
Let's say we have two functions: g :: Int -> String and f :: String -> Bool.
We can do (f . g) x, which is just the same as f (g x), where x is an Int value.
When doing composition/applying the result of one function to another, having the types match up is important. In the above case, the type of the result returned by g must be the same as the type accepted by f.
But sometimes values are in contexts, and this makes it a bit less easy to line up types. (Having values in contexts is very useful. For example, the Maybe Int type represents an Int value that may not be there, the IO String type represents a String value that is there as a result of performing some side effects.)
Let's say we now have g1 :: Int -> Maybe String and f1 :: String -> Maybe Bool. g1 and f1 are very similar to g and f respectively.
We can't do (f1 . g1) x or f1 (g1 x), where x is an Int value. The type of the result returned by g1 is not what f1 expects.
We could compose f and g with the . operator, but now we can't compose f1 and g1 with .. The problem is that we can't straightforwardly pass a value in a context to a function that expects a value that is not in a context.
Wouldn't it be nice if we introduce an operator to compose g1 and f1, such that we can write (f1 OPERATOR g1) x? g1 returns a value in a context. The value will be taken out of context and applied to f1. And yes, we have such an operator. It's <=<.
We also have the >>= operator that does for us the exact same thing, though in a slightly different syntax.
We write: g1 x >>= f1. g1 x is a Maybe Int value. The >>= operator helps take that Int value out of the "perhaps-not-there" context, and apply it to f1. The result of f1, which is a Maybe Bool, will be the result of the entire >>= operation.
And finally, why is Monad useful? Because Monad is the type class that defines the >>= operator, very much the same as the Eq type class that defines the == and /= operators.
To conclude, the Monad type class defines the >>= operator that allows us to pass values in a context (we call these monadic values) to functions that don't expect values in a context. The context will be taken care of.
If there is one thing to remember here, it is that Monads allow function composition that involves values in contexts.

A Monad is an Applicative (i.e. something that you can lift binary -- hence, "n-ary" -- functions to,(1) and inject pure values into(2)) Functor (i.e. something that you can map over,(3) i.e. lift unary functions to(3)) with the added ability to flatten the nested datatype (with each of the three notions following its corresponding set of laws). In Haskell, this flattening operation is called join.
The general (generic, parametric) type of this "join" operation is:
join :: Monad m => m (m a) -> m a
for any monad m (NB all ms in the type are the same!).
A specific m monad defines its specific version of join working for any value type a "carried" by the monadic values of type m a. Some specific types are:
join :: [[a]] -> [a] -- for lists, or nondeterministic values
join :: Maybe (Maybe a) -> Maybe a -- for Maybe, or optional values
join :: IO (IO a) -> IO a -- for I/O-produced values
The join operation converts an m-computation producing an m-computation of a-type values into one combined m-computation of a-type values. This allows for combination of computation steps into one larger computation.
This computation steps-combining "bind" (>>=) operator simply uses fmap and join together, i.e.
(ma >>= k) == join (fmap k ma)
{-
ma :: m a -- `m`-computation which produces `a`-type values
k :: a -> m b -- create new `m`-computation from an `a`-type value
fmap k ma :: m ( m b ) -- `m`-computation of `m`-computation of `b`-type values
(m >>= k) :: m b -- `m`-computation which produces `b`-type values
-}
Conversely, join can be defined via bind, join mma == join (fmap id mma) == mma >>= id where id ma = ma -- whichever is more convenient for a given type m.
For monads, both the do-notation and its equivalent bind-using code,
do { x <- mx ; y <- my ; return (f x y) } -- x :: a , mx :: m a
-- y :: b , my :: m b
mx >>= (\x -> -- nested
my >>= (\y -> -- lambda
return (f x y) )) -- functions
can be read as
first "do" mx, and when it's done, get its "result" as x and let me use it to "do" something else.
In a given do block, each of the values to the right of the binding arrow <- is of type m a for some type a and the same monad m throughout the do block.
return x is a neutral m-computation which just produces the pure value x it is given, such that binding any m-computation with return does not change that computation at all.
(1) with liftA2 :: Applicative m => (a -> b -> c) -> m a -> m b -> m c
(2) with pure :: Applicative m => a -> m a
(3) with fmap :: Functor m => (a -> b) -> m a -> m b
There's also the equivalent Monad methods,
liftM2 :: Monad m => (a -> b -> c) -> m a -> m b -> m c
return :: Monad m => a -> m a
liftM :: Monad m => (a -> b) -> m a -> m b
Given a monad, the other definitions could be made as
pure a = return a
fmap f ma = do { a <- ma ; return (f a) }
liftA2 f ma mb = do { a <- ma ; b <- mb ; return (f a b) }
(ma >>= k) = do { a <- ma ; b <- k a ; return b }

If I've understood correctly, IEnumerable is derived from monads. I wonder if that might be an interesting angle of approach for those of us from the C# world?
For what it's worth, here are some links to tutorials that helped me (and no, I still haven't understood what monads are).
http://osteele.com/archives/2007/12/overloading-semicolon
http://spbhug.folding-maps.org/wiki/MonadsEn
http://www.loria.fr/~kow/monads/

What the world needs is another monad blog post, but I think this is useful in identifying existing monads in the wild.
monads are fractals
The above is a fractal called Sierpinski triangle, the only fractal I can remember to draw. Fractals are self-similar structure like the above triangle, in which the parts are similar to the whole (in this case exactly half the scale as parent triangle).
Monads are fractals. Given a monadic data structure, its values can be composed to form another value of the data structure. This is why it's useful to programming, and this is why it occurrs in many situations.

http://code.google.com/p/monad-tutorial/ is a work in progress to address exactly this question.

A monad is a container, but for data. A special container.
All containers can have openings and handles and spouts, but these containers are all guaranteed to have certain openings and handles and spouts.
Why? Because these guaranteed openings and handles and spouts are useful for picking up and linking together the containers in specific, common ways.
This allows you to pick up different containers without having to know much about them. It also allows different kinds of containers to link together easily.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string