Haskell: Computation "in a monad" -- meaning?

Haskell: Computation "in a monad" -- meaning? - haskell

In reading about monads, I keep seeing phrases like "computations in the Xyz monad". What does it mean for a computation to be "in" a certain monad?
I think I have a fair grasp on what monads are about: allowing computations to produce outputs that are usually of some expected type, but can alternatively or additionally convey some other information, such as error status, logging info, state and so on, and allow such computations to be chained.
But I don't get how a computation would be said to be "in" a monad. Does this just refer to a function that produces a monadic result?
Examples: (search "computation in")
http://www.haskell.org/tutorial/monads.html
http://www.haskell.org/haskellwiki/All_About_Monads
http://ertes.de/articles/monads.html

Generally, a "computation in a monad" means not just a function returning a monadic result, but such a function used inside a do block, or as part of the second argument to (>>=), or anything else equivalent to those. The distinction is relevant to something you said in a comment:
"Computation" occurs in func f, after val extracted from input monad, and before result is wrapped as monad. I don't see how the computation per se is "in" the monad; it seems conspicuously "out" of the monad.
This isn't a bad way to think about it--in fact, do notation encourages it because it's a convenient way to look at things--but it does result in a slightly misleading intuition. Nowhere is anything being "extracted" from a monad. To see why, forget about (>>=)--it's a composite operation that exists to support do notation. The more fundamental definition of a monad are three orthogonal functions:
fmap :: (a -> b) -> (m a -> m b)
return :: a -> m a
join :: m (m a) -> m a
...where m is a monad.
Now think about how to implement (>>=) with these: starting with arguments of type m a and a -> m b, your only option is using fmap to get something of type m (m b), after which you can use join to flatten the nested "layers" to get just m b.
In other words, nothing is being taken "out" of the monad--instead, think of the computation as going deeper into the monad, with successive steps being collapsed into a single layer of the monad.
Note that the monad laws are also much simpler from this perspective--essentially, they say that when join is applied doesn't matter as long as the nesting order is preserved (a form of associativity) and that the monadic layer introduced by return does nothing (an identity value for join).

Does this just refer to a function that produces a monadic result?
Yes, in short.
In long, it's because Monad allows you to inject values into it (via return) but once inside the Monad they're stuck. You have to use some function like evalWriter or runCont which is strictly more specific than Monad to get values back "out".
More than that, Monad (really, its partner, Applicative) is the essence of having a "container" and allowing computations to happen inside of it. That's what (>>=) gives you, the ability to do interesting computations "inside" the Monad.
So functions like Monad m => m a -> (a -> m b) -> m b let you compute with and around and inside a Monad. Functions like Monad m => a -> m a let you inject into the Monad. Functions like m a -> a would let you "escape" the Monad except they don't exist in general (only in specific). So, for conversation's sake we like to talk about functions that have result types like Monad m => m a as being "inside the monad".

Usually monad stuff is easier to grasp when starting with "collection-like" monads as example. Imagine you calculate the distance of two points:
data Point = Point Double Double
distance :: Point -> Point -> Double
distance p1 p2 = undefined
Now you may have a certain context. E.g. one of the points may be "illegal" because it is out of some bounds (e.g. on the screen). So you wrap your existing computation in the Maybe monad:
distance :: Maybe Point -> Maybe Point -> Maybe Double
distance p1 p2 = undefined
You have exactly the same computation, but with the additional feature that there may be "no result" (encoded as Nothing).
Or you have a have a two groups of "possible" points, and need their mutual distances (e.g. to use later the shortest connection). Then the list monad is your "context":
distance :: [Point] -> [Point] -> [Double]
distance p1 p2 = undefined
Or the points are entered by a user, which makes the calculation "nondeterministic" (in the sense that you depend on things in the outside world, which may change), then the IO monad is your friend:
distance :: IO Point -> IO Point -> IO Double
distance p1 p2 = undefined
The computation remains always the same, but happens to take place in a certain "context", which adds some useful aspects (failure, multi-value, nondeterminism). You can even combine these contexts (monad transformers).
You may write a definition that unifies the definitions above, and works for any monad:
distance :: Monad m => m Point -> m Point -> m Double
distance p1 p2 = do
Point x1 y1 <- p1
Point x2 y2 <- p2
return $ sqrt ((x1-x2)^2 + (y1-y2)^2)
That proves that our computation is really independent from the actual monad, which leads to formulations as "x is computed in(-side) the y monad".

Looking at the links you provided, it seems that a common usage of "computation in" is with regards to a single monadic value. Excerpts:
Gentle introduction - here we run a computation in the SM monad, but the computation is the monadic value:
-- run a computation in the SM monad
runSM :: S -> SM a -> (a,S)
All about monads - previous computation refers to a monadic value in the sequence:
The >> function is a convenience operator that is used to bind a monadic computation that does not require input from the previous computation in the sequence
Understanding monads - here the first computation could refer to e.g. getLine, a monadic value :
(binding) gives an intrinsic idea of using the result of a computation in another computation, without requiring a notion of running computations.
So as an analogy, if I say i = 4 + 2, then i is the value 6, but it is equally a computation, namely the computation 4 + 2. It seems the linked pages uses computation in this sense - computation as a monadic value - at least some of the time, in which case it makes sense to use the expression "a computation in" the given monad.

Consider the IO monad. A value of type IO a is a description of a large (often infinite) number of behaviours where a behaviour is a sequence of IO events (reads, writes, etc). Such a value is called a "computation"; in this case it is a computation in the IO monad.

Related

How to understand "m ()" is a monadic computation

From the document:
when :: (Monad m) => Bool -> m () -> m ()
when p s = if p then s else return ()
The when function takes a boolean argument and a monadic computation with unit () type and performs the computation only when the boolean argument is True.
===
As a Haskell newbie, my problem is that for me m () is some "void" data, but here the document mentions it as computation. Is it because of the laziness of Haskell?

Laziness has no part in it.
The m/Monad part is often called computation.
The best example might be m = IO:
Look at putStrLn "Hello" :: IO () - This is a computation that, when run, will print "Hello" to your screen.
This computation has no result - so the return type is ()
Now when you write
hello :: Bool -> IO ()
hello sayIt =
when sayIt (putStrLn "Hello")
then hello True is a computation that, when run, will print "Hello"; while hello False is a computation that when run will do nothing at all.
Now compare it to getLine :: IO String - this is a computation that, when run, will prompt you for an input and will return the input as a String - that's why the return type is String.
Does this help?

for me "m ()" is some "void" data
And that kinda makes sense, in that a computation is a special kind of data. It has nothing to do with laziness - it's associated with context.
Let's take State as an example. A function of type, say, s -> () in Haskell can only produce one value. However, a function of type s -> ((), s) is a regular function doing some transformation on s. The problem you're having is that you're only looking at the () part, while the s -> s part stays hidden. That's the point of State - to hide the state passing.
Hence State s () is trivially convertible to s -> ((), s) and back, and it still is a Monad (a computation) that produces a value of... ().
If we look at practical use, now:
(flip runState 10) $ do
modify (+1)
This expression produces a tuple of ((), Int); the Int part is hidden
It will modify the state, adding 1 to it. It produces the intermediate value of (), though, which fits your when:
when (5 > 3) $ modify (+1)

Monads are notably abstract and mathematical, so intuitive statements about them are often made in language that is rather vague. So values of a monadic type are often informally labeled as "computations," "actions" or (less often) "commands" because it's an analogy that sometimes help us reason about them. But when you dig deeper, these turn out to be empty words when used this way; ultimately what they mean is "some value of a type that provides the Monad interface."
I like the word "action" better for this, so let me go with that. The intuition for the use for that word in Haskell is this: the language makes a distinction between functions and actions:
Functions can't have any side effects, and their types look like a -> b.
Actions may have side effects, and their types look like IO a.
A consequence of this: an action of type IO () produces an uninteresting result value, and therefore it's either a no-op (return ()) or an action that is only interesting because of its side effects.
Monad then is the interface that allows you to glue actions together into complex actions.
This is all very intuitive, but it's an analogy that becomes rather stretched when you try to apply it to many monads other than the IO type. For example, lists are a monad:
instance Monad [] where
return a = [a]
as >>= f = concatMap f as
The "actions" or "computations" of the list monad are... lists. How is a list an "action" or a "computation"? The analogy is rather weak in this case, isn't it?
So I'd say that this is the best advice:
Understand that "action" and "computation" are analogies. There's no strict definition.
Understand that these analogies are stronger for some monad instances, and weak for others.
The ultimate barometer of how things work are the Monad laws and the definitions of the various functions that work with Monad.

Why can I call a monadic function without supplying a monad?

I thought I had a good handle on Haskell Monads until I realized this very simple piece of code made no sense to me (this is from the haskell wiki about the State monad):
playGame :: String -> State GameState GameValue
playGame [] = do
(_, score) <- get
return score
What confuses me is, why is the code allowed to call "get", when the only argument supplied is a string? It seems almost like it is pulling the value out of thin air.
A better way for me to ask the question may be, how would one rewrite this function using >>= and lambda's instead of do notation? I'm unable to figure it out myself.

Desugaring this into do notation would look like
playGame [] =
get >>= \ (_, score) ->
return score
We could also just write this with fmap
playGame [] = fmap (\(_, score) -> score) get
playGame [] = fmap snd get
Now the trick is to realize that get is a value like any other with the type
State s s
What get will return won't be determined until we feed our computation to runState or similar where we provide an explicit starting value for our state.
If we simplify this further and get rid of the state monad we'd have
playGame :: String -> (GameState -> (GameState, GameValue))
playGame [] = \gamestate -> (gamestate, snd gamestate)
The state monad is just wrapping around all of this manual passing of GameState but you can think of get as accessing the value that our "function" was passed.

A monad is a "thing" which takes a context (we call it m) and which "produces" a value, while still respecting the monad laws. We can think of this in terms of being "inside" and "outside" of the monad. The monad laws tell us how to deal with a "round trip" -- going outside and then coming back inside. In particular, the laws tell us that m (m a) is essentially the same type as (m a).
The point is that a monad is a generalization of this round-tripping thing. join squashes (m (m a))'s into (m a)'s, and (>>=) pulls a value out of the monad and applies a function into the monad. Put another way, it applies a function (f :: a -> m b) to the a in (m a) -- which yields an (m (m b)), and then squashes that via join to get our (m b).
So what does this have to do with 'get' and objects?
Well, do notation sets us up so that the result of a computation is in our monad. And (<-) lets us pull a value out of a monad, so that we can bind it to a function, while still nominally being inside of the monad. So, for example:
doStuff = do
a <- get
b <- get
return $ (a + b)
Notice that a and b are pure. They are "outside" of the get, because we actually peeked inside it. But now that we have a value outside of the monad, we need to do something with it (+) and then stick it back in the monad.
This is just a little bit of suggestive notation, but it might be nice if we could do:
doStuff = do
a <- get
b <- get
(a + b) -> (\x -> return x)
to really emphasize the back and forth of it. When you finish a monad action, you must be on the right column in that table, because when the action is done, 'join' will get called to flatten the layers. (At least, conceptually)
Oh, right, objects. Well, obviously, an OO language basically lives and breathes in an IO monad of some kind. But we can actually break it down some more. When you run something along the lines of:
x = foo.bar.baz.bin()
you are basically running a monad transformer stack, which takes an IO context, which produces a foo context, which produces a bar context, which produces a baz context, which produces a bin context. And then the runtime system "calls" join on this thing as many times as needed. Notice how well this idea meshes with "call stacks". And indeed, this is why it is called a "monad transformer stack" on the haskell side. It is a stack of monadic contexts.

Are monads just ways of composing functions which would otherwise not compose?

The bind function seems remarkably similar like a composition function. And it helps in composing functions which return monads.
Is there anything more enlightening about monads than this idea?

Is there anything more enlightening about monads than this idea?
Yes, very much so!
Monadic binding is a way of composing functions where something else is happening over and above the application of a function to an input. What the something else is depends on the monad under consideration.
The Maybe monad is function composition with the possibility that one of the functions in the chain might fail, in which case the failure is automatically propagated to the end of the chain. The expression return x >>= f >>= g applies f to the value x. If the result is Nothing (i.e. failure) then the entire expression returns Nothing, with no other work taking place. Otherwise, g is applied to f x and its result is returned.
The Either e monad, where e is some type, is function composition with the possibility of failure with an error of type e. This is conceptually similar to the Maybe monad, but we get some more information about how and where the failure occured.
The List monad is function composition with the possibility of returning multiple values. If f and g are functions that return a list of outputs, then return x >>= f >>= g applies f to x, and then applies g to every output of f, collecting all of the outputs of these applications together into one big list.
Other monads represent function composition in various other contexts. Very briefly:
The Writer w monad is function composition with a value of type w being accumulated on the side. For example, often w = [String] (a list of strings) which is useful for logging.
The Reader r monad is function composition where each of the functions is also allowed to depend on a value of type r. This is useful when building evaluators for domain-specific languages, when r might be a map from variable names to values in the language - this allows simple implementation of lexical closures, for example.
The State s monad is a bit like a combination of reader and writer. It is function composition where each function is allowed to depend on, and modify, a value of type s.

The composition point of view is in fact quite enlightening in itself.
Monads can be seen as some of "funky composition" between functions of the form a -> Mb. You can compose f : a -> M b and g: b -> M c into something a -> M c, via the monad operations (just bind the return value of f into g).
This turns arrows of the form a -> M b as arrows of a category, termed the Kleisli category of M.
If M were not a monad but just a functor, you would be only able to compose fmap g and f into something (fmap g) . f :: a -> M (M c). Monads have join :: M (M a) -> M a that I let you define as an (easy and useful) exercise using only monad operations (for mathematicians, join is usually part of the definition of a monad). Then join . (fmap g) . f provides the composition for the Kleisli category.
All the funk of monadic composition can thus be seen to happen inside join, join represents the composition of side effects: for IO it sequences the effects, for List it concatenates lists, for Maybe it "stops a computation" when a result is Nothing, for Writer it sequences the writes, for State it sequences operations on the state, etc. It can be seen as an "overloadable semicolon" if you know C-like languages. It is very instructive to think about monads this way.
Of course, Dan Piponi explains this much better than I do, and here is some post of his that you may find enlightening: http://blog.sigfpe.com/2006/06/monads-kleisli-arrows-comonads-and.html

Unlike a Functor, a Monad can change shape?

I've always enjoyed the following intuitive explanation of a monad's power relative to a functor: a monad can change shape; a functor cannot.
For example: length $ fmap f [1,2,3] always equals 3.
With a monad, however, length $ [1,2,3] >>= g will often not equal 3. For example, if g is defined as:
g :: (Num a) => a -> [a]
g x = if x==2 then [] else [x]
then [1,2,3] >>= g is equal to [1,3].
The thing that troubles me slightly, is the type signature of g. It seems impossible to define a function which changes the shape of the input, with a generic monadic type such as:
h :: (Monad m, Num a) => a -> m a
The MonadPlus or MonadZero type classes have relevant zero elements, to use instead of [], but now we have something more than a monad.
Am I correct? If so, is there a way to express this subtlety to a newcomer to Haskell. I'd like to make my beloved "monads can change shape" phrase, just a touch more honest; if need be.

I've always enjoyed the following intuitive explanation of a monad's power relative to a functor: a monad can change shape; a functor cannot.
You're missing a bit of subtlety here, by the way. For the sake of terminology, I'll divide a Functor in the Haskell sense into three parts: The parametric component determined by the type parameter and operated on by fmap, the unchanging parts such as the tuple constructor in State, and the "shape" as anything else, such as choices between constructors (e.g., Nothing vs. Just) or parts involving other type parameters (e.g., the environment in Reader).
A Functor alone is limited to mapping functions over the parametric portion, of course.
A Monad can create new "shapes" based on the values of the parametric portion, which allows much more than just changing shapes. Duplicating every element in a list or dropping the first five elements would change the shape, but filtering a list requires inspecting the elements.
This is essentially how Applicative fits between them--it allows you to combine the shapes and parametric values of two Functors independently, without letting the latter influence the former.
Am I correct? If so, is there a way to express this subtlety to a newcomer to Haskell. I'd like to make my beloved "monads can change shape" phrase, just a touch more honest; if need be.
Perhaps the subtlety you're looking for here is that you're not really "changing" anything. Nothing in a Monad lets you explicitly mess with the shape. What it lets you do is create new shapes based on each parametric value, and have those new shapes recombined into a new composite shape.
Thus, you'll always be limited by the available ways to create shapes. With a completely generic Monad all you have is return, which by definition creates whatever shape is necessary such that (>>= return) is the identity function. The definition of a Monad tells you what you can do, given certain kinds of functions; it doesn't provide those functions for you.

Monad's operations can "change the shape" of values to the extent that the >>= function replaces leaf nodes in the "tree" that is the original value with a new substructure derived from the node's value (for a suitably general notion of "tree" - in the list case, the "tree" is associative).
In your list example what is happening is that each number (leaf) is being replaced by the new list that results when g is applied to that number. The overall structure of the original list still can be seen if you know what you're looking for; the results of g are still there in order, they've just been smashed together so you can't tell where one ends and the next begins unless you already know.
A more enlightening point of view may be to consider fmap and join instead of >>=. Together with return, either way gives an equivalent definition of a monad. In the fmap/join view, though, what is happening here is more clear. Continuing with your list example, first g is fmapped over the list yielding [[1],[],[3]]. Then that list is joined, which for list is just concat.

Just because the monad pattern includes some particular instances that allow shape changes doesn't mean every instance can have shape changes. For example, there is only one "shape" available in the Identity monad:
newtype Identity a = Identity a
instance Monad Identity where
return = Identity
Identity a >>= f = f a
In fact, it's not clear to me that very many monads have meaningful "shape"s: for example, what does shape mean in the State, Reader, Writer, ST, STM, or IO monads?

The key combinator for monads is (>>=). Knowing that it composes two monadic values and reading its type signature, the power of monads becomes more apparent:
(>>=) :: Monad m => m a -> (a -> m b) -> m b
The future action can depend entirely on the outcome of the first action, because it is a function of its result. This power comes at a price though: Functions in Haskell are entirely opaque, so there is no way for you to get any information about a composed action without actually running it. As a side note, this is where arrows come in.

A function with a signature like h indeed cannot do many interesting things beyond performing some arithmetic on its argument. So, you have the correct intuition there.
However, it might help to look at commonly used libraries for functions with similar signatures. You'll find that the most generic ones, as you'd expect, perform generic monad operations like return, liftM, or join. Also, when you use liftM or fmap to lift an ordinary function into a monadic function, you typically wind up with a similarly generic signature, and this is quite convenient for integrating pure functions with monadic code.
In order to use the structure that a particular monad offers, you inevitably need to use some knowledge about the specific monad you're in to build new and interesting computations in that monad. Consider the state monad, (s -> (a, s)). Without knowing that type, we can't write get = \s -> (s, s), but without being able to access the state, there's not much point to being in the monad.

The simplest type of a function satisfying the requirement I can imagine is this:
enigma :: Monad m => m () -> m ()
One can implement it in one of the following ways:
enigma1 m = m -- not changing the shape
enigma2 _ = return () -- changing the shape
This was a very simple change -- enigma2 just discards the shape and replaces it with the trivial one. Another kind of generic change is combining two shapes together:
foo :: Monad m => m () -> m () -> m ()
foo a b = a >> b
The result of foo can have shape different from both a and b.
A third obvious change of shape, requiring the full power of the monad, is a
join :: Monad m => m (m a) -> m a
join x = x >>= id
The shape of join x is usually not the same as of x itself.
Combining those primitive changes of shape, one can derive non-trivial things like sequence, foldM and alike.

Does
h :: (Monad m, Num a) => a -> m a
h 0 = fail "Failed."
h a = return a
suit your needs? For example,
> [0,1,2,3] >>= h
[1,2,3]

This isn't a full answer, but I have a few things to say about your question that don't really fit into a comment.
Firstly, Monad and Functor are typeclasses; they classify types. So it is odd to say that "a monad can change shape; a functor cannot." I believe what you are trying to talk about is a "Monadic value" or perhaps a "monadic action": a value whose type is m a for some Monad m of kind * -> * and some other type of kind *. I'm not entirely sure what to call Functor f :: f a, I suppose I'd call it a "value in a functor", though that's not the best description of, say, IO String (IO is a functor).
Secondly, note that all Monads are necessarily Functors (fmap = liftM), so I'd say the difference you observe is between fmap and >>=, or even between f and g, rather than between Monad and Functor.

Why monads? How does it resolve side-effects?

I am learning Haskell and trying to understand Monads. I have two questions:
From what I understand, Monad is just another typeclass that declares ways to interact with data inside "containers", including Maybe, List, and IO. It seems clever and clean to implement these 3 things with one concept, but really, the point is so there can be clean error handling in a chain of functions, containers, and side effects. Is this a correct interpretation?
How exactly is the problem of side-effects solved? With this concept of containers, the language essentially says anything inside the containers is non-deterministic (such as i/o). Because lists and IOs are both containers, lists are equivalence-classed with IO, even though values inside lists seem pretty deterministic to me. So what is deterministic and what has side-effects? I can't wrap my head around the idea that a basic value is deterministic, until you stick it in a container (which is no special than the same value with some other values next to it, e.g. Nothing) and it can now be random.
Can someone explain how, intuitively, Haskell gets away with changing state with inputs and output? I'm not seeing the magic here.

The point is so there can be clean error handling in a chain of functions, containers, and side effects. Is this a correct interpretation?
Not really. You've mentioned a lot of concepts that people cite when trying to explain monads, including side effects, error handling and non-determinism, but it sounds like you've gotten the incorrect sense that all of these concepts apply to all monads. But there's one concept you mentioned that does: chaining.
There are two different flavors of this, so I'll explain it two different ways: one without side effects, and one with side effects.
No Side Effects:
Take the following example:
addM :: (Monad m, Num a) => m a -> m a -> m a
addM ma mb = do
a <- ma
b <- mb
return (a + b)
This function adds two numbers, with the twist that they are wrapped in some monad. Which monad? Doesn't matter! In all cases, that special do syntax de-sugars to the following:
addM ma mb =
ma >>= \a ->
mb >>= \b ->
return (a + b)
... or, with operator precedence made explicit:
ma >>= (\a -> mb >>= (\b -> return (a + b)))
Now you can really see that this is a chain of little functions, all composed together, and its behavior will depend on how >>= and return are defined for each monad. If you're familiar with polymorphism in object-oriented languages, this is essentially the same thing: one common interface with multiple implementations. It's slightly more mind-bending than your average OOP interface, since the interface represents a computation policy rather than, say, an animal or a shape or something.
Okay, let's see some examples of how addM behaves across different monads. The Identity monad is a decent place to start, since its definition is trivial:
instance Monad Identity where
return a = Identity a -- create an Identity value
(Identity a) >>= f = f a -- apply f to a
So what happens when we say:
addM (Identity 1) (Identity 2)
Expanding this, step by step:
(Identity 1) >>= (\a -> (Identity 2) >>= (\b -> return (a + b)))
(\a -> (Identity 2) >>= (\b -> return (a + b)) 1
(Identity 2) >>= (\b -> return (1 + b))
(\b -> return (1 + b)) 2
return (1 + 2)
Identity 3
Great. Now, since you mentioned clean error handling, let's look at the Maybe monad. Its definition is only slightly trickier than Identity:
instance Monad Maybe where
return a = Just a -- same as Identity monad!
(Just a) >>= f = f a -- same as Identity monad again!
Nothing >>= _ = Nothing -- the only real difference from Identity
So you can imagine that if we say addM (Just 1) (Just 2) we'll get Just 3. But for grins, let's expand addM Nothing (Just 1) instead:
Nothing >>= (\a -> (Just 1) >>= (\b -> return (a + b)))
Nothing
Or the other way around, addM (Just 1) Nothing:
(Just 1) >>= (\a -> Nothing >>= (\b -> return (a + b)))
(\a -> Nothing >>= (\b -> return (a + b)) 1
Nothing >>= (\b -> return (1 + b))
Nothing
So the Maybe monad's definition of >>= was tweaked to account for failure. When a function is applied to a Maybe value using >>=, you get what you'd expect.
Okay, so you mentioned non-determinism. Yes, the list monad can be thought of as modeling non-determinism in a sense... It's a little weird, but think of the list as representing alternative possible values: [1, 2, 3] is not a collection, it's a single non-deterministic number that could be either one, two or three. That sounds dumb, but it starts to make some sense when you think about how >>= is defined for lists: it applies the given function to each possible value. So addM [1, 2] [3, 4] is actually going to compute all possible sums of those two non-deterministic values: [4, 5, 5, 6].
Okay, now to address your second question...
Side Effects:
Let's say you apply addM to two values in the IO monad, like:
addM (return 1 :: IO Int) (return 2 :: IO Int)
You don't get anything special, just 3 in the IO monad. addM does not read or write any mutable state, so it's kind of no fun. Same goes for the State or ST monads. No fun. So let's use a different function:
fireTheMissiles :: IO Int -- returns the number of casualties
Clearly the world will be different each time missiles are fired. Clearly. Now let's say you're trying to write some totally innocuous, side effect free, non-missile-firing code. Perhaps you're trying once again to add two numbers, but this time without any monads flying around:
add :: Num a => a -> a -> a
add a b = a + b
and all of a sudden your hand slips, and you accidentally typo:
add a b = a + b + fireTheMissiles
An honest mistake, really. The keys were so close together. Fortunately, because fireTheMissiles was of type IO Int rather than simply Int, the compiler is able to avert disaster.
Okay, totally contrived example, but the point is that in the case of IO, ST and friends, the type system keeps effects isolated to some specific context. It doesn't magically eliminate side effects, making code referentially transparent that shouldn't be, but it does make it clear at compile time what scope the effects are limited to.
So getting back to the original point: what does this have to do with chaining or composition of functions? Well, in this case, it's just a handy way of expressing a sequence of effects:
fireTheMissilesTwice :: IO ()
fireTheMissilesTwice = do
a <- fireTheMissiles
print a
b <- fireTheMissiles
print b
Summary:
A monad represents some policy for chaining computations. Identity's policy is pure function composition, Maybe's policy is function composition with failure propogation, IO's policy is impure function composition and so on.

Let me start by pointing at the excellent "You could have invented monads" article. It illustrates how the Monad structure can naturally manifest while you are writing programs. But the tutorial doesn't mention IO, so I will have a stab here at extending the approach.
Let us start with what you probably have already seen - the container monad. Let's say we have:
f, g :: Int -> [Int]
One way of looking at this is that it gives us a number of possible outputs for every possible input. What if we want all possible outputs for the composition of both functions? Giving all possibilities we could get by applying the functions one after the other?
Well, there's a function for that:
fg x = concatMap g $ f x
If we put this more general, we get
fg x = f x >>= g
xs >>= f = concatMap f xs
return x = [x]
Why would we want to wrap it like this? Well, writing our programs primarily using >>= and return gives us some nice properties - for example, we can be sure that it's relatively hard to "forget" solutions. We'd explicitly have to reintroduce it, say by adding another function skip. And also we now have a monad and can use all combinators from the monad library!
Now, let us jump to your trickier example. Let's say the two functions are "side-effecting". That's not non-deterministic, it just means that in theory the whole world is both their input (as it can influence them) as well as their output (as the function can influence it). So we get something like:
f, g :: Int -> RealWorld# -> (Int, RealWorld#)
If we now want f to get the world that g left behind, we'd write:
fg x rw = let (y, rw') = f x rw
(r, rw'') = g y rw'
in (r, rw'')
Or generalized:
fg x = f x >>= g
x >>= f = \rw -> let (y, rw') = x rw
(r, rw'') = f y rw'
in (r, rw'')
return x = \rw -> (x, rw)
Now if the user can only use >>=, return and a few pre-defined IO values we get a nice property again: The user will never actually see the RealWorld# getting passed around! And that is a very good thing, as you aren't really interested in the details of where getLine gets its data from. And again we get all the nice high-level functions from the monad libraries.
So the important things to take away:
The monad captures common patterns in your code, like "always pass all elements of container A to container B" or "pass this real-world-tag through". Often, once you realize that there is a monad in your program, complicated things become simply applications of the right monad combinator.
The monad allows you to completely hide the implementation from the user. It is an excellent encapsulation mechanism, be it for your own internal state or for how IO manages to squeeze non-purity into a pure program in a relatively safe way.
Appendix
In case someone is still scratching his head over RealWorld# as much as I did when I started: There's obviously more magic going on after all the monad abstraction has been removed. Then the compiler will make use of the fact that there can only ever be one "real world". That's good news and bad news:
It follows that the compiler must guarantuee execution ordering between functions (which is what we were after!)
But it also means that actually passing the real world isn't necessary as there is only one we could possibly mean: The one that is current when the function gets executed!
Bottom line is that once execution order is fixed, RealWorld# simply gets optimized out. Therefore programs using the IO monad actually have zero runtime overhead. Also note that using RealWorld# is obviously only one possible way to put IO - but it happens to be the one GHC uses internally. The good thing about monads is that, again, the user really doesn't need to know.

You could see a given monad m as a set/family (or realm, domain, etc.) of actions (think of a C statement). The monad m defines the kind of (side-)effects that its actions may have:
with [] you can define actions which can fork their executions in different "independent parallel worlds";
with Either Foo you can define actions which can fail with errors of type Foo;
with IO you can define actions which can have side-effects on the "outside world" (access files, network, launch processes, do a HTTP GET ...);
you can have a monad whose effect is "randomness" (see package MonadRandom);
you can define a monad whose actions can make a move in a game (say chess, Go…) and receive move from an opponent but are not able to write to your filesystem or anything else.
Summary
If m is a monad, m a is an action which produces a result/output of type a.
The >> and >>= operators are used to create more complex actions out of simpler ones:
a >> b is a macro-action which does action a and then action b;
a >> a does action a and then action a again;
with >>= the second action can depend on the output of the first one.
The exact meaning of what an action is and what doing an action and then another one is depends on the monad: each monad defines an imperative sublanguage with some features/effects.
Simple sequencing (>>)
Let's say with have a given monad M and some actions incrementCounter, decrementCounter, readCounter:
instance M Monad where ...
-- Modify the counter and do not produce any result:
incrementCounter :: M ()
decrementCounter :: M ()
-- Get the current value of the counter
readCounter :: M Integer
Now we would like to do something interesting with those actions. The first thing we would like to do with those actions is to sequence them. As in say C, we would like to be able to do:
// This is C:
counter++;
counter++;
We define an "sequencing operator" >>. Using this operator we can write:
incrementCounter >> incrementCounter
What is the type of "incrementCounter >> incrementCounter"?
It is an action made of two smaller actions like in C you can write composed-statements from atomic statements :
// This is a macro statement made of several statements
{
counter++;
counter++;
}
// and we can use it anywhere we may use a statement:
if (condition) {
counter++;
counter++;
}
it can have the same kind of effects as its subactions;
it does not produce any output/result.
So we would like incrementCounter >> incrementCounter to be of type M (): an (macro-)action with the same kind of possible effects but without any output.
More generally, given two actions:
action1 :: M a
action2 :: M b
we define a a >> b as the macro-action which is obtained by doing (whatever that means in our domain of action) a then b and produces as output the result of the execution of the second action. The type of >> is:
(>>) :: M a -> M b -> M b
or more generally:
(>>) :: (Monad m) => m a -> m b -> m b
We can define bigger sequence of actions from simpler ones:
action1 >> action2 >> action3 >> action4
Input and outputs (>>=)
We would like to be able to increment by something else that 1 at a time:
incrementBy 5
We want to provide some input in our actions, in order to do this we define a function incrementBy taking an Int and producing an action:
incrementBy :: Int -> M ()
Now we can write things like:
incrementCounter >> readCounter >> incrementBy 5
But we have no way to feed the output of readCounter into incrementBy. In order to do this, a slightly more powerful version of our sequencing operator is needed. The >>= operator can feed the output of a given action as input to the next action. We can write:
readCounter >>= incrementBy
It is an action which executes the readCounter action, feeds its output in the incrementBy function and then execute the resulting action.
The type of >>= is:
(>>=) :: Monad m => m a -> (a -> m b) -> m b
A (partial) example
Let's say I have a Prompt monad which can only display informations (text) to the user and ask informations to the user:
-- We don't have access to the internal structure of the Prompt monad
module Prompt (Prompt(), echo, prompt) where
-- Opaque
data Prompt a = ...
instance Monad Prompt where ...
-- Display a line to the CLI:
echo :: String -> Prompt ()
-- Ask a question to the user:
prompt :: String -> Prompt String
Let's try to define a promptBoolean message actions which asks for a question and produces a boolean value.
We use the prompt (message ++ "[y/n]") action and feed its output to a function f:
f "y" should be an action which does nothing but produce True as output;
f "n" should be an action which does nothing but produce False as output;
anything else should restart the action (do the action again);
promptBoolean would look like this:
-- Incomplete version, some bits are missing:
promptBoolean :: String -> M Boolean
promptBoolean message = prompt (message ++ "[y/n]") >>= f
where f result = if result == "y"
then ???? -- We need here an action which does nothing but produce `True` as output
else if result=="n"
then ???? -- We need here an action which does nothing but produce `False` as output
else echo "Input not recognised, try again." >> promptBoolean
Producing a value without effect (return)
In order to fill the missing bits in our promptBoolean function, we need a way to represent dummy actions without any side effect but which only outputs a given value:
-- "return 5" is an action which does nothing but outputs 5
return :: (Monad m) => a -> m a
and we can now write out promptBoolean function:
promptBoolean :: String -> Prompt Boolean
promptBoolean message :: prompt (message ++ "[y/n]") >>= f
where f result = if result=="y"
then return True
else if result=="n"
then return False
else echo "Input not recognised, try again." >> promptBoolean message
By composing those two simple actions (promptBoolean, echo) we can define any kind of dialogue between the user and your program (the actions of the program are deterministic as our monad does not have a "randomness effect").
promptInt :: String -> M Int
promptInt = ... -- similar
-- Classic "guess a number game/dialogue"
guess :: Int -> m()
guess n = promptInt "Guess:" m -> f
where f m = if m == n
then echo "Found"
else (if m > n
then echo "Too big"
then echo "Too small") >> guess n
The operations of a monad
A Monad is a set of actions which can be composed with the return and >>= operators:
>>= for action composition;
return for producing a value without any (side-)effect.
These two operators are the minimal operators needed to define a Monad.
In Haskell, the >> operator is needed as well but it can in fact be derived from >>=:
(>>): Monad m => m a -> m b -> m b
a >> b = a >>= f
where f x = b
In Haskell, an extra fail operator is need as well but this is really a hack (and it might be removed from Monad in the future).
This is the Haskell definition of a Monad:
class Monad m where
return :: m a
(>>=) :: m a -> (a -> m b) -> m b
(>>) :: m a -> m b -> m b -- can be derived from (>>=)
fail :: String -> m a -- mostly a hack
Actions are first-class
One great thing about monads is that actions are first-class. You can take them in a variable, you can define function which take actions as input and produce some other actions as output. For example, we can define a while operator:
-- while x y : does action y while action x output True
while :: (Monad m) => m Boolean -> m a -> m ()
while x y = x >>= f
where f True = y >> while x y
f False = return ()
Summary
A Monad is a set of actions in some domain. The monad/domain define the kind of "effects" which are possible. The >> and >>= operators represent sequencing of actions and monadic expression may be used to represent any kind of "imperative (sub)program" in your (functional) Haskell program.
The great things are that:
you can design your own Monad which supports the features and effects that you want
see Prompt for an example of a "dialogue only subprogram",
see Rand for an example of "sampling only subprogram";
you can write your own control structures (while, throw, catch or more exotic ones) as functions taking actions and composing them in some way to produce a bigger macro-actions.
MonadRandom
A good way of understanding monads, is the MonadRandom package. The Rand monad is made of actions whose output can be random (the effect is randomness). An action in this monad is some kind of random variable (or more exactly a sampling process):
-- Sample an Int from some distribution
action :: Rand Int
Using Rand to do some sampling/random algorithms is quite interesting because you have random variables as first class values:
-- Estimate mean by sampling nsamples times the random variable x
sampleMean :: Real a => Int -> m a -> m a
sampleMean n x = ...
In this setting, the sequence function from Prelude,
sequence :: Monad m => [m a] -> m [a]
becomes
sequence :: [Rand a] -> Rand [a]
It creates a random variable obtained by sampling independently from a list of random variables.

There are three main observations concerning the IO monad:
1) You can't get values out of it. Other types like Maybe might allow to extract values, but neither the monad class interface itself nor the IO data type allow it.
2) "Inside" IO is not only the real value but also that "RealWorld" thing. This dummy value is used to enforce the chaining of actions by the type system: If you have two independent calculations, the use of >>= makes the second calculation dependent on the first.
3) Assume a non-deterministic thing like random :: () -> Int, which isn't allowed in Haskell. If you change the signature to random :: Blubb -> (Blubb, Int), it is allowed, if you make sure that nobody ever can use a Blubb twice: Because in that case all inputs are "different", it is no problem that the outputs are different as well.
Now we can use the fact 1): Nobody can get something out of IO, so we can use the RealWord dummy hidden in IO to serve as a Blubb. There is only one IOin the whole application (the one we get from main), and it takes care of proper sequentiation, as we have seen in 2). Problem solved.

One thing that often helps me to understand the nature of something is to examine it in the most trivial way possible. That way, I'm not getting distracted by potentially unrelated concepts. With that in mind, I think it may be helpful to understand the nature of the Identity Monad, as it's the most trivial implementation of a Monad possible (I think).
What is interesting about the Identity Monad? I think it is that it allows me to express the idea of evaluating expressions in a context defined by other expressions. And to me, that is the essence of every Monad I've encountered (so far).
If you already had a lot of exposure to 'mainstream' programming languages before learning Haskell (like I did), then this doesn't seem very interesting at all. After all, in a mainstream programming language, statements are executed in sequence, one after the other (excepting control-flow constructs, of course). And naturally, we can assume that every statement is evaluated in the context of all previously executed statements and that those previously executed statements may alter the environment and the behavior of the currently executing statement.
All of that is pretty much a foreign concept in a functional, lazy language like Haskell. The order in which computations are evaluated in Haskell is well-defined, but sometimes hard to predict, and even harder to control. And for many kinds of problems, that's just fine. But other sorts of problems (e.g. IO) are hard to solve without some convenient way to establish an implicit order and context between the computations in your program.
As far as side-effects go, specifically, often they can be transformed (via a Monad) in to simple state-passing, which is perfectly legal in a pure functional language. Some Monads don't seem to be of that nature, however. Monads such as the IO Monad or the ST monad literally perform side-effecting actions. There are many ways to think about this, but one way that I think about it is that just because my computations must exist in a world without side-effects, the Monad may not. As such, the Monad is free to establish a context for my computation to execute that is based on side-effects defined by other computations.
Finally, I must disclaim that I am definitely not a Haskell expert. As such, please understand that everything I've said is pretty much my own thoughts on this subject and I may very well disown them later when I understand Monads more fully.

the point is so there can be clean error handling in a chain of functions, containers, and side effects
More or less.
how exactly is the problem of side-effects solved?
A value in the I/O monad, i.e. one of type IO a, should be interpreted as a program. p >> q on IO values can then be interpreted as the operator that combines two programs into one that first executes p, then q. The other monad operators have similar interpretations. By assigning a program to the name main, you declare to the compiler that that is the program that has to be executed by its output object code.
As for the list monad, it's not really related to the I/O monad except in a very abstract mathematical sense. The IO monad gives deterministic computation with side effects, while the list monad gives non-deterministic (but not random!) backtracking search, somewhat similar to Prolog's modus operandi.

With this concept of containers, the language essentially says anything inside the containers is non-deterministic
No. Haskell is deterministic. If you ask for integer addition 2+2 you will always get 4.
"Nondeterministic" is only a metaphor, a way of thinking. Everything is deterministic under the hood. If you have this code:
do x <- [4,5]
y <- [0,1]
return (x+y)
it is roughly equivalent to Python code
l = []
for x in [4,5]:
for y in [0,1]:
l.append(x+y)
You see nondeterminism here? No, it's deterministic construction of a list. Run it twice, you'll get the same numbers in the same order.
You can describe it this way: Choose arbitrary x from [4,5]. Choose arbitrary y from [0,1]. Return x+y. Collect all possible results.
That way seems to involve nondeterminism, but it's only a nested loop (list comprehension). There is no "real" nondeterminism here, it's simulated by checking all possibilities. Nondeterminism is an illusion. The code only appears to be nondeterministic.
This code using State monad:
do put 0
x <- get
put (x+2)
y <- get
return (y+3)
gives 5 and seems to involve changing state. As with lists it's an illusion. There are no "variables" that change (as in imperative languages). Everything is nonmutable under the hood.
You can describe the code this way: put 0 to a variable. Read the value of a variable to x. Put (x+2) to the variable. Read the variable to y, and return y+3.
That way seems to involve state, but it's only composing functions passing additional parameter. There is no "real" mutability here, it's simulated by composition. Mutability is an illusion. The code only appears to be using it.
Haskell does it this way: you've got functions
a -> s -> (b,s)
This function takes and old value of state and returns new value. It does not involve mutability or change variables. It's a function in mathematical sense.
For example the function "put" takes new value of state, ignores current state and returns new state:
put x _ = ((), x)
Just like you can compose two normal functions
a -> b
b -> c
into
a -> c
using (.) operator you can compose "state" transformers
a -> s -> (b,s)
b -> s -> (c,s)
into a single function
a -> s -> (c,s)
Try writing the composition operator yourself. This is what really happens, there are no "side effects" only passing arguments to functions.

From what I understand, Monad is just another typeclass that declares ways to interact with data [...]
...providing an interface common to all those types which have an instance. This can then be used to provide generic definitions which work across all monadic types.
It seems clever and clean to implement these 3 things with one concept [...]
...the only three things that are implemented are the instances for those three types (list, Maybe and IO) - the types themselves are defined independently elsewhere.
[...] but really, the point is so there can be clean error handling in a chain of functions, containers, and side effects.
Not just error handling e.g. consider ST - without the monadic interface, you would have to pass the encapsulated-state directly and correctly...a tiresome task.
How exactly is the problem of side-effects solved?
Short answer: Haskell solves manages them by using types to indicate their presence.
Can someone explain how, intuitively, Haskell gets away with changing state with inputs and output?
"Intuitively"...like what's available over here? Let's try a simple direct comparison instead:
From How to Declare an Imperative by Philip Wadler:
(* page 26 *)
type 'a io = unit -> 'a
infix >>=
val >>= : 'a io * ('a -> 'b io) -> 'b io
fun m >>= k = fn () => let
val x = m ()
val y = k x ()
in
y
end
val return : 'a -> 'a io
fun return x = fn () => x
val putc : char -> unit io
fun putc c = fn () => putcML c
val getc : char io
val getc = fn () => getcML ()
fun getcML () =
valOf(TextIO.input1(TextIO.stdIn))
(* page 25 *)
fun putcML c =
TextIO.output1(TextIO.stdOut,c)
Based on these two answers of mine, this is my Haskell translation:
type IO a = OI -> a
(>>=) :: IO a -> (a -> IO b) -> IO b
m >>= k = \ u -> let !(u1, u2) = part u in
let !x = m u1 in
let !y = k x u2 in
y
return :: a -> IO a
return x = \ u -> let !_ = part u in x
putc :: Char -> IO ()
putc c = \ u -> putcOI c u
getc :: IO Char
getc = \ u -> getcOI u
-- primitives
data OI
partOI :: OI -> (OI, OI)
putcOI :: Char -> OI -> ()
getcOI :: OI -> Char
Now remember that short answer about side-effects?
Haskell manages them by using types to indicate their presence.
Data.Char.chr :: Int -> Char -- no side effects
getChar :: IO Char -- side effects at
{- :: OI -> Char -} -- work: beware!

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string