Correct terminology for continuations

Correct terminology for continuations - haskell

I've been poking around continuations recently, and I got confused about the correct terminology. Here Gabriel Gonzalez says:
A Haskell continuation has the following type:
newtype Cont r a = Cont { runCont :: (a -> r) -> r }
i.e. the whole (a -> r) -> r thing is the continuation (sans the wrapping)
The wikipedia article seems to support this idea by saying
a continuation is an abstract representation of the control state
of a computer program.
However, here the authors say that
Continuations are functions that represent "the remaining computation to do."
but that would only be the (a->r) part of the Cont type. And this is in line to what Eugene Ching says here:
a computation (a function) that requires a continuation function in order
to fully evaluate.
We’re going to be seeing this kind of function a lot, hence, we’ll give it
a more intuitive name. Let’s call them waiting functions.
I've seen another tutorial (Brian Beckman and Erik Meijer) where they call the whole thing (the waiting function) the observable and the function which is required for it to complete the observer.
What is the the continuation, the (a->r)->r thingy or just the (a->r) thing (sans the wrapping)?
Is the wording observable/observer about correct?
Are the citations above really contradictory, is there a common truth?

What is the the continuation, the (a->r)->r thingy or just the (a->r) thing (sans the wrapping)?
I would say that the a -> r bit is the continuation and the (a -> r) -> r is "in continuation passing style" or "is the type of the continuation monad.
I am going to go off on a long digression on the history of continuations which is not really relivant to the question...so be warned.
It is my belief that the first published paper on continuations was "Continuations: A Mathematical Semantics for Handling Full Jumps" by Strachey and Wadsworth (although the concept was already folklore). The idea of that paper is I think a pretty important one. Early semantics for imperative programs attempted to model commands as state transformer functions. For example, consider the simple imperative language given by the following BNF
Command := set <Expression> to <Expression>
| skip
| <Command> ; <Command>
Expression := !<Expression>
| <Number>
| <Expression> + <Expression>
here we use a expressions as pointers. The simplest denotational function interprets the state as functions from natural numbers to natural numbers:
S = N -> N
We can interpret expressions as functions from state to the natural numbers
E[[e : Expression]] : S -> N
and commands as state transducers.
C[[c : Command]] : S -> S
This denotational semantics can be spelled out rather simply:
E[[n : Number]](s) = n
E[[a + b]](s) = E[[a]](s) + E[[b]](s)
E[[!e]](s) = s(E[[e]](s))
C[[skip]](s) = s
C[[set a to b]](s) = \n -> if n = E[[a]](s) then E[[b]](s) else s(n)
C[[c_1;c_2]](s) = (C[[c_2] . C[[c_1]])(s)
As simple program in this language might look like
set 0 to 1;
set 1 to (!0) + 1
which would be interpreted as function that turns a state function s into a new function that is just like s except it maps 0 to 1 and 1 to 2.
This was all well and good, but how do you handle branching? Well, if you think about it alot you can probably come up with a way to handle if and loops that go an exact number of times...but what about general while loops?
Strachey and Wadsworth's showed us how to do it. First of all, they pointed out that these "State transducer functions" were pretty important and so decided to call them "command continuations" or just "continuations."
C = S -> S
From this they defined a new semantics, which we will provisionally define this way
C'[[c : Command]] : C -> C
C'[[c]](cont) = cont . C[[c]]
What is going on here? Well, observe that
C'[[c_1]](C[[c_2]]) = C[[c_1 ; c_2]]
and further
C'[[c_1]](C'[[c_2]](cont) = C'[[c_1 ; c_2]](cont)
Instead of doing it this way, we can inline the definition
C'[[skip]](cont) = cont
C'[[set a to b]](cont) = cont . \s -> \n -> if n = E[[a]](s) then E[[b]](s) else s(n)
C'[[c_1 ; c_2]](cont) = C'[[c_1]](C'[[c_2]](cont)
What has this bought us? Well, a way to interpret while, thats what!
Command := ... | while <Expression> do <Command> end
C'[[while e do c end]](cont) =
let loop = \s -> if E[[e]](s) = 0 then C'[[c]](loop)(s) else cont(s)
in loop
or, using a fixpoint combinator
C'[[while e do c end]](cont)
= Y (\f -> \s -> if E[[e]](s) = 0 then C'[[c]](f)(s) else cont(s))
Anyways...that is history and not particularly important...except in so far as it showed how to interpret programs mathematically, and set the language of "continuation."
Also, the approach to denotational semantics of "1. define a new semantic function in terms of the old 2. inline 3. profit" works surprisingly often. For example, it is often useful to have your semantic domain form a lattice (think, abstract interpretation). How do you get that? Well, one option is to take the powerset of the domain, and inject into this by interpreting your functions as singletons. If you inline this powerset construction you get something that can either model non-determinism or, in the case of abstract interpretation, various amounts of information about a program other than exact certainty as to what it does.
Various other work followed. Here I skip over many greats such as the lambda papers... But, perhaps the most notable was Griffin's landmark paper "A Formulae-as-Types Notion of Control" which showed a connection between continuation passing style and classical logic. Here the connection between "continuation" and "Evaluation context" is emphasized
That is, E represents the rest of the computation that remains to be done after N is evaluated. The context E is called the continuation (or control context) of N at this point in the evalu- ation sequence. The notation of evaluation contexts allows, as we shall see below, a concise specification of the operational semantics of operators that ma- nipulate continuations (indeed, this was its intended use [3, 2, 4, 1]).
making clear that the "continuation" is "just the a -> r bit"
This all looks at things from the point of view of semantics and sees continuations as functions. The thing is, continuations as functions give you more power than you get with something like scheme's callCC. So, another perspective on continuations is that they are variables in the program which internalize the call stack. Parigot had the idea to make continuation variables a seperate syntactic category leading to the elegant lambda-mu calculus in "λμ-Calculus: An algorithmic interpretation of classical natural deduction."
Is the wording observable/observer about correct?
I think it is in so far as it is what Eric Mejier uses. It is non standard terminology in academic PLs.
Are the citations above really contradictory, is there a common truth?
Let us look at the citations again
a continuation is an abstract representation of the control state of a computer program.
In my interpretation (which I think is pretty standard) a continuation models what a program should do next. I think wikipedia is consistent with this.
A Haskell continuation has the following type:
This is a bit odd. But, note that later in the post Gabriel uses language which is more standard and supports my use of language.
That means that if we have a function with two continuations:
(a1 -> r) -> ((a2 -> r) -> r)

Fueled by reading about continuations via Andrzej Filinski's Declarative Continuations and Categorical Duality I adopt the following terminology and understanding.
A continuation on values of a is a "black hole which accepts values of a". You can see it as a black box with one operation—you feed it a value a and then the world ends. Locally at least.
Now let's assume we're in Haskell and I demand that you construct for me a function forall r . (a -> r) -> r. Let's say, for now, that a ~ Int and it'll look like
f :: forall r . (Int -> r) -> r
f cont = _
where the type hole has a context like
r :: Type
cont :: Int -> r
-----------------
_ :: r
Clearly, the only way we can comply with these demands is to pass an Int into the cont function and return it after which no further computation can happen. This models the idea of "feed an Int to the continuation and then the world ends".
So, I would call the function (a -> r) the continuation so long as it's in a context with a fixed-but-unknown r and a demand to return that r. For instance, the following is not so much of a continuation
forall r . (a -> r) -> (r, a)
as we're clearly allowed to pass back out more information from our failing universe than the continuation alone allows.
On "Observable"
I'm personally not a fan of the "observer"/"observable" terminology. In that terminology we might write
newtype Observable a = O { observe :: forall r . (a -> r) -> r }
so that we have observe :: Observable a -> (a -> r) -> r which ensures that exactly one a will be passed to an "observer" a -> r "observing" it. This gives a very operational view to the type above while Cont or even the scarily named Yoneda Identity explains much more declaratively what the type actually is.
I think the point is to somehow hide the complexity of Cont behind metaphor to make it less scary for "the average programmer", but that just adds an extra layer of metaphor for behavior to leak out of. Cont and Yoneda Identity explain exactly what the type is without dressing it up.

I suggest to recall the call convention for C on x86 platforms, because of its use of the stack and registers to pass the arguments around. This will turn out very useful to understand the abstraction.
Suppose, function f calls function g and passes 0 to it. This will look like so:
mov eax, 0
call g -- now eax is the first argument,
-- and the stack has the address of return point, f'
g: -- here goes g that uses eax to compute the return value
mov eax,1 -- which by calling convention is placed in eax
ret -- get the return point, f', off the stack, and jump there
f': ...
You see, placing the return point f' on the stack is the same as passing a function pointer as one of the arguments, and then the return is the same as calling the given function and pass it a value. So from g's point of view the return point to f looks like function of one argument, f' :: a -> r. As you understand, the state of the stack completely captures the state of the computation f was performing, and which needed a from g in order to proceed.
At the same time, at the point g is called it looks like a function that accepts a function of one argument (we place the pointer of that function on the stack), which will eventually compute the value of type r that the code from f': onwards was meant to compute, so the type becomes g :: (a->r)->r.
Since f' is given a value of type a from "somewhere", f' can be seen as the observer of g - which is, conversely, the observable.
This is only intended to give a basic idea and tie somehow to the world you probably already know. The magic of continuations permits to do more tricks than just convert "plain" computation into computation with continuations.

When we refer to a continuation, we mean the part that let us continue calculating a result.
An operation in the Continuation Monad is analogous to a function that is incomplete and so it is waiting on another function to complete it. Although, the Continuation Monad is itself a valid construct that can be used to complete another Continuation Monad, that is what the binding operator (>>=) for the Cont Monad does.
When writing code that involves callCC or Call with Current Continuation, you are passing the current Cont Monad into another Cont Monad so that the second one can make use of it. For example, it might prematurely end execution by calling the first Cont Monad, and from there the cycle can either repeat or diverge into a different Continuation Monad.
The part that is the continuation is different from which perspective you use. In my personal opinion, the best way to describe a continuation is in relation to another construct.
So if we return to our example of two Cont Monads interacting, from the perspective of the first Monad the continuation is the (a -> r) -> r (because that is the unwrapped type of the first Monad) and from the perspective of the second Monad the continuation is the (a -> r) (because that is the unwrapped type of the first monad when a is substituted for (a -> r)).

Related

How does the do notation in Monad mean in Haskell

Consider the following code segment
import Control.Monad.State
type Stack = [Int]
pop :: State Stack Int
pop = state $ \(x:xs) -> (x, xs)
push :: Int -> State Stack ()
push a = state $ \xs -> ((), a:xs)
stackManip :: State Stack Int
stackManip = do
push 3
a <- pop
pop
As we know, the do notation in Monad is the same as the >>= operator. We can rewrite this segment to:
push 3 >>= (\_ ->
pop >>= (\a ->
pop))
the final value about this expression is not related to push 3 and the first pop, no matter whatever the input, it just returns the pop, so it won't firstly put the value 3 into the stack and pop it, why would this happen?
Thanks for your reply. I have added the missing code (implementation for Stack, push and pop) above, and I think I have figured out how it works. The key to understanding this code is understanding the implementation of State s monad:
instance Monad (State s) where
return x = State $ \s -> (x, s)
(State h) >>= f = State $ \s -> let (a, newState) = h s
(State g) = f a
in g newState
The whole implementation of >>= do is passing a state to the function h, computing its new state and pass the new state to function g implied in f, getting the newer state. So actually the state has changed implicitly.

Because of the Monad Laws, your code is equivalent to
stackManip :: State Stack Int
stackManip = do
push 3
a <- pop -- a == 3
r <- pop
return r
So you push 3, pop it, ignore the popped 3, pop another value and return it.
Haskell is just another programming language. You the programmer are in control. Whether the compiler skips the inconsequential instructions is up to it, and is unobservable anyway (except by examining the compiler-produced code, or measuring the heat of CPU as it performs your code, but it might be a bit hard to do in the server farm beyond the Polar Circle).

Monads in Haskell are sometimes referred to as "programmable semicolons". It's not a phrase I find particularly helpful in general, but it does capture the way that expressions written with Haskell's do notation have something of the flavour of imperative programs. And in particular that the way the "statements" in a do block get combined is dependent on the particular monad being used. Hence "programmable semicolons" - the way successive "statements" (which in many imperative languages are separated by semicolons) combine together can be changed ("programmed") by using a different monad.
And since do notation is really just syntactic sugar for building up an expression from others using the >>= operator, it's the implementation of >>= for each monad that determines what its "special behaviour" is.
For example, the Monad instance for Maybe allows one, as a rough description, to work with Maybe values as if they are actually values of the underlying type, while ensuring that if a non-value (that is, Nothing) occurs at any point, the computation short-circuits and Nothing will be the overall result.
For the list monad, every line actually gets "executed" multiple times (or none) - once for each element in the list.
And for values of the State s monad, these are essentially "state manipulation functions" of type s -> (a, s) - they take an initial state, and from that compute a new state as well as an output value of some type a. What the >>= implementation - the "semicolon" - does here* is simply ensure that, when one function f :: s -> (a, s) is followed by another g :: s -> (b, s), that the resulting function applies f to the initial state and then applies g to the state computed from f. It's basically just function composition, slightly modified so as to also allow us to access an "output value" whose type is not necessarily related to that of the state. And this allows one to list various state manipulation functions one after another in a do block and know that the state at each stage is exactly that computed by the previous lines put together. This in turn allows a very natural programming style where you give successive "commands" for manipulating the state, yet without actually doing destructive updates, or otherwise departing from the world of pure functions and immutable data.
*strictly speaking, this isn't >>= but >>, an operation which is derived from >>= but ignores the output value. You may have noticed that in the example I gave the a value output by f is totally ignored - but >>= allows that value to be inspected and to determine which computation to do next. In do notation, this means writing a <- f and then using a later. This is actually the key thing which distinguishes Monads from their less powerful, but still vital, cousins (notably Applicative functors).

Why do we need monads?

In my humble opinion the answers to the famous question "What is a monad?", especially the most voted ones, try to explain what is a monad without clearly explaining why monads are really necessary. Can they be explained as the solution to a problem?

Why do we need monads?
We want to program only using functions. ("functional programming (FP)" after all).
Then, we have a first big problem. This is a program:
f(x) = 2 * x
g(x,y) = x / y
How can we say what is to be executed first? How can we form an ordered sequence of functions (i.e. a program) using no more than functions?
Solution: compose functions. If you want first g and then f, just write f(g(x,y)). This way, "the program" is a function as well: main = f(g(x,y)). OK, but ...
More problems: some functions might fail (i.e. g(2,0), divide by 0). We have no "exceptions" in FP (an exception is not a function). How do we solve it?
Solution: Let's allow functions to return two kind of things: instead of having g : Real,Real -> Real (function from two reals into a real), let's allow g : Real,Real -> Real | Nothing (function from two reals into (real or nothing)).
But functions should (to be simpler) return only one thing.
Solution: let's create a new type of data to be returned, a "boxing type" that encloses maybe a real or be simply nothing. Hence, we can have g : Real,Real -> Maybe Real. OK, but ...
What happens now to f(g(x,y))? f is not ready to consume a Maybe Real. And, we don't want to change every function we could connect with g to consume a Maybe Real.
Solution: let's have a special function to "connect"/"compose"/"link" functions. That way, we can, behind the scenes, adapt the output of one function to feed the following one.
In our case: g >>= f (connect/compose g to f). We want >>= to get g's output, inspect it and, in case it is Nothing just don't call f and return Nothing; or on the contrary, extract the boxed Real and feed f with it. (This algorithm is just the implementation of >>= for the Maybe type). Also note that >>= must be written only once per "boxing type" (different box, different adapting algorithm).
Many other problems arise which can be solved using this same pattern: 1. Use a "box" to codify/store different meanings/values, and have functions like g that return those "boxed values". 2. Have a composer/linker g >>= f to help connecting g's output to f's input, so we don't have to change any f at all.
Remarkable problems that can be solved using this technique are:
having a global state that every function in the sequence of functions ("the program") can share: solution StateMonad.
We don't like "impure functions": functions that yield different output for same input. Therefore, let's mark those functions, making them to return a tagged/boxed value: IO monad.
Total happiness!

The answer is, of course, "We don't". As with all abstractions, it isn't necessary.
Haskell does not need a monad abstraction. It isn't necessary for performing IO in a pure language. The IO type takes care of that just fine by itself. The existing monadic desugaring of do blocks could be replaced with desugaring to bindIO, returnIO, and failIO as defined in the GHC.Base module. (It's not a documented module on hackage, so I'll have to point at its source for documentation.) So no, there's no need for the monad abstraction.
So if it's not needed, why does it exist? Because it was found that many patterns of computation form monadic structures. Abstraction of a structure allows for writing code that works across all instances of that structure. To put it more concisely - code reuse.
In functional languages, the most powerful tool found for code reuse has been composition of functions. The good old (.) :: (b -> c) -> (a -> b) -> (a -> c) operator is exceedingly powerful. It makes it easy to write tiny functions and glue them together with minimal syntactic or semantic overhead.
But there are cases when the types don't work out quite right. What do you do when you have foo :: (b -> Maybe c) and bar :: (a -> Maybe b)? foo . bar doesn't typecheck, because b and Maybe b aren't the same type.
But... it's almost right. You just want a bit of leeway. You want to be able to treat Maybe b as if it were basically b. It's a poor idea to just flat-out treat them as the same type, though. That's more or less the same thing as null pointers, which Tony Hoare famously called the billion-dollar mistake. So if you can't treat them as the same type, maybe you can find a way to extend the composition mechanism (.) provides.
In that case, it's important to really examine the theory underlying (.). Fortunately, someone has already done this for us. It turns out that the combination of (.) and id form a mathematical construct known as a category. But there are other ways to form categories. A Kleisli category, for instance, allows the objects being composed to be augmented a bit. A Kleisli category for Maybe would consist of (.) :: (b -> Maybe c) -> (a -> Maybe b) -> (a -> Maybe c) and id :: a -> Maybe a. That is, the objects in the category augment the (->) with a Maybe, so (a -> b) becomes (a -> Maybe b).
And suddenly, we've extended the power of composition to things that the traditional (.) operation doesn't work on. This is a source of new abstraction power. Kleisli categories work with more types than just Maybe. They work with every type that can assemble a proper category, obeying the category laws.
Left identity: id . f = f
Right identity: f . id = f
Associativity: f . (g . h) = (f . g) . h
As long as you can prove that your type obeys those three laws, you can turn it into a Kleisli category. And what's the big deal about that? Well, it turns out that monads are exactly the same thing as Kleisli categories. Monad's return is the same as Kleisli id. Monad's (>>=) isn't identical to Kleisli (.), but it turns out to be very easy to write each in terms of the other. And the category laws are the same as the monad laws, when you translate them across the difference between (>>=) and (.).
So why go through all this bother? Why have a Monad abstraction in the language? As I alluded to above, it enables code reuse. It even enables code reuse along two different dimensions.
The first dimension of code reuse comes directly from the presence of the abstraction. You can write code that works across all instances of the abstraction. There's the entire monad-loops package consisting of loops that work with any instance of Monad.
The second dimension is indirect, but it follows from the existence of composition. When composition is easy, it's natural to write code in small, reusable chunks. This is the same way having the (.) operator for functions encourages writing small, reusable functions.
So why does the abstraction exist? Because it's proven to be a tool that enables more composition in code, resulting in creating reusable code and encouraging the creation of more reusable code. Code reuse is one of the holy grails of programming. The monad abstraction exists because it moves us a little bit towards that holy grail.

Benjamin Pierce said in TAPL
A type system can be regarded as calculating a kind of static
approximation to the run-time behaviours of the terms in a program.
That's why a language equipped with a powerful type system is strictly more expressive, than a poorly typed language. You can think about monads in the same way.
As #Carl and sigfpe point, you can equip a datatype with all operations you want without resorting to monads, typeclasses or whatever other abstract stuff. However monads allow you not only to write reusable code, but also to abstract away all redundant detailes.
As an example, let's say we want to filter a list. The simplest way is to use the filter function: filter (> 3) [1..10], which equals [4,5,6,7,8,9,10].
A slightly more complicated version of filter, that also passes an accumulator from left to right, is
swap (x, y) = (y, x)
(.*) = (.) . (.)
filterAccum :: (a -> b -> (Bool, a)) -> a -> [b] -> [b]
filterAccum f a xs = [x | (x, True) <- zip xs $ snd $ mapAccumL (swap .* f) a xs]
To get all i, such that i <= 10, sum [1..i] > 4, sum [1..i] < 25, we can write
filterAccum (\a x -> let a' = a + x in (a' > 4 && a' < 25, a')) 0 [1..10]
which equals [3,4,5,6].
Or we can redefine the nub function, that removes duplicate elements from a list, in terms of filterAccum:
nub' = filterAccum (\a x -> (x `notElem` a, x:a)) []
nub' [1,2,4,5,4,3,1,8,9,4] equals [1,2,4,5,3,8,9]. A list is passed as an accumulator here. The code works, because it's possible to leave the list monad, so the whole computation stays pure (notElem doesn't use >>= actually, but it could). However it's not possible to safely leave the IO monad (i.e. you cannot execute an IO action and return a pure value — the value always will be wrapped in the IO monad). Another example is mutable arrays: after you have leaved the ST monad, where a mutable array live, you cannot update the array in constant time anymore. So we need a monadic filtering from the Control.Monad module:
filterM :: (Monad m) => (a -> m Bool) -> [a] -> m [a]
filterM _ [] = return []
filterM p (x:xs) = do
flg <- p x
ys <- filterM p xs
return (if flg then x:ys else ys)
filterM executes a monadic action for all elements from a list, yielding elements, for which the monadic action returns True.
A filtering example with an array:
nub' xs = runST $ do
arr <- newArray (1, 9) True :: ST s (STUArray s Int Bool)
let p i = readArray arr i <* writeArray arr i False
filterM p xs
main = print $ nub' [1,2,4,5,4,3,1,8,9,4]
prints [1,2,4,5,3,8,9] as expected.
And a version with the IO monad, which asks what elements to return:
main = filterM p [1,2,4,5] >>= print where
p i = putStrLn ("return " ++ show i ++ "?") *> readLn
E.g.
return 1? -- output
True -- input
return 2?
False
return 4?
False
return 5?
True
[1,5] -- output
And as a final illustration, filterAccum can be defined in terms of filterM:
filterAccum f a xs = evalState (filterM (state . flip f) xs) a
with the StateT monad, that is used under the hood, being just an ordinary datatype.
This example illustrates, that monads not only allow you to abstract computational context and write clean reusable code (due to the composability of monads, as #Carl explains), but also to treat user-defined datatypes and built-in primitives uniformly.

I don't think IO should be seen as a particularly outstanding monad, but it's certainly one of the more astounding ones for beginners, so I'll use it for my explanation.
Naïvely building an IO system for Haskell
The simplest conceivable IO system for a purely-functional language (and in fact the one Haskell started out with) is this:
main₀ :: String -> String
main₀ _ = "Hello World"
With lazyness, that simple signature is enough to actually build interactive terminal programs – very limited, though. Most frustrating is that we can only output text. What if we added some more exciting output possibilities?
data Output = TxtOutput String
| Beep Frequency
main₁ :: String -> [Output]
main₁ _ = [ TxtOutput "Hello World"
-- , Beep 440 -- for debugging
]
cute, but of course a much more realistic “alterative output” would be writing to a file. But then you'd also want some way to read from files. Any chance?
Well, when we take our main₁ program and simply pipe a file to the process (using operating system facilities), we have essentially implemented file-reading. If we could trigger that file-reading from within the Haskell language...
readFile :: Filepath -> (String -> [Output]) -> [Output]
This would use an “interactive program” String->[Output], feed it a string obtained from a file, and yield a non-interactive program that simply executes the given one.
There's one problem here: we don't really have a notion of when the file is read. The [Output] list sure gives a nice order to the outputs, but we don't get an order for when the inputs will be done.
Solution: make input-events also items in the list of things to do.
data IO₀ = TxtOut String
| TxtIn (String -> [Output])
| FileWrite FilePath String
| FileRead FilePath (String -> [Output])
| Beep Double
main₂ :: String -> [IO₀]
main₂ _ = [ FileRead "/dev/null" $ \_ ->
[TxtOutput "Hello World"]
]
Ok, now you may spot an imbalance: you can read a file and make output dependent on it, but you can't use the file contents to decide to e.g. also read another file. Obvious solution: make the result of the input-events also something of type IO, not just Output. That sure includes simple text output, but also allows reading additional files etc..
data IO₁ = TxtOut String
| TxtIn (String -> [IO₁])
| FileWrite FilePath String
| FileRead FilePath (String -> [IO₁])
| Beep Double
main₃ :: String -> [IO₁]
main₃ _ = [ TxtIn $ \_ ->
[TxtOut "Hello World"]
]
That would now actually allow you to express any file operation you might want in a program (though perhaps not with good performance), but it's somewhat overcomplicated:
main₃ yields a whole list of actions. Why don't we simply use the signature :: IO₁, which has this as a special case?
The lists don't really give a reliable overview of program flow anymore: most subsequent computations will only be “announced” as the result of some input operation. So we might as well ditch the list structure, and simply cons a “and then do” to each output operation.
data IO₂ = TxtOut String IO₂
| TxtIn (String -> IO₂)
| Terminate
main₄ :: IO₂
main₄ = TxtIn $ \_ ->
TxtOut "Hello World"
Terminate
Not too bad!
So what has all of this to do with monads?
In practice, you wouldn't want to use plain constructors to define all your programs. There would need to be a good couple of such fundamental constructors, yet for most higher-level stuff we would like to write a function with some nice high-level signature. It turns out most of these would look quite similar: accept some kind of meaningfully-typed value, and yield an IO action as the result.
getTime :: (UTCTime -> IO₂) -> IO₂
randomRIO :: Random r => (r,r) -> (r -> IO₂) -> IO₂
findFile :: RegEx -> (Maybe FilePath -> IO₂) -> IO₂
There's evidently a pattern here, and we'd better write it as
type IO₃ a = (a -> IO₂) -> IO₂ -- If this reminds you of continuation-passing
-- style, you're right.
getTime :: IO₃ UTCTime
randomRIO :: Random r => (r,r) -> IO₃ r
findFile :: RegEx -> IO₃ (Maybe FilePath)
Now that starts to look familiar, but we're still only dealing with thinly-disguised plain functions under the hood, and that's risky: each “value-action” has the responsibility of actually passing on the resulting action of any contained function (else the control flow of the entire program is easily disrupted by one ill-behaved action in the middle). We'd better make that requirement explicit. Well, it turns out those are the monad laws, though I'm not sure we can really formulate them without the standard bind/join operators.
At any rate, we've now reached a formulation of IO that has a proper monad instance:
data IO₄ a = TxtOut String (IO₄ a)
| TxtIn (String -> IO₄ a)
| TerminateWith a
txtOut :: String -> IO₄ ()
txtOut s = TxtOut s $ TerminateWith ()
txtIn :: IO₄ String
txtIn = TxtIn $ TerminateWith
instance Functor IO₄ where
fmap f (TerminateWith a) = TerminateWith $ f a
fmap f (TxtIn g) = TxtIn $ fmap f . g
fmap f (TxtOut s c) = TxtOut s $ fmap f c
instance Applicative IO₄ where
pure = TerminateWith
(<*>) = ap
instance Monad IO₄ where
TerminateWith x >>= f = f x
TxtOut s c >>= f = TxtOut s $ c >>= f
TxtIn g >>= f = TxtIn $ (>>=f) . g
Obviously this is not an efficient implementation of IO, but it's in principle usable.

Monads serve basically to compose functions together in a chain. Period.
Now the way they compose differs across the existing monads, thus resulting in different behaviors (e.g., to simulate mutable state in the state monad).
The confusion about monads is that being so general, i.e., a mechanism to compose functions, they can be used for many things, thus leading people to believe that monads are about state, about IO, etc, when they are only about "composing functions".
Now, one interesting thing about monads, is that the result of the composition is always of type "M a", that is, a value inside an envelope tagged with "M". This feature happens to be really nice to implement, for example, a clear separation between pure from impure code: declare all impure actions as functions of type "IO a" and provide no function, when defining the IO monad, to take out the "a" value from inside the "IO a". The result is that no function can be pure and at the same time take out a value from an "IO a", because there is no way to take such value while staying pure (the function must be inside the "IO" monad to use such value). (NOTE: well, nothing is perfect, so the "IO straitjacket" can be broken using "unsafePerformIO : IO a -> a" thus polluting what was supposed to be a pure function, but this should be used very sparingly and when you really know to be not introducing any impure code with side-effects.

Monads are just a convenient framework for solving a class of recurring problems. First, monads must be functors (i.e. must support mapping without looking at the elements (or their type)), they must also bring a binding (or chaining) operation and a way to create a monadic value from an element type (return). Finally, bind and return must satisfy two equations (left and right identities), also called the monad laws. (Alternatively one could define monads to have a flattening operation instead of binding.)
The list monad is commonly used to deal with non-determinism. The bind operation selects one element of the list (intuitively all of them in parallel worlds), lets the programmer to do some computation with them, and then combines the results in all worlds to single list (by concatenating, or flattening, a nested list). Here is how one would define a permutation function in the monadic framework of Haskell:
perm [e] = [[e]]
perm l = do (leader, index) <- zip l [0 :: Int ..]
let shortened = take index l ++ drop (index + 1) l
trailer <- perm shortened
return (leader : trailer)
Here is an example repl session:
*Main> perm "a"
["a"]
*Main> perm "ab"
["ab","ba"]
*Main> perm ""
[]
*Main> perm "abc"
["abc","acb","bac","bca","cab","cba"]
It should be noted that the list monad is in no way a side effecting computation. A mathematical structure being a monad (i.e. conforming to the above mentioned interfaces and laws) does not imply side effects, though side-effecting phenomena often nicely fit into the monadic framework.

You need monads if you have a type constructor and functions that returns values of that type family. Eventually, you would like to combine these kind of functions together. These are the three key elements to answer why.
Let me elaborate. You have Int, String and Real and functions of type Int -> String, String -> Real and so on. You can combine these functions easily, ending with Int -> Real. Life is good.
Then, one day, you need to create a new family of types. It could be because you need to consider the possibility of returning no value (Maybe), returning an error (Either), multiple results (List) and so on.
Notice that Maybe is a type constructor. It takes a type, like Int and returns a new type Maybe Int. First thing to remember, no type constructor, no monad.
Of course, you want to use your type constructor in your code, and soon you end with functions like Int -> Maybe String and String -> Maybe Float. Now, you can't easily combine your functions. Life is not good anymore.
And here's when monads come to the rescue. They allow you to combine that kind of functions again. You just need to change the composition . for >==.

Why do we need monadic types?
Since it was the quandary of I/O and its observable effects in nonstrict languages like Haskell that brought the monadic interface to such prominence:
[...] monads are used to address the more general problem of computations (involving state, input/output, backtracking, ...) returning values: they do not solve any input/output-problems directly but rather provide an elegant and flexible abstraction of many solutions to related problems. [...] For instance, no less than three different input/output-schemes are used to solve these basic problems in Imperative functional programming, the paper which originally proposed `a new model, based on monads, for performing input/output in a non-strict, purely functional language'. [...]
[Such] input/output-schemes merely provide frameworks in which side-effecting operations can safely be used with a guaranteed order of execution and without affecting the properties of the purely functional parts of the language.
Claus Reinke (pages 96-97 of 210).
(emphasis by me.)
[...] When we write effectful code – monads or no monads – we have to constantly keep in mind the context of expressions we pass around.
The fact that monadic code ‘desugars’ (is implementable in terms of) side-effect-free code is irrelevant. When we use monadic notation, we program within that notation – without considering what this notation desugars into. Thinking of the desugared code breaks the monadic abstraction. A side-effect-free, applicative code is normally compiled to (that is, desugars into) C or machine code. If the desugaring argument has any force, it may be applied just as well to the applicative code, leading to the conclusion that it all boils down to the machine code and hence all programming is imperative.
[...] From the personal experience, I have noticed that the mistakes I make when writing monadic code are exactly the mistakes I made when programming in C. Actually, monadic mistakes tend to be worse, because monadic notation (compared to that of a typical imperative language) is ungainly and obscuring.
Oleg Kiselyov (page 21 of 26).
The most difficult construct for students to understand is the monad. I introduce IO without mentioning monads.
Olaf Chitil.
More generally:
Still, today, over 25 years after the introduction of the concept of monads to the world of functional programming, beginning functional programmers struggle to grasp the concept of monads. This struggle is exemplified by the numerous blog posts about the effort of trying to learn about monads. From our own experience we notice that even at university level, bachelor level students often struggle to comprehend monads and consistently score poorly on monad-related exam questions.
Considering that the concept of monads is not likely to disappear from the functional programming landscape any time soon, it is vital that we, as the functional programming community, somehow overcome the problems novices encounter when first studying monads.
Tim Steenvoorden, Jurriën Stutterheim, Erik Barendsen and Rinus Plasmeijer.
If only there was another way to specify "a guaranteed order of execution" in Haskell, while keeping the ability to separate regular Haskell definitions from those involved in I/O (and its observable effects) - translating this variation of Philip Wadler's echo:
val echoML : unit -> unit
fun echoML () = let val c = getcML () in
if c = #"\n" then
()
else
let val _ = putcML c in
echoML ()
end
fun putcML c = TextIO.output1(TextIO.stdOut,c);
fun getcML () = valOf(TextIO.input1(TextIO.stdIn));
...could then be as simple as:
echo :: OI -> ()
echo u = let !(u1:u2:u3:_) = partsOI u in
let !c = getChar u1 in
if c == '\n' then
()
else
let !_ = putChar c u2 in
echo u3
where:
data OI -- abstract
foreign import ccall "primPartOI" partOI :: OI -> (OI, OI)
⋮
foreign import ccall "primGetCharOI" getChar :: OI -> Char
foreign import ccall "primPutCharOI" putChar :: Char -> OI -> ()
⋮
and:
partsOI :: OI -> [OI]
partsOI u = let !(u1, u2) = partOI u in u1 : partsOI u2
How would this work? At run-time, Main.main receives an initial OI pseudo-data value as an argument:
module Main(main) where
main :: OI -> ()
⋮
...from which other OI values are produced, using partOI or partsOI. All you have to do is ensure each new OI value is used at most once, in each call to an OI-based definition, foreign or otherwise. In return, you get back a plain ordinary result - it isn't e.g. paired with some odd abstract state, or requires the use of a callback continuation, etc.
Using OI, instead of the unit type () like Standard ML does, means we can avoid always having to use the monadic interface:
Once you're in the IO monad, you're stuck there forever, and are reduced to Algol-style imperative programming.
Robert Harper.
But if you really do need it:
type IO a = OI -> a
unitIO :: a -> IO a
unitIO x = \ u -> let !_ = partOI u in x
bindIO :: IO a -> (a -> IO b) -> IO b
bindIO m k = \ u -> let !(u1, u2) = partOI u in
let !x = m u1 in
let !y = k x u2 in
y
⋮
So, monadic types aren't always needed - there are other interfaces out there:
LML had a fully fledged implementation of oracles running of a multi-processor (a Sequent Symmetry) back in ca 1989. The description in the Fudgets thesis refers to this implementation. It was fairly pleasant to work with and quite practical.
[...]
These days everything is done with monads so other solutions are sometimes forgotten.
Lennart Augustsson (2006).
Wait a moment: since it so closely resembles Standard ML's direct use of effects, is this approach and its use of pseudo-data referentially transparent?
Absolutely - just find a suitable definition of "referential transparency"; there's plenty to choose from...

Haskell: Computation "in a monad" -- meaning?

In reading about monads, I keep seeing phrases like "computations in the Xyz monad". What does it mean for a computation to be "in" a certain monad?
I think I have a fair grasp on what monads are about: allowing computations to produce outputs that are usually of some expected type, but can alternatively or additionally convey some other information, such as error status, logging info, state and so on, and allow such computations to be chained.
But I don't get how a computation would be said to be "in" a monad. Does this just refer to a function that produces a monadic result?
Examples: (search "computation in")
http://www.haskell.org/tutorial/monads.html
http://www.haskell.org/haskellwiki/All_About_Monads
http://ertes.de/articles/monads.html

Generally, a "computation in a monad" means not just a function returning a monadic result, but such a function used inside a do block, or as part of the second argument to (>>=), or anything else equivalent to those. The distinction is relevant to something you said in a comment:
"Computation" occurs in func f, after val extracted from input monad, and before result is wrapped as monad. I don't see how the computation per se is "in" the monad; it seems conspicuously "out" of the monad.
This isn't a bad way to think about it--in fact, do notation encourages it because it's a convenient way to look at things--but it does result in a slightly misleading intuition. Nowhere is anything being "extracted" from a monad. To see why, forget about (>>=)--it's a composite operation that exists to support do notation. The more fundamental definition of a monad are three orthogonal functions:
fmap :: (a -> b) -> (m a -> m b)
return :: a -> m a
join :: m (m a) -> m a
...where m is a monad.
Now think about how to implement (>>=) with these: starting with arguments of type m a and a -> m b, your only option is using fmap to get something of type m (m b), after which you can use join to flatten the nested "layers" to get just m b.
In other words, nothing is being taken "out" of the monad--instead, think of the computation as going deeper into the monad, with successive steps being collapsed into a single layer of the monad.
Note that the monad laws are also much simpler from this perspective--essentially, they say that when join is applied doesn't matter as long as the nesting order is preserved (a form of associativity) and that the monadic layer introduced by return does nothing (an identity value for join).

Does this just refer to a function that produces a monadic result?
Yes, in short.
In long, it's because Monad allows you to inject values into it (via return) but once inside the Monad they're stuck. You have to use some function like evalWriter or runCont which is strictly more specific than Monad to get values back "out".
More than that, Monad (really, its partner, Applicative) is the essence of having a "container" and allowing computations to happen inside of it. That's what (>>=) gives you, the ability to do interesting computations "inside" the Monad.
So functions like Monad m => m a -> (a -> m b) -> m b let you compute with and around and inside a Monad. Functions like Monad m => a -> m a let you inject into the Monad. Functions like m a -> a would let you "escape" the Monad except they don't exist in general (only in specific). So, for conversation's sake we like to talk about functions that have result types like Monad m => m a as being "inside the monad".

Usually monad stuff is easier to grasp when starting with "collection-like" monads as example. Imagine you calculate the distance of two points:
data Point = Point Double Double
distance :: Point -> Point -> Double
distance p1 p2 = undefined
Now you may have a certain context. E.g. one of the points may be "illegal" because it is out of some bounds (e.g. on the screen). So you wrap your existing computation in the Maybe monad:
distance :: Maybe Point -> Maybe Point -> Maybe Double
distance p1 p2 = undefined
You have exactly the same computation, but with the additional feature that there may be "no result" (encoded as Nothing).
Or you have a have a two groups of "possible" points, and need their mutual distances (e.g. to use later the shortest connection). Then the list monad is your "context":
distance :: [Point] -> [Point] -> [Double]
distance p1 p2 = undefined
Or the points are entered by a user, which makes the calculation "nondeterministic" (in the sense that you depend on things in the outside world, which may change), then the IO monad is your friend:
distance :: IO Point -> IO Point -> IO Double
distance p1 p2 = undefined
The computation remains always the same, but happens to take place in a certain "context", which adds some useful aspects (failure, multi-value, nondeterminism). You can even combine these contexts (monad transformers).
You may write a definition that unifies the definitions above, and works for any monad:
distance :: Monad m => m Point -> m Point -> m Double
distance p1 p2 = do
Point x1 y1 <- p1
Point x2 y2 <- p2
return $ sqrt ((x1-x2)^2 + (y1-y2)^2)
That proves that our computation is really independent from the actual monad, which leads to formulations as "x is computed in(-side) the y monad".

Looking at the links you provided, it seems that a common usage of "computation in" is with regards to a single monadic value. Excerpts:
Gentle introduction - here we run a computation in the SM monad, but the computation is the monadic value:
-- run a computation in the SM monad
runSM :: S -> SM a -> (a,S)
All about monads - previous computation refers to a monadic value in the sequence:
The >> function is a convenience operator that is used to bind a monadic computation that does not require input from the previous computation in the sequence
Understanding monads - here the first computation could refer to e.g. getLine, a monadic value :
(binding) gives an intrinsic idea of using the result of a computation in another computation, without requiring a notion of running computations.
So as an analogy, if I say i = 4 + 2, then i is the value 6, but it is equally a computation, namely the computation 4 + 2. It seems the linked pages uses computation in this sense - computation as a monadic value - at least some of the time, in which case it makes sense to use the expression "a computation in" the given monad.

Consider the IO monad. A value of type IO a is a description of a large (often infinite) number of behaviours where a behaviour is a sequence of IO events (reads, writes, etc). Such a value is called a "computation"; in this case it is a computation in the IO monad.

Why monads? How does it resolve side-effects?

I am learning Haskell and trying to understand Monads. I have two questions:
From what I understand, Monad is just another typeclass that declares ways to interact with data inside "containers", including Maybe, List, and IO. It seems clever and clean to implement these 3 things with one concept, but really, the point is so there can be clean error handling in a chain of functions, containers, and side effects. Is this a correct interpretation?
How exactly is the problem of side-effects solved? With this concept of containers, the language essentially says anything inside the containers is non-deterministic (such as i/o). Because lists and IOs are both containers, lists are equivalence-classed with IO, even though values inside lists seem pretty deterministic to me. So what is deterministic and what has side-effects? I can't wrap my head around the idea that a basic value is deterministic, until you stick it in a container (which is no special than the same value with some other values next to it, e.g. Nothing) and it can now be random.
Can someone explain how, intuitively, Haskell gets away with changing state with inputs and output? I'm not seeing the magic here.

The point is so there can be clean error handling in a chain of functions, containers, and side effects. Is this a correct interpretation?
Not really. You've mentioned a lot of concepts that people cite when trying to explain monads, including side effects, error handling and non-determinism, but it sounds like you've gotten the incorrect sense that all of these concepts apply to all monads. But there's one concept you mentioned that does: chaining.
There are two different flavors of this, so I'll explain it two different ways: one without side effects, and one with side effects.
No Side Effects:
Take the following example:
addM :: (Monad m, Num a) => m a -> m a -> m a
addM ma mb = do
a <- ma
b <- mb
return (a + b)
This function adds two numbers, with the twist that they are wrapped in some monad. Which monad? Doesn't matter! In all cases, that special do syntax de-sugars to the following:
addM ma mb =
ma >>= \a ->
mb >>= \b ->
return (a + b)
... or, with operator precedence made explicit:
ma >>= (\a -> mb >>= (\b -> return (a + b)))
Now you can really see that this is a chain of little functions, all composed together, and its behavior will depend on how >>= and return are defined for each monad. If you're familiar with polymorphism in object-oriented languages, this is essentially the same thing: one common interface with multiple implementations. It's slightly more mind-bending than your average OOP interface, since the interface represents a computation policy rather than, say, an animal or a shape or something.
Okay, let's see some examples of how addM behaves across different monads. The Identity monad is a decent place to start, since its definition is trivial:
instance Monad Identity where
return a = Identity a -- create an Identity value
(Identity a) >>= f = f a -- apply f to a
So what happens when we say:
addM (Identity 1) (Identity 2)
Expanding this, step by step:
(Identity 1) >>= (\a -> (Identity 2) >>= (\b -> return (a + b)))
(\a -> (Identity 2) >>= (\b -> return (a + b)) 1
(Identity 2) >>= (\b -> return (1 + b))
(\b -> return (1 + b)) 2
return (1 + 2)
Identity 3
Great. Now, since you mentioned clean error handling, let's look at the Maybe monad. Its definition is only slightly trickier than Identity:
instance Monad Maybe where
return a = Just a -- same as Identity monad!
(Just a) >>= f = f a -- same as Identity monad again!
Nothing >>= _ = Nothing -- the only real difference from Identity
So you can imagine that if we say addM (Just 1) (Just 2) we'll get Just 3. But for grins, let's expand addM Nothing (Just 1) instead:
Nothing >>= (\a -> (Just 1) >>= (\b -> return (a + b)))
Nothing
Or the other way around, addM (Just 1) Nothing:
(Just 1) >>= (\a -> Nothing >>= (\b -> return (a + b)))
(\a -> Nothing >>= (\b -> return (a + b)) 1
Nothing >>= (\b -> return (1 + b))
Nothing
So the Maybe monad's definition of >>= was tweaked to account for failure. When a function is applied to a Maybe value using >>=, you get what you'd expect.
Okay, so you mentioned non-determinism. Yes, the list monad can be thought of as modeling non-determinism in a sense... It's a little weird, but think of the list as representing alternative possible values: [1, 2, 3] is not a collection, it's a single non-deterministic number that could be either one, two or three. That sounds dumb, but it starts to make some sense when you think about how >>= is defined for lists: it applies the given function to each possible value. So addM [1, 2] [3, 4] is actually going to compute all possible sums of those two non-deterministic values: [4, 5, 5, 6].
Okay, now to address your second question...
Side Effects:
Let's say you apply addM to two values in the IO monad, like:
addM (return 1 :: IO Int) (return 2 :: IO Int)
You don't get anything special, just 3 in the IO monad. addM does not read or write any mutable state, so it's kind of no fun. Same goes for the State or ST monads. No fun. So let's use a different function:
fireTheMissiles :: IO Int -- returns the number of casualties
Clearly the world will be different each time missiles are fired. Clearly. Now let's say you're trying to write some totally innocuous, side effect free, non-missile-firing code. Perhaps you're trying once again to add two numbers, but this time without any monads flying around:
add :: Num a => a -> a -> a
add a b = a + b
and all of a sudden your hand slips, and you accidentally typo:
add a b = a + b + fireTheMissiles
An honest mistake, really. The keys were so close together. Fortunately, because fireTheMissiles was of type IO Int rather than simply Int, the compiler is able to avert disaster.
Okay, totally contrived example, but the point is that in the case of IO, ST and friends, the type system keeps effects isolated to some specific context. It doesn't magically eliminate side effects, making code referentially transparent that shouldn't be, but it does make it clear at compile time what scope the effects are limited to.
So getting back to the original point: what does this have to do with chaining or composition of functions? Well, in this case, it's just a handy way of expressing a sequence of effects:
fireTheMissilesTwice :: IO ()
fireTheMissilesTwice = do
a <- fireTheMissiles
print a
b <- fireTheMissiles
print b
Summary:
A monad represents some policy for chaining computations. Identity's policy is pure function composition, Maybe's policy is function composition with failure propogation, IO's policy is impure function composition and so on.

Let me start by pointing at the excellent "You could have invented monads" article. It illustrates how the Monad structure can naturally manifest while you are writing programs. But the tutorial doesn't mention IO, so I will have a stab here at extending the approach.
Let us start with what you probably have already seen - the container monad. Let's say we have:
f, g :: Int -> [Int]
One way of looking at this is that it gives us a number of possible outputs for every possible input. What if we want all possible outputs for the composition of both functions? Giving all possibilities we could get by applying the functions one after the other?
Well, there's a function for that:
fg x = concatMap g $ f x
If we put this more general, we get
fg x = f x >>= g
xs >>= f = concatMap f xs
return x = [x]
Why would we want to wrap it like this? Well, writing our programs primarily using >>= and return gives us some nice properties - for example, we can be sure that it's relatively hard to "forget" solutions. We'd explicitly have to reintroduce it, say by adding another function skip. And also we now have a monad and can use all combinators from the monad library!
Now, let us jump to your trickier example. Let's say the two functions are "side-effecting". That's not non-deterministic, it just means that in theory the whole world is both their input (as it can influence them) as well as their output (as the function can influence it). So we get something like:
f, g :: Int -> RealWorld# -> (Int, RealWorld#)
If we now want f to get the world that g left behind, we'd write:
fg x rw = let (y, rw') = f x rw
(r, rw'') = g y rw'
in (r, rw'')
Or generalized:
fg x = f x >>= g
x >>= f = \rw -> let (y, rw') = x rw
(r, rw'') = f y rw'
in (r, rw'')
return x = \rw -> (x, rw)
Now if the user can only use >>=, return and a few pre-defined IO values we get a nice property again: The user will never actually see the RealWorld# getting passed around! And that is a very good thing, as you aren't really interested in the details of where getLine gets its data from. And again we get all the nice high-level functions from the monad libraries.
So the important things to take away:
The monad captures common patterns in your code, like "always pass all elements of container A to container B" or "pass this real-world-tag through". Often, once you realize that there is a monad in your program, complicated things become simply applications of the right monad combinator.
The monad allows you to completely hide the implementation from the user. It is an excellent encapsulation mechanism, be it for your own internal state or for how IO manages to squeeze non-purity into a pure program in a relatively safe way.
Appendix
In case someone is still scratching his head over RealWorld# as much as I did when I started: There's obviously more magic going on after all the monad abstraction has been removed. Then the compiler will make use of the fact that there can only ever be one "real world". That's good news and bad news:
It follows that the compiler must guarantuee execution ordering between functions (which is what we were after!)
But it also means that actually passing the real world isn't necessary as there is only one we could possibly mean: The one that is current when the function gets executed!
Bottom line is that once execution order is fixed, RealWorld# simply gets optimized out. Therefore programs using the IO monad actually have zero runtime overhead. Also note that using RealWorld# is obviously only one possible way to put IO - but it happens to be the one GHC uses internally. The good thing about monads is that, again, the user really doesn't need to know.

You could see a given monad m as a set/family (or realm, domain, etc.) of actions (think of a C statement). The monad m defines the kind of (side-)effects that its actions may have:
with [] you can define actions which can fork their executions in different "independent parallel worlds";
with Either Foo you can define actions which can fail with errors of type Foo;
with IO you can define actions which can have side-effects on the "outside world" (access files, network, launch processes, do a HTTP GET ...);
you can have a monad whose effect is "randomness" (see package MonadRandom);
you can define a monad whose actions can make a move in a game (say chess, Go…) and receive move from an opponent but are not able to write to your filesystem or anything else.
Summary
If m is a monad, m a is an action which produces a result/output of type a.
The >> and >>= operators are used to create more complex actions out of simpler ones:
a >> b is a macro-action which does action a and then action b;
a >> a does action a and then action a again;
with >>= the second action can depend on the output of the first one.
The exact meaning of what an action is and what doing an action and then another one is depends on the monad: each monad defines an imperative sublanguage with some features/effects.
Simple sequencing (>>)
Let's say with have a given monad M and some actions incrementCounter, decrementCounter, readCounter:
instance M Monad where ...
-- Modify the counter and do not produce any result:
incrementCounter :: M ()
decrementCounter :: M ()
-- Get the current value of the counter
readCounter :: M Integer
Now we would like to do something interesting with those actions. The first thing we would like to do with those actions is to sequence them. As in say C, we would like to be able to do:
// This is C:
counter++;
counter++;
We define an "sequencing operator" >>. Using this operator we can write:
incrementCounter >> incrementCounter
What is the type of "incrementCounter >> incrementCounter"?
It is an action made of two smaller actions like in C you can write composed-statements from atomic statements :
// This is a macro statement made of several statements
{
counter++;
counter++;
}
// and we can use it anywhere we may use a statement:
if (condition) {
counter++;
counter++;
}
it can have the same kind of effects as its subactions;
it does not produce any output/result.
So we would like incrementCounter >> incrementCounter to be of type M (): an (macro-)action with the same kind of possible effects but without any output.
More generally, given two actions:
action1 :: M a
action2 :: M b
we define a a >> b as the macro-action which is obtained by doing (whatever that means in our domain of action) a then b and produces as output the result of the execution of the second action. The type of >> is:
(>>) :: M a -> M b -> M b
or more generally:
(>>) :: (Monad m) => m a -> m b -> m b
We can define bigger sequence of actions from simpler ones:
action1 >> action2 >> action3 >> action4
Input and outputs (>>=)
We would like to be able to increment by something else that 1 at a time:
incrementBy 5
We want to provide some input in our actions, in order to do this we define a function incrementBy taking an Int and producing an action:
incrementBy :: Int -> M ()
Now we can write things like:
incrementCounter >> readCounter >> incrementBy 5
But we have no way to feed the output of readCounter into incrementBy. In order to do this, a slightly more powerful version of our sequencing operator is needed. The >>= operator can feed the output of a given action as input to the next action. We can write:
readCounter >>= incrementBy
It is an action which executes the readCounter action, feeds its output in the incrementBy function and then execute the resulting action.
The type of >>= is:
(>>=) :: Monad m => m a -> (a -> m b) -> m b
A (partial) example
Let's say I have a Prompt monad which can only display informations (text) to the user and ask informations to the user:
-- We don't have access to the internal structure of the Prompt monad
module Prompt (Prompt(), echo, prompt) where
-- Opaque
data Prompt a = ...
instance Monad Prompt where ...
-- Display a line to the CLI:
echo :: String -> Prompt ()
-- Ask a question to the user:
prompt :: String -> Prompt String
Let's try to define a promptBoolean message actions which asks for a question and produces a boolean value.
We use the prompt (message ++ "[y/n]") action and feed its output to a function f:
f "y" should be an action which does nothing but produce True as output;
f "n" should be an action which does nothing but produce False as output;
anything else should restart the action (do the action again);
promptBoolean would look like this:
-- Incomplete version, some bits are missing:
promptBoolean :: String -> M Boolean
promptBoolean message = prompt (message ++ "[y/n]") >>= f
where f result = if result == "y"
then ???? -- We need here an action which does nothing but produce `True` as output
else if result=="n"
then ???? -- We need here an action which does nothing but produce `False` as output
else echo "Input not recognised, try again." >> promptBoolean
Producing a value without effect (return)
In order to fill the missing bits in our promptBoolean function, we need a way to represent dummy actions without any side effect but which only outputs a given value:
-- "return 5" is an action which does nothing but outputs 5
return :: (Monad m) => a -> m a
and we can now write out promptBoolean function:
promptBoolean :: String -> Prompt Boolean
promptBoolean message :: prompt (message ++ "[y/n]") >>= f
where f result = if result=="y"
then return True
else if result=="n"
then return False
else echo "Input not recognised, try again." >> promptBoolean message
By composing those two simple actions (promptBoolean, echo) we can define any kind of dialogue between the user and your program (the actions of the program are deterministic as our monad does not have a "randomness effect").
promptInt :: String -> M Int
promptInt = ... -- similar
-- Classic "guess a number game/dialogue"
guess :: Int -> m()
guess n = promptInt "Guess:" m -> f
where f m = if m == n
then echo "Found"
else (if m > n
then echo "Too big"
then echo "Too small") >> guess n
The operations of a monad
A Monad is a set of actions which can be composed with the return and >>= operators:
>>= for action composition;
return for producing a value without any (side-)effect.
These two operators are the minimal operators needed to define a Monad.
In Haskell, the >> operator is needed as well but it can in fact be derived from >>=:
(>>): Monad m => m a -> m b -> m b
a >> b = a >>= f
where f x = b
In Haskell, an extra fail operator is need as well but this is really a hack (and it might be removed from Monad in the future).
This is the Haskell definition of a Monad:
class Monad m where
return :: m a
(>>=) :: m a -> (a -> m b) -> m b
(>>) :: m a -> m b -> m b -- can be derived from (>>=)
fail :: String -> m a -- mostly a hack
Actions are first-class
One great thing about monads is that actions are first-class. You can take them in a variable, you can define function which take actions as input and produce some other actions as output. For example, we can define a while operator:
-- while x y : does action y while action x output True
while :: (Monad m) => m Boolean -> m a -> m ()
while x y = x >>= f
where f True = y >> while x y
f False = return ()
Summary
A Monad is a set of actions in some domain. The monad/domain define the kind of "effects" which are possible. The >> and >>= operators represent sequencing of actions and monadic expression may be used to represent any kind of "imperative (sub)program" in your (functional) Haskell program.
The great things are that:
you can design your own Monad which supports the features and effects that you want
see Prompt for an example of a "dialogue only subprogram",
see Rand for an example of "sampling only subprogram";
you can write your own control structures (while, throw, catch or more exotic ones) as functions taking actions and composing them in some way to produce a bigger macro-actions.
MonadRandom
A good way of understanding monads, is the MonadRandom package. The Rand monad is made of actions whose output can be random (the effect is randomness). An action in this monad is some kind of random variable (or more exactly a sampling process):
-- Sample an Int from some distribution
action :: Rand Int
Using Rand to do some sampling/random algorithms is quite interesting because you have random variables as first class values:
-- Estimate mean by sampling nsamples times the random variable x
sampleMean :: Real a => Int -> m a -> m a
sampleMean n x = ...
In this setting, the sequence function from Prelude,
sequence :: Monad m => [m a] -> m [a]
becomes
sequence :: [Rand a] -> Rand [a]
It creates a random variable obtained by sampling independently from a list of random variables.

There are three main observations concerning the IO monad:
1) You can't get values out of it. Other types like Maybe might allow to extract values, but neither the monad class interface itself nor the IO data type allow it.
2) "Inside" IO is not only the real value but also that "RealWorld" thing. This dummy value is used to enforce the chaining of actions by the type system: If you have two independent calculations, the use of >>= makes the second calculation dependent on the first.
3) Assume a non-deterministic thing like random :: () -> Int, which isn't allowed in Haskell. If you change the signature to random :: Blubb -> (Blubb, Int), it is allowed, if you make sure that nobody ever can use a Blubb twice: Because in that case all inputs are "different", it is no problem that the outputs are different as well.
Now we can use the fact 1): Nobody can get something out of IO, so we can use the RealWord dummy hidden in IO to serve as a Blubb. There is only one IOin the whole application (the one we get from main), and it takes care of proper sequentiation, as we have seen in 2). Problem solved.

One thing that often helps me to understand the nature of something is to examine it in the most trivial way possible. That way, I'm not getting distracted by potentially unrelated concepts. With that in mind, I think it may be helpful to understand the nature of the Identity Monad, as it's the most trivial implementation of a Monad possible (I think).
What is interesting about the Identity Monad? I think it is that it allows me to express the idea of evaluating expressions in a context defined by other expressions. And to me, that is the essence of every Monad I've encountered (so far).
If you already had a lot of exposure to 'mainstream' programming languages before learning Haskell (like I did), then this doesn't seem very interesting at all. After all, in a mainstream programming language, statements are executed in sequence, one after the other (excepting control-flow constructs, of course). And naturally, we can assume that every statement is evaluated in the context of all previously executed statements and that those previously executed statements may alter the environment and the behavior of the currently executing statement.
All of that is pretty much a foreign concept in a functional, lazy language like Haskell. The order in which computations are evaluated in Haskell is well-defined, but sometimes hard to predict, and even harder to control. And for many kinds of problems, that's just fine. But other sorts of problems (e.g. IO) are hard to solve without some convenient way to establish an implicit order and context between the computations in your program.
As far as side-effects go, specifically, often they can be transformed (via a Monad) in to simple state-passing, which is perfectly legal in a pure functional language. Some Monads don't seem to be of that nature, however. Monads such as the IO Monad or the ST monad literally perform side-effecting actions. There are many ways to think about this, but one way that I think about it is that just because my computations must exist in a world without side-effects, the Monad may not. As such, the Monad is free to establish a context for my computation to execute that is based on side-effects defined by other computations.
Finally, I must disclaim that I am definitely not a Haskell expert. As such, please understand that everything I've said is pretty much my own thoughts on this subject and I may very well disown them later when I understand Monads more fully.

the point is so there can be clean error handling in a chain of functions, containers, and side effects
More or less.
how exactly is the problem of side-effects solved?
A value in the I/O monad, i.e. one of type IO a, should be interpreted as a program. p >> q on IO values can then be interpreted as the operator that combines two programs into one that first executes p, then q. The other monad operators have similar interpretations. By assigning a program to the name main, you declare to the compiler that that is the program that has to be executed by its output object code.
As for the list monad, it's not really related to the I/O monad except in a very abstract mathematical sense. The IO monad gives deterministic computation with side effects, while the list monad gives non-deterministic (but not random!) backtracking search, somewhat similar to Prolog's modus operandi.

With this concept of containers, the language essentially says anything inside the containers is non-deterministic
No. Haskell is deterministic. If you ask for integer addition 2+2 you will always get 4.
"Nondeterministic" is only a metaphor, a way of thinking. Everything is deterministic under the hood. If you have this code:
do x <- [4,5]
y <- [0,1]
return (x+y)
it is roughly equivalent to Python code
l = []
for x in [4,5]:
for y in [0,1]:
l.append(x+y)
You see nondeterminism here? No, it's deterministic construction of a list. Run it twice, you'll get the same numbers in the same order.
You can describe it this way: Choose arbitrary x from [4,5]. Choose arbitrary y from [0,1]. Return x+y. Collect all possible results.
That way seems to involve nondeterminism, but it's only a nested loop (list comprehension). There is no "real" nondeterminism here, it's simulated by checking all possibilities. Nondeterminism is an illusion. The code only appears to be nondeterministic.
This code using State monad:
do put 0
x <- get
put (x+2)
y <- get
return (y+3)
gives 5 and seems to involve changing state. As with lists it's an illusion. There are no "variables" that change (as in imperative languages). Everything is nonmutable under the hood.
You can describe the code this way: put 0 to a variable. Read the value of a variable to x. Put (x+2) to the variable. Read the variable to y, and return y+3.
That way seems to involve state, but it's only composing functions passing additional parameter. There is no "real" mutability here, it's simulated by composition. Mutability is an illusion. The code only appears to be using it.
Haskell does it this way: you've got functions
a -> s -> (b,s)
This function takes and old value of state and returns new value. It does not involve mutability or change variables. It's a function in mathematical sense.
For example the function "put" takes new value of state, ignores current state and returns new state:
put x _ = ((), x)
Just like you can compose two normal functions
a -> b
b -> c
into
a -> c
using (.) operator you can compose "state" transformers
a -> s -> (b,s)
b -> s -> (c,s)
into a single function
a -> s -> (c,s)
Try writing the composition operator yourself. This is what really happens, there are no "side effects" only passing arguments to functions.

From what I understand, Monad is just another typeclass that declares ways to interact with data [...]
...providing an interface common to all those types which have an instance. This can then be used to provide generic definitions which work across all monadic types.
It seems clever and clean to implement these 3 things with one concept [...]
...the only three things that are implemented are the instances for those three types (list, Maybe and IO) - the types themselves are defined independently elsewhere.
[...] but really, the point is so there can be clean error handling in a chain of functions, containers, and side effects.
Not just error handling e.g. consider ST - without the monadic interface, you would have to pass the encapsulated-state directly and correctly...a tiresome task.
How exactly is the problem of side-effects solved?
Short answer: Haskell solves manages them by using types to indicate their presence.
Can someone explain how, intuitively, Haskell gets away with changing state with inputs and output?
"Intuitively"...like what's available over here? Let's try a simple direct comparison instead:
From How to Declare an Imperative by Philip Wadler:
(* page 26 *)
type 'a io = unit -> 'a
infix >>=
val >>= : 'a io * ('a -> 'b io) -> 'b io
fun m >>= k = fn () => let
val x = m ()
val y = k x ()
in
y
end
val return : 'a -> 'a io
fun return x = fn () => x
val putc : char -> unit io
fun putc c = fn () => putcML c
val getc : char io
val getc = fn () => getcML ()
fun getcML () =
valOf(TextIO.input1(TextIO.stdIn))
(* page 25 *)
fun putcML c =
TextIO.output1(TextIO.stdOut,c)
Based on these two answers of mine, this is my Haskell translation:
type IO a = OI -> a
(>>=) :: IO a -> (a -> IO b) -> IO b
m >>= k = \ u -> let !(u1, u2) = part u in
let !x = m u1 in
let !y = k x u2 in
y
return :: a -> IO a
return x = \ u -> let !_ = part u in x
putc :: Char -> IO ()
putc c = \ u -> putcOI c u
getc :: IO Char
getc = \ u -> getcOI u
-- primitives
data OI
partOI :: OI -> (OI, OI)
putcOI :: Char -> OI -> ()
getcOI :: OI -> Char
Now remember that short answer about side-effects?
Haskell manages them by using types to indicate their presence.
Data.Char.chr :: Int -> Char -- no side effects
getChar :: IO Char -- side effects at
{- :: OI -> Char -} -- work: beware!

Is Haskell truly pure (is any language that deals with input and output outside the system)?

After touching on Monads in respect to functional programming, does the feature actually make a language pure, or is it just another "get out of jail free card" for reasoning of computer systems in the real world, outside of blackboard maths?
EDIT:
This is not flame bait as someone has said in this post, but a genuine question that I am hoping that someone can shoot me down with and say, proof, it is pure.
Also I am looking at the question with respect to other not so pure Functional Languages and some OO languages that use good design and comparing the purity. So far in my very limited world of FP, I have still not groked the Monads purity, you will be pleased to know however I do like the idea of immutability which is far more important in the purity stakes.

Take the following mini-language:
data Action = Get (Char -> Action) | Put Char Action | End
Get f means: read a character c, and perform action f c.
Put c a means: write character c, and perform action a.
Here's a program that prints "xy", then asks for two letters and prints them in reverse order:
Put 'x' (Put 'y' (Get (\a -> Get (\b -> Put b (Put a End)))))
You can manipulate such programs. For example:
conditionally p = Get (\a -> if a == 'Y' then p else End)
This is has type Action -> Action - it takes a program and gives another program that asks for confirmation first. Here's another:
printString = foldr Put End
This has type String -> Action - it takes a string and returns a program that writes the string, like
Put 'h' (Put 'e' (Put 'l' (Put 'l' (Put 'o' End)))).
IO in Haskell works similarily. Although executing it requires performing side effects, you can build complex programs without executing them, in a pure way. You're computing on descriptions of programs (IO actions), and not actually performing them.
In language like C you can write a function void execute(Action a) that actually executed the program. In Haskell you specify that action by writing main = a. The compiler creates a program that executes the action, but you have no other way to execute an action (aside dirty tricks).
Obviously Get and Put are not only options, you can add many other API calls to the IO data type, like operating on files or concurrency.
Adding a result value
Now consider the following data type.
data IO a = Get (Char -> Action) | Put Char Action | End a
The previous Action type is equivalent to IO (), i.e. an IO value which always returns "unit", comparable to "void".
This type is very similar to Haskell IO, only in Haskell IO is an abstract data type (you don't have access to the definition, only to some methods).
These are IO actions which can end with some result. A value like this:
Get (\x -> if x == 'A' then Put 'B' (End 3) else End 4)
has type IO Int and is corresponding to a C program:
int f() {
char x;
scanf("%c", &x);
if (x == 'A') {
printf("B");
return 3;
} else return 4;
}
Evaluation and execution
There's a difference between evaluating and executing. You can evaluate any Haskell expression, and get a value; for example, evaluate 2+2 :: Int into 4 :: Int. You can execute Haskell expressions only which have type IO a. This might have side-effects; executing Put 'a' (End 3) puts the letter a to screen. If you evaluate an IO value, like this:
if 2+2 == 4 then Put 'A' (End 0) else Put 'B' (End 2)
you get:
Put 'A' (End 0)
But there are no side-effects - you only performed an evaluation, which is harmless.
How would you translate
bool comp(char x) {
char y;
scanf("%c", &y);
if (x > y) { //Character comparison
printf(">");
return true;
} else {
printf("<");
return false;
}
}
into an IO value?
Fix some character, say 'v'. Now comp('v') is an IO action, which compares given character to 'v'. Similarly, comp('b') is an IO action, which compares given character to 'b'. In general, comp is a function which takes a character and returns an IO action.
As a programmer in C, you might argue that comp('b') is a boolean. In C, evaluation and execution are identical (i.e they mean the same thing, or happens simultaneously). Not in Haskell. comp('b') evaluates into some IO action, which after being executed gives a boolean. (Precisely, it evaluates into code block as above, only with 'b' substituted for x.)
comp :: Char -> IO Bool
comp x = Get (\y -> if x > y then Put '>' (End True) else Put '<' (End False))
Now, comp 'b' evaluates into Get (\y -> if 'b' > y then Put '>' (End True) else Put '<' (End False)).
It also makes sense mathematically. In C, int f() is a function. For a mathematician, this doesn't make sense - a function with no arguments? The point of functions is to take arguments. A function int f() should be equivalent to int f. It isn't, because functions in C blend mathematical functions and IO actions.
First class
These IO values are first-class. Just like you can have a list of lists of tuples of integers [[(0,2),(8,3)],[(2,8)]] you can build complex values with IO.
(Get (\x -> Put (toUpper x) (End 0)), Get (\x -> Put (toLower x) (End 0)))
:: (IO Int, IO Int)
A tuple of IO actions: first reads a character and prints it uppercase, second reads a character and returns it lowercase.
Get (\x -> End (Put x (End 0))) :: IO (IO Int)
An IO value, which reads a character x and ends, returning an IO value which writes x to screen.
Haskell has special functions which allow easy manipulation of IO values. For example:
sequence :: [IO a] -> IO [a]
which takes a list of IO actions, and returns an IO action which executes them in sequence.
Monads
Monads are some combinators (like conditionally above), which allow you to write programs more structurally. There's a function that composes of type
IO a -> (a -> IO b) -> IO b
which given IO a, and a function a -> IO b, returns a value of type IO b. If you write first argument as a C function a f() and second argument as b g(a x) it returns a program for g(f(x)). Given above definition of Action / IO, you can write that function yourself.
Notice monads are not essential to purity - you can always write programs as I did above.
Purity
The essential thing about purity is referential transparency, and distinguishing between evaluation and execution.
In Haskell, if you have f x+f x you can replace that with 2*f x. In C, f(x)+f(x) in general is not the same as 2*f(x), since f could print something on the screen, or modify x.
Thanks to purity, a compiler has much more freedom and can optimize better. It can rearrange computations, while in C it has to think if that changes meaning of the program.

It is important to understand that there is nothing inherently special about monads – so they definitely don’t represent a “get out of jail” card in this regard. There is no compiler (or other) magic necessary to implement or use monads, they are defined in the purely functional environment of Haskell. In particular, sdcvvc has shown how to define monads in purely functional manner, without any recourses to implementation backdoors.

What does it mean to reason about computer systems "outside of blackboard maths"? What kind of reasoning would that be? Dead reckoning?
Side-effects and pure functions are a matter of point of view. If we view a nominally side-effecting function as a function taking us from one state of the world to another, it's pure again.
We can make every side-effecting function pure by giving it a second argument, a world, and requiring that it pass us a new world when it is done. I don't know C++ at all anymore but say read has a signature like this:
vector<char> read(filepath_t)
In our new "pure style", we handle it like this:
pair<vector<char>, world_t> read(world_t, filepath_t)
This is in fact how every Haskell IO action works.
So now we've got a pure model of IO. Thank goodness. If we couldn't do that then maybe Lambda Calculus and Turing Machines are not equivalent formalisms and then we'd have some explaining to do. We're not quite done but the two problems left to us are easy:
What goes in the world_t structure? A description of every grain of sand, blade of
grass, broken heart and golden sunset?
We have an informal rule that we use a world only once -- after every IO operation, we
throw away the world we used with it. With all these worlds floating around, though,
we are bound to get them mixed up.
The first problem is easy enough. As long as we do not allow inspection of the world, it turns out we needn't trouble ourselves about storing anything in it. We just need to ensure that a new world is not equal to any previous world (lest the compiler deviously optimize some world-producing operations away, like it sometimes does in C++). There are many ways to handle this.
As for the worlds getting mixed up, we'd like to hide the world passing inside a library so that there's no way to get at the worlds and thus no way to mix them up. Turns out, monads are a great way to hide a "side-channel" in a computation. Enter the IO monad.
Some time ago, a question like yours was asked on the Haskell mailing list and there I went in to the "side-channel" in more detail. Here's the Reddit thread (which links to my original email):
http://www.reddit.com/r/haskell/comments/8bhir/why_the_io_monad_isnt_a_dirty_hack/

I'm very new to functional programming, but here's how I understand it:
In haskell, you define a bunch of functions. These functions don't get executed. They might get evaluated.
There's one function in particular that gets evaluated. This is a constant function that produces a set of "actions." The actions include the evaluation of functions and performing of IO and other "real-world" stuff. You can have functions that create and pass around these actions and they will never be executed unless a function is evaluated with unsafePerformIO or
they are returned by the main function.
So in short, a Haskell program is a function, composed of other functions, that returns an imperative program. The Haskell program itself is pure. Obviously, that imperative program itself can't be. Real-world computers are by definition impure.
There's a lot more to this question and a lot of it is a question of semantics (human, not programming language). Monads are also a bit more abstract than what I've described here. But I think this is a useful way of thinking about it in general.

I think of it like this: Programs have to do something with the outside world to be useful. What's happening (or should be happening) when you write code (in any language) is that you strive to write as much pure, side-effect-free code as possible and corral the IO into specific places.
What we have in Haskell is that you're pushed more in this direction of writing to tightly control effects. In the core and in many libraries there is an enormous amount of pure code as well. Haskell is really all about this. Monads in Haskell are useful for a lot of things. And one thing they've been used for is containment around code that deals with impurity.
This way of designing together with a language that greatly facilitates it has an overall effect of helping us to produce more reliable work, requiring less unit testing to be clear on how it behaves, and allowing more re-use through composition.
If I understand what you're saying correctly, I don't see this as a something fake or only in our minds, like a "get out of jail free card." The benefits here are very real.

For an expanded version of sdcwc's sort of construction of IO, one can look at the IOSpec package on Hackage: http://hackage.haskell.org/package/IOSpec

Is Haskell truly pure?
In the absolute sense of the term: no.
That solid-state Turing machine on which you run your programs - Haskell or otherwise - is a state-and-effect device. For any program to use all of its "features", the program will have to resort to using state and effects.
As for all the other "meanings" ascribed to that pejorative term:
To postulate a state-less model of computation on top of a machinery whose most eminent characteristic is state, seems to be an odd idea, to say the least. The gap between model and machinery is wide, and therefore costly to bridge. No hardware support feature can wash this fact aside: It remains a bad idea for practice.
This has in due time also been recognized by the protagonists of functional languages. They have introduced state (and variables) in various tricky ways. The purely functional character has thereby been compromised and sacrificed. The old terminology has become deceiving.
Niklaus Wirth
Does using monadic types actually make a language pure?
No. It's just one way of using types to demarcate:
definitions that have no visible side-effects at all - values;
definitions that potentially have visible side-effects - actions.
You could instead use uniqueness types, just like Clean does...
Is the use of monadic types just another "get out of jail free card" for reasoning of computer systems in the real world, outside of blackboard maths?
This question is ironic, considering the description of the IO type given in the Haskell 2010 report:
The IO type serves as a tag for operations (actions) that interact with the outside world. The IO type is abstract: no constructors are visible to the user. IO is an instance of the Monad and Functor classes.
...to borrow the parlance of another answer:
[…] IO is magical (having an implementation but no denotation) […]
Being abstract, the IO type is anything but a "get out of jail free card" - intricate models involving multiple semantics are required to account for the workings of I/O in Haskell. For more details, see:
Tackling the Awkward Squad: … by Simon Peyton Jones;
The semantics of fixIO by Levent Erk, John Launchbury and Andrew Moran.
It wasn't always like this - Haskell originally had an I/O mechanism which was at least partially-visible; the last language version to have it was Haskell 1.2. Back then, the type of main was:
main :: [Response] -> [Request]
which was usually abbreviated to:
main :: Dialogue
where:
type Dialogue = [Response] -> [Request]
and Response along with Request were humble, albeit large datatypes:
The advent of I/O using the monadic interface in Haskell changed all that - no more visible datatypes, just an abstract description. As a result, how IO, return, (>>=) etc are really defined is now specific to each implementation of Haskell.
(Why was the old I/O mechanism abandoned? "Tackling the Awkward Squad" gives an overview of its problems.)
These days, the more pertinent question should be:
Is I/O in your implementation of Haskell referentially transparent?
As Owen Stephens notes in Approaches to Functional I/O:
I/O is not a particularly active area of research, but new approaches are still being discovered […]
The Haskell language may yet have a referentially-transparent model for I/O which doesn't attract so much controversy...

No, it isn't. IO monad is impure because it has side effects and mutable state (the race conditions are possible in Haskell programs so ? eh ... pure FP language don't know something like "race condition"). Really pure FP is Clean with uniqueness typing, or Elm with FRP (functional reactive programing) not Haskell. Haskell is one big lie.
Proof :
import Control.Concurrent
import System.IO as IO
import Data.IORef as IOR
import Control.Monad.STM
import Control.Concurrent.STM.TVar
limit = 150000
threadsCount = 50
-- Don't talk about purity in Haskell when we have race conditions
-- in unlocked memory ... PURE language don't need LOCKING because
-- there isn't any mutable state or another side effects !!
main = do
hSetBuffering stdout NoBuffering
putStr "Lock counter? : "
a <- getLine
if a == "y" || a == "yes" || a == "Yes" || a == "Y"
then withLocking
else noLocking
noLocking = do
counter <- newIORef 0
let doWork =
mapM_ (\_ -> IOR.modifyIORef counter (\x -> x + 1)) [1..limit]
threads <- mapM (\_ -> forkIO doWork) [1..threadsCount]
-- Sorry, it's dirty but time is expensive ...
threadDelay (15 * 1000 * 1000)
val <- IOR.readIORef counter
IO.putStrLn ("It may be " ++ show (threadsCount * limit) ++
" but it is " ++ show val)
withLocking = do
counter <- atomically (newTVar 0)
let doWork =
mapM_ (\_ -> atomically $ modifyTVar counter (\x ->
x + 1)) [1..limit]
threads <- mapM (\_ -> forkIO doWork) [1..threadsCount]
threadDelay (15 * 1000 * 1000)
val <- atomically $ readTVar counter
IO.putStrLn ("It may be " ++ show (threadsCount * limit) ++
" but it is " ++ show val)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string