How to get the “inflexible semantics of monad transformers” using extensible effects?

How to get the “inflexible semantics of monad transformers” using extensible effects? - haskell

Consider the following example.
newtype TooBig = TooBig Int deriving Show
choose :: MonadPlus m => [a] -> m a
choose = msum . map return
ex1 :: (MonadPlus m, MonadError TooBig m) => m Int
ex1 = do
x <- choose [5,7,1]
if x > 5
then throwError (TooBig x)
else return x
ex2 :: (MonadPlus m, MonadError TooBig m) => m Int
ex2 = ex1 `catchError` handler
where
handler (TooBig x) = if x > 7
then throwError (TooBig x)
else return x
ex3 :: Either TooBig [Int]
ex3 = runIdentity . runExceptT . runListT $ ex2
What should the value of ex3 be? If we use MTL then the answer is Right [7] which makes sense because ex1 is terminated since it throws an error, and the handler simply returns the pure value return 7 which is Right [7].
However, in the paper “Extensible Effects: An Alternative to Monad Transformers” by Oleg Kiselyov, et al. the authors say that this is “a surprising and undesirable result.” They expected the result to be Right [5,7,1] because the handler recovers from the exception by not re-throwing it. Essentially, they expected the catchError to be moved into ex1 as follows.
newtype TooBig = TooBig Int deriving Show
choose :: MonadPlus m => [a] -> m a
choose = msum . map return
ex1 :: (MonadPlus m, MonadError TooBig m) => m Int
ex1 = do
x <- choose [5,7,1]
if x > 5
then throwError (TooBig x) `catchError` handler
else return x
where
handler (TooBig x) = if x > 7
then throwError (TooBig x)
else return x
ex3 :: Either TooBig [Int]
ex3 = runIdentity . runExceptT . runListT $ ex1
Indeed, this is what extensible effects do. They change the semantics of the program by moving the effect handlers closer to the effect source. For example, local is moved closer to ask and catchError is moved closer to throwError. The authors of the paper tout this as one of the advantages of extensible effects over monad transformers, claiming that monad transformers have “inflexible semantics”.
But, what if I want the result to be Right [7] instead of Right [5,7,1] for whatever reason? As shown in the examples above, monad transformers can be used to get both results. However, because extensible effects always seem to move effect handlers closer to the effect source, it seems impossible to get the result Right [7].
So, the question is how to get the “inflexible semantics of monad transformers” using extensible effects? Is it possible to prevent individual effect handlers from moving closer to the effect source when using extensible effects? If not, then is this a limitation of extensible effects that needs to be addressed?

I'm also a little confused about the nuance in those excerpts from that particular paper. I think it's more useful to take a few steps back and to explain the motivations behind the enterprise of algebraic effects, to which that paper belongs.
The MTL approach is in some sense the most obvious and general: you have an interface (or "effect"), put it in a type class and call it a day. The cost of that generality is that it is unprincipled: you don't know what happens when you combine interfaces together. This issue appears most concretely when you implement an interface: you must implement all of them simultaneously. We like to think that each interface can be implemented in isolation in a dedicated transformer, but if you have two interfaces, say MonadPlus and MonadError, implemented by transformers ListT and ExceptT, in order to compose them, you will also have to either implement MonadError for ListT or MonadPlus for ExceptT. This O(n^2) instance problem is popularly understood as "just boilerplate", but the deeper issue is that if we allow interfaces to be of any shape, there is no telling what danger could hide in that "boilerplate", if it can even be implemented at all.
We must put more structure on those interfaces. For some definition of "lift" (lift from MonadTrans), the effects we can lift uniformly through transformers are exactly the algebraic effects. (See also, Monad Transformers and Modular Algebraic Effects, What Binds Them Together.)
This is not truly a restriction. While some interfaces are not algebraic in a technical sense, such as MonadError (because of catch), they can usually still be expressed within the framework of algebraic effects, just less literally. While restricting the definition of an "interface", we also gain richer ways of using them.
So I think algebraic effects are a different, more precise way of thinking about interfaces before all. As a way of thinking, it can thus be adopted without changing anything about your code, which is why comparisons tend to look at the same code twice and it is difficult to see the point without having a grasp on the surrounding context and motivations. If you think the O(n^2) instances problem is a trivial "boilerplate" problem, you already believe in the principle that interfaces ought to be composable; algebraic effects are one way of explicitly designing libraries and languages around that principle.
"Algebraic effects" are a fuzzy notion without a fixed definition. Nowadays they are most recognizable by syntax featuring a call and a handle construct (or op/perform/throw/raise and catch/match). call is the one construct to use interfaces and handle is how we implement them. The idea common to such languages is that there are equations (hence "algebraic") that provide a basic intuition of how call and handle behave in a way that's independent of the interface, notably via the interaction of handle with sequential composition (>>=).
Semantically, the meaning of a program can be denoted by a tree of calls, and a handle is a transformation of such trees. That's why many incarnations of "algebraic effects" in Haskell start with free monads, types of trees parameterized by the type of nodes f:
data Free f a
= Pure a
| Free (f (Free f a))
From that point of view, the program ex2 is a tree with three branches, with the branch labeled 7 ending in an exception:
ex2 :: Free ([] :+: Const Int) Int -- The functor "Const e" models exceptions (the equivalent of "MonadError e")
ex2 = Free [Pure 5, Free (Const 7), Pure 1]
-- You can write this with do notation to look like the original ex2, I'd say "it's just notation".
-- NB: constructors for (:+:) omitted
And each of the effects [] and Const Int corresponds to some way of transforming the tree, eliminating that effect from the tree (possibly introducing others, including itself).
"Catching" an exception corresponds to handling the Const effect by converting Free (Const x) nodes into some new tree h x.
To handle the [] effect, one way is to compose all children of a Free [...] node using (>>=), collecting their results in a final list. This can be seen as a generalization of depth-first search.
You get the result [7] or [5,7,1] depending on how those transformations are ordered.
Of course, there is a correspondence to the two orders of monad transformers in a MTL approach, but that intuition of programs as trees, which is generally applicable to all algebraic effects, is not as obvious when you're in the middle of implementing an instance such as MonadError e for ListT. That intuition might make sense a posteriori, but it is a priori obfuscated because type class instances are not first-class values like handlers, and monad transformers are typically expressed in terms of the final interpretation (hidden in the monad m they transform) instead of the initial syntax.

Related

Why do we need monads?

In my humble opinion the answers to the famous question "What is a monad?", especially the most voted ones, try to explain what is a monad without clearly explaining why monads are really necessary. Can they be explained as the solution to a problem?

Why do we need monads?
We want to program only using functions. ("functional programming (FP)" after all).
Then, we have a first big problem. This is a program:
f(x) = 2 * x
g(x,y) = x / y
How can we say what is to be executed first? How can we form an ordered sequence of functions (i.e. a program) using no more than functions?
Solution: compose functions. If you want first g and then f, just write f(g(x,y)). This way, "the program" is a function as well: main = f(g(x,y)). OK, but ...
More problems: some functions might fail (i.e. g(2,0), divide by 0). We have no "exceptions" in FP (an exception is not a function). How do we solve it?
Solution: Let's allow functions to return two kind of things: instead of having g : Real,Real -> Real (function from two reals into a real), let's allow g : Real,Real -> Real | Nothing (function from two reals into (real or nothing)).
But functions should (to be simpler) return only one thing.
Solution: let's create a new type of data to be returned, a "boxing type" that encloses maybe a real or be simply nothing. Hence, we can have g : Real,Real -> Maybe Real. OK, but ...
What happens now to f(g(x,y))? f is not ready to consume a Maybe Real. And, we don't want to change every function we could connect with g to consume a Maybe Real.
Solution: let's have a special function to "connect"/"compose"/"link" functions. That way, we can, behind the scenes, adapt the output of one function to feed the following one.
In our case: g >>= f (connect/compose g to f). We want >>= to get g's output, inspect it and, in case it is Nothing just don't call f and return Nothing; or on the contrary, extract the boxed Real and feed f with it. (This algorithm is just the implementation of >>= for the Maybe type). Also note that >>= must be written only once per "boxing type" (different box, different adapting algorithm).
Many other problems arise which can be solved using this same pattern: 1. Use a "box" to codify/store different meanings/values, and have functions like g that return those "boxed values". 2. Have a composer/linker g >>= f to help connecting g's output to f's input, so we don't have to change any f at all.
Remarkable problems that can be solved using this technique are:
having a global state that every function in the sequence of functions ("the program") can share: solution StateMonad.
We don't like "impure functions": functions that yield different output for same input. Therefore, let's mark those functions, making them to return a tagged/boxed value: IO monad.
Total happiness!

The answer is, of course, "We don't". As with all abstractions, it isn't necessary.
Haskell does not need a monad abstraction. It isn't necessary for performing IO in a pure language. The IO type takes care of that just fine by itself. The existing monadic desugaring of do blocks could be replaced with desugaring to bindIO, returnIO, and failIO as defined in the GHC.Base module. (It's not a documented module on hackage, so I'll have to point at its source for documentation.) So no, there's no need for the monad abstraction.
So if it's not needed, why does it exist? Because it was found that many patterns of computation form monadic structures. Abstraction of a structure allows for writing code that works across all instances of that structure. To put it more concisely - code reuse.
In functional languages, the most powerful tool found for code reuse has been composition of functions. The good old (.) :: (b -> c) -> (a -> b) -> (a -> c) operator is exceedingly powerful. It makes it easy to write tiny functions and glue them together with minimal syntactic or semantic overhead.
But there are cases when the types don't work out quite right. What do you do when you have foo :: (b -> Maybe c) and bar :: (a -> Maybe b)? foo . bar doesn't typecheck, because b and Maybe b aren't the same type.
But... it's almost right. You just want a bit of leeway. You want to be able to treat Maybe b as if it were basically b. It's a poor idea to just flat-out treat them as the same type, though. That's more or less the same thing as null pointers, which Tony Hoare famously called the billion-dollar mistake. So if you can't treat them as the same type, maybe you can find a way to extend the composition mechanism (.) provides.
In that case, it's important to really examine the theory underlying (.). Fortunately, someone has already done this for us. It turns out that the combination of (.) and id form a mathematical construct known as a category. But there are other ways to form categories. A Kleisli category, for instance, allows the objects being composed to be augmented a bit. A Kleisli category for Maybe would consist of (.) :: (b -> Maybe c) -> (a -> Maybe b) -> (a -> Maybe c) and id :: a -> Maybe a. That is, the objects in the category augment the (->) with a Maybe, so (a -> b) becomes (a -> Maybe b).
And suddenly, we've extended the power of composition to things that the traditional (.) operation doesn't work on. This is a source of new abstraction power. Kleisli categories work with more types than just Maybe. They work with every type that can assemble a proper category, obeying the category laws.
Left identity: id . f = f
Right identity: f . id = f
Associativity: f . (g . h) = (f . g) . h
As long as you can prove that your type obeys those three laws, you can turn it into a Kleisli category. And what's the big deal about that? Well, it turns out that monads are exactly the same thing as Kleisli categories. Monad's return is the same as Kleisli id. Monad's (>>=) isn't identical to Kleisli (.), but it turns out to be very easy to write each in terms of the other. And the category laws are the same as the monad laws, when you translate them across the difference between (>>=) and (.).
So why go through all this bother? Why have a Monad abstraction in the language? As I alluded to above, it enables code reuse. It even enables code reuse along two different dimensions.
The first dimension of code reuse comes directly from the presence of the abstraction. You can write code that works across all instances of the abstraction. There's the entire monad-loops package consisting of loops that work with any instance of Monad.
The second dimension is indirect, but it follows from the existence of composition. When composition is easy, it's natural to write code in small, reusable chunks. This is the same way having the (.) operator for functions encourages writing small, reusable functions.
So why does the abstraction exist? Because it's proven to be a tool that enables more composition in code, resulting in creating reusable code and encouraging the creation of more reusable code. Code reuse is one of the holy grails of programming. The monad abstraction exists because it moves us a little bit towards that holy grail.

Benjamin Pierce said in TAPL
A type system can be regarded as calculating a kind of static
approximation to the run-time behaviours of the terms in a program.
That's why a language equipped with a powerful type system is strictly more expressive, than a poorly typed language. You can think about monads in the same way.
As #Carl and sigfpe point, you can equip a datatype with all operations you want without resorting to monads, typeclasses or whatever other abstract stuff. However monads allow you not only to write reusable code, but also to abstract away all redundant detailes.
As an example, let's say we want to filter a list. The simplest way is to use the filter function: filter (> 3) [1..10], which equals [4,5,6,7,8,9,10].
A slightly more complicated version of filter, that also passes an accumulator from left to right, is
swap (x, y) = (y, x)
(.*) = (.) . (.)
filterAccum :: (a -> b -> (Bool, a)) -> a -> [b] -> [b]
filterAccum f a xs = [x | (x, True) <- zip xs $ snd $ mapAccumL (swap .* f) a xs]
To get all i, such that i <= 10, sum [1..i] > 4, sum [1..i] < 25, we can write
filterAccum (\a x -> let a' = a + x in (a' > 4 && a' < 25, a')) 0 [1..10]
which equals [3,4,5,6].
Or we can redefine the nub function, that removes duplicate elements from a list, in terms of filterAccum:
nub' = filterAccum (\a x -> (x `notElem` a, x:a)) []
nub' [1,2,4,5,4,3,1,8,9,4] equals [1,2,4,5,3,8,9]. A list is passed as an accumulator here. The code works, because it's possible to leave the list monad, so the whole computation stays pure (notElem doesn't use >>= actually, but it could). However it's not possible to safely leave the IO monad (i.e. you cannot execute an IO action and return a pure value — the value always will be wrapped in the IO monad). Another example is mutable arrays: after you have leaved the ST monad, where a mutable array live, you cannot update the array in constant time anymore. So we need a monadic filtering from the Control.Monad module:
filterM :: (Monad m) => (a -> m Bool) -> [a] -> m [a]
filterM _ [] = return []
filterM p (x:xs) = do
flg <- p x
ys <- filterM p xs
return (if flg then x:ys else ys)
filterM executes a monadic action for all elements from a list, yielding elements, for which the monadic action returns True.
A filtering example with an array:
nub' xs = runST $ do
arr <- newArray (1, 9) True :: ST s (STUArray s Int Bool)
let p i = readArray arr i <* writeArray arr i False
filterM p xs
main = print $ nub' [1,2,4,5,4,3,1,8,9,4]
prints [1,2,4,5,3,8,9] as expected.
And a version with the IO monad, which asks what elements to return:
main = filterM p [1,2,4,5] >>= print where
p i = putStrLn ("return " ++ show i ++ "?") *> readLn
E.g.
return 1? -- output
True -- input
return 2?
False
return 4?
False
return 5?
True
[1,5] -- output
And as a final illustration, filterAccum can be defined in terms of filterM:
filterAccum f a xs = evalState (filterM (state . flip f) xs) a
with the StateT monad, that is used under the hood, being just an ordinary datatype.
This example illustrates, that monads not only allow you to abstract computational context and write clean reusable code (due to the composability of monads, as #Carl explains), but also to treat user-defined datatypes and built-in primitives uniformly.

I don't think IO should be seen as a particularly outstanding monad, but it's certainly one of the more astounding ones for beginners, so I'll use it for my explanation.
Naïvely building an IO system for Haskell
The simplest conceivable IO system for a purely-functional language (and in fact the one Haskell started out with) is this:
main₀ :: String -> String
main₀ _ = "Hello World"
With lazyness, that simple signature is enough to actually build interactive terminal programs – very limited, though. Most frustrating is that we can only output text. What if we added some more exciting output possibilities?
data Output = TxtOutput String
| Beep Frequency
main₁ :: String -> [Output]
main₁ _ = [ TxtOutput "Hello World"
-- , Beep 440 -- for debugging
]
cute, but of course a much more realistic “alterative output” would be writing to a file. But then you'd also want some way to read from files. Any chance?
Well, when we take our main₁ program and simply pipe a file to the process (using operating system facilities), we have essentially implemented file-reading. If we could trigger that file-reading from within the Haskell language...
readFile :: Filepath -> (String -> [Output]) -> [Output]
This would use an “interactive program” String->[Output], feed it a string obtained from a file, and yield a non-interactive program that simply executes the given one.
There's one problem here: we don't really have a notion of when the file is read. The [Output] list sure gives a nice order to the outputs, but we don't get an order for when the inputs will be done.
Solution: make input-events also items in the list of things to do.
data IO₀ = TxtOut String
| TxtIn (String -> [Output])
| FileWrite FilePath String
| FileRead FilePath (String -> [Output])
| Beep Double
main₂ :: String -> [IO₀]
main₂ _ = [ FileRead "/dev/null" $ \_ ->
[TxtOutput "Hello World"]
]
Ok, now you may spot an imbalance: you can read a file and make output dependent on it, but you can't use the file contents to decide to e.g. also read another file. Obvious solution: make the result of the input-events also something of type IO, not just Output. That sure includes simple text output, but also allows reading additional files etc..
data IO₁ = TxtOut String
| TxtIn (String -> [IO₁])
| FileWrite FilePath String
| FileRead FilePath (String -> [IO₁])
| Beep Double
main₃ :: String -> [IO₁]
main₃ _ = [ TxtIn $ \_ ->
[TxtOut "Hello World"]
]
That would now actually allow you to express any file operation you might want in a program (though perhaps not with good performance), but it's somewhat overcomplicated:
main₃ yields a whole list of actions. Why don't we simply use the signature :: IO₁, which has this as a special case?
The lists don't really give a reliable overview of program flow anymore: most subsequent computations will only be “announced” as the result of some input operation. So we might as well ditch the list structure, and simply cons a “and then do” to each output operation.
data IO₂ = TxtOut String IO₂
| TxtIn (String -> IO₂)
| Terminate
main₄ :: IO₂
main₄ = TxtIn $ \_ ->
TxtOut "Hello World"
Terminate
Not too bad!
So what has all of this to do with monads?
In practice, you wouldn't want to use plain constructors to define all your programs. There would need to be a good couple of such fundamental constructors, yet for most higher-level stuff we would like to write a function with some nice high-level signature. It turns out most of these would look quite similar: accept some kind of meaningfully-typed value, and yield an IO action as the result.
getTime :: (UTCTime -> IO₂) -> IO₂
randomRIO :: Random r => (r,r) -> (r -> IO₂) -> IO₂
findFile :: RegEx -> (Maybe FilePath -> IO₂) -> IO₂
There's evidently a pattern here, and we'd better write it as
type IO₃ a = (a -> IO₂) -> IO₂ -- If this reminds you of continuation-passing
-- style, you're right.
getTime :: IO₃ UTCTime
randomRIO :: Random r => (r,r) -> IO₃ r
findFile :: RegEx -> IO₃ (Maybe FilePath)
Now that starts to look familiar, but we're still only dealing with thinly-disguised plain functions under the hood, and that's risky: each “value-action” has the responsibility of actually passing on the resulting action of any contained function (else the control flow of the entire program is easily disrupted by one ill-behaved action in the middle). We'd better make that requirement explicit. Well, it turns out those are the monad laws, though I'm not sure we can really formulate them without the standard bind/join operators.
At any rate, we've now reached a formulation of IO that has a proper monad instance:
data IO₄ a = TxtOut String (IO₄ a)
| TxtIn (String -> IO₄ a)
| TerminateWith a
txtOut :: String -> IO₄ ()
txtOut s = TxtOut s $ TerminateWith ()
txtIn :: IO₄ String
txtIn = TxtIn $ TerminateWith
instance Functor IO₄ where
fmap f (TerminateWith a) = TerminateWith $ f a
fmap f (TxtIn g) = TxtIn $ fmap f . g
fmap f (TxtOut s c) = TxtOut s $ fmap f c
instance Applicative IO₄ where
pure = TerminateWith
(<*>) = ap
instance Monad IO₄ where
TerminateWith x >>= f = f x
TxtOut s c >>= f = TxtOut s $ c >>= f
TxtIn g >>= f = TxtIn $ (>>=f) . g
Obviously this is not an efficient implementation of IO, but it's in principle usable.

Monads serve basically to compose functions together in a chain. Period.
Now the way they compose differs across the existing monads, thus resulting in different behaviors (e.g., to simulate mutable state in the state monad).
The confusion about monads is that being so general, i.e., a mechanism to compose functions, they can be used for many things, thus leading people to believe that monads are about state, about IO, etc, when they are only about "composing functions".
Now, one interesting thing about monads, is that the result of the composition is always of type "M a", that is, a value inside an envelope tagged with "M". This feature happens to be really nice to implement, for example, a clear separation between pure from impure code: declare all impure actions as functions of type "IO a" and provide no function, when defining the IO monad, to take out the "a" value from inside the "IO a". The result is that no function can be pure and at the same time take out a value from an "IO a", because there is no way to take such value while staying pure (the function must be inside the "IO" monad to use such value). (NOTE: well, nothing is perfect, so the "IO straitjacket" can be broken using "unsafePerformIO : IO a -> a" thus polluting what was supposed to be a pure function, but this should be used very sparingly and when you really know to be not introducing any impure code with side-effects.

Monads are just a convenient framework for solving a class of recurring problems. First, monads must be functors (i.e. must support mapping without looking at the elements (or their type)), they must also bring a binding (or chaining) operation and a way to create a monadic value from an element type (return). Finally, bind and return must satisfy two equations (left and right identities), also called the monad laws. (Alternatively one could define monads to have a flattening operation instead of binding.)
The list monad is commonly used to deal with non-determinism. The bind operation selects one element of the list (intuitively all of them in parallel worlds), lets the programmer to do some computation with them, and then combines the results in all worlds to single list (by concatenating, or flattening, a nested list). Here is how one would define a permutation function in the monadic framework of Haskell:
perm [e] = [[e]]
perm l = do (leader, index) <- zip l [0 :: Int ..]
let shortened = take index l ++ drop (index + 1) l
trailer <- perm shortened
return (leader : trailer)
Here is an example repl session:
*Main> perm "a"
["a"]
*Main> perm "ab"
["ab","ba"]
*Main> perm ""
[]
*Main> perm "abc"
["abc","acb","bac","bca","cab","cba"]
It should be noted that the list monad is in no way a side effecting computation. A mathematical structure being a monad (i.e. conforming to the above mentioned interfaces and laws) does not imply side effects, though side-effecting phenomena often nicely fit into the monadic framework.

You need monads if you have a type constructor and functions that returns values of that type family. Eventually, you would like to combine these kind of functions together. These are the three key elements to answer why.
Let me elaborate. You have Int, String and Real and functions of type Int -> String, String -> Real and so on. You can combine these functions easily, ending with Int -> Real. Life is good.
Then, one day, you need to create a new family of types. It could be because you need to consider the possibility of returning no value (Maybe), returning an error (Either), multiple results (List) and so on.
Notice that Maybe is a type constructor. It takes a type, like Int and returns a new type Maybe Int. First thing to remember, no type constructor, no monad.
Of course, you want to use your type constructor in your code, and soon you end with functions like Int -> Maybe String and String -> Maybe Float. Now, you can't easily combine your functions. Life is not good anymore.
And here's when monads come to the rescue. They allow you to combine that kind of functions again. You just need to change the composition . for >==.

Why do we need monadic types?
Since it was the quandary of I/O and its observable effects in nonstrict languages like Haskell that brought the monadic interface to such prominence:
[...] monads are used to address the more general problem of computations (involving state, input/output, backtracking, ...) returning values: they do not solve any input/output-problems directly but rather provide an elegant and flexible abstraction of many solutions to related problems. [...] For instance, no less than three different input/output-schemes are used to solve these basic problems in Imperative functional programming, the paper which originally proposed `a new model, based on monads, for performing input/output in a non-strict, purely functional language'. [...]
[Such] input/output-schemes merely provide frameworks in which side-effecting operations can safely be used with a guaranteed order of execution and without affecting the properties of the purely functional parts of the language.
Claus Reinke (pages 96-97 of 210).
(emphasis by me.)
[...] When we write effectful code – monads or no monads – we have to constantly keep in mind the context of expressions we pass around.
The fact that monadic code ‘desugars’ (is implementable in terms of) side-effect-free code is irrelevant. When we use monadic notation, we program within that notation – without considering what this notation desugars into. Thinking of the desugared code breaks the monadic abstraction. A side-effect-free, applicative code is normally compiled to (that is, desugars into) C or machine code. If the desugaring argument has any force, it may be applied just as well to the applicative code, leading to the conclusion that it all boils down to the machine code and hence all programming is imperative.
[...] From the personal experience, I have noticed that the mistakes I make when writing monadic code are exactly the mistakes I made when programming in C. Actually, monadic mistakes tend to be worse, because monadic notation (compared to that of a typical imperative language) is ungainly and obscuring.
Oleg Kiselyov (page 21 of 26).
The most difficult construct for students to understand is the monad. I introduce IO without mentioning monads.
Olaf Chitil.
More generally:
Still, today, over 25 years after the introduction of the concept of monads to the world of functional programming, beginning functional programmers struggle to grasp the concept of monads. This struggle is exemplified by the numerous blog posts about the effort of trying to learn about monads. From our own experience we notice that even at university level, bachelor level students often struggle to comprehend monads and consistently score poorly on monad-related exam questions.
Considering that the concept of monads is not likely to disappear from the functional programming landscape any time soon, it is vital that we, as the functional programming community, somehow overcome the problems novices encounter when first studying monads.
Tim Steenvoorden, Jurriën Stutterheim, Erik Barendsen and Rinus Plasmeijer.
If only there was another way to specify "a guaranteed order of execution" in Haskell, while keeping the ability to separate regular Haskell definitions from those involved in I/O (and its observable effects) - translating this variation of Philip Wadler's echo:
val echoML : unit -> unit
fun echoML () = let val c = getcML () in
if c = #"\n" then
()
else
let val _ = putcML c in
echoML ()
end
fun putcML c = TextIO.output1(TextIO.stdOut,c);
fun getcML () = valOf(TextIO.input1(TextIO.stdIn));
...could then be as simple as:
echo :: OI -> ()
echo u = let !(u1:u2:u3:_) = partsOI u in
let !c = getChar u1 in
if c == '\n' then
()
else
let !_ = putChar c u2 in
echo u3
where:
data OI -- abstract
foreign import ccall "primPartOI" partOI :: OI -> (OI, OI)
⋮
foreign import ccall "primGetCharOI" getChar :: OI -> Char
foreign import ccall "primPutCharOI" putChar :: Char -> OI -> ()
⋮
and:
partsOI :: OI -> [OI]
partsOI u = let !(u1, u2) = partOI u in u1 : partsOI u2
How would this work? At run-time, Main.main receives an initial OI pseudo-data value as an argument:
module Main(main) where
main :: OI -> ()
⋮
...from which other OI values are produced, using partOI or partsOI. All you have to do is ensure each new OI value is used at most once, in each call to an OI-based definition, foreign or otherwise. In return, you get back a plain ordinary result - it isn't e.g. paired with some odd abstract state, or requires the use of a callback continuation, etc.
Using OI, instead of the unit type () like Standard ML does, means we can avoid always having to use the monadic interface:
Once you're in the IO monad, you're stuck there forever, and are reduced to Algol-style imperative programming.
Robert Harper.
But if you really do need it:
type IO a = OI -> a
unitIO :: a -> IO a
unitIO x = \ u -> let !_ = partOI u in x
bindIO :: IO a -> (a -> IO b) -> IO b
bindIO m k = \ u -> let !(u1, u2) = partOI u in
let !x = m u1 in
let !y = k x u2 in
y
⋮
So, monadic types aren't always needed - there are other interfaces out there:
LML had a fully fledged implementation of oracles running of a multi-processor (a Sequent Symmetry) back in ca 1989. The description in the Fudgets thesis refers to this implementation. It was fairly pleasant to work with and quite practical.
[...]
These days everything is done with monads so other solutions are sometimes forgotten.
Lennart Augustsson (2006).
Wait a moment: since it so closely resembles Standard ML's direct use of effects, is this approach and its use of pseudo-data referentially transparent?
Absolutely - just find a suitable definition of "referential transparency"; there's plenty to choose from...

Is it better to define Functor in terms of Applicative in terms of Monad, or vice versa?

This is a general question, not tied to any one piece of code.
Say you have a type T a that can be given an instance of Monad. Since every monad is an Applicative by assigning pure = return and (<*>) = ap, and then every applicative is a Functor via fmap f x = pure f <*> x, is it better to define your instance of Monad first, and then trivially give T instances of Applicative and Functor?
It feels a bit backward to me. If I were doing math instead of programming, I would think that I would first show that my object is a functor, and then continue adding restrictions until I have also shown it to be a monad. I know Haskell is merely inspired by Category Theory and obviously the techniques one would use when constructing a proof aren't the techniques one would use when writing a useful program, but I'd like to get an opinion from the Haskell community. Is it better to go from Monad down to Functor? or from Functor up to Monad?

I tend to write and see written the Functor instance first. Doubly so because if you use the LANGUAGE DeriveFunctor pragma then data Foo a = Foo a deriving ( Functor ) works most of the time.
The tricky bits are around agreement of instances when your Applicative can be more general than your Monad. For instance, here's an Err data type
data Err e a = Err [e] | Ok a deriving ( Functor )
instance Applicative (Err e) where
pure = Ok
Err es <*> Err es' = Err (es ++ es')
Err es <*> _ = Err es
_ <*> Err es = Err es
Ok f <*> Ok x = Ok (f x)
instance Monad (Err e) where
return = pure
Err es >>= _ = Err es
Ok a >>= f = f a
Above I defined the instances in Functor-to-Monad order and, taken in isolation, each instance is correct. Unfortunately, the Applicative and Monad instances do not align: ap and (<*>) are observably different as are (>>) and (*>).
Err "hi" <*> Err "bye" == Err "hibye"
Err "hi" `ap` Err "bye" == Err "hi"
For sensibility purposes, especially once the Applicative/Monad Proposal is in everyone's hands, these should align. If you defined instance Applicative (Err e) where { pure = return; (<*>) = ap } then they will align.
But then, finally, you may be capable of carefully teasing apart the differences in Applicative and Monad so that they behave differently in benign ways---such as having a lazier or more efficient Applicative instance. This actually occurs fairly frequently and I feel the jury is still a little bit out on what "benign" means and under what kinds of "observation" should your instances align. Perhaps some of the most gregarious use of this is in the Haxl project at Facebook where the Applicative instance is more parallelized than the Monad instance, and thus is far more efficient at the cost of some fairly severe "unobserved" side effects.
In any case, if they differ, document it.

I often choose a reverse approach as compared to the one in Abrahamson's answer. I manually define only the Monad instance and define the Applicative and Functor in terms of it with the help of already defined functions in the Control.Monad, which renders those instances the same for absolutely any monad, i.e.:
instance Applicative SomeMonad where
pure = return
(<*>) = ap
instance Functore SomeMonad where
fmap = liftM
While this way the definition of Functor and Applicative is always "brain-free" and very easy to reason about, I must note that this is not the ultimate solution, since there are cases, when the instances can be implemented more efficiently or even provide new features. E.g., the Applicative instance of Concurrently executes things ... concurrently, while the Monad instance can only execute them sequentially due to the nature of monads.

Functor instances are typically very simple to define, I'd normally do those by hand.
For Applicative and Monad, it depends. pure and return are usually similarly easy, and it really doesn't matter in which class you put the expanded definition. For bind, it is sometimes benfitial to go the "category way", i.e. define a specialised join' :: (M (M x)) -> M x first and then a>>=b = join' $ fmap b a (which of course wouldn't work if you had defined fmap in terms of >>=). Then it's probably useful to just re-use (>>=) for the Applicative instance.
Other times, the Applicative instance can be written quite easily or is more efficient than the generic Monad-derived implementation. In that case, you should definitely define <*> separately.

The magic here, that the Haskell uses the Kleisli-tiplet notation of a monad, that
is more convenient way, if somebody wants to use monads in imperative programming like tools.
I asked the same question, and the answer come after a while, if you see the definitions
of the Functor, Applicative, Monad in haskell you miss one link, which is the original definition of the monad, which contains only the join operation, that can be found on the HaskellWiki.
With this point of view you will see how haskell monads are built up functor, applicative functors, monads and Kliesli triplet.
A rough explanation can be found here: https://github.com/andorp/pearls/blob/master/Monad.hs
And other with the same ideas here: http://people.inf.elte.hu/pgj/haskell2/jegyzet/08/Monad.hs

I think you mis-understand how sub-classes work in Haskell. They aren't like OO sub-classes! Instead, a sub-class constraint, like
class Applicative m => Monad m
says "any type with a canonical Monad structure must also have a canonical Applicative structure". There are two basic reasons why you would place a constraint like that:
The sub-class structure induces a super-class structure.
The super-class structure is a natural subset of the sub-class structure.
For example, consider:
class Vector v where
(.^) :: Double -> v -> v
(+^) :: v -> v -> v
negateV :: v -> v
class Metric a where
distance :: a -> a -> Double
class (Vector v, Metric v) => Norm v where
norm :: v -> Double
The first super-class constraint on Norm arises because the concept of a normed space is really weak unless you also assume a vector space structure; the second arises because (given a vector space) a Norm induces a Metric, which you can prove by observing that
instance Metric V where
distance v0 v1 = norm (v0 .^ negateV v1)
is a valid Metric instance for any V with a valid Vector instance and a valid norm function. We say that the norm induces a metric. See http://en.wikipedia.org/wiki/Normed_vector_space#Topological_structure .
The Functor and Applicative super-classes on Monad are like Metric, not like Vector: the return and >>= functions from Monad induce Functor and Applicative structures:
fmap: can be defined as fmap f a = a >>= return . f, which was liftM in the Haskell 98 standard library.
pure: is the same operation as return; the two names is a legacy from when Applicative wasn't a super-class of Monad.
<*>: can be defined as af <*> ax = af >>= \ f -> ax >>= \ x -> return (f x), which was liftM2 ($) in the Haskell 98 standard library.
join: can be defined as join aa = aa >>= id.
So it's perfectly sensible, mathematically, to define the Functor and Applicative operations in terms of Monad.

What can Arrows do that Monads can't?

Arrows seem to be gaining popularity in the Haskell community, but it seems to me like Monads are more powerful. What is gained by using Arrows? Why can't Monads be used instead?

Every monad gives rise to an arrow
newtype Kleisli m a b = Kleisli (a -> m b)
instance Monad m => Category (Kleisli m) where
id = Kleisli return
(Kleisli f) . (Kleisli g) = Kleisli (\x -> (g x) >>= f)
instance Monad m => Arrow (Kleisli m) where
arr f = Kleisli (return . f)
first (Kleisli f) = Kleisli (\(a,b) -> (f a) >>= \fa -> return (fa,b))
But, there are arrows which are not monads. Thus, there are arrows which do things that you can't do with monads. A good example is the arrow transformer to add some static information
data StaticT m c a b = StaticT m (c a b)
instance (Category c, Monoid m) => Category (StaticT m c) where
id = StaticT mempty id
(StaticT m1 f) . (StaticT m2 g) = StaticT (m1 <> m2) (f . g)
instance (Arrow c, Monoid m) => Arrow (StaticT m c) where
arr f = StaticT mempty (arr f)
first (StaticT m f) = StaticT m (first f)
this arrow tranformer is usefull because it can be used to keep track of static properties of a program. For example, you can use this to instrument your API to statically measure how many calls you are making.

I've always found it difficult to think of the issue in these terms: what is gained by using arrows. As other commenters have mentioned, every monad can trivially be turned into an arrow. So a monad can do all the arrow-y things. However, we can make Arrows that are not monads. That is to say, we can make types that can do these arrow-y things without making them support monadic binding. It might not seem like the case, but the monadic bind function is actually a pretty restrictive (hence powerful) operation that disqualifies many types.
See, to support bind, you have to be able to assert that that regardless of the input type, what's going to come out is going to be wrapped in the monad.
(>>=) :: forall a b. m a -> (a -> m b) -> m b
But, how would we define bind for a type like data Foo a = F Bool a Surely, we could combine one Foo's a with another's but how would we combine the Bools. Imagine that the Bool marked, say, whether or not the value of the other parameter had changed. If I have a = Foo False whatever and I bind it into a function, I have no idea whether or not that function is going to change whatever. I can't write a bind that correctly sets the Bool. This is often called the problem of static meta-information. I cannot inspect the function being bound into to determine whether or not it will alter whatever.
There are several other cases like this: types that represent mutating functions, parsers that can exit early, etc. But the basic idea is this: monads set a high bar that not all types can clear. Arrows allow you to compose types (that may or may not be able to support this high, binding standard) in powerful ways without having to satisfy bind. Of course, you do lose some of the power of monads.
Moral of the story: there's nothing an arrow can do that monad cannot, because a monad can always be made into an arrow. However, sometimes you can't make your types into monads but you still want to allow them to have most of the compositional flexibility and power of monads.
Many of these ideas were inspired by the superb Understanding Haskell Arrows (backup)

Well, I'm going to cheat slightly here by changing the question from Arrow to Applicative. A lot of the same motives apply, and I know applicatives better than arrows. (And in fact, every Arrow is also an Applicative but not vice-versa, so I'm just taking it down a bit further down the slope to Functor.)
Just like every Monad is an Arrow, every Monad is also an Applicative. There are Applicatives that are not Monads (e.g., ZipList), so that's one possible answer.
But assume we're dealing with a type that admits of a Monad instance as well as an Applicative. Why might we sometime use the Applicative instance instead of Monad? Because Applicative is less powerful, and that comes with benefits:
There are things that we know that the Monad can do which the Applicative cannot. For example, if we use the Applicative instance of IO to assemble a compound action from simpler ones, none of the actions we compose may use the results of any of the others. All that applicative IO can do is execute the component actions and combine their results with pure functions.
Applicative types can be written so that we can do powerful static analysis of the actions before executing them. So you can write a program that inspects an Applicative action before executing it, figures out what it's going to do, and uses that to improve performance, tell the user what's going to be done, etc.
As an example of the first, I've been working on designing a kind of OLAP calculation language using Applicatives. The type admits of a Monad instance, but I've deliberately avoided having that, because I want the queries to be less powerful than what Monad would allow. Applicative means that each calculation will bottom out to a predictable number of queries.
As an example of the latter, I'll use a toy example from my still-under-development operational Applicative library. If you write the Reader monad as an operational Applicative program instead, you can examine the resulting Readers to count how many times they use the ask operation:
{-# LANGUAGE GADTs, RankNTypes, ScopedTypeVariables #-}
import Control.Applicative.Operational
-- | A 'Reader' is an 'Applicative' program that uses the 'ReaderI'
-- instruction set.
type Reader r a = ProgramAp (ReaderI r) a
-- | The only 'Reader' instruction is 'Ask', which requires both the
-- environment and result type to be #r#.
data ReaderI r a where
Ask :: ReaderI r r
ask :: Reader r r
ask = singleton Ask
-- | We run a 'Reader' by translating each instruction in the instruction set
-- into an #r -> a# function. In the case of 'Ask' the translation is 'id'.
runReader :: forall r a. Reader r a -> r -> a
runReader = interpretAp evalI
where evalI :: forall x. ReaderI r x -> r -> x
evalI Ask = id
-- | Count how many times a 'Reader' uses the 'Ask' instruction. The 'viewAp'
-- function translates a 'ProgramAp' into a syntax tree that we can inspect.
countAsk :: forall r a. Reader r a -> Int
countAsk = count . viewAp
where count :: forall x. ProgramViewAp (ReaderI r) x -> Int
-- Pure :: a -> ProgamViewAp instruction a
count (Pure _) = 0
-- (:<**>) :: instruction a
-- -> ProgramViewAp instruction (a -> b)
-- -> ProgramViewAp instruction b
count (Ask :<**> k) = succ (count k)
As best as I understand, you can't write countAsk if you implement Reader as a monad. (My understanding comes from asking right here in Stack Overflow, I'll add.)
This same motive is actually one of the ideas behind Arrows. One of the big motivating examples for Arrow was a parser combinator design that uses "static information" to get better performance than monadic parsers. What they mean by "static information" is more or less the same as in my Reader example: it's possible to write an Arrow instance where the parsers can be inspected very much like my Readers can. Then the parsing library can, before executing a parser, inspect it to see if it can predict ahead of time that it will fail, and skip it in that case.
In one of the direct comments to your question, jberryman mentions that arrows may in fact be losing popularity. I'd add that as I see it, Applicative is what arrows are losing popularity to.
References:
Paolo Capriotti & Ambrus Kaposi, "Free Applicative Functors". Very highly recommended.
Gergo Erdi, "Static analysis with Applicatives". Inspirational, but I it hard to follow...

The question isn't quite right. It's like asking why would you eat oranges instead of apples, since apples seem more nutritious all around.
Arrows, like monads, are a way of expressing computations, but they have to obey a different set of laws. In particular, the laws tend to make arrows nicer to use when you have function-like things.
The Haskell Wiki lists a few introductions to arrows. In particular, the Wikibook is a nice high level introduction, and the tutorial by John Hughes is a good overview of the various kinds of arrows.
For a real world example, compare this tutorial which uses Hakyll 3's arrow-based interface, with roughly the same thing in Hakyll 4's monad-based interface.

I always found one of the really practical use cases of arrows to be stream programming.
Look at this:
data Stream a = Stream a (Stream a)
data SF a b = SF (a -> (b, SF a b))
SF a b is a synchronous stream function.
You can define a function from it that transforms Stream a into Stream b that never hangs and always outputs one b for one a:
(<<$>>) :: SF a b -> Stream a -> Stream b
SF f <<$>> Stream a as = let (b, sf') = f a
in Stream b $ sf' <<$>> as
There is an Arrow instance for SF. In particular, you can compose SFs:
(>>>) :: SF a b -> SF b c -> SF a c
Now try to do this in monads. It doesn't work well. You might say that Stream a == Reader Nat a and thus it's a monad, but the monad instance is very inefficient. Imagine the type of join:
join :: Stream (Stream a) -> Stream a
You have to extract the diagonal from a stream of streams. This means O(n) complexity for the nth element, but using the Arrow instance for SFs gives you O(1) in principle! (And also deals with time and space leaks.)

Unlike a Functor, a Monad can change shape?

I've always enjoyed the following intuitive explanation of a monad's power relative to a functor: a monad can change shape; a functor cannot.
For example: length $ fmap f [1,2,3] always equals 3.
With a monad, however, length $ [1,2,3] >>= g will often not equal 3. For example, if g is defined as:
g :: (Num a) => a -> [a]
g x = if x==2 then [] else [x]
then [1,2,3] >>= g is equal to [1,3].
The thing that troubles me slightly, is the type signature of g. It seems impossible to define a function which changes the shape of the input, with a generic monadic type such as:
h :: (Monad m, Num a) => a -> m a
The MonadPlus or MonadZero type classes have relevant zero elements, to use instead of [], but now we have something more than a monad.
Am I correct? If so, is there a way to express this subtlety to a newcomer to Haskell. I'd like to make my beloved "monads can change shape" phrase, just a touch more honest; if need be.

I've always enjoyed the following intuitive explanation of a monad's power relative to a functor: a monad can change shape; a functor cannot.
You're missing a bit of subtlety here, by the way. For the sake of terminology, I'll divide a Functor in the Haskell sense into three parts: The parametric component determined by the type parameter and operated on by fmap, the unchanging parts such as the tuple constructor in State, and the "shape" as anything else, such as choices between constructors (e.g., Nothing vs. Just) or parts involving other type parameters (e.g., the environment in Reader).
A Functor alone is limited to mapping functions over the parametric portion, of course.
A Monad can create new "shapes" based on the values of the parametric portion, which allows much more than just changing shapes. Duplicating every element in a list or dropping the first five elements would change the shape, but filtering a list requires inspecting the elements.
This is essentially how Applicative fits between them--it allows you to combine the shapes and parametric values of two Functors independently, without letting the latter influence the former.
Am I correct? If so, is there a way to express this subtlety to a newcomer to Haskell. I'd like to make my beloved "monads can change shape" phrase, just a touch more honest; if need be.
Perhaps the subtlety you're looking for here is that you're not really "changing" anything. Nothing in a Monad lets you explicitly mess with the shape. What it lets you do is create new shapes based on each parametric value, and have those new shapes recombined into a new composite shape.
Thus, you'll always be limited by the available ways to create shapes. With a completely generic Monad all you have is return, which by definition creates whatever shape is necessary such that (>>= return) is the identity function. The definition of a Monad tells you what you can do, given certain kinds of functions; it doesn't provide those functions for you.

Monad's operations can "change the shape" of values to the extent that the >>= function replaces leaf nodes in the "tree" that is the original value with a new substructure derived from the node's value (for a suitably general notion of "tree" - in the list case, the "tree" is associative).
In your list example what is happening is that each number (leaf) is being replaced by the new list that results when g is applied to that number. The overall structure of the original list still can be seen if you know what you're looking for; the results of g are still there in order, they've just been smashed together so you can't tell where one ends and the next begins unless you already know.
A more enlightening point of view may be to consider fmap and join instead of >>=. Together with return, either way gives an equivalent definition of a monad. In the fmap/join view, though, what is happening here is more clear. Continuing with your list example, first g is fmapped over the list yielding [[1],[],[3]]. Then that list is joined, which for list is just concat.

Just because the monad pattern includes some particular instances that allow shape changes doesn't mean every instance can have shape changes. For example, there is only one "shape" available in the Identity monad:
newtype Identity a = Identity a
instance Monad Identity where
return = Identity
Identity a >>= f = f a
In fact, it's not clear to me that very many monads have meaningful "shape"s: for example, what does shape mean in the State, Reader, Writer, ST, STM, or IO monads?

The key combinator for monads is (>>=). Knowing that it composes two monadic values and reading its type signature, the power of monads becomes more apparent:
(>>=) :: Monad m => m a -> (a -> m b) -> m b
The future action can depend entirely on the outcome of the first action, because it is a function of its result. This power comes at a price though: Functions in Haskell are entirely opaque, so there is no way for you to get any information about a composed action without actually running it. As a side note, this is where arrows come in.

A function with a signature like h indeed cannot do many interesting things beyond performing some arithmetic on its argument. So, you have the correct intuition there.
However, it might help to look at commonly used libraries for functions with similar signatures. You'll find that the most generic ones, as you'd expect, perform generic monad operations like return, liftM, or join. Also, when you use liftM or fmap to lift an ordinary function into a monadic function, you typically wind up with a similarly generic signature, and this is quite convenient for integrating pure functions with monadic code.
In order to use the structure that a particular monad offers, you inevitably need to use some knowledge about the specific monad you're in to build new and interesting computations in that monad. Consider the state monad, (s -> (a, s)). Without knowing that type, we can't write get = \s -> (s, s), but without being able to access the state, there's not much point to being in the monad.

The simplest type of a function satisfying the requirement I can imagine is this:
enigma :: Monad m => m () -> m ()
One can implement it in one of the following ways:
enigma1 m = m -- not changing the shape
enigma2 _ = return () -- changing the shape
This was a very simple change -- enigma2 just discards the shape and replaces it with the trivial one. Another kind of generic change is combining two shapes together:
foo :: Monad m => m () -> m () -> m ()
foo a b = a >> b
The result of foo can have shape different from both a and b.
A third obvious change of shape, requiring the full power of the monad, is a
join :: Monad m => m (m a) -> m a
join x = x >>= id
The shape of join x is usually not the same as of x itself.
Combining those primitive changes of shape, one can derive non-trivial things like sequence, foldM and alike.

Does
h :: (Monad m, Num a) => a -> m a
h 0 = fail "Failed."
h a = return a
suit your needs? For example,
> [0,1,2,3] >>= h
[1,2,3]

This isn't a full answer, but I have a few things to say about your question that don't really fit into a comment.
Firstly, Monad and Functor are typeclasses; they classify types. So it is odd to say that "a monad can change shape; a functor cannot." I believe what you are trying to talk about is a "Monadic value" or perhaps a "monadic action": a value whose type is m a for some Monad m of kind * -> * and some other type of kind *. I'm not entirely sure what to call Functor f :: f a, I suppose I'd call it a "value in a functor", though that's not the best description of, say, IO String (IO is a functor).
Secondly, note that all Monads are necessarily Functors (fmap = liftM), so I'd say the difference you observe is between fmap and >>=, or even between f and g, rather than between Monad and Functor.

Why are side-effects modeled as monads in Haskell?

Could anyone give some pointers on why the impure computations in Haskell are modelled as monads?
I mean monad is just an interface with 4 operations, so what was the reasoning to modelling side-effects in it?

Suppose a function has side effects. If we take all the effects it produces as the input and output parameters, then the function is pure to the outside world.
So, for an impure function
f' :: Int -> Int
we add the RealWorld to the consideration
f :: Int -> RealWorld -> (Int, RealWorld)
-- input some states of the whole world,
-- modify the whole world because of the side effects,
-- then return the new world.
then f is pure again. We define a parametrized data type type IO a = RealWorld -> (a, RealWorld), so we don't need to type RealWorld so many times, and can just write
f :: Int -> IO Int
To the programmer, handling a RealWorld directly is too dangerous—in particular, if a programmer gets their hands on a value of type RealWorld, they might try to copy it, which is basically impossible. (Think of trying to copy the entire filesystem, for example. Where would you put it?) Therefore, our definition of IO encapsulates the states of the whole world as well.
Composition of "impure" functions
These impure functions are useless if we can't chain them together. Consider
getLine :: IO String ~ RealWorld -> (String, RealWorld)
getContents :: String -> IO String ~ String -> RealWorld -> (String, RealWorld)
putStrLn :: String -> IO () ~ String -> RealWorld -> ((), RealWorld)
We want to
get a filename from the console,
read that file, and
print that file's contents to the console.
How would we do it if we could access the real world states?
printFile :: RealWorld -> ((), RealWorld)
printFile world0 = let (filename, world1) = getLine world0
(contents, world2) = (getContents filename) world1
in (putStrLn contents) world2 -- results in ((), world3)
We see a pattern here. The functions are called like this:
...
(<result-of-f>, worldY) = f worldX
(<result-of-g>, worldZ) = g <result-of-f> worldY
...
So we could define an operator ~~~ to bind them:
(~~~) :: (IO b) -> (b -> IO c) -> IO c
(~~~) :: (RealWorld -> (b, RealWorld))
-> (b -> RealWorld -> (c, RealWorld))
-> (RealWorld -> (c, RealWorld))
(f ~~~ g) worldX = let (resF, worldY) = f worldX
in g resF worldY
then we could simply write
printFile = getLine ~~~ getContents ~~~ putStrLn
without touching the real world.
"Impurification"
Now suppose we want to make the file content uppercase as well. Uppercasing is a pure function
upperCase :: String -> String
But to make it into the real world, it has to return an IO String. It is easy to lift such a function:
impureUpperCase :: String -> RealWorld -> (String, RealWorld)
impureUpperCase str world = (upperCase str, world)
This can be generalized:
impurify :: a -> IO a
impurify :: a -> RealWorld -> (a, RealWorld)
impurify a world = (a, world)
so that impureUpperCase = impurify . upperCase, and we can write
printUpperCaseFile =
getLine ~~~ getContents ~~~ (impurify . upperCase) ~~~ putStrLn
(Note: Normally we write getLine ~~~ getContents ~~~ (putStrLn . upperCase))
We were working with monads all along
Now let's see what we've done:
We defined an operator (~~~) :: IO b -> (b -> IO c) -> IO c which chains two impure functions together
We defined a function impurify :: a -> IO a which converts a pure value to impure.
Now we make the identification (>>=) = (~~~) and return = impurify, and see? We've got a monad.
Technical note
To ensure it's really a monad, there's still a few axioms which need to be checked too:
return a >>= f = f a
impurify a = (\world -> (a, world))
(impurify a ~~~ f) worldX = let (resF, worldY) = (\world -> (a, world )) worldX
in f resF worldY
= let (resF, worldY) = (a, worldX)
in f resF worldY
= f a worldX
f >>= return = f
(f ~~~ impurify) worldX = let (resF, worldY) = f worldX
in impurify resF worldY
= let (resF, worldY) = f worldX
in (resF, worldY)
= f worldX
f >>= (\x -> g x >>= h) = (f >>= g) >>= h
Left as exercise.

Could anyone give some pointers on why the unpure computations in Haskell are modeled as monads?
This question contains a widespread misunderstanding.
Impurity and Monad are independent notions.
Impurity is not modeled by Monad.
Rather, there are a few data types, such as IO, that represent imperative computation.
And for some of those types, a tiny fraction of their interface corresponds to the interface pattern called "Monad".
Moreover, there is no known pure/functional/denotative explanation of IO (and there is unlikely to be one, considering the "sin bin" purpose of IO), though there is the commonly told story about World -> (a, World) being the meaning of IO a.
That story cannot truthfully describe IO, because IO supports concurrency and nondeterminism.
The story doesn't even work when for deterministic computations that allow mid-computation interaction with the world.
For more explanation, see this answer.
Edit: On re-reading the question, I don't think my answer is quite on track.
Models of imperative computation do often turn out to be monads, just as the question said.
The asker might not really assume that monadness in any way enables the modeling of imperative computation.

As I understand it, someone called Eugenio Moggi first noticed that a previously obscure mathematical construct called a "monad" could be used to model side effects in computer languages, and hence specify their semantics using Lambda calculus. When Haskell was being developed there were various ways in which impure computations were modelled (see Simon Peyton Jones' "hair shirt" paper for more details), but when Phil Wadler introduced monads it rapidly became obvious that this was The Answer. And the rest is history.

Could anyone give some pointers on why the unpure computations in Haskell are modeled as monads?
Well, because Haskell is pure. You need a mathematical concept to distinguish between unpure computations and pure ones on type-level and to model programm flows in respectively.
This means you'll have to end up with some type IO a that models an unpure computation. Then you need to know ways of combining these computations of which apply in sequence (>>=) and lift a value (return) are the most obvious and basic ones.
With these two, you've already defined a monad (without even thinking of it);)
In addition, monads provide very general and powerful abstractions, so many kinds of control flow can be conveniently generalized in monadic functions like sequence, liftM or special syntax, making unpureness not such a special case.
See monads in functional programming and uniqueness typing (the only alternative I know) for more information.

As you say, Monad is a very simple structure. One half of the answer is: Monad is the simplest structure that we could possibly give to side-effecting functions and be able to use them. With Monad we can do two things: we can treat a pure value as a side-effecting value (return), and we can apply a side-effecting function to a side-effecting value to get a new side-effecting value (>>=). Losing the ability to do either of these things would be crippling, so our side-effecting type needs to be "at least" Monad, and it turns out Monad is enough to implement everything we've needed to so far.
The other half is: what's the most detailed structure we could give to "possible side effects"? We can certainly think about the space of all possible side effects as a set (the only operation that requires is membership). We can combine two side effects by doing them one after another, and this will give rise to a different side effect (or possibly the same one - if the first was "shutdown computer" and the second was "write file", then the result of composing these is just "shutdown computer").
Ok, so what can we say about this operation? It's associative; that is, if we combine three side effects, it doesn't matter which order we do the combining in. If we do (write file then read socket) then shutdown computer, it's the same as doing write file then (read socket then shutdown computer). But it's not commutative: ("write file" then "delete file") is a different side effect from ("delete file" then "write file"). And we have an identity: the special side effect "no side effects" works ("no side effects" then "delete file" is the same side effect as just "delete file") At this point any mathematician is thinking "Group!" But groups have inverses, and there's no way to invert a side effect in general; "delete file" is irreversible. So the structure we have left is that of a monoid, which means our side-effecting functions should be monads.
Is there a more complex structure? Sure! We could divide possible side effects into filesystem-based effects, network-based effects and more, and we could come up with more elaborate rules of composition that preserved these details. But again it comes down to: Monad is very simple, and yet powerful enough to express most of the properties we care about. (In particular, associativity and the other axioms let us test our application in small pieces, with confidence that the side effects of the combined application will be the same as the combination of the side effects of the pieces).

It's actually quite a clean way to think of I/O in a functional way.
In most programming languages, you do input/output operations. In Haskell, imagine writing code not to do the operations, but to generate a list of the operations that you would like to do.
Monads are just pretty syntax for exactly that.
If you want to know why monads as opposed to something else, I guess the answer is that they're the best functional way to represent I/O that people could think of when they were making Haskell.

AFAIK, the reason is to be able to include side effects checks in the type system. If you want to know more, listen to those SE-Radio episodes:
Episode 108: Simon Peyton Jones on Functional Programming and Haskell
Episode 72: Erik Meijer on LINQ

Above there are very good detailed answers with theoretical background. But I want to give my view on IO monad. I am not experienced haskell programmer, so May be it is quite naive or even wrong. But i helped me to deal with IO monad to some extent (note, that it do not relates to other monads).
First I want to say, that example with "real world" is not too clear for me as we cannot access its (real world) previous states. May be it do not relates to monad computations at all but it is desired in the sense of referential transparency, which is generally presents in haskell code.
So we want our language (haskell) to be pure. But we need input/output operations as without them our program cannot be useful. And those operations cannot be pure by their nature. So the only way to deal with this we have to separate impure operations from the rest of code.
Here monad comes. Actually, I am not sure, that there cannot exist other construct with similar needed properties, but the point is that monad have these properties, so it can be used (and it is used successfully). The main property is that we cannot escape from it. Monad interface do not have operations to get rid of the monad around our value. Other (not IO) monads provide such operations and allow pattern matching (e.g. Maybe), but those operations are not in monad interface. Another required property is ability to chain operations.
If we think about what we need in terms of type system, we come to the fact that we need type with constructor, which can be wrapped around any vale. Constructor must be private, as we prohibit escaping from it(i.e. pattern matching). But we need function to put value into this constructor (here return comes to mind). And we need the way to chain operations. If we think about it for some time, we will come to the fact, that chaining operation must have type as >>= has. So, we come to something very similar to monad. I think, if we now analyze possible contradictory situations with this construct, we will come to monad axioms.
Note, that developed construct do not have anything in common with impurity. It only have properties, which we wished to have to be able to deal with impure operations, namely, no-escaping, chaining, and a way to get in.
Now some set of impure operations is predefined by the language within this selected monad IO. We can combine those operations to create new unpure operations. And all those operations will have to have IO in their type. Note however, that presence of IO in type of some function do not make this function impure. But as I understand, it is bad idea to write pure functions with IO in their type, as it was initially our idea to separate pure and impure functions.
Finally, I want to say, that monad do not turn impure operations into pure ones. It only allows to separate them effectively. (I repeat, that it is only my understanding)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string