Monads, composition and the order of computation

Monads, composition and the order of computation - haskell

All the monad articles often state, that monads allow you to sequence effects in order.
But what about simple composition? Ain't
f x = x + 1
g x = x * 2
result = f g x
requires g x to be computed before f ...?
Do monads do the same but with handling of effects?

Disclaimer: Monads are a lot of things. They are notoriously difficult to explain, so I will not attempt to explain what monads are in general here, since the question does not ask for that. I will assume you have a basic grasp on what the Monad interface is as well as how it works for some useful datatypes, like Maybe, Either, and IO.
What is an effect?
Your question begins with a note:
All the monad articles often state, that monads allow you to sequence effects in order.
Hmm. This is interesting. In fact, it is interesting for a couple reasons, one of which you have identified: it implies that monads let you create some sort of sequencing. That’s true, but it’s only part of the picture: it also states that sequencing happens on effects.
Here’s the thing, though… what is an “effect”? Is adding two numbers together an effect? Under most definitions, the answer would be no. What about printing something to stdout, is that an effect? In that case, I think most people would agree that the answer is yes. However, consider something more subtle: is short-circuiting a computation by producing Nothing an effect?
Error effects
Let’s take a look at an example. Consider the following code:
> do x <- Just 1
y <- Nothing
return (x + y)
Nothing
The second line of that example “short-circuits” due to the Monad instance for Maybe. Could that be considered an effect? In some sense, I think so, since it’s non-local, but in another sense, probably not. After all, if the x <- Just 1 or y <- Nothing lines are swapped, the result is still the same, so the ordering doesn’t matter.
However, consider a slightly more complex example that uses Either instead of Maybe:
> do x <- Left "x failed"
y <- Left "y failed"
return (x + y)
Left "x failed"
Now this is more interesting. If you swap the first two lines now, you get a different result! Still, is this a representation of an “effect” like the ones you allude to in your question? After all, it’s just a bunch of function calls. As you know, do notation is just an alternative syntax for a bunch of uses of the >>= operator, so we can expand it out:
> Left "x failed" >>= \x ->
Left "y failed" >>= \y ->
return (x + y)
Left "x failed"
We can even replace the >>= operator with the Either-specific definition to get rid of monads entirely:
> case Left "x failed" of
Right x -> case Left "y failed" of
Right y -> Right (x + y)
Left e -> Left e
Left e -> Left e
Left "x failed"
Therefore, it’s clear that monads do impose some sort of sequencing, but this is not because they are monads and monads are magic, it’s just because they happen to enable a style of programming that looks more impure than Haskell typically allows.
Monads and state
But perhaps that is unsatisfying to you. Error handling is not compelling because it is simply short-circuiting, it doesn’t actually have any sequencing in the result! Well, if we reach for some slightly more complex types, we can do that. For example, consider the Writer type, which allows a sort of “logging” using the monadic interface:
> execWriter $ do
tell "hello"
tell " "
tell "world"
"hello world"
This is even more interesting than before, since now the result of each computation in the do block is unused, but it still affects the output! This is clearly side-effectful, and order is clearly very important! If we reorder the tell expressions, we would get a very different result:
> execWriter $ do
tell " "
tell "world"
tell "hello"
" worldhello"
But how is this possible? Well, again, we can rewrite it to avoid do notation:
execWriter (
tell "hello" >>= \_ ->
tell " " >>= \_ ->
tell "world")
We could inline the definition of >>= again for Writer, but it’s too long to be terribly illustrative here. The point is, though, that Writer is just a completely ordinary Haskell datatype that doesn’t do any I/O or anything like that, and yet we have used the monadic interface to create something that looks like ordered effects.
We can go even further by creating an interface that looks like mutable state using the State type:
> flip execState 0 $ do
modify (+ 3)
modify (* 2)
6
Once again, if we reorder the expressions, we get a different result:
> flip execState 0 $ do
modify (* 2)
modify (+ 3)
3
Clearly, monads are a useful tool for creating interfaces that look stateful and have a well-defined ordering despite actually just being ordinary function calls.
Why can monads do this?
What gives monads this power? Well, they’re not magic—they’re just ordinary pure Haskell code. But consider the type signature for >>=:
(>>=) :: Monad m => m a -> (a -> m b) -> m b
Notice how the second argument depends on a, and the only way to get an a is from the first argument? This means that >>= needs to “run” the first argument to produce a value before it can apply the second argument. This doesn’t have to do with evaluation order so much as it has to do with actually writing code that will typecheck.
Now, it’s true that Haskell is a lazy language. But Haskell’s laziness doesn’t really matter for any of this because all of this code is actually pure, even the example using State! It’s simply a pattern that encodes computations that look sort of stateful in a pure way, but if you actually implemented State yourself, you’d find that it just passes around the “current state” in the definition of the >>= function. There’s not any actual mutation.
And that’s it. Monads, by virtue of their interface, impose an ordering on how their arguments may be evaluated, and instances of Monad exploit that to make stateful-looking interfaces. You don’t need Monad to have evaluation ordering, though, as you found; obviously in (1 + 2) * 3 the addition will be evaluated before the multiplication.
But what about IO??
Okay, you got me. Here’s the problem: IO is magic.
Monads are not magic, but IO is. All of the above examples are purely functional, but obviously reading a file or writing to stdout is not pure. So how the heck does IO work?
Well, IO is implemented by the GHC runtime, and you could not write it yourself. However, in order to make it work nicely with the rest of Haskell, there needs to be a well-defined evaluation order! Otherwise things would get printed out in the wrong order and all sorts of other hell would break loose.
Well, it turns out the Monad’s interface is a great way to ensure that evaluation order is predictable, since it works for pure code already. So IO leverages the same interface to guarantee the evaluation order is the same, and the runtime actually defines what that evaluation means.
However, don’t be misled! You don’t need monads to do I/O in a pure language, and you don’t need IO to have monadic effects. Early versions of Haskell experimented with a non-monadic way to do I/O, and the other parts of this answer explain how you can have pure monadic effects. Remember that monads are not special or holy, they’re just a pattern that Haskell programmers have found useful because of its various properties.

Yes, the functions you proposed are strict for the standard numerical types. But not all functions are! In
f _ = 3
g x = x * 2
result = f (g x)
it is not the case that g x must be computed before f (g x).

Yes, monads use function composition to sequence effects, and are not the only way to achieve sequenced effects.
Strict semantics and side effects
In most languages, there is sequencing by strict semantics applied first to the function side of an expression, then to each argument in turn, and finally the function is applied to the arguments. So in JS, the function application form,
<Code 1>(<Code 2>, <Code 3>)
runs four pieces of code in a specified order: 1, 2, 3, then it checks that the output of 1 was a function, then it calls the function with those two computed arguments. And it does this because any of those steps can have side-effects. You would write,
const logVal = (log, val) => {
console.log(log);
return val;
};
logVal(1, (a, b) => logVal(4, a+b))(
logVal(2, 2),
logVal(3, 3));
And that works for those languages. These are side-effects, which we can say in this context means that JS's type system doesn't give you any way to know that they are there.
Haskell does have a strict application primitive, but it wanted to be pure, which roughly means that it wanted the type system to track effects. So they introduced a form of metaprogramming where one of their types is a type-level adjective, “programs which compute a _____”. A program interacts with the real world; Haskell code in theory doesn't. You have to define “main is a program which computes a unit type” and then the compiler actually just builds that program for you as an executable binary file. By the time that file is run Haskell is not really in the picture any more!
This is therefore more specific than normal function application, because the abstract problem I wrote in JavaScript is,
I have a program which computes {a function from (X, Y) pairs to programs which compute Zs}.
I also have a program which computes an X, and a program which computes a Y.
I want to put these all together into a program which computes a Z.
That's not just function composition itself. But a function can do that.
Peeking inside monads
A monad is a pattern. The pattern is, sometimes you have an adjective which does not add much when you repeat it. For example there is not much added when you say "a delayed delayed x" or "zero or more (zero or more xs)" or "either a null or else either a null or else an x." Similarly for the IO monad, not much is added by "a program to compute a program to compute an x" that is not available in "a program to compute an x."
The pattern is that there is some canonical merging algorithm which merges:
join: given an <adjective> <adjective> x, I will make you an <adjective> x.
We also add two other properties, the adjective should be outputtish,
map: given an x -> y and an <adjective> x, I will make you an <adjective> y
and universally embeddable,
pure: given an x, I will make you an <adjective> x.
Given these three things and a couple axioms you happen to have a common "monad" idea which you can develop One True Syntax for.
Now this metaprogramming idea obviously contains a monad. In JS we would write,
interface IO<x> {
run: () => Promise<x>
}
function join<x>(pprog: IO<IO<x>>): IO<x> {
return { run: () => pprog.run().then(prog => prog.run()) };
}
function map<x, y>(prog: IO<x>, fn: (in: x) => y): IO<y> {
return { run: () => prog.run().then(x => fn(x)) }
}
function pure<x>(input: x): IO<x> {
return { run: () => Promise.resolve(input) }
}
// with those you can also define,
function bind<x, y>(prog: IO<x>, fn: (in: x) => IO<y>): IO<y> {
return join(map(prog, fn));
}
But the fact that a pattern exists does not mean it is useful! I am claiming that these functions turn out to be all you need to resolve the problem above. And it is not hard to see why: you can use bind to create a function scope inside of which the adjective doesn't exist, and manipulate your values there:
function ourGoal<x, y, z>(
fnProg: IO<(inX: x, inY: y) => IO<z>>,
xProg: IO<x>,
yProg: IO<y>): IO<z> {
return bind(fnProg, fn =>
bind(xProg, x =>
bind(yProg, y => fn(x, y))));
}
How this answers your question
Notice that in the above we choose an order of operations by how we write the three binds. We could have written those in some other order. But we needed all the parameters to run the final program.
This choice of how we sequence our operations is indeed realized in the function calls: you are 100% right. But the way that you are doing it, with only function composition, is flawed because it demotes the effects down to side-effects in order to get the types through.

Related

How does return statement work in Haskell? [duplicate]

This question already has answers here:
What's so special about 'return' keyword
(3 answers)
Closed 5 years ago.
Consider these functions
f1 :: Maybe Int
f1 = return 1
f2 :: [Int]
f2 = return 1
Both have the same statement return 1. But the results are different. f1 gives value Just 1 and f2 gives value [1]
Looks like Haskell invokes two different versions of return based on return type. I like to know more about this kind of function invocation. Is there a name for this feature in programming languages?

This is a long meandering answer!
As you've probably seen from the comments and Thomas's excellent (but very technical) answer You've asked a very hard question. Well done!
Rather than try to explain the technical answer I've tried to give you a broad overview of what Haskell does behind the scenes without diving into technical detail. Hopefully it will help you to get a big picture view of what's going on.
return is an example of type inference.
Most modern languages have some notion of polymorphism. For example var x = 1 + 1 will set x equal to 2. In a statically typed language 2 will usually be an int. If you say var y = 1.0 + 1.0 then y will be a float. The operator + (which is just a function with a special syntax)
Most imperative languages, especially object oriented languages, can only do type inference one way. Every variable has a fixed type. When you call a function it looks at the types of the argument and chooses a version of that function that fits the types (or complains if it can't find one).
When you assign the result of a function to a variable the variable already has a type and if it doesn't agree with the type of the return value you get an error.
So in an imperative language the "flow" of type deduction follows time in your program Deduce the type of a variable, do something with it and deduce the type of the result. In a dynamically typed language (such as Python or javascript) the type of a variable is not assigned until the value of the variable is computed (which is why there don't seem to be types). In a statically typed language the types are worked out ahead of time (by the compiler) but the logic is the same. The compiler works out what the types of variables are going to be, but it does so by following the logic of the program in the same way as the program runs.
In Haskell the type inference also follows the logic of the program. Being Haskell it does so in a very mathematically pure way (called System F). The language of types (that is the rules by which types are deduced) are similar to Haskell itself.
Now remember Haskell is a lazy language. It doesn't work out the value of anything until it needs it. That's why it makes sense in Haskell to have infinite data structures. It never occurs to Haskell that a data structure is infinite because it doesn't bother to work it out until it needs to.
Now all that lazy magic happens at the type level too. In the same way that Haskell doesn't work out what the value of an expression is until it really needs to, Haskell doesn't work out what the type of an expression is until it really needs to.
Consider this function
func (x : y : rest) = (x,y) : func rest
func _ = []
If you ask Haskell for the type of this function it has a look at the definition, sees [] and : and deduces that it's working with lists. But it never needs to look at the types of x and y, it just knows that they have to be the same because they end up in the same list. So it deduces the type of the function as [a] -> [a] where a is a type that it hasn't bothered to work out yet.
So far no magic. But it's useful to notice the difference between this idea and how it would be done in an OO language. Haskell doesn't convert the arguments to Object, do it's thing and then convert back. Haskell just hasn't been asked explicitly what the type of the list is. So it doesn't care.
Now try typing the following into ghci
maxBound - length ""
maxBound : "Hello"
Now what just happened !? minBound bust be a Char because I put it on the front of a string and it must be an integer because I added it to 0 and got a number. Plus the two values are clearly very different.
So what is the type of minBound? Let's ask ghci!
:type minBound
minBound :: Bounded a => a
AAargh! what does that mean? Basically it means that it hasn't bothered to work out exactly what a is, but is has to be Bounded if you type :info Bounded you get three useful lines
class Bounded a where
minBound :: a
maxBound :: a
and a lot of less useful lines
So if a is Bounded there are values minBound and maxBound of type a.
In fact under the hood Bounded is just a value, it's "type" is a record with fields minBound and maxBound. Because it's a value Haskell doesn't look at it until it really needs to.
So I appear to have meandered somewhere in the region of the answer to your question. Before we move onto return (which you may have noticed from the comments is a wonderfully complex beast.) let's look at read.
ghci again
read "42" + 7
read "'H'" : "ello"
length (read "[1,2,3]")
and hopefully you won't be too surprised to find that there are definitions
read :: Read a => String -> a
class Read where
read :: String -> a
so Read a is just a record containing a single value which is a function String -> a. Its very tempting to assume that there is one read function which looks at a string, works out what type is contained in the string and returns that type. But it does the opposite. It completely ignores the string until it's needed. When the value is needed, Haskell first works out what type it's expecting, once it's done that it goes and gets the appropriate version of the read function and combines it with the string.
now consider something slightly more complex
readList :: Read a => [String] -> a
readList strs = map read strs
under the hood readList actually takes two arguments
readList' (Read a) -> [String] -> [a]
readList' {read = f} strs = map f strs
Again as Haskell is lazy it only bothers looking at the arguments when it's needs to find out the return value, at that point it knows what a is, so the compiler can go and fine the right version of Read. Until then it doesn't care.
Hopefully that's given you a bit of an idea of what's happening and why Haskell can "overload" on the return type. But it's important to remember it's not overloading in the conventional sense. Every function has only one definition. It's just that one of the arguments is a bag of functions. read_str doesn't ever know what types it's dealing with. It just knows it gets a function String -> a and some Strings, to do the application it just passes the arguments to map. map in turn doesn't even know it gets strings. When you get deeper into Haskell it becomes very important that functions don't know very much about the types they're dealing with.
Now let's look at return.
Remember how I said that the type system in Haskell was very similar to Haskell itself. Remember that in Haskell functions are just ordinary values.
Does this mean I can have a type that takes a type as an argument and returns another type? Of course it does!
You've seen some type functions Maybe takes a type a and returns another type which can either be Just a or Nothing. [] takes a type a and returns a list of as. Type functions in Haskell are usually containers. For example I could define a type function BinaryTree which stores a load of a's in a tree like structure. There are of course lots of much stranger ones.
So, if these type functions are similar to ordinary types I can have a typeclass that contains type functions. One such typeclass is Monad
class Monad m where
return a -> m a
(>>=) m a (a -> m b) -> m b
so here m is some type function. If I want to define Monad for m I need to define return and the scary looking operator below it (which is called bind)
As others have pointed out the return is a really misleading name for a fairly boring function. The team that designed Haskell have since realised their mistake and they're genuinely sorry about it. return is just an ordinary function that takes an argument and returns a Monad with that type in it. (You never asked what a Monad actually is so I'm not going to tell you)
Let's define Monad for m = Maybe!
First I need to define return. What should return x be? Remember I'm only allowed to define the function once, so I can't look at x because I don't know what type it is. I could always return Nothing, but that seems a waste of a perfectly good function. Let's define return x = Just x because that's literally the only other thing I can do.
What about the scary bind thing? what can we say about x >>= f? well x is a Maybe a of some unknown type a and f is a function that takes an a and returns a Maybe b. Somehow I need to combine these to get a Maybe b`
So I need to define Nothing >== f. I can't call f because it needs an argument of type a and I don't have a value of type a I don't even know what 'a' is. I've only got one choice which is to define
Nothing >== f = Nothing
What about Just x >>= f? Well I know x is of type a and f takes a as an argument, so I can set y = f a and deduce that y is of type b. Now I need to make a Maybe b and I've got a b so ...
Just x >>= f = Just (f x)
So I've got a Monad! what if m is List? well I can follow a similar sort of logic and define
return x = [x]
[] >>= f = []
(x : xs) >>= a = f x ++ (xs >>= f)
Hooray another Monad! It's a nice exercise to go through the steps and convince yourself that there's no other sensible way of defining this.
So what happens when I call return 1?
Nothing!
Haskell's Lazy remember. The thunk return 1 (technical term) just sits there until someone needs the value. As soon as Haskell needs the value it know what type the value should be. In particular it can deduce that m is List. Now that it knows that Haskell can find the instance of Monad for List. As soon as it does that it has access to the correct version of return.
So finally Haskell is ready To call return, which in this case returns [1]!

The return function is from the Monad class:
class Applicative m => Monad (m :: * -> *) where
...
return :: a -> m a
So return takes any value of type a and results in a value of type m a. The monad, m, as you've observed is polymorphic using the Haskell type class Monad for ad hoc polymorphism.
At this point you probably realize return is not an good, intuitive, name. It's not even a built in function or a statement like in many other languages. In fact a better-named and identically-operating function exists - pure. In almost all cases return = pure.
That is, the function return is the same as the function pure (from the Applicative class) - I often think to myself "this monadic value is purely the underlying a" and I try to use pure instead of return if there isn't already a convention in the codebase.
You can use return (or pure) for any type that is a class of Monad. This includes the Maybe monad to get a value of type Maybe a:
instance Monad Maybe where
...
return = pure -- which is from Applicative
...
instance Applicative Maybe where
pure = Just
Or for the list monad to get a value of [a]:
instance Applicative [] where
{-# INLINE pure #-}
pure x = [x]
Or, as a more complex example, Aeson's parse monad to get a value of type Parser a:
instance Applicative Parser where
pure a = Parser $ \_path _kf ks -> ks a

Can any recursive definition be rewritten using foldr?

Say I have a general recursive definition in haskell like this:
foo a0 a1 ... = base_case
foo b0 b1 ...
| cond1 = recursive_case_1
| cond2 = recursive_case_2
...
Can it always rewritten using foldr? Can it be proved?

If we interpret your question literally, we can write const value foldr to achieve any value, as #DanielWagner pointed out in a comment.
A more interesting question is whether we can instead forbid general recursion from Haskell, and "recurse" only through the eliminators/catamorphisms associated to each user-defined data type, which are the natural generalization of foldr to inductively defined data types. This is, essentially, (higher-order) primitive recursion.
When this restriction is performed, we can only compose terminating functions (the eliminators) together. This means that we can no longer define non terminating functions.
As a first example, we lose the trivial recursion
f x = f x
-- or even
a = a
since, as said, the language becomes total.
More interestingly, the general fixed point operator is lost.
fix :: (a -> a) -> a
fix f = f (fix f)
A more intriguing question is: what about the total functions we can express in Haskell? We do lose all the non-total functions, but do we lose any of the total ones?
Computability theory states that, since the language becomes total (no more non termination), we lose expressiveness even on the total fragment.
The proof is a standard diagonalization argument. Fix any enumeration of programs in the total fragment so that we can speak of "the i-th program".
Then, let eval i x be the result of running the i-th program on the natural x as input (for simplicity, assume this is well typed, and that the result is a natural). Note that, since the language is total, then a result must exist. Moreover, eval can be implemented in the unrestricted Haskell language, since we can write an interpreter of Haskell in Haskell (left as an exercise :-P), and that would work as fine for the fragment. Then, we simply take
f n = succ $ eval n n
The above is a total function (a composition of total functions) which can be expressed in Haskell, but not in the fragment. Indeed, otherwise there would be a program to compute it, say the i-th program. In such case we would have
eval i x = f x
for all x. But then,
eval i i = f i = succ $ eval i i
which is impossible -- contradiction. QED.

In type theory, it is indeed the case that you can elaborate all definitions by dependent pattern-matching into ones only using eliminators (a more strongly-typed version of folds, the generalisation of lists' foldr).
See e.g. Eliminating Dependent Pattern Matching (pdf)

what is the meaning of "let x = x in x" and "data Float#" in GHC.Prim in Haskell

I looked at the module of GHC.Prim and found that it seems that all datas in GHC.Prim are defined as data Float# without something like =A|B, and all functions in GHC.Prim is defined as gtFloat# = let x = x in x.
My question is whether these definations make sense and what they mean.
I checked the header of GHC.Prim like below
{-
This is a generated file (generated by genprimopcode).
It is not code to actually be used. Its only purpose is to be
consumed by haddock.
-}
I guess it may have some relations with the questions and who could please explain that to me.

It's magic :)
These are the "primitive operators and operations". They are hardwired into the compiler, hence there are no data constructors for primitives and all functions are bottom since they are necessarily not expressable in pure haskell.
(Bottom represents a "hole" in a haskell program, an infinite loop or undefined are examples of bottom)
To put it another way
These data declarations/functions are to provide access to the raw compiler internals. GHC.Prim exists to export these primitives, it doesn't actually implement them or anything (eg its code isn't actually useful). All of that is done in the compiler.
It's meant for code that needs to be extremely optimized. If you think you might need it, some useful reading about the primitives in GHC

A brief expansion of jozefg's answer ...
Primops are precisely those operations that are supplied by the runtime because they can't be defined within the language (or shouldn't be, for reasons of efficiency, say). The true purpose of GHC.Prim is not to define anything, but merely to export some operations so that Haddock can document their existence.
The construction let x = x in x is used at this point in GHC's codebase because the value undefined has not yet been, um, "defined". (That waits until the Prelude.) But notice that the circular let construction, just like undefined, is both syntactically correct and can have any type. That is, it's an infinite loop with the semantics of ⊥, just as undefined is.
... and an aside
Also note that in general the Haskell expression let x = z in y means "change the variable x to the expression z wherever x occurs in the expression y". If you're familiar with the lambda calculus, you should recognize this as the reduction rule for the application of the lambda abstraction \x -> y to the term z. So is the Haskell expression let x = x in x nothing more than some syntax on top of the pure lambda calculus? Let's take a look.
First, we need to account for the recursiveness of Haskell's let expressions. The lambda calculus does not admit recursive definitions, but given a primitive fixed-point operator fix,1 we can encode recursiveness explicitly. For example, the Haskell expression let x = x in x has the same meaning as (fix \r x -> r x) z.2 (I've renamed the x on the right side of the application to z to emphasize that it has no implicit relation to the x inside the lambda).
Applying the usual definition of a fixed-point operator, fix f = f (fix f), our translation of let x = x in x reduces (or rather doesn't) like this:
(fix \r x -> r x) z ==>
(\s y -> s y) (fix \r x -> r x) z ==>
(\y -> (fix \r x -> r x) y) z ==>
(fix \r x -> r x) z ==> ...
So at this point in the development of the language, we've introduced the semantics of ⊥ from the foundation of the (typed) lambda calculus with a built-in fixed-point operator. Lovely!
We need a primitive fixed-point operation (that is, one that is built into the language) because it's impossible to define a fixed-point combinator in the simply typed lambda calculus and its close cousins. (The definition of fix in Haskell's Prelude doesn't contradict this—it's defined recursively, but we need a fixed-point operator to implement recursion.)
If you haven't seen this before, you should read up on fixed-point recursion in the lambda calculus. A text on the lambda calculus is best (there are some free ones online), but some Googling should get you going. The basic idea is that we can convert a recursive definition into a non-recursive one by abstracting over the recursive call, then use a fixed-point combinator to pass our function (lambda abstraction) to itself. The base-case of a well-defined recursive definition corresponds to a fixed point of our function, so the function executes, calling itself over and over again until it hits a fixed point, at which point the function returns its result. Pretty damn neat, huh?

Is Haskell truly pure (is any language that deals with input and output outside the system)?

After touching on Monads in respect to functional programming, does the feature actually make a language pure, or is it just another "get out of jail free card" for reasoning of computer systems in the real world, outside of blackboard maths?
EDIT:
This is not flame bait as someone has said in this post, but a genuine question that I am hoping that someone can shoot me down with and say, proof, it is pure.
Also I am looking at the question with respect to other not so pure Functional Languages and some OO languages that use good design and comparing the purity. So far in my very limited world of FP, I have still not groked the Monads purity, you will be pleased to know however I do like the idea of immutability which is far more important in the purity stakes.

Take the following mini-language:
data Action = Get (Char -> Action) | Put Char Action | End
Get f means: read a character c, and perform action f c.
Put c a means: write character c, and perform action a.
Here's a program that prints "xy", then asks for two letters and prints them in reverse order:
Put 'x' (Put 'y' (Get (\a -> Get (\b -> Put b (Put a End)))))
You can manipulate such programs. For example:
conditionally p = Get (\a -> if a == 'Y' then p else End)
This is has type Action -> Action - it takes a program and gives another program that asks for confirmation first. Here's another:
printString = foldr Put End
This has type String -> Action - it takes a string and returns a program that writes the string, like
Put 'h' (Put 'e' (Put 'l' (Put 'l' (Put 'o' End)))).
IO in Haskell works similarily. Although executing it requires performing side effects, you can build complex programs without executing them, in a pure way. You're computing on descriptions of programs (IO actions), and not actually performing them.
In language like C you can write a function void execute(Action a) that actually executed the program. In Haskell you specify that action by writing main = a. The compiler creates a program that executes the action, but you have no other way to execute an action (aside dirty tricks).
Obviously Get and Put are not only options, you can add many other API calls to the IO data type, like operating on files or concurrency.
Adding a result value
Now consider the following data type.
data IO a = Get (Char -> Action) | Put Char Action | End a
The previous Action type is equivalent to IO (), i.e. an IO value which always returns "unit", comparable to "void".
This type is very similar to Haskell IO, only in Haskell IO is an abstract data type (you don't have access to the definition, only to some methods).
These are IO actions which can end with some result. A value like this:
Get (\x -> if x == 'A' then Put 'B' (End 3) else End 4)
has type IO Int and is corresponding to a C program:
int f() {
char x;
scanf("%c", &x);
if (x == 'A') {
printf("B");
return 3;
} else return 4;
}
Evaluation and execution
There's a difference between evaluating and executing. You can evaluate any Haskell expression, and get a value; for example, evaluate 2+2 :: Int into 4 :: Int. You can execute Haskell expressions only which have type IO a. This might have side-effects; executing Put 'a' (End 3) puts the letter a to screen. If you evaluate an IO value, like this:
if 2+2 == 4 then Put 'A' (End 0) else Put 'B' (End 2)
you get:
Put 'A' (End 0)
But there are no side-effects - you only performed an evaluation, which is harmless.
How would you translate
bool comp(char x) {
char y;
scanf("%c", &y);
if (x > y) { //Character comparison
printf(">");
return true;
} else {
printf("<");
return false;
}
}
into an IO value?
Fix some character, say 'v'. Now comp('v') is an IO action, which compares given character to 'v'. Similarly, comp('b') is an IO action, which compares given character to 'b'. In general, comp is a function which takes a character and returns an IO action.
As a programmer in C, you might argue that comp('b') is a boolean. In C, evaluation and execution are identical (i.e they mean the same thing, or happens simultaneously). Not in Haskell. comp('b') evaluates into some IO action, which after being executed gives a boolean. (Precisely, it evaluates into code block as above, only with 'b' substituted for x.)
comp :: Char -> IO Bool
comp x = Get (\y -> if x > y then Put '>' (End True) else Put '<' (End False))
Now, comp 'b' evaluates into Get (\y -> if 'b' > y then Put '>' (End True) else Put '<' (End False)).
It also makes sense mathematically. In C, int f() is a function. For a mathematician, this doesn't make sense - a function with no arguments? The point of functions is to take arguments. A function int f() should be equivalent to int f. It isn't, because functions in C blend mathematical functions and IO actions.
First class
These IO values are first-class. Just like you can have a list of lists of tuples of integers [[(0,2),(8,3)],[(2,8)]] you can build complex values with IO.
(Get (\x -> Put (toUpper x) (End 0)), Get (\x -> Put (toLower x) (End 0)))
:: (IO Int, IO Int)
A tuple of IO actions: first reads a character and prints it uppercase, second reads a character and returns it lowercase.
Get (\x -> End (Put x (End 0))) :: IO (IO Int)
An IO value, which reads a character x and ends, returning an IO value which writes x to screen.
Haskell has special functions which allow easy manipulation of IO values. For example:
sequence :: [IO a] -> IO [a]
which takes a list of IO actions, and returns an IO action which executes them in sequence.
Monads
Monads are some combinators (like conditionally above), which allow you to write programs more structurally. There's a function that composes of type
IO a -> (a -> IO b) -> IO b
which given IO a, and a function a -> IO b, returns a value of type IO b. If you write first argument as a C function a f() and second argument as b g(a x) it returns a program for g(f(x)). Given above definition of Action / IO, you can write that function yourself.
Notice monads are not essential to purity - you can always write programs as I did above.
Purity
The essential thing about purity is referential transparency, and distinguishing between evaluation and execution.
In Haskell, if you have f x+f x you can replace that with 2*f x. In C, f(x)+f(x) in general is not the same as 2*f(x), since f could print something on the screen, or modify x.
Thanks to purity, a compiler has much more freedom and can optimize better. It can rearrange computations, while in C it has to think if that changes meaning of the program.

It is important to understand that there is nothing inherently special about monads – so they definitely don’t represent a “get out of jail” card in this regard. There is no compiler (or other) magic necessary to implement or use monads, they are defined in the purely functional environment of Haskell. In particular, sdcvvc has shown how to define monads in purely functional manner, without any recourses to implementation backdoors.

What does it mean to reason about computer systems "outside of blackboard maths"? What kind of reasoning would that be? Dead reckoning?
Side-effects and pure functions are a matter of point of view. If we view a nominally side-effecting function as a function taking us from one state of the world to another, it's pure again.
We can make every side-effecting function pure by giving it a second argument, a world, and requiring that it pass us a new world when it is done. I don't know C++ at all anymore but say read has a signature like this:
vector<char> read(filepath_t)
In our new "pure style", we handle it like this:
pair<vector<char>, world_t> read(world_t, filepath_t)
This is in fact how every Haskell IO action works.
So now we've got a pure model of IO. Thank goodness. If we couldn't do that then maybe Lambda Calculus and Turing Machines are not equivalent formalisms and then we'd have some explaining to do. We're not quite done but the two problems left to us are easy:
What goes in the world_t structure? A description of every grain of sand, blade of
grass, broken heart and golden sunset?
We have an informal rule that we use a world only once -- after every IO operation, we
throw away the world we used with it. With all these worlds floating around, though,
we are bound to get them mixed up.
The first problem is easy enough. As long as we do not allow inspection of the world, it turns out we needn't trouble ourselves about storing anything in it. We just need to ensure that a new world is not equal to any previous world (lest the compiler deviously optimize some world-producing operations away, like it sometimes does in C++). There are many ways to handle this.
As for the worlds getting mixed up, we'd like to hide the world passing inside a library so that there's no way to get at the worlds and thus no way to mix them up. Turns out, monads are a great way to hide a "side-channel" in a computation. Enter the IO monad.
Some time ago, a question like yours was asked on the Haskell mailing list and there I went in to the "side-channel" in more detail. Here's the Reddit thread (which links to my original email):
http://www.reddit.com/r/haskell/comments/8bhir/why_the_io_monad_isnt_a_dirty_hack/

I'm very new to functional programming, but here's how I understand it:
In haskell, you define a bunch of functions. These functions don't get executed. They might get evaluated.
There's one function in particular that gets evaluated. This is a constant function that produces a set of "actions." The actions include the evaluation of functions and performing of IO and other "real-world" stuff. You can have functions that create and pass around these actions and they will never be executed unless a function is evaluated with unsafePerformIO or
they are returned by the main function.
So in short, a Haskell program is a function, composed of other functions, that returns an imperative program. The Haskell program itself is pure. Obviously, that imperative program itself can't be. Real-world computers are by definition impure.
There's a lot more to this question and a lot of it is a question of semantics (human, not programming language). Monads are also a bit more abstract than what I've described here. But I think this is a useful way of thinking about it in general.

I think of it like this: Programs have to do something with the outside world to be useful. What's happening (or should be happening) when you write code (in any language) is that you strive to write as much pure, side-effect-free code as possible and corral the IO into specific places.
What we have in Haskell is that you're pushed more in this direction of writing to tightly control effects. In the core and in many libraries there is an enormous amount of pure code as well. Haskell is really all about this. Monads in Haskell are useful for a lot of things. And one thing they've been used for is containment around code that deals with impurity.
This way of designing together with a language that greatly facilitates it has an overall effect of helping us to produce more reliable work, requiring less unit testing to be clear on how it behaves, and allowing more re-use through composition.
If I understand what you're saying correctly, I don't see this as a something fake or only in our minds, like a "get out of jail free card." The benefits here are very real.

For an expanded version of sdcwc's sort of construction of IO, one can look at the IOSpec package on Hackage: http://hackage.haskell.org/package/IOSpec

Is Haskell truly pure?
In the absolute sense of the term: no.
That solid-state Turing machine on which you run your programs - Haskell or otherwise - is a state-and-effect device. For any program to use all of its "features", the program will have to resort to using state and effects.
As for all the other "meanings" ascribed to that pejorative term:
To postulate a state-less model of computation on top of a machinery whose most eminent characteristic is state, seems to be an odd idea, to say the least. The gap between model and machinery is wide, and therefore costly to bridge. No hardware support feature can wash this fact aside: It remains a bad idea for practice.
This has in due time also been recognized by the protagonists of functional languages. They have introduced state (and variables) in various tricky ways. The purely functional character has thereby been compromised and sacrificed. The old terminology has become deceiving.
Niklaus Wirth
Does using monadic types actually make a language pure?
No. It's just one way of using types to demarcate:
definitions that have no visible side-effects at all - values;
definitions that potentially have visible side-effects - actions.
You could instead use uniqueness types, just like Clean does...
Is the use of monadic types just another "get out of jail free card" for reasoning of computer systems in the real world, outside of blackboard maths?
This question is ironic, considering the description of the IO type given in the Haskell 2010 report:
The IO type serves as a tag for operations (actions) that interact with the outside world. The IO type is abstract: no constructors are visible to the user. IO is an instance of the Monad and Functor classes.
...to borrow the parlance of another answer:
[…] IO is magical (having an implementation but no denotation) […]
Being abstract, the IO type is anything but a "get out of jail free card" - intricate models involving multiple semantics are required to account for the workings of I/O in Haskell. For more details, see:
Tackling the Awkward Squad: … by Simon Peyton Jones;
The semantics of fixIO by Levent Erk, John Launchbury and Andrew Moran.
It wasn't always like this - Haskell originally had an I/O mechanism which was at least partially-visible; the last language version to have it was Haskell 1.2. Back then, the type of main was:
main :: [Response] -> [Request]
which was usually abbreviated to:
main :: Dialogue
where:
type Dialogue = [Response] -> [Request]
and Response along with Request were humble, albeit large datatypes:
The advent of I/O using the monadic interface in Haskell changed all that - no more visible datatypes, just an abstract description. As a result, how IO, return, (>>=) etc are really defined is now specific to each implementation of Haskell.
(Why was the old I/O mechanism abandoned? "Tackling the Awkward Squad" gives an overview of its problems.)
These days, the more pertinent question should be:
Is I/O in your implementation of Haskell referentially transparent?
As Owen Stephens notes in Approaches to Functional I/O:
I/O is not a particularly active area of research, but new approaches are still being discovered […]
The Haskell language may yet have a referentially-transparent model for I/O which doesn't attract so much controversy...

No, it isn't. IO monad is impure because it has side effects and mutable state (the race conditions are possible in Haskell programs so ? eh ... pure FP language don't know something like "race condition"). Really pure FP is Clean with uniqueness typing, or Elm with FRP (functional reactive programing) not Haskell. Haskell is one big lie.
Proof :
import Control.Concurrent
import System.IO as IO
import Data.IORef as IOR
import Control.Monad.STM
import Control.Concurrent.STM.TVar
limit = 150000
threadsCount = 50
-- Don't talk about purity in Haskell when we have race conditions
-- in unlocked memory ... PURE language don't need LOCKING because
-- there isn't any mutable state or another side effects !!
main = do
hSetBuffering stdout NoBuffering
putStr "Lock counter? : "
a <- getLine
if a == "y" || a == "yes" || a == "Yes" || a == "Y"
then withLocking
else noLocking
noLocking = do
counter <- newIORef 0
let doWork =
mapM_ (\_ -> IOR.modifyIORef counter (\x -> x + 1)) [1..limit]
threads <- mapM (\_ -> forkIO doWork) [1..threadsCount]
-- Sorry, it's dirty but time is expensive ...
threadDelay (15 * 1000 * 1000)
val <- IOR.readIORef counter
IO.putStrLn ("It may be " ++ show (threadsCount * limit) ++
" but it is " ++ show val)
withLocking = do
counter <- atomically (newTVar 0)
let doWork =
mapM_ (\_ -> atomically $ modifyTVar counter (\x ->
x + 1)) [1..limit]
threads <- mapM (\_ -> forkIO doWork) [1..threadsCount]
threadDelay (15 * 1000 * 1000)
val <- atomically $ readTVar counter
IO.putStrLn ("It may be " ++ show (threadsCount * limit) ++
" but it is " ++ show val)

Why are side-effects modeled as monads in Haskell?

Could anyone give some pointers on why the impure computations in Haskell are modelled as monads?
I mean monad is just an interface with 4 operations, so what was the reasoning to modelling side-effects in it?

Suppose a function has side effects. If we take all the effects it produces as the input and output parameters, then the function is pure to the outside world.
So, for an impure function
f' :: Int -> Int
we add the RealWorld to the consideration
f :: Int -> RealWorld -> (Int, RealWorld)
-- input some states of the whole world,
-- modify the whole world because of the side effects,
-- then return the new world.
then f is pure again. We define a parametrized data type type IO a = RealWorld -> (a, RealWorld), so we don't need to type RealWorld so many times, and can just write
f :: Int -> IO Int
To the programmer, handling a RealWorld directly is too dangerous—in particular, if a programmer gets their hands on a value of type RealWorld, they might try to copy it, which is basically impossible. (Think of trying to copy the entire filesystem, for example. Where would you put it?) Therefore, our definition of IO encapsulates the states of the whole world as well.
Composition of "impure" functions
These impure functions are useless if we can't chain them together. Consider
getLine :: IO String ~ RealWorld -> (String, RealWorld)
getContents :: String -> IO String ~ String -> RealWorld -> (String, RealWorld)
putStrLn :: String -> IO () ~ String -> RealWorld -> ((), RealWorld)
We want to
get a filename from the console,
read that file, and
print that file's contents to the console.
How would we do it if we could access the real world states?
printFile :: RealWorld -> ((), RealWorld)
printFile world0 = let (filename, world1) = getLine world0
(contents, world2) = (getContents filename) world1
in (putStrLn contents) world2 -- results in ((), world3)
We see a pattern here. The functions are called like this:
...
(<result-of-f>, worldY) = f worldX
(<result-of-g>, worldZ) = g <result-of-f> worldY
...
So we could define an operator ~~~ to bind them:
(~~~) :: (IO b) -> (b -> IO c) -> IO c
(~~~) :: (RealWorld -> (b, RealWorld))
-> (b -> RealWorld -> (c, RealWorld))
-> (RealWorld -> (c, RealWorld))
(f ~~~ g) worldX = let (resF, worldY) = f worldX
in g resF worldY
then we could simply write
printFile = getLine ~~~ getContents ~~~ putStrLn
without touching the real world.
"Impurification"
Now suppose we want to make the file content uppercase as well. Uppercasing is a pure function
upperCase :: String -> String
But to make it into the real world, it has to return an IO String. It is easy to lift such a function:
impureUpperCase :: String -> RealWorld -> (String, RealWorld)
impureUpperCase str world = (upperCase str, world)
This can be generalized:
impurify :: a -> IO a
impurify :: a -> RealWorld -> (a, RealWorld)
impurify a world = (a, world)
so that impureUpperCase = impurify . upperCase, and we can write
printUpperCaseFile =
getLine ~~~ getContents ~~~ (impurify . upperCase) ~~~ putStrLn
(Note: Normally we write getLine ~~~ getContents ~~~ (putStrLn . upperCase))
We were working with monads all along
Now let's see what we've done:
We defined an operator (~~~) :: IO b -> (b -> IO c) -> IO c which chains two impure functions together
We defined a function impurify :: a -> IO a which converts a pure value to impure.
Now we make the identification (>>=) = (~~~) and return = impurify, and see? We've got a monad.
Technical note
To ensure it's really a monad, there's still a few axioms which need to be checked too:
return a >>= f = f a
impurify a = (\world -> (a, world))
(impurify a ~~~ f) worldX = let (resF, worldY) = (\world -> (a, world )) worldX
in f resF worldY
= let (resF, worldY) = (a, worldX)
in f resF worldY
= f a worldX
f >>= return = f
(f ~~~ impurify) worldX = let (resF, worldY) = f worldX
in impurify resF worldY
= let (resF, worldY) = f worldX
in (resF, worldY)
= f worldX
f >>= (\x -> g x >>= h) = (f >>= g) >>= h
Left as exercise.

Could anyone give some pointers on why the unpure computations in Haskell are modeled as monads?
This question contains a widespread misunderstanding.
Impurity and Monad are independent notions.
Impurity is not modeled by Monad.
Rather, there are a few data types, such as IO, that represent imperative computation.
And for some of those types, a tiny fraction of their interface corresponds to the interface pattern called "Monad".
Moreover, there is no known pure/functional/denotative explanation of IO (and there is unlikely to be one, considering the "sin bin" purpose of IO), though there is the commonly told story about World -> (a, World) being the meaning of IO a.
That story cannot truthfully describe IO, because IO supports concurrency and nondeterminism.
The story doesn't even work when for deterministic computations that allow mid-computation interaction with the world.
For more explanation, see this answer.
Edit: On re-reading the question, I don't think my answer is quite on track.
Models of imperative computation do often turn out to be monads, just as the question said.
The asker might not really assume that monadness in any way enables the modeling of imperative computation.

As I understand it, someone called Eugenio Moggi first noticed that a previously obscure mathematical construct called a "monad" could be used to model side effects in computer languages, and hence specify their semantics using Lambda calculus. When Haskell was being developed there were various ways in which impure computations were modelled (see Simon Peyton Jones' "hair shirt" paper for more details), but when Phil Wadler introduced monads it rapidly became obvious that this was The Answer. And the rest is history.

Could anyone give some pointers on why the unpure computations in Haskell are modeled as monads?
Well, because Haskell is pure. You need a mathematical concept to distinguish between unpure computations and pure ones on type-level and to model programm flows in respectively.
This means you'll have to end up with some type IO a that models an unpure computation. Then you need to know ways of combining these computations of which apply in sequence (>>=) and lift a value (return) are the most obvious and basic ones.
With these two, you've already defined a monad (without even thinking of it);)
In addition, monads provide very general and powerful abstractions, so many kinds of control flow can be conveniently generalized in monadic functions like sequence, liftM or special syntax, making unpureness not such a special case.
See monads in functional programming and uniqueness typing (the only alternative I know) for more information.

As you say, Monad is a very simple structure. One half of the answer is: Monad is the simplest structure that we could possibly give to side-effecting functions and be able to use them. With Monad we can do two things: we can treat a pure value as a side-effecting value (return), and we can apply a side-effecting function to a side-effecting value to get a new side-effecting value (>>=). Losing the ability to do either of these things would be crippling, so our side-effecting type needs to be "at least" Monad, and it turns out Monad is enough to implement everything we've needed to so far.
The other half is: what's the most detailed structure we could give to "possible side effects"? We can certainly think about the space of all possible side effects as a set (the only operation that requires is membership). We can combine two side effects by doing them one after another, and this will give rise to a different side effect (or possibly the same one - if the first was "shutdown computer" and the second was "write file", then the result of composing these is just "shutdown computer").
Ok, so what can we say about this operation? It's associative; that is, if we combine three side effects, it doesn't matter which order we do the combining in. If we do (write file then read socket) then shutdown computer, it's the same as doing write file then (read socket then shutdown computer). But it's not commutative: ("write file" then "delete file") is a different side effect from ("delete file" then "write file"). And we have an identity: the special side effect "no side effects" works ("no side effects" then "delete file" is the same side effect as just "delete file") At this point any mathematician is thinking "Group!" But groups have inverses, and there's no way to invert a side effect in general; "delete file" is irreversible. So the structure we have left is that of a monoid, which means our side-effecting functions should be monads.
Is there a more complex structure? Sure! We could divide possible side effects into filesystem-based effects, network-based effects and more, and we could come up with more elaborate rules of composition that preserved these details. But again it comes down to: Monad is very simple, and yet powerful enough to express most of the properties we care about. (In particular, associativity and the other axioms let us test our application in small pieces, with confidence that the side effects of the combined application will be the same as the combination of the side effects of the pieces).

It's actually quite a clean way to think of I/O in a functional way.
In most programming languages, you do input/output operations. In Haskell, imagine writing code not to do the operations, but to generate a list of the operations that you would like to do.
Monads are just pretty syntax for exactly that.
If you want to know why monads as opposed to something else, I guess the answer is that they're the best functional way to represent I/O that people could think of when they were making Haskell.

AFAIK, the reason is to be able to include side effects checks in the type system. If you want to know more, listen to those SE-Radio episodes:
Episode 108: Simon Peyton Jones on Functional Programming and Haskell
Episode 72: Erik Meijer on LINQ

Above there are very good detailed answers with theoretical background. But I want to give my view on IO monad. I am not experienced haskell programmer, so May be it is quite naive or even wrong. But i helped me to deal with IO monad to some extent (note, that it do not relates to other monads).
First I want to say, that example with "real world" is not too clear for me as we cannot access its (real world) previous states. May be it do not relates to monad computations at all but it is desired in the sense of referential transparency, which is generally presents in haskell code.
So we want our language (haskell) to be pure. But we need input/output operations as without them our program cannot be useful. And those operations cannot be pure by their nature. So the only way to deal with this we have to separate impure operations from the rest of code.
Here monad comes. Actually, I am not sure, that there cannot exist other construct with similar needed properties, but the point is that monad have these properties, so it can be used (and it is used successfully). The main property is that we cannot escape from it. Monad interface do not have operations to get rid of the monad around our value. Other (not IO) monads provide such operations and allow pattern matching (e.g. Maybe), but those operations are not in monad interface. Another required property is ability to chain operations.
If we think about what we need in terms of type system, we come to the fact that we need type with constructor, which can be wrapped around any vale. Constructor must be private, as we prohibit escaping from it(i.e. pattern matching). But we need function to put value into this constructor (here return comes to mind). And we need the way to chain operations. If we think about it for some time, we will come to the fact, that chaining operation must have type as >>= has. So, we come to something very similar to monad. I think, if we now analyze possible contradictory situations with this construct, we will come to monad axioms.
Note, that developed construct do not have anything in common with impurity. It only have properties, which we wished to have to be able to deal with impure operations, namely, no-escaping, chaining, and a way to get in.
Now some set of impure operations is predefined by the language within this selected monad IO. We can combine those operations to create new unpure operations. And all those operations will have to have IO in their type. Note however, that presence of IO in type of some function do not make this function impure. But as I understand, it is bad idea to write pure functions with IO in their type, as it was initially our idea to separate pure and impure functions.
Finally, I want to say, that monad do not turn impure operations into pure ones. It only allows to separate them effectively. (I repeat, that it is only my understanding)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string