IO Monads: GHC Implementation Meaning vs. Mathematical Meaning

IO Monads: GHC Implementation Meaning vs. Mathematical Meaning - haskell

Consider
ioFunction :: String -> IO ()
ioFunction str = do putStrLn str
putStrLn "2"
Here, ioFunction, from a mathematical point of view, is a function that takes one input and returns a value of type IO (). I presume it means nothing else. That is, from a mathematical point of view, this function returns a value and nothing else; in particular, it doesn't print anything.
So does this mean that the way Haskell uses the IO monad for imperative side-effects (in this case: that running this function will first print str and then print "2", in that order) is purely a GHC implementation detail that has nothing to do with the mathematical (and in some sense even the Haskell) meaning of the term?
EDIT: To make this question clearer, what I mean to ask is, for example, is there any difference from the mathematical point of view between the following two functions:
ioFunction1 :: String -> IO ()
ioFunction1 str = do putStrLn str
putStrLn "2"
ioFunction2 :: String -> IO ()
ioFunction2 str = do return ()
It seems the answer is no, for -- from the mathematical point of view -- they both take as input a String and return a value (presumably, the same value) of type IO (). Is this not the case?

Here, ioFunction, from a mathematical point of view, is a function that takes one input and returns a value of type IO (). I presume it means nothing else.
Yes. Exactly.
So does this mean that the way Haskell uses the IO monad for imperative side-effects (in this case: that running this function will first print str and then print "2", in that order) is purely a GHC implementation detail that has nothing to do with the mathematical (and in some sense even the Haskell) meaning of the term?
Not quite. From a mathematical (set-theory) standpoint, I would assume "same" means structurally identical. Since I don't know what makes up a value of IO (), I can't say anything about whether two values of that type are the same or not.
In fact, this is by design: by making IO opaque (in the sense that I don't know what constitutes an IO), GHC prevents me from ever being able to say that one value of type IO () is equal to another. The only things I can do with IO is exposed through functions like fmap, (<*>), (>>=), mplus, etc.

I always find it helpful to consider a simplified “imperative-language source code” implementation of the IO monad:
data IO :: * -> * where
PutStr :: String -> IO ()
GetLine :: IO String
IOPure :: a -> IO a
IOSeq :: IO a -> IO b -> IO b
...
LaunchMissiles :: IO ()
Then what ioFunction is quite clearly a proper, sensible function in the mathematical sense:
ioFunction str = do putStrLn str
putStrLn "2"
= putStr (str++"\n") >> putStrLn "2\n"
= IOSeq (PutStr (str++"\n")) (PutStrLn "2\n")
This is simply a data structure representing effectively some source code of an imperative language. ioFunction places the given argument in a specific spot in the result structure, so it's mathematically very much not just a trivial “return () and nothing else (something may happen but it's a GHC implementation detail)”.
The value is indeed completely different for ioFunction2:
ioFunction2 str = do return ()
= return ()
= IOPure ()
how do we know they are different in this sense?
That's a good question. Practically speaking, you know it by executing both programs and observing that they cause different effects, hence they must have been different. This is more than a little awkward of course – “observing what happens” isn't maths, it's physics, and the observation would scientifically require that you execute twice under exact the same environment conditions.
Worse, even with pure values that are generally taken to be mathematically the same you can observe different behaviour: print 100000000 will immediately cause the side-effect, whereas print $ last [0..100000000] takes a significant pause (which, if you follow it with a time-printing command, can actually yield different text output).
But these issues are unavoidable in a real-world context. It hence makes only sense that Haskell's does not specify any structure on IO that could with mathematical rigour be checked for equality from within the language. So, in quite some sense you really can't know that putStrLn str >> putStrLn "2" is not the same as return ().
And indeed they might be the same. It is conceivable to make a even simpler toy implementation of IO than the above thus:
data IO :: * -> * where
IONop :: IO ()
IOHang :: IO a -> IO a
and simply map any pure output operation to the no-op (and any input to an infinite loop). Then you would have
ioFunction str = do putStrLn str
putStrLn "2"
= IONop >> IONop
= IONop
and
ioFunction2 str = return ()
= IONop
This is a bit as if we've imposed an additional axiom on the natural numbers, namely 1 = 0. From that you can then conclude that every number is zero.
Clearly though, that would be a totally useless implementation. But something similar could actually make sense if you're running Haskell code in a sandbox environment and want to be utterly sure nothing bad happens. http://tryhaskell.org/ does something like this.

For simplicity, let us concentrate on the output aspect of the I/O monad. Mathematically speaking, the output (writer) monad is given by an endofunctor T with T(A) = U* × A, where U is a fixed set of characters, and U* is the set of strings over U. Then ioFunction : U* → T (), i.e. ioFunction : U* → U* × (), and ioFunction(s) = (s++"\n"++"2\n", ()). By contrast, ioFunction2(s) = ("", ()). Any implementation has to respect this difference. The difference is primarily a mathematical one, not an implementation detail.

Related

What is the IO type in Haskell

I am new to the Haskell programming language, I keep on stumbling on the IO type either as a function parameter or a return type.
playGame :: Screen -> IO ()
OR
gameRunner :: IO String -> (String -> IO ()) -> Screen -> IO ()
How does this work, I am a bit confused because I know a String expects words and an Int expects numbers. Whats does the IO used in functions expect or Return?

IO is the way how Haskell differentiates between code that is referentially transparent and code that is not. IO a is the type of an IO action that returns an a.
You can think of an IO action as a piece of code with some effect on the real world that waits to get executed. Because of this side effect, an IO action is not referentially transparent; therefore, execution order matters. It is the task of the main function of a Haskell program to properly sequence and execute all IO actions. Thus, when you write a function that returns IO a, what you are actually doing is writing a function that returns an action that eventually - when executed by main - performs the action and returns an a.
Some more explanation:
Referential transparency means that you can replace a function by its value. A referentially transparent function cannot have any side effects; in particular, a referentially transparent function cannot access any hardware resources like files, network, or keyboard, because the function value would depend on something else than its parameters.
Referentially transparent functions in a functional language like Haskell are like math functions (mappings between domain and codomain), much more than a sequence of imperative instructions on how to compute the function's value. Therefore, Haskell code says the compiler that a function is applied to its arguments, but it does not say that a function is called and thus actually computed.
Therefore, referentially transparent functions do not imply the order of execution. The Haskell compiler is free to evaluate functions in any way it sees fit - or not evaluate them at all if it is not necessary (called lazy evaluation). The only ordering arises from data dependencies, when one function requires the output of another function as input.
Real-world side effects are not referentially transparent. You can think of the real world as some sort of implicit global state that effectual functions mutate. Because of this state, the order of execution matters: It makes a difference if you first read from a database and then update it, or vice versa.
Haskell is a pure functional language, all its functions are referentially transparent and compilation rests on this guarantee. How, then, can we deal with effectful functions that manipulate some global real-world state and that need to be executed in a certain order? By introducing data dependency between those functions.
This is exactly what IO does: Under the hood, the IO type wraps an effectful function together with a dummy state paramter. Each IO action takes this dummy state as input and provides it as output. Passing this dummy state parameter from one IO action to the next creates a data dependency and thus tells the Haskell compiler how to properly sequence all the IO actions.
You don't see the dummy state parameter because it is hidden behind some syntactic sugar: the do notation in main and other IO actions, and inside the IO type.

Briefly put:
f1 :: A -> B -> C
is a function which takes two arguments of type A and B and returns a C. It does not perform any IO.
f2 :: A -> B -> IO C
is similar to f1, but can also perform IO.
f3 :: (A -> B) -> IO C
takes as an argument a function A -> B (which does not perform IO) and produces a C, possibly performing IO.
f4 :: (A -> IO B) -> IO C
takes as an argument a function A -> IO B (which can perform IO) and produces a C, possibly performing IO.
f5 :: A -> IO B -> IO C
takes as an argument a value of type A, an IO action of type IO B, and returns a value of type C, possibly performing IO (e.g. by running the IO action argument one or more times).
Example:
f6 :: IO Int -> IO Int
f6 action = do
x1 <- action
x2 <- action
putStrLn "hello!"
x3 <- action
return (x1+x2+x3)
When a function returns IO (), it returns no useful value, but can perform IO. Similar to, say, returning void in C or Java. Your
gameRunner :: IO String -> (String -> IO ()) -> Screen -> IO ()
function can be called with the following arguments:
arg1 :: IO String
arg1 = do
putStrLn "hello"
s <- readLine
return ("here: " ++ s)
arg2 :: String -> IO ()
arg2 str = do
putStrLn "hello"
putStrLn str
putStrLn "hello again"
arg3 :: Screen
arg3 = ... -- I don't know what's a Screen in your context

Let's try answering some simpler questions first:
What is the Maybe type in Haskell?
From chapter 21 (page 205) of the Haskell 2010 Report:
data Maybe a = Nothing | Just a
it's a simple partial type - you have a value (conveyed via Just) or you don't (Nothing).
How does this work?
Let's look at one possible Monad instance for Maybe:
instance Monad Maybe where
return = Just
Just x >>= k = k x
Nothing >>= _ = Nothing
This monadic interface simplifies the use of values based on Maybe constructors e.g.
instead of:
\f ox oy -> case ox of
Nothing -> Nothing
Just x -> case oy of
Nothing -> Nothing
Just y -> Just (f x y)
you can simply write this:
\f ox oy -> ox >>= \x -> oy >>= \y -> return (f x y)
The monadic interface is widely applicable: from parsing to encapsulated state, and so much more.
What does the Maybe type used in functions expect or return?
For a function expecting a Maybe-based value e.g:
maybe :: b -> (a -> b) -> Maybe a -> b
maybe _ f (Just x) = f x
maybe d _ Nothing = d
if its contents are being used in the function, then the function may have to deal with not receiving a value it can use i.e. Nothing.
For a function returning a Maybe-based value e.g:
invert :: Double -> Maybe Double
invert 0.0 = Nothing
invert d = Just (1/d)
it just needs to use the appropriate constructors.
One last point: observe how Maybe-based values are used - from starting simply (e.g. invert 0.5 or Just "here") to then define other, possibly more-elaborate Maybe-based values (with (>>=), (>>), etc) to ultimately be examined directly by pattern-matching, or abstractly by a suitable definition (maybe, fromJust et al).
Time for the original questions:
What is the IO type in Haskell?
From section 6.1.7 (page 75) of the Report:
The IO type serves as a tag for operations (actions) that interact with the outside world. The IO type is abstract: no constructors are visible to the user. IO is an instance of the Monad and Functor classes.
the crucial point being:
The IO type is abstract: no constructors are visible to the user.
No constructors? That begs the next question:
How does this work?
This is where the versatility of the monadic interface steps in: the flexibility of its two key operatives - return and (>>=) in Haskell - substantially make up for IO-based values being
abstract.
Remember that observation about how Maybe-based values are used? Well, IO-based values are used in similar fashion - starting simply (e.g. return 1, getChar or putStrLn "Hello, there!") to defining other IO-based values (with (>>=), (>>), catch, etc) to ultimately form Main.main.
But instead of pattern-matching or calling another function to extract its contents, Main.main is
processed directly by the Haskell implementation.
What does the IO used in functions expect or return?
For a function expecting a IO-based value e.g:
echo :: IO ()
echo :: getChar >>= \c -> if c == '\n'
then return ()
else putChar c >> echo
if its contents are being used in the function, then the function usually returns an IO-based value.
For a function returning a IO-based value e.g:
newLine :: IO ()
newLine = putChar '\n'
it just needs to use the appropriate definitions.

Is print in Haskell a pure function?

Is print in Haskell a pure function; why or why not? I'm thinking it's not, because it does not always return the same value as pure functions should.

A value of type IO Int is not really an Int. It's more like a piece of paper which reads "hey Haskell runtime, please produce an Int value in such and such way". The piece of paper is inert and remains the same, even if the Ints eventually produced by the runtime are different.
You send the piece of paper to the runtime by assigning it to main. If the IO action never comes in the way of main and instead languishes inside some container, it will never get executed.
Functions that return IO actions are pure like the others. They always return the same piece of paper. What the runtime does with those instructions is another matter.
If they weren't pure, we would have to think twice before changing
foo :: (Int -> IO Int) -> IO Int
foo f = liftA2 (+) (f 0) (f 0)
to:
foo :: (Int -> IO Int) -> IO Int
foo f = let x = f 0 in liftA2 (+) x x

Yes, print is a pure function. The value it returns has type IO (), which you can think of as a bunch of code that outputs the string you passed in. For each string you pass in, it always returns the same code.

If you just read the Tag of pure-function (A function that always evaluates to the same result value given the same argument value(s) and that does not cause any semantically observable side effect or output, such as mutation of mutable objects or output to I/O devices.) and then Think in the type of print:
putStrLn :: String -> IO ()
You will find a trick there, it always returns IO (), so... No, it produces effects. So in terms of Referential Transparency is not pure
For example, getLine returns IO String but it is also a pure function. (#interjay contribution), What I'm trying to say, is that the answer depends very close of the question:
On matter of value, IO () will always be the same IO () value for the same input.
And
On matter of execution, it is not pure because the execution of that
IO () could have side effects (put an string in the screen, in this
case looks so innocent, but some IO could lunch nuclear bombs, and
then return the Int 42)
You could understand better with the nice approach of #Ben here:
"There are several ways to explain how you're "purely" manipulating
the real world. One is to say that IO is just like a state monad, only
the state being threaded through is the entire world outside your
program;= (so your Stuff -> IO DBThing function really has an extra
hidden argument that receives the world, and actually returns a
DBThing along with another world; it's always called with different
worlds, and that's why it can return different DBThing values even
when called with the same Stuff). Another explanation is that an IO
DBThing value is itself an imperative program; your Haskell program is
a totally pure function doing no IO, which returns an impure program
that does IO, and the Haskell runtime system (impurely) executes the
program it returns."
And #Erik Allik:
So Haskell functions that return values of type IO a, are actually not
the functions that are being executed at runtime — what gets executed
is the IO a value itself. So these functions actually are pure but
their return values represent non-pure computations.
You can found them here Understanding pure functions in Haskell with IO

How to understand "m ()" is a monadic computation

From the document:
when :: (Monad m) => Bool -> m () -> m ()
when p s = if p then s else return ()
The when function takes a boolean argument and a monadic computation with unit () type and performs the computation only when the boolean argument is True.
===
As a Haskell newbie, my problem is that for me m () is some "void" data, but here the document mentions it as computation. Is it because of the laziness of Haskell?

Laziness has no part in it.
The m/Monad part is often called computation.
The best example might be m = IO:
Look at putStrLn "Hello" :: IO () - This is a computation that, when run, will print "Hello" to your screen.
This computation has no result - so the return type is ()
Now when you write
hello :: Bool -> IO ()
hello sayIt =
when sayIt (putStrLn "Hello")
then hello True is a computation that, when run, will print "Hello"; while hello False is a computation that when run will do nothing at all.
Now compare it to getLine :: IO String - this is a computation that, when run, will prompt you for an input and will return the input as a String - that's why the return type is String.
Does this help?

for me "m ()" is some "void" data
And that kinda makes sense, in that a computation is a special kind of data. It has nothing to do with laziness - it's associated with context.
Let's take State as an example. A function of type, say, s -> () in Haskell can only produce one value. However, a function of type s -> ((), s) is a regular function doing some transformation on s. The problem you're having is that you're only looking at the () part, while the s -> s part stays hidden. That's the point of State - to hide the state passing.
Hence State s () is trivially convertible to s -> ((), s) and back, and it still is a Monad (a computation) that produces a value of... ().
If we look at practical use, now:
(flip runState 10) $ do
modify (+1)
This expression produces a tuple of ((), Int); the Int part is hidden
It will modify the state, adding 1 to it. It produces the intermediate value of (), though, which fits your when:
when (5 > 3) $ modify (+1)

Monads are notably abstract and mathematical, so intuitive statements about them are often made in language that is rather vague. So values of a monadic type are often informally labeled as "computations," "actions" or (less often) "commands" because it's an analogy that sometimes help us reason about them. But when you dig deeper, these turn out to be empty words when used this way; ultimately what they mean is "some value of a type that provides the Monad interface."
I like the word "action" better for this, so let me go with that. The intuition for the use for that word in Haskell is this: the language makes a distinction between functions and actions:
Functions can't have any side effects, and their types look like a -> b.
Actions may have side effects, and their types look like IO a.
A consequence of this: an action of type IO () produces an uninteresting result value, and therefore it's either a no-op (return ()) or an action that is only interesting because of its side effects.
Monad then is the interface that allows you to glue actions together into complex actions.
This is all very intuitive, but it's an analogy that becomes rather stretched when you try to apply it to many monads other than the IO type. For example, lists are a monad:
instance Monad [] where
return a = [a]
as >>= f = concatMap f as
The "actions" or "computations" of the list monad are... lists. How is a list an "action" or a "computation"? The analogy is rather weak in this case, isn't it?
So I'd say that this is the best advice:
Understand that "action" and "computation" are analogies. There's no strict definition.
Understand that these analogies are stronger for some monad instances, and weak for others.
The ultimate barometer of how things work are the Monad laws and the definitions of the various functions that work with Monad.

What do parentheses () used on their own mean?

I read in learn you haskell that
Enum members are sequentially ordered types ... Types in this class:
(), Bool, Char ...
Also it appears in some signatures:
putChar :: Char -> IO ()
It is very difficult to find info about it in Google as the answers refer to problems of the "common parentheses" (use in function calls, precedence problems and the like).
Therefore, what does the expression () means? Is it a type? What are variables of type ()? What is used for and when is it needed?

It is both a type and a value. () is a special type "pronounced" unit, and it has one value: (), also pronounced unit. It is essentially the same as the type void in Java or C/C++. If you're familiar with Python, think of it as the NoneType which has the singleton None.
It is useful when you want to denote an action that doesn't return anything. It is most commonly used in the context of Monads, such as the IO monad. For example, if you had the following function:
getVal :: IO Int
getVal = do
putStrLn "Enter an integer value:"
n <- getLine
return $ read n
And for some reason you decided that you just wanted to annoy the user and throw away the number they just passed in:
getValAnnoy :: IO ()
getValAnnoy = do
_ <- getVal
return () -- Returns nothing
However, return is just a Monad function, so we could abstract this a bit further
throwAwayResult :: Monad m => m a -> m ()
throwAwayResult action = do
_ <- action
return ()
Then
getValAnnoy = throwAwayResult getVal
However, you don't have to write this function yourself, it already exists in Control.Monad as the function void that is even less constraining and works on Functors:
void :: Functor f => f a -> f ()
void fa = fmap (const ()) fa
Why does it work on Functor instead of Monad? Well, for each Monad instance, you can write the Functor instance as
instance Monad m => Functor m where
fmap f m = m >>= return . f
But you can't make a Monad out of every Functor. It's like how a square is a rectangle but a rectangle isn't always a square, Monads form a subset of Functors.

As others have said, it's the unit type which has one value called unit. In Haskell syntax this is easily, if confusingly, expressed as () :: (). We can make our own quite easily as well.
data Unit = Unit
>>> :t Unit :: Unit
Unit :: Unit
>>> :t () :: ()
() :: ()
It's written as () because it behaves very much like an "empty tuple" would. There are theoretical reasons why this holds, but honestly it makes a lot of simple intuitive sense too.
It's often used as the argument to a type constructor like IO or ST when its the context of the value that's interesting, not the value itself. This is intuitively true because if I tell you have I have a value of type () then you don't need to know anything more---there's only one of them!
putStrLn :: String -> IO () -- the return type is unimportant,
-- we just want the *effect*
map (const ()) :: [a] -> [()] -- this destroys all information about a list
-- keeping only the *length*
>>> [ (), (), () ] :: [()] -- this might as well just be the number `3`
-- there is nothing else interesting about it
forward :: [()] -> Int -- we have a pair of isomorphisms even
forward = length
backward :: Int -> [()]
backward n = replicate n ()

It is both a type and a value.
It is unit type, the type that has only one value. In Haskell its name and only value looks like empty tuple : ().

As others have said, () in Haskell is both the name of the "unit" type, and the only value of said type.
One of the confusing things in moving from imperative programming to Haskell is that the way the languages deal with the concept of "nothing" is different. What's more confusing is the vocabulary, because imperative languages and Haskell use the term "void" to mean diametrically different things.
In an imperative language, a "function" (which may not be a true mathematical function) may have "void" as its return type, as in this pseudocode example:
void sayHello() {
printLn("Hello!");
}
In this case, void means that the "function," if it returns, will not produce a result value. (The other possibility is that they function may not return—it may loop forever, or fail with an error or exception.)
In Haskell, however, all functions (and IO actions) must must produce a result. So when we write an IO action that doesn't produce any interesting return value, we make it return ():
sayHello :: IO ()
sayHello = putStrLn "Hello!"
Later actions will just ignore the () result value.
Now, you probably don't need to worry too much about this, but there is one place where this gets confusing, which is that in Haskell there is a type called Void, but it means something completely different from the imperative programming void. Because of this, the word "void" becomes a minefield when comparing Haskell and imperative languages, because the meaning is completely different when you switch paradigms.
In Haskell, Void is a type that doesn't have any values. The largest consequence of this is that in Haskell a function with return type Void can never return, it can only fail with an error or loop forever. Why? Because the function would have produce a value of type Void in order to return, but there isn't such a value.
This is however not relevant until you're working with some more advanced techniques, so you don't need to worry about it other than to beware of the word "void."
But the bigger lesson is that the imperative and the Haskell concepts of "no return value" are different. Haskell distinguishes between:
Things that may return but whose result won't have any information (the () type);
Things that cannot return, no matter what (the Void type).
The imperative void corresponds to the former, and not the latter.

Why are side-effects modeled as monads in Haskell?

Could anyone give some pointers on why the impure computations in Haskell are modelled as monads?
I mean monad is just an interface with 4 operations, so what was the reasoning to modelling side-effects in it?

Suppose a function has side effects. If we take all the effects it produces as the input and output parameters, then the function is pure to the outside world.
So, for an impure function
f' :: Int -> Int
we add the RealWorld to the consideration
f :: Int -> RealWorld -> (Int, RealWorld)
-- input some states of the whole world,
-- modify the whole world because of the side effects,
-- then return the new world.
then f is pure again. We define a parametrized data type type IO a = RealWorld -> (a, RealWorld), so we don't need to type RealWorld so many times, and can just write
f :: Int -> IO Int
To the programmer, handling a RealWorld directly is too dangerous—in particular, if a programmer gets their hands on a value of type RealWorld, they might try to copy it, which is basically impossible. (Think of trying to copy the entire filesystem, for example. Where would you put it?) Therefore, our definition of IO encapsulates the states of the whole world as well.
Composition of "impure" functions
These impure functions are useless if we can't chain them together. Consider
getLine :: IO String ~ RealWorld -> (String, RealWorld)
getContents :: String -> IO String ~ String -> RealWorld -> (String, RealWorld)
putStrLn :: String -> IO () ~ String -> RealWorld -> ((), RealWorld)
We want to
get a filename from the console,
read that file, and
print that file's contents to the console.
How would we do it if we could access the real world states?
printFile :: RealWorld -> ((), RealWorld)
printFile world0 = let (filename, world1) = getLine world0
(contents, world2) = (getContents filename) world1
in (putStrLn contents) world2 -- results in ((), world3)
We see a pattern here. The functions are called like this:
...
(<result-of-f>, worldY) = f worldX
(<result-of-g>, worldZ) = g <result-of-f> worldY
...
So we could define an operator ~~~ to bind them:
(~~~) :: (IO b) -> (b -> IO c) -> IO c
(~~~) :: (RealWorld -> (b, RealWorld))
-> (b -> RealWorld -> (c, RealWorld))
-> (RealWorld -> (c, RealWorld))
(f ~~~ g) worldX = let (resF, worldY) = f worldX
in g resF worldY
then we could simply write
printFile = getLine ~~~ getContents ~~~ putStrLn
without touching the real world.
"Impurification"
Now suppose we want to make the file content uppercase as well. Uppercasing is a pure function
upperCase :: String -> String
But to make it into the real world, it has to return an IO String. It is easy to lift such a function:
impureUpperCase :: String -> RealWorld -> (String, RealWorld)
impureUpperCase str world = (upperCase str, world)
This can be generalized:
impurify :: a -> IO a
impurify :: a -> RealWorld -> (a, RealWorld)
impurify a world = (a, world)
so that impureUpperCase = impurify . upperCase, and we can write
printUpperCaseFile =
getLine ~~~ getContents ~~~ (impurify . upperCase) ~~~ putStrLn
(Note: Normally we write getLine ~~~ getContents ~~~ (putStrLn . upperCase))
We were working with monads all along
Now let's see what we've done:
We defined an operator (~~~) :: IO b -> (b -> IO c) -> IO c which chains two impure functions together
We defined a function impurify :: a -> IO a which converts a pure value to impure.
Now we make the identification (>>=) = (~~~) and return = impurify, and see? We've got a monad.
Technical note
To ensure it's really a monad, there's still a few axioms which need to be checked too:
return a >>= f = f a
impurify a = (\world -> (a, world))
(impurify a ~~~ f) worldX = let (resF, worldY) = (\world -> (a, world )) worldX
in f resF worldY
= let (resF, worldY) = (a, worldX)
in f resF worldY
= f a worldX
f >>= return = f
(f ~~~ impurify) worldX = let (resF, worldY) = f worldX
in impurify resF worldY
= let (resF, worldY) = f worldX
in (resF, worldY)
= f worldX
f >>= (\x -> g x >>= h) = (f >>= g) >>= h
Left as exercise.

Could anyone give some pointers on why the unpure computations in Haskell are modeled as monads?
This question contains a widespread misunderstanding.
Impurity and Monad are independent notions.
Impurity is not modeled by Monad.
Rather, there are a few data types, such as IO, that represent imperative computation.
And for some of those types, a tiny fraction of their interface corresponds to the interface pattern called "Monad".
Moreover, there is no known pure/functional/denotative explanation of IO (and there is unlikely to be one, considering the "sin bin" purpose of IO), though there is the commonly told story about World -> (a, World) being the meaning of IO a.
That story cannot truthfully describe IO, because IO supports concurrency and nondeterminism.
The story doesn't even work when for deterministic computations that allow mid-computation interaction with the world.
For more explanation, see this answer.
Edit: On re-reading the question, I don't think my answer is quite on track.
Models of imperative computation do often turn out to be monads, just as the question said.
The asker might not really assume that monadness in any way enables the modeling of imperative computation.

As I understand it, someone called Eugenio Moggi first noticed that a previously obscure mathematical construct called a "monad" could be used to model side effects in computer languages, and hence specify their semantics using Lambda calculus. When Haskell was being developed there were various ways in which impure computations were modelled (see Simon Peyton Jones' "hair shirt" paper for more details), but when Phil Wadler introduced monads it rapidly became obvious that this was The Answer. And the rest is history.

Could anyone give some pointers on why the unpure computations in Haskell are modeled as monads?
Well, because Haskell is pure. You need a mathematical concept to distinguish between unpure computations and pure ones on type-level and to model programm flows in respectively.
This means you'll have to end up with some type IO a that models an unpure computation. Then you need to know ways of combining these computations of which apply in sequence (>>=) and lift a value (return) are the most obvious and basic ones.
With these two, you've already defined a monad (without even thinking of it);)
In addition, monads provide very general and powerful abstractions, so many kinds of control flow can be conveniently generalized in monadic functions like sequence, liftM or special syntax, making unpureness not such a special case.
See monads in functional programming and uniqueness typing (the only alternative I know) for more information.

As you say, Monad is a very simple structure. One half of the answer is: Monad is the simplest structure that we could possibly give to side-effecting functions and be able to use them. With Monad we can do two things: we can treat a pure value as a side-effecting value (return), and we can apply a side-effecting function to a side-effecting value to get a new side-effecting value (>>=). Losing the ability to do either of these things would be crippling, so our side-effecting type needs to be "at least" Monad, and it turns out Monad is enough to implement everything we've needed to so far.
The other half is: what's the most detailed structure we could give to "possible side effects"? We can certainly think about the space of all possible side effects as a set (the only operation that requires is membership). We can combine two side effects by doing them one after another, and this will give rise to a different side effect (or possibly the same one - if the first was "shutdown computer" and the second was "write file", then the result of composing these is just "shutdown computer").
Ok, so what can we say about this operation? It's associative; that is, if we combine three side effects, it doesn't matter which order we do the combining in. If we do (write file then read socket) then shutdown computer, it's the same as doing write file then (read socket then shutdown computer). But it's not commutative: ("write file" then "delete file") is a different side effect from ("delete file" then "write file"). And we have an identity: the special side effect "no side effects" works ("no side effects" then "delete file" is the same side effect as just "delete file") At this point any mathematician is thinking "Group!" But groups have inverses, and there's no way to invert a side effect in general; "delete file" is irreversible. So the structure we have left is that of a monoid, which means our side-effecting functions should be monads.
Is there a more complex structure? Sure! We could divide possible side effects into filesystem-based effects, network-based effects and more, and we could come up with more elaborate rules of composition that preserved these details. But again it comes down to: Monad is very simple, and yet powerful enough to express most of the properties we care about. (In particular, associativity and the other axioms let us test our application in small pieces, with confidence that the side effects of the combined application will be the same as the combination of the side effects of the pieces).

It's actually quite a clean way to think of I/O in a functional way.
In most programming languages, you do input/output operations. In Haskell, imagine writing code not to do the operations, but to generate a list of the operations that you would like to do.
Monads are just pretty syntax for exactly that.
If you want to know why monads as opposed to something else, I guess the answer is that they're the best functional way to represent I/O that people could think of when they were making Haskell.

AFAIK, the reason is to be able to include side effects checks in the type system. If you want to know more, listen to those SE-Radio episodes:
Episode 108: Simon Peyton Jones on Functional Programming and Haskell
Episode 72: Erik Meijer on LINQ

Above there are very good detailed answers with theoretical background. But I want to give my view on IO monad. I am not experienced haskell programmer, so May be it is quite naive or even wrong. But i helped me to deal with IO monad to some extent (note, that it do not relates to other monads).
First I want to say, that example with "real world" is not too clear for me as we cannot access its (real world) previous states. May be it do not relates to monad computations at all but it is desired in the sense of referential transparency, which is generally presents in haskell code.
So we want our language (haskell) to be pure. But we need input/output operations as without them our program cannot be useful. And those operations cannot be pure by their nature. So the only way to deal with this we have to separate impure operations from the rest of code.
Here monad comes. Actually, I am not sure, that there cannot exist other construct with similar needed properties, but the point is that monad have these properties, so it can be used (and it is used successfully). The main property is that we cannot escape from it. Monad interface do not have operations to get rid of the monad around our value. Other (not IO) monads provide such operations and allow pattern matching (e.g. Maybe), but those operations are not in monad interface. Another required property is ability to chain operations.
If we think about what we need in terms of type system, we come to the fact that we need type with constructor, which can be wrapped around any vale. Constructor must be private, as we prohibit escaping from it(i.e. pattern matching). But we need function to put value into this constructor (here return comes to mind). And we need the way to chain operations. If we think about it for some time, we will come to the fact, that chaining operation must have type as >>= has. So, we come to something very similar to monad. I think, if we now analyze possible contradictory situations with this construct, we will come to monad axioms.
Note, that developed construct do not have anything in common with impurity. It only have properties, which we wished to have to be able to deal with impure operations, namely, no-escaping, chaining, and a way to get in.
Now some set of impure operations is predefined by the language within this selected monad IO. We can combine those operations to create new unpure operations. And all those operations will have to have IO in their type. Note however, that presence of IO in type of some function do not make this function impure. But as I understand, it is bad idea to write pure functions with IO in their type, as it was initially our idea to separate pure and impure functions.
Finally, I want to say, that monad do not turn impure operations into pure ones. It only allows to separate them effectively. (I repeat, that it is only my understanding)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string