When is unsafeInterleaveIO unsafe?

When is unsafeInterleaveIO unsafe? - haskell

Unlike other unsafe* operations, the documentation for unsafeInterleaveIO is not very clear about its possible pitfalls. So exactly when is it unsafe? I would like to know the condition for both parallel/concurrent and the single threaded usage.
More specifically, are the two functions in the following code semantically equivalent? If not, when and how?
joinIO :: IO a -> (a -> IO b) -> IO b
joinIO a f = do !x <- a
!x' <- f x
return x'
joinIO':: IO a -> (a -> IO b) -> IO b
joinIO' a f = do !x <- unsafeInterleaveIO a
!x' <- unsafeInterleaveIO $ f x
return x'
Here's how I would use this in practice:
data LIO a = LIO {runLIO :: IO a}
instance Functor LIO where
fmap f (LIO a) = LIO (fmap f a)
instance Monad LIO where
return x = LIO $ return x
a >>= f = LIO $ lazily a >>= lazily . f
where
lazily = unsafeInterleaveIO . runLIO
iterateLIO :: (a -> LIO a) -> a -> LIO [a]
iterateLIO f x = do
x' <- f x
xs <- iterateLIO f x' -- IO monad would diverge here
return $ x:xs
limitLIO :: (a -> LIO a) -> a -> (a -> a -> Bool) -> LIO a
limitLIO f a converged = do
xs <- iterateLIO f a
return . snd . head . filter (uncurry converged) $ zip xs (tail xs)
root2 = runLIO $ limitLIO newtonLIO 1 converged
where
newtonLIO x = do () <- LIO $ print x
LIO $ print "lazy io"
return $ x - f x / f' x
f x = x^2 -2
f' x = 2 * x
converged x x' = abs (x-x') < 1E-15
Although I would rather avoid using this code in serious applications because of the terrifying unsafe* stuff, I could at least be lazier than would be possible with the stricter IO monad in deciding what 'convergence' means, leading to (what I think is) more idiomatic Haskell. And this brings up another question:why is it not the default semantics for Haskell's (or GHC's?) IO monad? I've heard some resource management issues for lazy IO (which GHC only provides by a small fixed set of commands), but the examples typically given somewhat resemble like a broken makefile:a resource X depends on a resource Y, but if you fail to specify the dependency, you get an undefined status for X. Is lazy IO really the culprit for this problem? (On the other hand, if there is a subtle concurrency bug in the above code such as deadlocks I would take it as a more fundamental problem.)
Update
Reading Ben's and Dietrich's answer and his comments below, I have briefly browsed the ghc source code to see how the IO monad is implemented in GHC. Here I summerize my few findings.
GHC implements Haskell as an impure, non-referentially-transparent language. GHC's runtime operates by successively evaluating impure functions with side effects just like any other functional languages. This is why the evaluation order matters.
unsafeInterleaveIO is unsafe because it can introduce any kind of concurrency bugs even in a sigle-threaded program by exposing the (usually) hidden impurity of GHC's Haskell. (iteratee seems to be a nice and elegant solution for this, and I will certainly learn how to use it.)
the IO monad must be strict because a safe, lazy IO monad would require a precise (lifted) representation of the RealWorld, which seems impossible.
It's not just the IO monad and unsafe functions that are unsafe. The whole Haskell (as implemented by GHC) is potentially unsafe, and 'pure' functions in (GHC's) Haskell are only pure by convention and the people's goodwill. Types can never be a proof for purity.
To see this, I demonstrate how GHC's Haskell is not referentially transparent regardless of the IO monad, regardless of the unsafe* functions,etc.
-- An evil example of a function whose result depends on a particular
-- evaluation order without reference to unsafe* functions or even
-- the IO monad.
{-# LANGUAGE MagicHash #-}
{-# LANGUAGE UnboxedTuples #-}
{-# LANGUAGE BangPatterns #-}
import GHC.Prim
f :: Int -> Int
f x = let v = myVar 1
-- removing the strictness in the following changes the result
!x' = h v x
in g v x'
g :: MutVar# RealWorld Int -> Int -> Int
g v x = let !y = addMyVar v 1
in x * y
h :: MutVar# RealWorld Int -> Int -> Int
h v x = let !y = readMyVar v
in x + y
myVar :: Int -> MutVar# (RealWorld) Int
myVar x =
case newMutVar# x realWorld# of
(# _ , v #) -> v
readMyVar :: MutVar# (RealWorld) Int -> Int
readMyVar v =
case readMutVar# v realWorld# of
(# _ , x #) -> x
addMyVar :: MutVar# (RealWorld) Int -> Int -> Int
addMyVar v x =
case readMutVar# v realWorld# of
(# s , y #) ->
case writeMutVar# v (x+y) s of
s' -> x + y
main = print $ f 1
Just for easy reference, I collected some of the relevant definitions
for the IO monad as implemented by GHC.
(All the paths below are relative to the top directory of the ghc's source repository.)
-- Firstly, according to "libraries/base/GHC/IO.hs",
{-
The IO Monad is just an instance of the ST monad, where the state is
the real world. We use the exception mechanism (in GHC.Exception) to
implement IO exceptions.
...
-}
-- And indeed in "libraries/ghc-prim/GHC/Types.hs", We have
newtype IO a = IO (State# RealWorld -> (# State# RealWorld, a #))
-- And in "libraries/base/GHC/Base.lhs", we have the Monad instance for IO:
data RealWorld
instance Functor IO where
fmap f x = x >>= (return . f)
instance Monad IO where
m >> k = m >>= \ _ -> k
return = returnIO
(>>=) = bindIO
fail s = failIO s
returnIO :: a -> IO a
returnIO x = IO $ \ s -> (# s, x #)
bindIO :: IO a -> (a -> IO b) -> IO b
bindIO (IO m) k = IO $ \ s -> case m s of (# new_s, a #) -> unIO (k a) new_s
unIO :: IO a -> (State# RealWorld -> (# State# RealWorld, a #))
unIO (IO a) = a
-- Many of the unsafe* functions are defined in "libraries/base/GHC/IO.hs":
unsafePerformIO :: IO a -> a
unsafePerformIO m = unsafeDupablePerformIO (noDuplicate >> m)
unsafeDupablePerformIO :: IO a -> a
unsafeDupablePerformIO (IO m) = lazy (case m realWorld# of (# _, r #) -> r)
unsafeInterleaveIO :: IO a -> IO a
unsafeInterleaveIO m = unsafeDupableInterleaveIO (noDuplicate >> m)
unsafeDupableInterleaveIO :: IO a -> IO a
unsafeDupableInterleaveIO (IO m)
= IO ( \ s -> let
r = case m s of (# _, res #) -> res
in
(# s, r #))
noDuplicate :: IO ()
noDuplicate = IO $ \s -> case noDuplicate# s of s' -> (# s', () #)
-- The auto-generated file "libraries/ghc-prim/dist-install/build/autogen/GHC/Prim.hs"
-- list types of all the primitive impure functions. For example,
data MutVar# s a
data State# s
newMutVar# :: a -> State# s -> (# State# s,MutVar# s a #)
-- The actual implementations are found in "rts/PrimOps.cmm".
So, for example, ignoring the constructor and assuming referential transparency,
we have
unsafeDupableInterleaveIO m >>= f
==> (let u = unsafeDupableInterleaveIO)
u m >>= f
==> (definition of (>>=) and ignore the constructor)
\s -> case u m s of
(# s',a' #) -> f a' s'
==> (definition of u and let snd# x = case x of (# _,r #) -> r)
\s -> case (let r = snd# (m s)
in (# s,r #)
) of
(# s',a' #) -> f a' s'
==>
\s -> let r = snd# (m s)
in
case (# s, r #) of
(# s', a' #) -> f a' s'
==>
\s -> f (snd# (m s)) s
This is not what we would normally get from binding usual lazy state monads.
Assuming the state variable s carries some real meaning (which it does not), it looks more like a concurrent IO (or interleaved IO as the function rightly says) than a lazy IO as we would normally mean by 'lazy state monad' wherein despite the laziness the states are properly threaded by an associative operation.
I tried to implement a truely lazy IO monad, but soon realized that in order to define a lazy monadic composition for the IO datatype, we need to be able to lift/unlift the RealWorld. However this seems impossible because there is no constructor for both State# s and RealWorld. And even if that were possible, I would then have to represent the precise, functional represenation of our RealWorld which is impossible,too.
But I'm still not sure whether the standard Haskell 2010 breaks referential transparency or the lazy IO is bad in itself. At least it seems entirely possible to build a small model of the RealWorld on which the lazy IO is perfectly safe and predictable. And there might be a good enough approximation that serves many practical purposes without breaking the referential transparency.

At the top, the two functions you have are always identical.
v1 = do !a <- x
y
v2 = do !a <- unsafeInterleaveIO x
y
Remember that unsafeInterleaveIO defers the IO operation until its result is forced -- yet you are forcing it immediately by using a strict pattern match !a, so the operation is not deferred at all. So v1 and v2 are exactly the same.
In general
In general, it is up to you to prove that your use of unsafeInterleaveIO is safe. If you call unsafeInterleaveIO x, then you have to prove that x can be called at any time and still produce the same output.
Modern sentiment about Lazy IO
...is that Lazy IO is dangerous and a bad idea 99% of the time.
The chief problem that it is trying to solve is that IO has to be done in the IO monad, but you want to be able to do incremental IO and you don't want to rewrite all of your pure functions to call IO callbacks to get more data. Incremental IO is important because it uses less memory, allowing you to operate on data sets that don't fit in memory without changing your algorithms too much.
Lazy IO's solution is to do IO outside of the IO monad. This is not generally safe.
Today, people are solving the problem of incremental IO in different ways by using libraries like Conduit or Pipes. Conduit and Pipes are much more deterministic and well-behaved than Lazy IO, solve the same problems, and do not require unsafe constructs.
Remember that unsafeInterleaveIO is really just unsafePerformIO with a different type.
Example
Here is an example of a program that is broken due to lazy IO:
rot13 :: Char -> Char
rot13 x
| (x >= 'a' && x <= 'm') || (x >= 'A' && x <= 'M') = toEnum (fromEnum x + 13)
| (x >= 'n' && x <= 'z') || (x >= 'N' && x <= 'Z') = toEnum (fromEnum x - 13)
| otherwise = x
rot13file :: FilePath -> IO ()
rot13file path = do
x <- readFile path
let y = map rot13 x
writeFile path y
main = rot13file "test.txt"
This program will not work. Replacing the lazy IO with strict IO will make it work.
Links
From Lazy IO breaks purity by Oleg Kiselyov on the Haskell mailing list:
We demonstrate how lazy IO breaks referential transparency. A pure
function of the type Int->Int->Int gives different integers depending
on the order of evaluation of its arguments. Our Haskell98 code uses
nothing but the standard input. We conclude that extolling the purity
of Haskell and advertising lazy IO is inconsistent.
...
Lazy IO should not be considered good style. One of the common
definitions of purity is that pure expressions should evaluate to the
same results regardless of evaluation order, or that equals can be
substituted for equals. If an expression of the type Int evaluates to
1, we should be able to replace every occurrence of the expression with
1 without changing the results and other observables.
From Lazy vs correct IO by Oleg Kiselyov on the Haskell mailing list:
After all, what could be more against
the spirit of Haskell than a `pure' function with observable side
effects. With Lazy IO, one indeed has to choose between correctness
and performance. The appearance of such code is especially strange
after the evidence of deadlocks with Lazy IO, presented on this list
less than a month ago. Let alone unpredictable resource usage and
reliance on finalizers to close files (forgetting that GHC does not
guarantee that finalizers will be run at all).
Kiselyov wrote the Iteratee library, which was the first real alternative to lazy IO.

Laziness means that when (and whether) exactly a computation is actually carried out depends on when (and whether) the runtime implementation decides it needs the value. As a Haskell programmer you completely relinquish control over the evaluation order (except by the data dependencies inherent in your code, and when you start playing with strictness to force the runtime to make certain choices).
That's great for pure computations, because the result of a pure computation will be exactly the same whenever you do it (except that if you carry out computations that you don't actually need, you might encounter errors or fail to terminate, when another evaluation order might allow the program to terminate successfully; but all non-bottom values computed by any evaluation order will be the same).
But when you're writing IO-dependent code, evaluation order matters. The whole point of IO is to provide a mechanism for building computations whose steps depend on and affect the world outside the program, and an important part of doing that is that those steps are explicitly sequenced. Using unsafeInterleaveIO throws away that explicit sequencing, and relinquishes control of when (and whether) the IO operation is actually carried out to the runtime system.
This is unsafe in general for IO operations, because there may be dependencies between their side-effects which cannot be inferred from the data dependencies inside the program. For example, one IO action might create a file with some data in it, and another IO action might read the same file. If they're both executed "lazily", then they'll only get run when the resulting Haskell value is needed. Creating the file is probably IO () though, and it's quite possible that the () is never needed. That could mean that the read operation is carried out first, either failing or reading data that was already in the file, but not the data that should have been put there by the other operation. There's no guarantee that the runtime system will execute them in the right order. To program correctly with a system that always did this for IO you'd have to be able to accurately predict the order in which the Haskell runtime will choose to perform the various IO actions.
Treat unsafeInterlaveIO as promise to the compiler (which it cannot verify, it's just going to trust you) that it doesn't matter when the IO action is carried out, or whether it's elided entirely. This is really what all the unsafe* functions are; they provide facilities that are not safe in general, and for which safety cannot be automatically checked, but which can be safe in particular instances. The onus is on you to ensure that your use of them is in fact safe. But if you make a promise to the compiler, and your promise is false, then unpleasant bugs can be the result. The "unsafe" in the name is to scare you into thinking about your particular case and deciding whether you really can make the promise to the compiler.

Basically everything under "Update" in the question is so confused it's not even wrong, so please try to forget it when you're trying to understand my answer.
Look at this function:
badLazyReadlines :: Handle -> IO [String]
badLazyReadlines h = do
l <- unsafeInterleaveIO $ hGetLine h
r <- unsafeInterleaveIO $ badLazyReadlines h
return (l:r)
In addition to what I'm trying to illustrate: the above function also doesn't handle reaching the end of the file. But ignore that for now.
main = do
h <- openFile "example.txt" ReadMode
lns <- badLazyReadlines h
putStrLn $ lns ! 4
This will print the first line of "example.txt", because the 5th element in the list is actually the first line that's read from the file.

Your joinIO and joinIO' are not semantically equivalent. They will usually be the same, but there's a subtlety involved: a bang pattern makes a value strict, but that's all it does. Bang patterns are implemented using seq, and that does not enforce a particular evaluation order, in particular the following two are semantically equivalent:
a `seq` b `seq` c
b `seq` a `seq` c
GHC can evaluate either b or a first before returning c. Indeed, it can evaluate c first, then a and b, then return c. Or, if it can statically prove a or b are non-bottom, or that c is bottom, it doesn't have to evaluate a or b at all. Some optimisations do genuinely make use of this fact, but it doesn't come up very often in practice.
unsafeInterleaveIO, by contrast, is sensitive to all or any of those changes – it does not depend on the semantic property of how strict some function is, but the operational property of when something is evaluated. So all of the above transformations are visible to it, which is why it's only reasonable to view unsafeInterleaveIO as performing its IO non-deterministically, more or less whenever it feels appropriate.
This is, in essence, why unsafeInterleaveIO is unsafe - it is the only mechanism in normal use that can detect transformations that ought to be meaning-preserving. It's the only way you can detect evaluation, which by rights ought to be impossible.
As an aside, it's probably fair to mentally prepend unsafe to every function from GHC.Prim, and probably several other GHC. modules as well. They're certainly not ordinary Haskell.

Related

How to interact with pure algorithm in IO code

To illustrate the point with a trivial example, say I have implemented filter:
filter :: (a -> Bool) -> [a] -> [a]
And I have a predicate p that interacts with the real world:
p :: a -> IO Bool
How do it make it work with filter without writing a separate implementation:
filterIO :: (a -> IO Bool) -> [a] -> IO [a]
Presumably if I can turn p into p':
p': IO (a -> Bool)
Then I can do
main :: IO ()
main = do
p'' <- p'
print $ filter p'' [1..100]
But I haven't been able to find the conversion.
Edited:
As people have pointed out in the comment, such a conversion doesn't make sense as it would break the encapsulation of the IO Monad.
Now the question is, can I structure my code so that the pure and IO versions don't completely duplicate the core logic?

How do it make it work with filter without writing a separate implementation
That isn't possible and the fact this sort of thing isn't possible is by design - Haskell places firm limits on its types and you have to obey them. You cannot sprinkle IO all over the place willy-nilly.
Now the question is, can I structure my code so that the pure and IO versions don't completely duplicate the core logic?
You'll be interested in filterM. Then, you can get both the functionality of filterIO by using the IO monad and the pure functionality using the Identity monad. Of course, for the pure case, you now have to pay the extra price of wrapping/unwrapping (or coerceing) the Identity wrapper. (Side remark: since Identity is a newtype this is only a code readability cost, not a runtime one.)
ghci> data Color = Red | Green | Blue deriving (Read, Show, Eq)
Here is a monadic example (note that the lines containing only Red, Blue, and Blue are user-entered at the prompt):
ghci> filterM (\x -> do y<-readLn; pure (x==y)) [Red,Green,Blue]
Red
Blue
Blue
[Red,Blue] :: IO [Color]
Here is a pure example:
ghci> filterM (\x -> Identity (x /= Green)) [Red,Green,Blue]
Identity [Red,Blue] :: Identity [Color]

As already said, you can use filterM for this specific task. However, it is usually better to keep with Haskell's characteristic strict seperation of IO and calculations. In your case, you can just tick off all necessary IO in one go and then do the interesting filtering in nice, reliable, easily testable pure code (i.e. here, simply with the normal filter):
type A = Int
type Annotated = (A, Bool)
p' :: Annotated -> Bool
p' = snd
main :: IO ()
main = do
candidates <- forM [1..100] $ \n -> do
permitted <- p n
return (n, permitted)
print $ fst <$> filter p' candidates
Here, we first annotate each number with a flag indicating what the environment says. This flag can then simply be read out in the actual filtering step, without requiring any further IO.
In short, this would be written:
main :: IO ()
main = do
candidates <- forM [1..100] $ \n -> (n,) <$> p n
print $ fst <$> filter snd candidates
While it is not feasible for this specific task, I'd also add that you can in principle achieve the IO seperation with something like your p'. This requires that the type A is “small enough” that you can evaluate the predicate with all values that are possible at all. For instance,
import qualified Data.Map as Map
type A = Char
p' :: IO (A -> Bool)
p' = (Map.!) . Map.fromList <$> mapM (\c -> (c,) <$> p c) ['\0'..]
This evaluates the predicate once for all of the 1114112 chars there are and stores the results in a lookup table.

What is the intuitive meaning of "join"?

What is the intuitive meaning of join for a Monad?
The monads-as-containers analogies make sense to me, and inside these analogies join makes sense. A value is double-wrapped and we unwrap one layer. But as we all know, a monad is not a container.
How might one write sensible, understandable code using join in normal circumstances, say when in IO?

An action :: IO (IO a) is a way of producing a way of producing an a. join action, then, is a way of producing an a by running the outermost producer of action, taking the producer it produced and then running that as well, to finally get to that juicy a.

join collapses consecutive layers of the type constructor.
A valid join must satisfy the property that, for any number of consecutive applications of the type constructor, it shouldn't matter the order in which we collapse the layers.
For example
ghci> let lolol = [[['a'],['b','c']],[['d'],['e']]]
ghci> lolol :: [[[Char]]]
ghci> lolol :: [] ([] ([] Char)) -- the type can also be expressed like this
ghci> join (fmap join lolol) -- collapse inner layers first
"abcde"
ghci> join (join lolol) -- collapse outer layers first
"abcde"
(We used fmap to "get inside" the outer monadic layer so that we could collapse the inner layers first.)
A small non container example where join is useful: for the function monad (->) a, join is equivalent to \f x -> f x x, a function of type (a -> a -> b) -> a -> b that applies two times the same argument to another function.

For the List monad, join is simply concat, and concatMap is join . fmap.
So join implicitly appears in any list expression which uses concat
or concatMap.
Suppose you were asked to find all of the numbers which are divisors of any
number in an input list. If you have a divisors function:
divisors :: Int -> [Int]
divisors n = [ d | d <- [1..n], mod n d == 0 ]
you might solve the problem like this:
foo xs = concat $ (map divisors xs)
Here we are thinking of solving the problem by first mapping the
divisors function over all of the input elements and then concatenating
all of the resulting lists. You might even think that this is a very
"functional" way of solving the problem.
Another approch would be to write a list comprehension:
bar xs = [ d | x <- xs, d <- divisors x ]
or using do-notation:
bar xs = do x <- xs
d <- divisors
return d
Here it might be said we're thinking a little more
imperatively - first draw a number from the list xs; then draw
a divisors from the divisors of the number and yield it.
It turns out, though, that foo and bar are exactly the same function.
Morever, these two approaches are exactly the same in any monad.
That is, for any monad, and appropriate monadic functions f and g:
do x <- f
y <- g x is the same as: (join . fmap g) f
return y
For instance, in the IO monad if we set f = getLine and g = readFile,
we have:
do x <- getLine
y <- readFile x is the same as: (join . fmap readFile) getLine
return y
The do-block is a more imperative way of expressing the action: first read a
line of input; then treat returned string as a file name, read the contents
of the file and finally return the result.
The equivalent join expression seems a little unnatural in the IO-monad.
However it shouldn't be as we are using it in exactly the same way as we
used concatMap in the first example.

Given an action that produces another action, run the action and then run the action that it produces.
If you imagine some kind of Parser x monad that parses an x, then Parser (Parser x) is a parser that does some parsing, and then returns another parser. So join would flatten this into a Parser x that just runs both actions and returns the final x.
Why would you even have a Parser (Parser x) in the first place? Basically, because fmap. If you have a parser, you can fmap a function that changes the result over it. But if you fmap a function that itself returns a parser, you end up with a Parser (Parser x), where you probably want to just run both actions. join implements "just run both actions".
I like the parsing example because a parser typically has a runParser function. And it's clear that a Parser Int is not an integer. It's something that can parse an integer, after you give it some input to parse from. I think a lot of people end up thinking of an IO Int as being just a normal integer but with this annoying IO bit that you can't get rid of. It isn't. It's an unexecuted I/O operation. There's no integer "inside" it; the integer doesn't exist until you actually perform the I/O.

I find these things easier to interpret by writing out the types and refactoring them a bit to reveal what the functions do.
Reader monad
The Reader type is defined thus, and its join function has the type shown:
newtype Reader r a = Reader { runReader :: r -> a }
join :: Reader r (Reader r a) -> Reader r a
Since this is a newtype, this means that the type Reader r a is isomorphic to r -> a. So we can refactor the type definition to give us this type that, albeit it's not the same, it's really "the same" with scare quotes:
In the (->) r monad, which is isomorphic to Reader r, join is the function:
join :: (r -> r -> a) -> r -> a
So the Reader join is the function that takes a two-place function (r -> r -> a) and applies to the same value at both its argument positions.
Writer monad
Since the Writer type has this definition:
newtype Writer w a = Writer { runWriter :: (a, w) }
...then when we remove the newtype, its join function has a type isomorphic to:
join :: Monoid w => ((a, w), w) -> (a, w)
The Monoid constraint needs to be there because the Monad instance for Writer requires it, and it lets us guess right away what the function does:
join ((a, w0), w1) = (a, w0 <> w1)
State monad
Similarly, since State has this definition:
newtype State s a = State { runState :: s -> (a, s) }
...then its join is like this:
join :: (s -> (s -> (a, s), s)) -> s -> (a, s)
...and you can also venture just writing it directly:
join f s0 = (a, s2)
where
(g, s1) = f s0
(a, s2) = g s1
{- Here's the "map" to the variable names in the function:
f g s2 s1 s0 s2
join :: (s -> (s -> (a, s ), s )) -> s -> (a, s )
-}
If you stare at this type a bit, you might think that it bears some resemblance to both the Reader and Writer's types for their join operations. And you'd be right! The Reader, Writer and State monads are all instances of a more general pattern called update monads.
List monad
join :: [[a]] -> [a]
As other people have pointed out, this is the type of the concat function.
Parsing monads
Here comes a really neat thing to realize. Very often, "fancy" monads turn out to be combinations or variants of "basic" ones like Reader, Writer, State or lists. So often what I do when confronted with a novel monad is ask: which of the basic monads does it resemble, and how?
Take for example parsing monads, which have been brought up in other answers here. A simplistic parser monad (with no support for important things like error reporting) looks like this:
newtype Parser a = Parser { runParser :: String -> [(a, String)] }
A Parser is a function that takes a string as input, and returns a list of candidate parses, where each candidate parse is a pair of:
A parse result of type a;
The leftovers (the suffix of the input string that was not consumed in that parse).
But notice that this type looks very much like the state monad:
newtype Parser a = Parser { runParser :: String -> [(a, String)] }
newtype State s a = State { runState :: s -> (a, s) }
And this is no accident! Parser monads are nondeterministic state monads, where the state is the unconsumed portion of the input string, and parse steps generate alternatives that may be later rejected in light of further input. List monads are often called "nondeterminism" monads, so it's no surprise that a parser resembles a mix of the state and list monads.
And this intuition can be systematized by using monad transfomers. The state monad transformer is defined like this:
newtype StateT s m a = StateT { runStateT :: s -> m (a, s) }
Which means that the Parser type from above can be written like this as well:
type Parser a = StateT String [] a
...and its Monad instance follows mechanically from those of StateT and [].
The IO monad
Imagine we could enumerate all of the possible primitive IO actions, somewhat like this:
{-# LANGUAGE GADTs #-}
data Command a where
-- An action that writes a char to stdout
putChar :: Char -> Command ()
-- An action that reads a char from stdin
getChar :: Command Char
-- ...
Then we could think of the IO type as this (which I've adapted from the highly-recommended Operational monad tutorial):
data IO a where
-- An `IO` action that just returns a constant value.
Return :: a -> IO a
-- An action that binds the result of a `Command` to
-- a function that computes the next step after it.
Bind :: Command x -> (x -> IO a) -> IO a
instance Monad IO where ...
Then join action would then look like this:
join :: IO (IO a) -> IO a
-- If the action is just `Return`, then its payload already
-- is what we need to return.
join (Return ioa) = ioa
-- If the action is a `Bind`, then its "next step" function
-- `f` produces `IO (IO a)`, so we can just recursively stick
-- a `join` to its result end.
join (Bind cmd f) = Bind cmd (join . f)
So all that the join does here is "chase down" the IO action until it sees a result that fits the pattern Return (ma :: IO a), and strip out the outer Return.
So what did I do here? Just like for parser monads, I just defined (or rather copied) a toy model of the IO type that has the virtue of being transparent. Then I work out the behavior of join from the toy model.

Practical Implications of runST vs unsafePerformIO

I want something like
f :: [forall m. (Mutable v) (PrimState m) r -> m ()] -> v r -> v r -- illegal signature
f gs x = runST $ do
y <- thaw x
foldM_ (\_ g -> g y) undefined gs -- you get the idea
unsafeFreeze y
I'm essentially in the same position I was in this question where Vitus commented:
[I]f you want keep polymorphic functions inside some structure, you need either specialized data type (e.g. newtype I = I (forall a. a -> a)) or ImpredicativeTypes.
Also, see this question. The problem is, these are both really ugly solutions. So I've come up with a third alternative, which is to avoid the polymorphism altogether by running what "should" be a ST computation in IO instead. Thus f becomes:
f :: [(Mutable v) RealWorld r -> IO ()] -> v r -> v r
f gs x = unsafePerformIO $ do
y <- thaw x
foldM_ (\_ g -> g y) undefined gs -- you get the idea
unsafeFreeze y
I feel slightly dirty for going the unsafe IO route compared the to the "safe" ST route, but if my alternative is a wrapper or impredicative types... Apparently, I'm not alone.
Is there any reason I shouldn't use unsafePerformIO here? In this case, is it really unsafe at all? Are there performance considerations or anything else I should be aware of?
--------------EDIT----------------
An answer below shows me how to get around this problem altogether, which is great. But I'm still interested in the original question (implicaitons of runST vs unsafePerformIO when using mutable vectors) for educational purposes.

I can't say I understand the problem statement completely yet, but the following file compiles without error under GHC 7.6.2. It has the same body as your first example (and in particular doesn't call unsafePerformIO at all); the primary difference is that the forall is moved outside of all type constructors.
{-# LANGUAGE RankNTypes #-}
import Control.Monad
import Control.Monad.Primitive (PrimState)
import Control.Monad.ST
import Data.Vector.Generic hiding (foldM_)
f :: Vector v r => (forall m. [Mutable v (PrimState m) r -> m ()]) -> v r -> v r
f gs x = runST $ do
y <- thaw x
foldM_ (\_ g -> g y) undefined gs
unsafeFreeze y
Now let's tackle the the ST vs IO question. The reason it's called unsafePerformIO and not unusablePerformIO is because it comes with a proof burden that can't be checked by the compiler: the thing you are running unsafePerformIO on must behave as if it is referentially transparent. Since ST actions come with a (compiler-checked) proof that they behave transparently when executed with runST, this means there is no more danger in using unsafePerformIO on code that would typecheck in ST than there is in using runST.
BUT: there is danger from a software engineering standpoint. Since the proof is no longer compiler-checked, it's much easier for future refactoring to violate the conditions under which it's safe to use unsafePerformIO. So if it is possible to avoid it (as it seems to be here), you should take efforts to do so. (Additionally, "there is no more danger" doesn't mean "there is no danger": the unsafeFreeze call you are making has its own proof burden that you must satisfy; but then you already had to satisfy that proof burden for the ST code to be correct.)

What are the definitions for >>= and return for the IO monad?

After seeing how the List and Maybe monads are defined, I naturally became curious about how
the operations >>= and return are defined for the IO monad.

There is no specific implementation for IO; it's an abstract type, with the exact implementation left undefined by the Haskell Report. Indeed, there's nothing stopping an implementation implementing IO and its Monad instance as compiler primitives, with no Haskell implementation at all.
Basically, Monad is used as an interface to IO, which cannot itself be implemented in pure Haskell. That's probably all you need to know at this stage, and diving into implementation details is likely to just confuse, rather than give insight.
That said, if you look at GHC's source code, you'll find that it represents IO a as a function looking like State# RealWorld -> (# State# RealWorld, a #) (using an unboxed tuple as the return type), but this is misleading; it's an implementation detail, and these State# RealWorld values do not actually exist at runtime. IO is not a state monad,1 in theory or in practice.
Instead, GHC uses impure primitives to implement these IO operations; the State# RealWorld "values" are only to stop the compiler reordering statements by introducing data dependencies from one statement to the next.
But if you really want to see GHC's implementation of return and (>>=), here they are:
returnIO :: a -> IO a
returnIO x = IO $ \ s -> (# s, x #)
bindIO :: IO a -> (a -> IO b) -> IO b
bindIO (IO m) k = IO $ \ s -> case m s of (# new_s, a #) -> unIO (k a) new_s
where unIO simply unwraps the function from inside the IO constructor.
It's important to note that IO a represents a description of an impure computation that could be run to produce a value of type a. The fact that there's a way to get values out of GHC's internal representation of IO doesn't mean that this holds in general, or that you can do such a thing for all monads. It's purely an implementation detail on the part of GHC.
1 The state monad is a monad used for accessing and mutating a state across a series of computations; it's represented as s -> (a, s) (where s is the type of state), which looks very similar to the type GHC uses for IO, thus the confusion.

You will be disappointed, but the >>= in IO monad isn't that interesting. To quote the GHC source:
{- |
A value of type #'IO' a# is a computation which, when performed,
does some I\/O before returning a value of type #a#.
There is really only one way to \"perform\" an I\/O action: bind it to
#Main.main# in your program. When your program is run, the I\/O will
be performed. It isn't possible to perform I\/O from an arbitrary
function, unless that function is itself in the 'IO' monad and called
at some point, directly or indirectly, from #Main.main#.
'IO' is a monad, so 'IO' actions can be combined using either the do-notation
or the '>>' and '>>=' operations from the 'Monad' class.
-}
newtype IO a = IO (State# RealWorld -> (# State# RealWorld, a #))
That means IO monad is declared as the instance of State State# monad, and its >>= is defined there (and its implementation is fairly easy to guess).
See IO inside article on Haskell wiki for more details about the IO monad.
It is also helpful to look at the Haskell docs, where every position has small "Source" link on the right.
Update: And there goes another disappointment, which is my answer, because I didn't notice the '#' in State#.
However IO behaves like State monad carrying abstract RealWorld state
As #ehird wrote State# is compiler's internal and >>= for the IO monad is defined in GHC.Base module:
instance Monad IO where
{-# INLINE return #-}
{-# INLINE (>>) #-}
{-# INLINE (>>=) #-}
m >> k = m >>= \ _ -> k
return = returnIO
(>>=) = bindIO
fail s = failIO s
returnIO :: a -> IO a
returnIO x = IO $ \ s -> (# s, x #)
bindIO :: IO a -> (a -> IO b) -> IO b
bindIO (IO m) k = IO $ \ s -> case m s of (# new_s, a #) -> unIO (k a) new_s

They don't do anything special, and are just there for sequencing actions. It would help if you think of them with different names:
>>= becomes "and then, using the result of the previous action,"
>> becomes "and then,"
return becomes "do nothing, but the result of doing nothing is"
This turns this function:
main :: IO ()
main = putStr "hello"
>> return " world"
>>= putStrLn
becomes:
main :: IO ()
main = putStr "hello" and then,
do nothing, but the result of doing nothing is " world"
and then, using the result of the previous action, putStrLn
In the end, there's nothing magical about IO. It's exactly as magical as a semicolon is in C.

Why monads? How does it resolve side-effects?

I am learning Haskell and trying to understand Monads. I have two questions:
From what I understand, Monad is just another typeclass that declares ways to interact with data inside "containers", including Maybe, List, and IO. It seems clever and clean to implement these 3 things with one concept, but really, the point is so there can be clean error handling in a chain of functions, containers, and side effects. Is this a correct interpretation?
How exactly is the problem of side-effects solved? With this concept of containers, the language essentially says anything inside the containers is non-deterministic (such as i/o). Because lists and IOs are both containers, lists are equivalence-classed with IO, even though values inside lists seem pretty deterministic to me. So what is deterministic and what has side-effects? I can't wrap my head around the idea that a basic value is deterministic, until you stick it in a container (which is no special than the same value with some other values next to it, e.g. Nothing) and it can now be random.
Can someone explain how, intuitively, Haskell gets away with changing state with inputs and output? I'm not seeing the magic here.

The point is so there can be clean error handling in a chain of functions, containers, and side effects. Is this a correct interpretation?
Not really. You've mentioned a lot of concepts that people cite when trying to explain monads, including side effects, error handling and non-determinism, but it sounds like you've gotten the incorrect sense that all of these concepts apply to all monads. But there's one concept you mentioned that does: chaining.
There are two different flavors of this, so I'll explain it two different ways: one without side effects, and one with side effects.
No Side Effects:
Take the following example:
addM :: (Monad m, Num a) => m a -> m a -> m a
addM ma mb = do
a <- ma
b <- mb
return (a + b)
This function adds two numbers, with the twist that they are wrapped in some monad. Which monad? Doesn't matter! In all cases, that special do syntax de-sugars to the following:
addM ma mb =
ma >>= \a ->
mb >>= \b ->
return (a + b)
... or, with operator precedence made explicit:
ma >>= (\a -> mb >>= (\b -> return (a + b)))
Now you can really see that this is a chain of little functions, all composed together, and its behavior will depend on how >>= and return are defined for each monad. If you're familiar with polymorphism in object-oriented languages, this is essentially the same thing: one common interface with multiple implementations. It's slightly more mind-bending than your average OOP interface, since the interface represents a computation policy rather than, say, an animal or a shape or something.
Okay, let's see some examples of how addM behaves across different monads. The Identity monad is a decent place to start, since its definition is trivial:
instance Monad Identity where
return a = Identity a -- create an Identity value
(Identity a) >>= f = f a -- apply f to a
So what happens when we say:
addM (Identity 1) (Identity 2)
Expanding this, step by step:
(Identity 1) >>= (\a -> (Identity 2) >>= (\b -> return (a + b)))
(\a -> (Identity 2) >>= (\b -> return (a + b)) 1
(Identity 2) >>= (\b -> return (1 + b))
(\b -> return (1 + b)) 2
return (1 + 2)
Identity 3
Great. Now, since you mentioned clean error handling, let's look at the Maybe monad. Its definition is only slightly trickier than Identity:
instance Monad Maybe where
return a = Just a -- same as Identity monad!
(Just a) >>= f = f a -- same as Identity monad again!
Nothing >>= _ = Nothing -- the only real difference from Identity
So you can imagine that if we say addM (Just 1) (Just 2) we'll get Just 3. But for grins, let's expand addM Nothing (Just 1) instead:
Nothing >>= (\a -> (Just 1) >>= (\b -> return (a + b)))
Nothing
Or the other way around, addM (Just 1) Nothing:
(Just 1) >>= (\a -> Nothing >>= (\b -> return (a + b)))
(\a -> Nothing >>= (\b -> return (a + b)) 1
Nothing >>= (\b -> return (1 + b))
Nothing
So the Maybe monad's definition of >>= was tweaked to account for failure. When a function is applied to a Maybe value using >>=, you get what you'd expect.
Okay, so you mentioned non-determinism. Yes, the list monad can be thought of as modeling non-determinism in a sense... It's a little weird, but think of the list as representing alternative possible values: [1, 2, 3] is not a collection, it's a single non-deterministic number that could be either one, two or three. That sounds dumb, but it starts to make some sense when you think about how >>= is defined for lists: it applies the given function to each possible value. So addM [1, 2] [3, 4] is actually going to compute all possible sums of those two non-deterministic values: [4, 5, 5, 6].
Okay, now to address your second question...
Side Effects:
Let's say you apply addM to two values in the IO monad, like:
addM (return 1 :: IO Int) (return 2 :: IO Int)
You don't get anything special, just 3 in the IO monad. addM does not read or write any mutable state, so it's kind of no fun. Same goes for the State or ST monads. No fun. So let's use a different function:
fireTheMissiles :: IO Int -- returns the number of casualties
Clearly the world will be different each time missiles are fired. Clearly. Now let's say you're trying to write some totally innocuous, side effect free, non-missile-firing code. Perhaps you're trying once again to add two numbers, but this time without any monads flying around:
add :: Num a => a -> a -> a
add a b = a + b
and all of a sudden your hand slips, and you accidentally typo:
add a b = a + b + fireTheMissiles
An honest mistake, really. The keys were so close together. Fortunately, because fireTheMissiles was of type IO Int rather than simply Int, the compiler is able to avert disaster.
Okay, totally contrived example, but the point is that in the case of IO, ST and friends, the type system keeps effects isolated to some specific context. It doesn't magically eliminate side effects, making code referentially transparent that shouldn't be, but it does make it clear at compile time what scope the effects are limited to.
So getting back to the original point: what does this have to do with chaining or composition of functions? Well, in this case, it's just a handy way of expressing a sequence of effects:
fireTheMissilesTwice :: IO ()
fireTheMissilesTwice = do
a <- fireTheMissiles
print a
b <- fireTheMissiles
print b
Summary:
A monad represents some policy for chaining computations. Identity's policy is pure function composition, Maybe's policy is function composition with failure propogation, IO's policy is impure function composition and so on.

Let me start by pointing at the excellent "You could have invented monads" article. It illustrates how the Monad structure can naturally manifest while you are writing programs. But the tutorial doesn't mention IO, so I will have a stab here at extending the approach.
Let us start with what you probably have already seen - the container monad. Let's say we have:
f, g :: Int -> [Int]
One way of looking at this is that it gives us a number of possible outputs for every possible input. What if we want all possible outputs for the composition of both functions? Giving all possibilities we could get by applying the functions one after the other?
Well, there's a function for that:
fg x = concatMap g $ f x
If we put this more general, we get
fg x = f x >>= g
xs >>= f = concatMap f xs
return x = [x]
Why would we want to wrap it like this? Well, writing our programs primarily using >>= and return gives us some nice properties - for example, we can be sure that it's relatively hard to "forget" solutions. We'd explicitly have to reintroduce it, say by adding another function skip. And also we now have a monad and can use all combinators from the monad library!
Now, let us jump to your trickier example. Let's say the two functions are "side-effecting". That's not non-deterministic, it just means that in theory the whole world is both their input (as it can influence them) as well as their output (as the function can influence it). So we get something like:
f, g :: Int -> RealWorld# -> (Int, RealWorld#)
If we now want f to get the world that g left behind, we'd write:
fg x rw = let (y, rw') = f x rw
(r, rw'') = g y rw'
in (r, rw'')
Or generalized:
fg x = f x >>= g
x >>= f = \rw -> let (y, rw') = x rw
(r, rw'') = f y rw'
in (r, rw'')
return x = \rw -> (x, rw)
Now if the user can only use >>=, return and a few pre-defined IO values we get a nice property again: The user will never actually see the RealWorld# getting passed around! And that is a very good thing, as you aren't really interested in the details of where getLine gets its data from. And again we get all the nice high-level functions from the monad libraries.
So the important things to take away:
The monad captures common patterns in your code, like "always pass all elements of container A to container B" or "pass this real-world-tag through". Often, once you realize that there is a monad in your program, complicated things become simply applications of the right monad combinator.
The monad allows you to completely hide the implementation from the user. It is an excellent encapsulation mechanism, be it for your own internal state or for how IO manages to squeeze non-purity into a pure program in a relatively safe way.
Appendix
In case someone is still scratching his head over RealWorld# as much as I did when I started: There's obviously more magic going on after all the monad abstraction has been removed. Then the compiler will make use of the fact that there can only ever be one "real world". That's good news and bad news:
It follows that the compiler must guarantuee execution ordering between functions (which is what we were after!)
But it also means that actually passing the real world isn't necessary as there is only one we could possibly mean: The one that is current when the function gets executed!
Bottom line is that once execution order is fixed, RealWorld# simply gets optimized out. Therefore programs using the IO monad actually have zero runtime overhead. Also note that using RealWorld# is obviously only one possible way to put IO - but it happens to be the one GHC uses internally. The good thing about monads is that, again, the user really doesn't need to know.

You could see a given monad m as a set/family (or realm, domain, etc.) of actions (think of a C statement). The monad m defines the kind of (side-)effects that its actions may have:
with [] you can define actions which can fork their executions in different "independent parallel worlds";
with Either Foo you can define actions which can fail with errors of type Foo;
with IO you can define actions which can have side-effects on the "outside world" (access files, network, launch processes, do a HTTP GET ...);
you can have a monad whose effect is "randomness" (see package MonadRandom);
you can define a monad whose actions can make a move in a game (say chess, Go…) and receive move from an opponent but are not able to write to your filesystem or anything else.
Summary
If m is a monad, m a is an action which produces a result/output of type a.
The >> and >>= operators are used to create more complex actions out of simpler ones:
a >> b is a macro-action which does action a and then action b;
a >> a does action a and then action a again;
with >>= the second action can depend on the output of the first one.
The exact meaning of what an action is and what doing an action and then another one is depends on the monad: each monad defines an imperative sublanguage with some features/effects.
Simple sequencing (>>)
Let's say with have a given monad M and some actions incrementCounter, decrementCounter, readCounter:
instance M Monad where ...
-- Modify the counter and do not produce any result:
incrementCounter :: M ()
decrementCounter :: M ()
-- Get the current value of the counter
readCounter :: M Integer
Now we would like to do something interesting with those actions. The first thing we would like to do with those actions is to sequence them. As in say C, we would like to be able to do:
// This is C:
counter++;
counter++;
We define an "sequencing operator" >>. Using this operator we can write:
incrementCounter >> incrementCounter
What is the type of "incrementCounter >> incrementCounter"?
It is an action made of two smaller actions like in C you can write composed-statements from atomic statements :
// This is a macro statement made of several statements
{
counter++;
counter++;
}
// and we can use it anywhere we may use a statement:
if (condition) {
counter++;
counter++;
}
it can have the same kind of effects as its subactions;
it does not produce any output/result.
So we would like incrementCounter >> incrementCounter to be of type M (): an (macro-)action with the same kind of possible effects but without any output.
More generally, given two actions:
action1 :: M a
action2 :: M b
we define a a >> b as the macro-action which is obtained by doing (whatever that means in our domain of action) a then b and produces as output the result of the execution of the second action. The type of >> is:
(>>) :: M a -> M b -> M b
or more generally:
(>>) :: (Monad m) => m a -> m b -> m b
We can define bigger sequence of actions from simpler ones:
action1 >> action2 >> action3 >> action4
Input and outputs (>>=)
We would like to be able to increment by something else that 1 at a time:
incrementBy 5
We want to provide some input in our actions, in order to do this we define a function incrementBy taking an Int and producing an action:
incrementBy :: Int -> M ()
Now we can write things like:
incrementCounter >> readCounter >> incrementBy 5
But we have no way to feed the output of readCounter into incrementBy. In order to do this, a slightly more powerful version of our sequencing operator is needed. The >>= operator can feed the output of a given action as input to the next action. We can write:
readCounter >>= incrementBy
It is an action which executes the readCounter action, feeds its output in the incrementBy function and then execute the resulting action.
The type of >>= is:
(>>=) :: Monad m => m a -> (a -> m b) -> m b
A (partial) example
Let's say I have a Prompt monad which can only display informations (text) to the user and ask informations to the user:
-- We don't have access to the internal structure of the Prompt monad
module Prompt (Prompt(), echo, prompt) where
-- Opaque
data Prompt a = ...
instance Monad Prompt where ...
-- Display a line to the CLI:
echo :: String -> Prompt ()
-- Ask a question to the user:
prompt :: String -> Prompt String
Let's try to define a promptBoolean message actions which asks for a question and produces a boolean value.
We use the prompt (message ++ "[y/n]") action and feed its output to a function f:
f "y" should be an action which does nothing but produce True as output;
f "n" should be an action which does nothing but produce False as output;
anything else should restart the action (do the action again);
promptBoolean would look like this:
-- Incomplete version, some bits are missing:
promptBoolean :: String -> M Boolean
promptBoolean message = prompt (message ++ "[y/n]") >>= f
where f result = if result == "y"
then ???? -- We need here an action which does nothing but produce `True` as output
else if result=="n"
then ???? -- We need here an action which does nothing but produce `False` as output
else echo "Input not recognised, try again." >> promptBoolean
Producing a value without effect (return)
In order to fill the missing bits in our promptBoolean function, we need a way to represent dummy actions without any side effect but which only outputs a given value:
-- "return 5" is an action which does nothing but outputs 5
return :: (Monad m) => a -> m a
and we can now write out promptBoolean function:
promptBoolean :: String -> Prompt Boolean
promptBoolean message :: prompt (message ++ "[y/n]") >>= f
where f result = if result=="y"
then return True
else if result=="n"
then return False
else echo "Input not recognised, try again." >> promptBoolean message
By composing those two simple actions (promptBoolean, echo) we can define any kind of dialogue between the user and your program (the actions of the program are deterministic as our monad does not have a "randomness effect").
promptInt :: String -> M Int
promptInt = ... -- similar
-- Classic "guess a number game/dialogue"
guess :: Int -> m()
guess n = promptInt "Guess:" m -> f
where f m = if m == n
then echo "Found"
else (if m > n
then echo "Too big"
then echo "Too small") >> guess n
The operations of a monad
A Monad is a set of actions which can be composed with the return and >>= operators:
>>= for action composition;
return for producing a value without any (side-)effect.
These two operators are the minimal operators needed to define a Monad.
In Haskell, the >> operator is needed as well but it can in fact be derived from >>=:
(>>): Monad m => m a -> m b -> m b
a >> b = a >>= f
where f x = b
In Haskell, an extra fail operator is need as well but this is really a hack (and it might be removed from Monad in the future).
This is the Haskell definition of a Monad:
class Monad m where
return :: m a
(>>=) :: m a -> (a -> m b) -> m b
(>>) :: m a -> m b -> m b -- can be derived from (>>=)
fail :: String -> m a -- mostly a hack
Actions are first-class
One great thing about monads is that actions are first-class. You can take them in a variable, you can define function which take actions as input and produce some other actions as output. For example, we can define a while operator:
-- while x y : does action y while action x output True
while :: (Monad m) => m Boolean -> m a -> m ()
while x y = x >>= f
where f True = y >> while x y
f False = return ()
Summary
A Monad is a set of actions in some domain. The monad/domain define the kind of "effects" which are possible. The >> and >>= operators represent sequencing of actions and monadic expression may be used to represent any kind of "imperative (sub)program" in your (functional) Haskell program.
The great things are that:
you can design your own Monad which supports the features and effects that you want
see Prompt for an example of a "dialogue only subprogram",
see Rand for an example of "sampling only subprogram";
you can write your own control structures (while, throw, catch or more exotic ones) as functions taking actions and composing them in some way to produce a bigger macro-actions.
MonadRandom
A good way of understanding monads, is the MonadRandom package. The Rand monad is made of actions whose output can be random (the effect is randomness). An action in this monad is some kind of random variable (or more exactly a sampling process):
-- Sample an Int from some distribution
action :: Rand Int
Using Rand to do some sampling/random algorithms is quite interesting because you have random variables as first class values:
-- Estimate mean by sampling nsamples times the random variable x
sampleMean :: Real a => Int -> m a -> m a
sampleMean n x = ...
In this setting, the sequence function from Prelude,
sequence :: Monad m => [m a] -> m [a]
becomes
sequence :: [Rand a] -> Rand [a]
It creates a random variable obtained by sampling independently from a list of random variables.

There are three main observations concerning the IO monad:
1) You can't get values out of it. Other types like Maybe might allow to extract values, but neither the monad class interface itself nor the IO data type allow it.
2) "Inside" IO is not only the real value but also that "RealWorld" thing. This dummy value is used to enforce the chaining of actions by the type system: If you have two independent calculations, the use of >>= makes the second calculation dependent on the first.
3) Assume a non-deterministic thing like random :: () -> Int, which isn't allowed in Haskell. If you change the signature to random :: Blubb -> (Blubb, Int), it is allowed, if you make sure that nobody ever can use a Blubb twice: Because in that case all inputs are "different", it is no problem that the outputs are different as well.
Now we can use the fact 1): Nobody can get something out of IO, so we can use the RealWord dummy hidden in IO to serve as a Blubb. There is only one IOin the whole application (the one we get from main), and it takes care of proper sequentiation, as we have seen in 2). Problem solved.

One thing that often helps me to understand the nature of something is to examine it in the most trivial way possible. That way, I'm not getting distracted by potentially unrelated concepts. With that in mind, I think it may be helpful to understand the nature of the Identity Monad, as it's the most trivial implementation of a Monad possible (I think).
What is interesting about the Identity Monad? I think it is that it allows me to express the idea of evaluating expressions in a context defined by other expressions. And to me, that is the essence of every Monad I've encountered (so far).
If you already had a lot of exposure to 'mainstream' programming languages before learning Haskell (like I did), then this doesn't seem very interesting at all. After all, in a mainstream programming language, statements are executed in sequence, one after the other (excepting control-flow constructs, of course). And naturally, we can assume that every statement is evaluated in the context of all previously executed statements and that those previously executed statements may alter the environment and the behavior of the currently executing statement.
All of that is pretty much a foreign concept in a functional, lazy language like Haskell. The order in which computations are evaluated in Haskell is well-defined, but sometimes hard to predict, and even harder to control. And for many kinds of problems, that's just fine. But other sorts of problems (e.g. IO) are hard to solve without some convenient way to establish an implicit order and context between the computations in your program.
As far as side-effects go, specifically, often they can be transformed (via a Monad) in to simple state-passing, which is perfectly legal in a pure functional language. Some Monads don't seem to be of that nature, however. Monads such as the IO Monad or the ST monad literally perform side-effecting actions. There are many ways to think about this, but one way that I think about it is that just because my computations must exist in a world without side-effects, the Monad may not. As such, the Monad is free to establish a context for my computation to execute that is based on side-effects defined by other computations.
Finally, I must disclaim that I am definitely not a Haskell expert. As such, please understand that everything I've said is pretty much my own thoughts on this subject and I may very well disown them later when I understand Monads more fully.

the point is so there can be clean error handling in a chain of functions, containers, and side effects
More or less.
how exactly is the problem of side-effects solved?
A value in the I/O monad, i.e. one of type IO a, should be interpreted as a program. p >> q on IO values can then be interpreted as the operator that combines two programs into one that first executes p, then q. The other monad operators have similar interpretations. By assigning a program to the name main, you declare to the compiler that that is the program that has to be executed by its output object code.
As for the list monad, it's not really related to the I/O monad except in a very abstract mathematical sense. The IO monad gives deterministic computation with side effects, while the list monad gives non-deterministic (but not random!) backtracking search, somewhat similar to Prolog's modus operandi.

With this concept of containers, the language essentially says anything inside the containers is non-deterministic
No. Haskell is deterministic. If you ask for integer addition 2+2 you will always get 4.
"Nondeterministic" is only a metaphor, a way of thinking. Everything is deterministic under the hood. If you have this code:
do x <- [4,5]
y <- [0,1]
return (x+y)
it is roughly equivalent to Python code
l = []
for x in [4,5]:
for y in [0,1]:
l.append(x+y)
You see nondeterminism here? No, it's deterministic construction of a list. Run it twice, you'll get the same numbers in the same order.
You can describe it this way: Choose arbitrary x from [4,5]. Choose arbitrary y from [0,1]. Return x+y. Collect all possible results.
That way seems to involve nondeterminism, but it's only a nested loop (list comprehension). There is no "real" nondeterminism here, it's simulated by checking all possibilities. Nondeterminism is an illusion. The code only appears to be nondeterministic.
This code using State monad:
do put 0
x <- get
put (x+2)
y <- get
return (y+3)
gives 5 and seems to involve changing state. As with lists it's an illusion. There are no "variables" that change (as in imperative languages). Everything is nonmutable under the hood.
You can describe the code this way: put 0 to a variable. Read the value of a variable to x. Put (x+2) to the variable. Read the variable to y, and return y+3.
That way seems to involve state, but it's only composing functions passing additional parameter. There is no "real" mutability here, it's simulated by composition. Mutability is an illusion. The code only appears to be using it.
Haskell does it this way: you've got functions
a -> s -> (b,s)
This function takes and old value of state and returns new value. It does not involve mutability or change variables. It's a function in mathematical sense.
For example the function "put" takes new value of state, ignores current state and returns new state:
put x _ = ((), x)
Just like you can compose two normal functions
a -> b
b -> c
into
a -> c
using (.) operator you can compose "state" transformers
a -> s -> (b,s)
b -> s -> (c,s)
into a single function
a -> s -> (c,s)
Try writing the composition operator yourself. This is what really happens, there are no "side effects" only passing arguments to functions.

From what I understand, Monad is just another typeclass that declares ways to interact with data [...]
...providing an interface common to all those types which have an instance. This can then be used to provide generic definitions which work across all monadic types.
It seems clever and clean to implement these 3 things with one concept [...]
...the only three things that are implemented are the instances for those three types (list, Maybe and IO) - the types themselves are defined independently elsewhere.
[...] but really, the point is so there can be clean error handling in a chain of functions, containers, and side effects.
Not just error handling e.g. consider ST - without the monadic interface, you would have to pass the encapsulated-state directly and correctly...a tiresome task.
How exactly is the problem of side-effects solved?
Short answer: Haskell solves manages them by using types to indicate their presence.
Can someone explain how, intuitively, Haskell gets away with changing state with inputs and output?
"Intuitively"...like what's available over here? Let's try a simple direct comparison instead:
From How to Declare an Imperative by Philip Wadler:
(* page 26 *)
type 'a io = unit -> 'a
infix >>=
val >>= : 'a io * ('a -> 'b io) -> 'b io
fun m >>= k = fn () => let
val x = m ()
val y = k x ()
in
y
end
val return : 'a -> 'a io
fun return x = fn () => x
val putc : char -> unit io
fun putc c = fn () => putcML c
val getc : char io
val getc = fn () => getcML ()
fun getcML () =
valOf(TextIO.input1(TextIO.stdIn))
(* page 25 *)
fun putcML c =
TextIO.output1(TextIO.stdOut,c)
Based on these two answers of mine, this is my Haskell translation:
type IO a = OI -> a
(>>=) :: IO a -> (a -> IO b) -> IO b
m >>= k = \ u -> let !(u1, u2) = part u in
let !x = m u1 in
let !y = k x u2 in
y
return :: a -> IO a
return x = \ u -> let !_ = part u in x
putc :: Char -> IO ()
putc c = \ u -> putcOI c u
getc :: IO Char
getc = \ u -> getcOI u
-- primitives
data OI
partOI :: OI -> (OI, OI)
putcOI :: Char -> OI -> ()
getcOI :: OI -> Char
Now remember that short answer about side-effects?
Haskell manages them by using types to indicate their presence.
Data.Char.chr :: Int -> Char -- no side effects
getChar :: IO Char -- side effects at
{- :: OI -> Char -} -- work: beware!

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string