Haskell: Why do the Maybe and Either types behave differently when used as Monads?

Haskell: Why do the Maybe and Either types behave differently when used as Monads? - haskell

I'm trying to get my head around error handling in Haskell. I've found the article "8 ways to report errors in Haskell" but I'm confused as to why Maybe and Either behave differently.
For example:
import Control.Monad.Error
myDiv :: (Monad m) => Float -> Float -> m Float
myDiv x 0 = fail "My divison by zero"
myDiv x y = return (x / y)
testMyDiv1 :: Float -> Float -> String
testMyDiv1 x y =
case myDiv x y of
Left e -> e
Right r -> show r
testMyDiv2 :: Float -> Float -> String
testMyDiv2 x y =
case myDiv x y of
Nothing -> "An error"
Just r -> show r
Calling testMyDiv2 1 0 gives a result of "An error", but calling testMyDiv1 1 0 gives:
"*** Exception: My divison by zero
(Note the lack of closing quote, indicating this isn't a string but an exception).
What gives?

The short answer is that the Monad class in Haskell adds the fail operation to the original mathematical idea of monads, which makes it somewhat controversial how to make the Either type into a (Haskell) Monad, because there are many ways to do it.
There are several implementations floating around that do different things. The 3 basic approaches that I'm aware of are:
fail = Left. This seems to be what most people expect, but it actually can't be done in strict Haskell 98. The instance would have to be declared as instance Monad (Either String), which is not legal under H98 because it mentions a particular type for one of Eithers parameters (in GHC, the FlexibleInstances extension would cause the compiler to accept it).
Ignore fail, using the default implementation which just calls error. This is what's happening in your example. This version has the advantage of being H98 compliant, but the disadvantage of being rather surprising to the user (with the surprise coming at runtime).
The fail implementation calls some other class to convert a String into whatever type. This is done in MTL's Control.Monad.Error module, which declares instance Error e => Monad (Either e). In this implementation, fail msg = Left (strMsg msg). This one is again legal H98, and again occasionally surprising to users because it introduces another type class. In contrast to the last example though, the surprise comes at compile time.

I'm guessing you're using monads-fd.
$ ghci t.hs -hide-package mtl
*Main Data.List> testMyDiv1 1 0
"*** Exception: My divison by zero
*Main Data.List> :i Either
...
instance Monad (Either e) -- Defined in Control.Monad.Trans.Error
...
Looking in the transformers package, which is where monads-fd gets the instance, we see:
instance Monad (Either e) where
return = Right
Left l >>= _ = Left l
Right r >>= k = k r
So, no definition for Fail what-so-ever. In general, fail is discouraged as it isn't always guaranteed to fail cleanly in a monad (many people would like to see fail removed from the Monad class).
EDIT: I should add that it certainly isn't clear fail was intentioned to be left as the default error call. A ping to haskell-cafe or the maintainer might be worth while.
EDIT2: The mtl instance has been moved to base, this move includes removing the definition of fail = Left and discussion as to why that decision was made. Presumably, they want people to use ErrorT more when monads fail, thus reserving fail for something more catastrophic situations like bad pattern matches (ex: Just x <- e where e -->* m Nothing).

Related

Pattern match over non-constructor functions

One of the most powerful ways pattern matching and lazy evaluation can come together is to bypass expensive computation. However I am still shocked that Haskell only permits the pattern matching of constructors, which is barely pattern matching at all!
Is there some way to impliment the following functionality in Haskell:
exp :: Double -> Double
exp 0 = 1
exp (log a) = a
--...
log :: Double -> Double
log 1 = 0
log (exp a) = a
--...
The original problem I found this useful in was writing an associativity preference / rule in a Monoid class:
class Monoid m where
iden :: m
(+) m -> m -> m
(+) iden a = a
(+) a iden = a
--Line with issue
(+) ((+) a b) c = (+) a ((+) b c)

There's no reason to be shocked about this. How would it be even remotely feasible to pattern match on arbitrary functions? Most functions aren't invertible, and even for those that are it is typically nontrivial to actually compute the inverses.
Of course the compiler could in principle handle trivial examples like replacing literal exp (log x) with x, but that would be almost completely useless in practice (in the unlikely event somebody were to literally write that, they could as well reduce it right there in the source), and would generally lead to very weird unpredictable behaviour if inlining order changes whether or not the compiler can see that a match applies.
(There is however a thing called rewrite rules, which is similar to what you proposed but is seen as only an optimisation tool.)
Even the two lines from the Monoid class that don't error don't make sense, but for different reasons. First, when you write
(+) iden a = a
(+) a iden = a
this doesn't do what you seem to think. These are actually two redundant catch-call clauses, equivalent to
(+) x y = y
(+) x y = x
...which is an utterly nonsensical thing to write. What you want to state could in fact be written as
default (+) :: Eq a => a -> a -> a
x+y
| x==iden = y
| y==iden = x
| otherwise = ...
...but this still doesn't accomplish anything useful, because this is never going to be a full definition of +. And as soon as a concrete instance even begins to define its own + operator, the complete default one is going to be ignored.
Moreover, if you were to have these kind of clauses all over your Haskell project it would in practice just mean your performing a lot of unnecessary, redundant extra checks. A law-abiding Monoid instance needs to fulfill mempty <> a ≡ a anyway, no point explicitly special-casing it.
I think what you really want is tests. It would make sense to specify laws right in a class declaration in a way that they could automatically be checked, but standard Haskell has no syntax for this. Most projects just do it in a separate test suite, using QuickCheck to generate example inputs. I think there's also a tool that allow you to put the test cases right in your source file, but I forgot what it's called.

How to preserve information when failing?

I'm writing some code that uses the StateT monad transformer to keep track of some stateful information (logging and more).
The monad I'm passing to StateT is very simple:
data CheckerError a = Bad {errorMessage :: Log} | Good a
deriving (Eq, Show)
instance Monad CheckerError where
return x = Good x
fail msg = Bad msg
(Bad msg) >>= f = Bad msg
(Good x) >>= f = f x
type CheckerMonad a = StateT CheckerState CheckerError a
It's just a Left and Right variant.
What troubles me is the definition of fail. In my computation I produce a lot of information inside this monad and I'd like to keep this information even when failing.
Currently the only thing I can do is to convert everything to a String and create a Bad instance with the String passed as argument to fail.
What I'd like to do is something like:
fail msg = do
info <- getInfoOutOfTheComputation
return $ Bad info
However everything I tried until now gives type errors, probably because this would mix different monads.
Is there anyway in which I can implement fail in order to preserve the information I need without having to convert all of it into a String?
I cannot believe that the best Haskell can achieve is using show+read to pass all the information as the string to fail.

Your CheckerError monad is very similar to the Either monad. I will use the Either monad (and its monad transformer counterpart ErrorT) in my answer.
There is a subtlety with monad trasformers: order matters. Effects in the "inner" monad have primacy over effects caused by the "outer" layers. Consider these two alternative definitions of CheckerMonad:
import Control.Monad.State
import Control.Monad.Error
type CheckerState = Int -- dummy definitions for convenience
type CheckerError = String
type CheckerMonad a = StateT CheckerState (Either String) a
type CheckerMonad' a = ErrorT String (State CheckerState) a
In CheckerMonad, Either is the inner monad, and this means a failure will wipe the whole state. Notice the type of this run function:
runCM :: CheckerMonad a -> CheckerState -> Either CheckerError (a,CheckerState)
runCM m s = runStateT m s
You either fail, or return a result along with the state up to that point.
In CheckerMonad', State is the inner monad. This means the state will be preserved even in case of failures:
runCM' :: CheckerMonad' a -> CheckerState -> (Either CheckerError a,CheckerState)
runCM' m s = runState (runErrorT m) s
A pair is returned, which contains the state up to that point, and either a failure or a result.
It takes a bit of practice to develop an intuition of how to properly order monad transformers. The chart in the Type juggling section of this Wikibook page is a good starting point.
Also, it is better to avoid using fail directly, because it is considered a bit of a wart in the language. Instead, use the specialized functions for throwing errors provided by the error transformer. When working with ErrorT or some other instance of MonadError, use throwError.
sillycomp :: CheckerMonad' Bool
sillycomp = do
modify (+1)
s <- get
if s == 3
then throwError "boo"
else return True
*Main> runCM' sillycomp 2
Loading package transformers-0.3.0.0 ... linking ... done.
Loading package mtl-2.1.2 ... linking ... done.
(Left "boo",3)
*Main> runCM' sillycomp 3
(Right True,4)
ErrorT is sometimes annoying to use because, unlike Either, it requires an Error constraint on the error type. The Error typeclass forces you to define two error constructors noMsg and strMsg, which may or may not make sense for your type.
You can use EitherT from the either package instead, which lets you use any type whatsoever as the error. When working with EitherT, use the left function to throw errors.

Can GADTs be used to prove type inequalities in GHC?

So, in my ongoing attempts to half-understand Curry-Howard through small Haskell exercises, I've gotten stuck at this point:
{-# LANGUAGE GADTs #-}
import Data.Void
type Not a = a -> Void
-- | The type of type equality proofs, which can only be instantiated if a = b.
data Equal a b where
Refl :: Equal a a
-- | Derive a contradiction from a putative proof of #Equal Int Char#.
intIsNotChar :: Not (Equal Int Char)
intIsNotChar intIsChar = ???
Clearly the type Equal Int Char has no (non-bottom) inhabitants, and thus semantically there ought to be an absurdEquality :: Equal Int Char -> a function... but for the life of me I can't figure out any way to write one other than using undefined.
So either:
I'm missing something, or
There is some limitation of the language that makes this an impossible task, and I haven't managed to understand what it is.
I suspect the answer is something like this: the compiler is unable to exploit the fact that there are no Equal constructors that don't have a = b. But if that is so, what makes it true?

Here's a shorter version of Philip JF's solution, which is the way dependent type theorists have been refuting equations for years.
type family Discriminate x
type instance Discriminate Int = ()
type instance Discriminate Char = Void
transport :: Equal a b -> Discriminate a -> Discriminate b
transport Refl d = d
refute :: Equal Int Char -> Void
refute q = transport q ()
In order to show that things are different, you have to catch them behaving differently by providing a computational context which results in distinct observations. Discriminate provides exactly such a context: a type-level program which treats the two types differently.
It is not necessary to resort to undefined to solve this problem. Total programming sometimes involves rejecting impossible inputs. Even where undefined is available, I would recommend not using it where a total method suffices: the total method explains why something is impossible and the typechecker confirms; undefined merely documents your promise. Indeed, this method of refutation is how Epigram dispenses with "impossible cases" whilst ensuring that a case analysis covers its domain.
As for computational behaviour, note that refute, via transport is necessarily strict in q and that q cannot compute to head normal form in the empty context, simply because no such head normal form exists (and because computation preserves type, of course). In a total setting, we'd be sure that refute would never be invoked at run time. In Haskell, we're at least certain that its argument will diverge or throw an exception before we're obliged to respond to it. A lazy version, such as
absurdEquality e = error "you have a type error likely to cause big problems"
will ignore the toxicity of e and tell you that you have a type error when you don't. I prefer
absurdEquality e = e `seq` error "sue me if this happens"
if the honest refutation is too much like hard work.

I don't understand the problem with using undefined every type is inhabited by bottom in Haskell. Our language is not strongly normalizing... You are looking for the wrong thing. Equal Int Char leads to type errors not nice well kept exceptions. See
{-# LANGUAGE GADTs, TypeFamilies #-}
data Equal a b where
Refl :: Equal a a
type family Pick cond a b
type instance Pick Char a b = a
type instance Pick Int a b = b
newtype Picker cond a b = Picker (Pick cond a b)
pick :: b -> Picker Int a b
pick = Picker
unpick :: Picker Char a b -> a
unpick (Picker x) = x
samePicker :: Equal t1 t2 -> Picker t1 a b -> Picker t2 a b
samePicker Refl x = x
absurdCoerce :: Equal Int Char -> a -> b
absurdCoerce e x = unpick (samePicker e (pick x))
you could use this to create the function you want
absurdEquality e = absurdCoerce e ()
but that will produce undefined behavior as its computation rule. false should cause programs to abort, or at the very least run for ever. Aborting is the computation rule that is akin to turning minimal logic into intiutionistic logic by adding not. The correct definition is
absurdEquality e = error "you have a type error likely to cause big problems"
as to the question in the title: essentially no. To the best of my knowledge, type inequality is not representable in a practical way in current Haskell. Coming changes to the type system may lead to this getting nicer, but as of right now, we have equalities but not inequalites.

Are newtypes faster than enumerations?

According to this article,
Enumerations don't count as single-constructor types as far as GHC is concerned, so they don't benefit from unpacking when used as strict constructor fields, or strict function arguments. This is a deficiency in GHC, but it can be worked around.
And instead the use of newtypes is recommended. However, I cannot verify this with the following code:
{-# LANGUAGE MagicHash,BangPatterns #-}
{-# OPTIONS_GHC -O2 -funbox-strict-fields -rtsopts -fllvm -optlc --x86-asm-syntax=intel #-}
module Main(main,f,g)
where
import GHC.Base
import Criterion.Main
data D = A | B | C
newtype E = E Int deriving(Eq)
f :: D -> Int#
f z | z `seq` False = 3422#
f z = case z of
A -> 1234#
B -> 5678#
C -> 9012#
g :: E -> Int#
g z | z `seq` False = 7432#
g z = case z of
(E 0) -> 2345#
(E 1) -> 6789#
(E 2) -> 3535#
f' x = I# (f x)
g' x = I# (g x)
main :: IO ()
main = defaultMain [ bench "f" (whnf f' A)
, bench "g" (whnf g' (E 0))
]
Looking at the assembly, the tags for each constructor of the enumeration D is actually unpacked and directly hard-coded in the instruction. Furthermore, the function f lacks error-handling code, and more than 10% faster than g. In a more realistic case I have also experienced a slowdown after converting a enumeration to a newtype. Can anyone give me some insight about this? Thanks.

It depends on the use case. For the functions you have, it's expected that the enumeration performs better. Basically, the three constructors of D become Ints resp. Int#s when the strictness analysis allows that, and GHC knows it's statically checked that the argument can only have one of the three values 0#, 1#, 2#, so it needs not insert error handling code for f. For E, the static guarantee of only one of three values being possible isn't given, so it needs to add error handling code for g, that slows things down significantly. If you change the definition of g so that the last case becomes
E _ -> 3535#
the difference vanishes completely or almost completely (I get a 1% - 2% better benchmark for f still, but I haven't done enough testing to be sure whether that's a real difference or an artifact of benchmarking).
But this is not the use case the wiki page is talking about. What it's talking about is unpacking the constructors into other constructors when the type is a component of other data, e.g.
data FooD = FD !D !D !D
data FooE = FE !E !E !E
Then, if compiled with -funbox-strict-fields, the three Int#s can be unpacked into the constructor of FooE, so you'd basically get the equivalent of
struct FooE {
long x, y, z;
};
while the fields of FooD have the multi-constructor type D and cannot be unpacked into the constructor FD(1), so that would basically give you
struct FooD {
long *px, *py, *pz;
}
That can obviously have significant impact.
I'm not sure about the case of single-constructor function arguments. That has obvious advantages for types with contained data, like tuples, but I don't see how that would apply to plain enumerations, where you just have a case and splitting off a worker and a wrapper makes no sense (to me).
Anyway, the worker/wrapper transformation isn't so much a single-constructor thing, constructor specialisation can give the same benefit to types with few constructors. (For how many constructors specialisations would be created depends on the value of -fspec-constr-count.)
(1) That might have changed, but I doubt it. I haven't checked it though, so it's possible the page is out of date.

I would guess that GHC has changed quite a bit since that page was last updated in 2008. Also, you're using the LLVM backend, so that's likely to have some effect on performance as well. GHC can (and will, since you've used -O2) strip any error handling code from f, because it knows statically that f is total. The same cannot be said for g. I would guess that it's the LLVM backend that then unpacks the constructor tags in f, because it can easily see that there is nothing else used by the branching condition. I'm not sure of that, though.

Why monads? How does it resolve side-effects?

I am learning Haskell and trying to understand Monads. I have two questions:
From what I understand, Monad is just another typeclass that declares ways to interact with data inside "containers", including Maybe, List, and IO. It seems clever and clean to implement these 3 things with one concept, but really, the point is so there can be clean error handling in a chain of functions, containers, and side effects. Is this a correct interpretation?
How exactly is the problem of side-effects solved? With this concept of containers, the language essentially says anything inside the containers is non-deterministic (such as i/o). Because lists and IOs are both containers, lists are equivalence-classed with IO, even though values inside lists seem pretty deterministic to me. So what is deterministic and what has side-effects? I can't wrap my head around the idea that a basic value is deterministic, until you stick it in a container (which is no special than the same value with some other values next to it, e.g. Nothing) and it can now be random.
Can someone explain how, intuitively, Haskell gets away with changing state with inputs and output? I'm not seeing the magic here.

The point is so there can be clean error handling in a chain of functions, containers, and side effects. Is this a correct interpretation?
Not really. You've mentioned a lot of concepts that people cite when trying to explain monads, including side effects, error handling and non-determinism, but it sounds like you've gotten the incorrect sense that all of these concepts apply to all monads. But there's one concept you mentioned that does: chaining.
There are two different flavors of this, so I'll explain it two different ways: one without side effects, and one with side effects.
No Side Effects:
Take the following example:
addM :: (Monad m, Num a) => m a -> m a -> m a
addM ma mb = do
a <- ma
b <- mb
return (a + b)
This function adds two numbers, with the twist that they are wrapped in some monad. Which monad? Doesn't matter! In all cases, that special do syntax de-sugars to the following:
addM ma mb =
ma >>= \a ->
mb >>= \b ->
return (a + b)
... or, with operator precedence made explicit:
ma >>= (\a -> mb >>= (\b -> return (a + b)))
Now you can really see that this is a chain of little functions, all composed together, and its behavior will depend on how >>= and return are defined for each monad. If you're familiar with polymorphism in object-oriented languages, this is essentially the same thing: one common interface with multiple implementations. It's slightly more mind-bending than your average OOP interface, since the interface represents a computation policy rather than, say, an animal or a shape or something.
Okay, let's see some examples of how addM behaves across different monads. The Identity monad is a decent place to start, since its definition is trivial:
instance Monad Identity where
return a = Identity a -- create an Identity value
(Identity a) >>= f = f a -- apply f to a
So what happens when we say:
addM (Identity 1) (Identity 2)
Expanding this, step by step:
(Identity 1) >>= (\a -> (Identity 2) >>= (\b -> return (a + b)))
(\a -> (Identity 2) >>= (\b -> return (a + b)) 1
(Identity 2) >>= (\b -> return (1 + b))
(\b -> return (1 + b)) 2
return (1 + 2)
Identity 3
Great. Now, since you mentioned clean error handling, let's look at the Maybe monad. Its definition is only slightly trickier than Identity:
instance Monad Maybe where
return a = Just a -- same as Identity monad!
(Just a) >>= f = f a -- same as Identity monad again!
Nothing >>= _ = Nothing -- the only real difference from Identity
So you can imagine that if we say addM (Just 1) (Just 2) we'll get Just 3. But for grins, let's expand addM Nothing (Just 1) instead:
Nothing >>= (\a -> (Just 1) >>= (\b -> return (a + b)))
Nothing
Or the other way around, addM (Just 1) Nothing:
(Just 1) >>= (\a -> Nothing >>= (\b -> return (a + b)))
(\a -> Nothing >>= (\b -> return (a + b)) 1
Nothing >>= (\b -> return (1 + b))
Nothing
So the Maybe monad's definition of >>= was tweaked to account for failure. When a function is applied to a Maybe value using >>=, you get what you'd expect.
Okay, so you mentioned non-determinism. Yes, the list monad can be thought of as modeling non-determinism in a sense... It's a little weird, but think of the list as representing alternative possible values: [1, 2, 3] is not a collection, it's a single non-deterministic number that could be either one, two or three. That sounds dumb, but it starts to make some sense when you think about how >>= is defined for lists: it applies the given function to each possible value. So addM [1, 2] [3, 4] is actually going to compute all possible sums of those two non-deterministic values: [4, 5, 5, 6].
Okay, now to address your second question...
Side Effects:
Let's say you apply addM to two values in the IO monad, like:
addM (return 1 :: IO Int) (return 2 :: IO Int)
You don't get anything special, just 3 in the IO monad. addM does not read or write any mutable state, so it's kind of no fun. Same goes for the State or ST monads. No fun. So let's use a different function:
fireTheMissiles :: IO Int -- returns the number of casualties
Clearly the world will be different each time missiles are fired. Clearly. Now let's say you're trying to write some totally innocuous, side effect free, non-missile-firing code. Perhaps you're trying once again to add two numbers, but this time without any monads flying around:
add :: Num a => a -> a -> a
add a b = a + b
and all of a sudden your hand slips, and you accidentally typo:
add a b = a + b + fireTheMissiles
An honest mistake, really. The keys were so close together. Fortunately, because fireTheMissiles was of type IO Int rather than simply Int, the compiler is able to avert disaster.
Okay, totally contrived example, but the point is that in the case of IO, ST and friends, the type system keeps effects isolated to some specific context. It doesn't magically eliminate side effects, making code referentially transparent that shouldn't be, but it does make it clear at compile time what scope the effects are limited to.
So getting back to the original point: what does this have to do with chaining or composition of functions? Well, in this case, it's just a handy way of expressing a sequence of effects:
fireTheMissilesTwice :: IO ()
fireTheMissilesTwice = do
a <- fireTheMissiles
print a
b <- fireTheMissiles
print b
Summary:
A monad represents some policy for chaining computations. Identity's policy is pure function composition, Maybe's policy is function composition with failure propogation, IO's policy is impure function composition and so on.

Let me start by pointing at the excellent "You could have invented monads" article. It illustrates how the Monad structure can naturally manifest while you are writing programs. But the tutorial doesn't mention IO, so I will have a stab here at extending the approach.
Let us start with what you probably have already seen - the container monad. Let's say we have:
f, g :: Int -> [Int]
One way of looking at this is that it gives us a number of possible outputs for every possible input. What if we want all possible outputs for the composition of both functions? Giving all possibilities we could get by applying the functions one after the other?
Well, there's a function for that:
fg x = concatMap g $ f x
If we put this more general, we get
fg x = f x >>= g
xs >>= f = concatMap f xs
return x = [x]
Why would we want to wrap it like this? Well, writing our programs primarily using >>= and return gives us some nice properties - for example, we can be sure that it's relatively hard to "forget" solutions. We'd explicitly have to reintroduce it, say by adding another function skip. And also we now have a monad and can use all combinators from the monad library!
Now, let us jump to your trickier example. Let's say the two functions are "side-effecting". That's not non-deterministic, it just means that in theory the whole world is both their input (as it can influence them) as well as their output (as the function can influence it). So we get something like:
f, g :: Int -> RealWorld# -> (Int, RealWorld#)
If we now want f to get the world that g left behind, we'd write:
fg x rw = let (y, rw') = f x rw
(r, rw'') = g y rw'
in (r, rw'')
Or generalized:
fg x = f x >>= g
x >>= f = \rw -> let (y, rw') = x rw
(r, rw'') = f y rw'
in (r, rw'')
return x = \rw -> (x, rw)
Now if the user can only use >>=, return and a few pre-defined IO values we get a nice property again: The user will never actually see the RealWorld# getting passed around! And that is a very good thing, as you aren't really interested in the details of where getLine gets its data from. And again we get all the nice high-level functions from the monad libraries.
So the important things to take away:
The monad captures common patterns in your code, like "always pass all elements of container A to container B" or "pass this real-world-tag through". Often, once you realize that there is a monad in your program, complicated things become simply applications of the right monad combinator.
The monad allows you to completely hide the implementation from the user. It is an excellent encapsulation mechanism, be it for your own internal state or for how IO manages to squeeze non-purity into a pure program in a relatively safe way.
Appendix
In case someone is still scratching his head over RealWorld# as much as I did when I started: There's obviously more magic going on after all the monad abstraction has been removed. Then the compiler will make use of the fact that there can only ever be one "real world". That's good news and bad news:
It follows that the compiler must guarantuee execution ordering between functions (which is what we were after!)
But it also means that actually passing the real world isn't necessary as there is only one we could possibly mean: The one that is current when the function gets executed!
Bottom line is that once execution order is fixed, RealWorld# simply gets optimized out. Therefore programs using the IO monad actually have zero runtime overhead. Also note that using RealWorld# is obviously only one possible way to put IO - but it happens to be the one GHC uses internally. The good thing about monads is that, again, the user really doesn't need to know.

You could see a given monad m as a set/family (or realm, domain, etc.) of actions (think of a C statement). The monad m defines the kind of (side-)effects that its actions may have:
with [] you can define actions which can fork their executions in different "independent parallel worlds";
with Either Foo you can define actions which can fail with errors of type Foo;
with IO you can define actions which can have side-effects on the "outside world" (access files, network, launch processes, do a HTTP GET ...);
you can have a monad whose effect is "randomness" (see package MonadRandom);
you can define a monad whose actions can make a move in a game (say chess, Go…) and receive move from an opponent but are not able to write to your filesystem or anything else.
Summary
If m is a monad, m a is an action which produces a result/output of type a.
The >> and >>= operators are used to create more complex actions out of simpler ones:
a >> b is a macro-action which does action a and then action b;
a >> a does action a and then action a again;
with >>= the second action can depend on the output of the first one.
The exact meaning of what an action is and what doing an action and then another one is depends on the monad: each monad defines an imperative sublanguage with some features/effects.
Simple sequencing (>>)
Let's say with have a given monad M and some actions incrementCounter, decrementCounter, readCounter:
instance M Monad where ...
-- Modify the counter and do not produce any result:
incrementCounter :: M ()
decrementCounter :: M ()
-- Get the current value of the counter
readCounter :: M Integer
Now we would like to do something interesting with those actions. The first thing we would like to do with those actions is to sequence them. As in say C, we would like to be able to do:
// This is C:
counter++;
counter++;
We define an "sequencing operator" >>. Using this operator we can write:
incrementCounter >> incrementCounter
What is the type of "incrementCounter >> incrementCounter"?
It is an action made of two smaller actions like in C you can write composed-statements from atomic statements :
// This is a macro statement made of several statements
{
counter++;
counter++;
}
// and we can use it anywhere we may use a statement:
if (condition) {
counter++;
counter++;
}
it can have the same kind of effects as its subactions;
it does not produce any output/result.
So we would like incrementCounter >> incrementCounter to be of type M (): an (macro-)action with the same kind of possible effects but without any output.
More generally, given two actions:
action1 :: M a
action2 :: M b
we define a a >> b as the macro-action which is obtained by doing (whatever that means in our domain of action) a then b and produces as output the result of the execution of the second action. The type of >> is:
(>>) :: M a -> M b -> M b
or more generally:
(>>) :: (Monad m) => m a -> m b -> m b
We can define bigger sequence of actions from simpler ones:
action1 >> action2 >> action3 >> action4
Input and outputs (>>=)
We would like to be able to increment by something else that 1 at a time:
incrementBy 5
We want to provide some input in our actions, in order to do this we define a function incrementBy taking an Int and producing an action:
incrementBy :: Int -> M ()
Now we can write things like:
incrementCounter >> readCounter >> incrementBy 5
But we have no way to feed the output of readCounter into incrementBy. In order to do this, a slightly more powerful version of our sequencing operator is needed. The >>= operator can feed the output of a given action as input to the next action. We can write:
readCounter >>= incrementBy
It is an action which executes the readCounter action, feeds its output in the incrementBy function and then execute the resulting action.
The type of >>= is:
(>>=) :: Monad m => m a -> (a -> m b) -> m b
A (partial) example
Let's say I have a Prompt monad which can only display informations (text) to the user and ask informations to the user:
-- We don't have access to the internal structure of the Prompt monad
module Prompt (Prompt(), echo, prompt) where
-- Opaque
data Prompt a = ...
instance Monad Prompt where ...
-- Display a line to the CLI:
echo :: String -> Prompt ()
-- Ask a question to the user:
prompt :: String -> Prompt String
Let's try to define a promptBoolean message actions which asks for a question and produces a boolean value.
We use the prompt (message ++ "[y/n]") action and feed its output to a function f:
f "y" should be an action which does nothing but produce True as output;
f "n" should be an action which does nothing but produce False as output;
anything else should restart the action (do the action again);
promptBoolean would look like this:
-- Incomplete version, some bits are missing:
promptBoolean :: String -> M Boolean
promptBoolean message = prompt (message ++ "[y/n]") >>= f
where f result = if result == "y"
then ???? -- We need here an action which does nothing but produce `True` as output
else if result=="n"
then ???? -- We need here an action which does nothing but produce `False` as output
else echo "Input not recognised, try again." >> promptBoolean
Producing a value without effect (return)
In order to fill the missing bits in our promptBoolean function, we need a way to represent dummy actions without any side effect but which only outputs a given value:
-- "return 5" is an action which does nothing but outputs 5
return :: (Monad m) => a -> m a
and we can now write out promptBoolean function:
promptBoolean :: String -> Prompt Boolean
promptBoolean message :: prompt (message ++ "[y/n]") >>= f
where f result = if result=="y"
then return True
else if result=="n"
then return False
else echo "Input not recognised, try again." >> promptBoolean message
By composing those two simple actions (promptBoolean, echo) we can define any kind of dialogue between the user and your program (the actions of the program are deterministic as our monad does not have a "randomness effect").
promptInt :: String -> M Int
promptInt = ... -- similar
-- Classic "guess a number game/dialogue"
guess :: Int -> m()
guess n = promptInt "Guess:" m -> f
where f m = if m == n
then echo "Found"
else (if m > n
then echo "Too big"
then echo "Too small") >> guess n
The operations of a monad
A Monad is a set of actions which can be composed with the return and >>= operators:
>>= for action composition;
return for producing a value without any (side-)effect.
These two operators are the minimal operators needed to define a Monad.
In Haskell, the >> operator is needed as well but it can in fact be derived from >>=:
(>>): Monad m => m a -> m b -> m b
a >> b = a >>= f
where f x = b
In Haskell, an extra fail operator is need as well but this is really a hack (and it might be removed from Monad in the future).
This is the Haskell definition of a Monad:
class Monad m where
return :: m a
(>>=) :: m a -> (a -> m b) -> m b
(>>) :: m a -> m b -> m b -- can be derived from (>>=)
fail :: String -> m a -- mostly a hack
Actions are first-class
One great thing about monads is that actions are first-class. You can take them in a variable, you can define function which take actions as input and produce some other actions as output. For example, we can define a while operator:
-- while x y : does action y while action x output True
while :: (Monad m) => m Boolean -> m a -> m ()
while x y = x >>= f
where f True = y >> while x y
f False = return ()
Summary
A Monad is a set of actions in some domain. The monad/domain define the kind of "effects" which are possible. The >> and >>= operators represent sequencing of actions and monadic expression may be used to represent any kind of "imperative (sub)program" in your (functional) Haskell program.
The great things are that:
you can design your own Monad which supports the features and effects that you want
see Prompt for an example of a "dialogue only subprogram",
see Rand for an example of "sampling only subprogram";
you can write your own control structures (while, throw, catch or more exotic ones) as functions taking actions and composing them in some way to produce a bigger macro-actions.
MonadRandom
A good way of understanding monads, is the MonadRandom package. The Rand monad is made of actions whose output can be random (the effect is randomness). An action in this monad is some kind of random variable (or more exactly a sampling process):
-- Sample an Int from some distribution
action :: Rand Int
Using Rand to do some sampling/random algorithms is quite interesting because you have random variables as first class values:
-- Estimate mean by sampling nsamples times the random variable x
sampleMean :: Real a => Int -> m a -> m a
sampleMean n x = ...
In this setting, the sequence function from Prelude,
sequence :: Monad m => [m a] -> m [a]
becomes
sequence :: [Rand a] -> Rand [a]
It creates a random variable obtained by sampling independently from a list of random variables.

There are three main observations concerning the IO monad:
1) You can't get values out of it. Other types like Maybe might allow to extract values, but neither the monad class interface itself nor the IO data type allow it.
2) "Inside" IO is not only the real value but also that "RealWorld" thing. This dummy value is used to enforce the chaining of actions by the type system: If you have two independent calculations, the use of >>= makes the second calculation dependent on the first.
3) Assume a non-deterministic thing like random :: () -> Int, which isn't allowed in Haskell. If you change the signature to random :: Blubb -> (Blubb, Int), it is allowed, if you make sure that nobody ever can use a Blubb twice: Because in that case all inputs are "different", it is no problem that the outputs are different as well.
Now we can use the fact 1): Nobody can get something out of IO, so we can use the RealWord dummy hidden in IO to serve as a Blubb. There is only one IOin the whole application (the one we get from main), and it takes care of proper sequentiation, as we have seen in 2). Problem solved.

One thing that often helps me to understand the nature of something is to examine it in the most trivial way possible. That way, I'm not getting distracted by potentially unrelated concepts. With that in mind, I think it may be helpful to understand the nature of the Identity Monad, as it's the most trivial implementation of a Monad possible (I think).
What is interesting about the Identity Monad? I think it is that it allows me to express the idea of evaluating expressions in a context defined by other expressions. And to me, that is the essence of every Monad I've encountered (so far).
If you already had a lot of exposure to 'mainstream' programming languages before learning Haskell (like I did), then this doesn't seem very interesting at all. After all, in a mainstream programming language, statements are executed in sequence, one after the other (excepting control-flow constructs, of course). And naturally, we can assume that every statement is evaluated in the context of all previously executed statements and that those previously executed statements may alter the environment and the behavior of the currently executing statement.
All of that is pretty much a foreign concept in a functional, lazy language like Haskell. The order in which computations are evaluated in Haskell is well-defined, but sometimes hard to predict, and even harder to control. And for many kinds of problems, that's just fine. But other sorts of problems (e.g. IO) are hard to solve without some convenient way to establish an implicit order and context between the computations in your program.
As far as side-effects go, specifically, often they can be transformed (via a Monad) in to simple state-passing, which is perfectly legal in a pure functional language. Some Monads don't seem to be of that nature, however. Monads such as the IO Monad or the ST monad literally perform side-effecting actions. There are many ways to think about this, but one way that I think about it is that just because my computations must exist in a world without side-effects, the Monad may not. As such, the Monad is free to establish a context for my computation to execute that is based on side-effects defined by other computations.
Finally, I must disclaim that I am definitely not a Haskell expert. As such, please understand that everything I've said is pretty much my own thoughts on this subject and I may very well disown them later when I understand Monads more fully.

the point is so there can be clean error handling in a chain of functions, containers, and side effects
More or less.
how exactly is the problem of side-effects solved?
A value in the I/O monad, i.e. one of type IO a, should be interpreted as a program. p >> q on IO values can then be interpreted as the operator that combines two programs into one that first executes p, then q. The other monad operators have similar interpretations. By assigning a program to the name main, you declare to the compiler that that is the program that has to be executed by its output object code.
As for the list monad, it's not really related to the I/O monad except in a very abstract mathematical sense. The IO monad gives deterministic computation with side effects, while the list monad gives non-deterministic (but not random!) backtracking search, somewhat similar to Prolog's modus operandi.

With this concept of containers, the language essentially says anything inside the containers is non-deterministic
No. Haskell is deterministic. If you ask for integer addition 2+2 you will always get 4.
"Nondeterministic" is only a metaphor, a way of thinking. Everything is deterministic under the hood. If you have this code:
do x <- [4,5]
y <- [0,1]
return (x+y)
it is roughly equivalent to Python code
l = []
for x in [4,5]:
for y in [0,1]:
l.append(x+y)
You see nondeterminism here? No, it's deterministic construction of a list. Run it twice, you'll get the same numbers in the same order.
You can describe it this way: Choose arbitrary x from [4,5]. Choose arbitrary y from [0,1]. Return x+y. Collect all possible results.
That way seems to involve nondeterminism, but it's only a nested loop (list comprehension). There is no "real" nondeterminism here, it's simulated by checking all possibilities. Nondeterminism is an illusion. The code only appears to be nondeterministic.
This code using State monad:
do put 0
x <- get
put (x+2)
y <- get
return (y+3)
gives 5 and seems to involve changing state. As with lists it's an illusion. There are no "variables" that change (as in imperative languages). Everything is nonmutable under the hood.
You can describe the code this way: put 0 to a variable. Read the value of a variable to x. Put (x+2) to the variable. Read the variable to y, and return y+3.
That way seems to involve state, but it's only composing functions passing additional parameter. There is no "real" mutability here, it's simulated by composition. Mutability is an illusion. The code only appears to be using it.
Haskell does it this way: you've got functions
a -> s -> (b,s)
This function takes and old value of state and returns new value. It does not involve mutability or change variables. It's a function in mathematical sense.
For example the function "put" takes new value of state, ignores current state and returns new state:
put x _ = ((), x)
Just like you can compose two normal functions
a -> b
b -> c
into
a -> c
using (.) operator you can compose "state" transformers
a -> s -> (b,s)
b -> s -> (c,s)
into a single function
a -> s -> (c,s)
Try writing the composition operator yourself. This is what really happens, there are no "side effects" only passing arguments to functions.

From what I understand, Monad is just another typeclass that declares ways to interact with data [...]
...providing an interface common to all those types which have an instance. This can then be used to provide generic definitions which work across all monadic types.
It seems clever and clean to implement these 3 things with one concept [...]
...the only three things that are implemented are the instances for those three types (list, Maybe and IO) - the types themselves are defined independently elsewhere.
[...] but really, the point is so there can be clean error handling in a chain of functions, containers, and side effects.
Not just error handling e.g. consider ST - without the monadic interface, you would have to pass the encapsulated-state directly and correctly...a tiresome task.
How exactly is the problem of side-effects solved?
Short answer: Haskell solves manages them by using types to indicate their presence.
Can someone explain how, intuitively, Haskell gets away with changing state with inputs and output?
"Intuitively"...like what's available over here? Let's try a simple direct comparison instead:
From How to Declare an Imperative by Philip Wadler:
(* page 26 *)
type 'a io = unit -> 'a
infix >>=
val >>= : 'a io * ('a -> 'b io) -> 'b io
fun m >>= k = fn () => let
val x = m ()
val y = k x ()
in
y
end
val return : 'a -> 'a io
fun return x = fn () => x
val putc : char -> unit io
fun putc c = fn () => putcML c
val getc : char io
val getc = fn () => getcML ()
fun getcML () =
valOf(TextIO.input1(TextIO.stdIn))
(* page 25 *)
fun putcML c =
TextIO.output1(TextIO.stdOut,c)
Based on these two answers of mine, this is my Haskell translation:
type IO a = OI -> a
(>>=) :: IO a -> (a -> IO b) -> IO b
m >>= k = \ u -> let !(u1, u2) = part u in
let !x = m u1 in
let !y = k x u2 in
y
return :: a -> IO a
return x = \ u -> let !_ = part u in x
putc :: Char -> IO ()
putc c = \ u -> putcOI c u
getc :: IO Char
getc = \ u -> getcOI u
-- primitives
data OI
partOI :: OI -> (OI, OI)
putcOI :: Char -> OI -> ()
getcOI :: OI -> Char
Now remember that short answer about side-effects?
Haskell manages them by using types to indicate their presence.
Data.Char.chr :: Int -> Char -- no side effects
getChar :: IO Char -- side effects at
{- :: OI -> Char -} -- work: beware!

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string