Does the existence rseq/seq break referential transparency? Are there some alternative approaches that don't? - haskell

I always thought that replacing an expression x :: () with () :: () would be one of the most basic optimizations during compiling Haskell programs. Since () has a single inhabitant, no matter what x is, it's result is (). This optimization seemed to me as a significant consequence of referential transparency. And we could do such an optimization for any type with just a single inhabitant.
(Update: My reasoning in this matter comes from natural deduction rules. There the unit type corresponds to truth (⊤) and we have an expansion rule "if x : ⊤ then () : ⊤". For example see this text p. 20. I assumed that it's safe to replace an expression with its expansion or contractum.)
One consequence of this optimization would be that undefined :: () would be replaced by () :: (), but I don't see this as a problem - it would simply make programs a bit lazier (and relying on undefined :: () is certainly a bad programming practice).
However today I realized that such optimization would break Control.Seq completely. Strategy is defined as
type Strategy a = a -> ()
and we have
-- | 'rseq' evaluates its argument to weak head normal form.
rseq :: Strategy a
rseq x = x `seq` ()
But rseq x :: () so the optimization would simply discard the required evaluation of x to WHNF.
So where is the problem?
Does the existence of rseq and seq break referential transparency (even if we consider only terminating expressions)?
Or is this a flaw of the design of Strategy and we could devise a better way how to force expressions to WHNF compatible with such optimizations?

Referential transparency is about equality statements and variable references. When you say x = y and your language is referentially transparent, then you can replace every occurence of x by y (modulo scope).
If you haven't specified x = (), then you can't replace x by () safely, such as in your case. This is because you are wrong about the inhabitants of (), because in Haskell there are two of them: One is the only constructor of (), namely (). The other is the value that is never calculated. You may call it bottom or undefined:
x :: ()
x = x
You certainly can't replace any occurrence of x by () here, because that would chance semantics. The existence of bottom in the language semantics allows some awkward edge cases, especially when you have a seq combinator, where you can even prove every monad wrong. That's why for many formal discussions we disregard the existence of bottom.
However, referential transparency does not suffer from that. Haskell is still referentially transparent and as such a purely functional language.

Your definition of referential transparency is incorrect. Referential transparency does not mean you can replace x :: () with () :: () and everything stays the same; it means you can replace all occurrences of a variable with its definition and everything stays the same. seq and rseq are not in conflict with referential transparency if you use this definition.

seq is a red herring here.
unitseq :: () -> a -> a
unitseq x y = case x of () -> y
This has the same semantics as seq on unit, and requires no magic. In fact, seq only has something "magic" when working on things you can't pattern match on -- i.e., functions.
Replacing undefined :: () with () has the same ill effects with unitseq as with the more magical seq.
In general, we think of values not only as "what" they are, but how defined they are. From this perspective, it should be clear why we can't go implementing definedness-changing transformations willy-nilly.

Thanks all for the inspiring answers. After thinking about it more, I come to the following views:
View 1: Considering terminating expressions only
If we restrict ourselves to a normalizing system, then seq (or rseq etc.) has no influence on the result of a program. It can change the amount of memory the program uses and increase the CPU time (by evaluating expressions we won't actually need), but the result is the same. So in this case, the rule x :: () --> () :: () is admissible - nothing is lost, CPU time and memory can be saved.
View 2: Curry-Howard isomorphism
Since Haskell is Turing-complete and so any type is inhabited by an infinite loop, the C-H corresponding logic system is inconsistent (as pointed out by Vitus) - anything can be proved. So actually any rule is admissible in such a system. This is probably the biggest problem in my original thinking.
View 3: Expansion rules
Natural deduction rules give us reductions and expansions for different type operators. For example for -> it's β-reduction and η-expansion, for (,) it is
fst (x, y) --reduce--> x (and similarly for snd)
x : (a,b) --expand--> (fst x, snd x)
etc. With the presence of bottom, the reducing rules are still admissible (I believe), but the expansion rules not! In particular, for ->:
rseq (undefined :: Int -> Int) = undefined
but it's η-expansion
rseq (\x -> (undefined :: Int -> Int) x) = ()
or for (,):
case undefined of (a,b) -> ()
but
case (fst undefined, snd undefined) of (a,b) -> ()
etc. So in the same way, the expansion rule for () is not admissible.
View 4: Lazy pattern matching
Allowing x :: () to expand to () :: () could be viewed as a special case of forcing all pattern matching on single-constructor data types to be lazy. I've never seen lazy pattern matching to be used on a multi-constructor data type, so I believe this change would allow us to get rid of the ~ lazy pattern matching symbol completely. However, it would change the semantic of the language to be more lazy (see also Daniel Fischer's comment to the question). And moreover (as described in the question) break Strategy.

Related

Why are values of type () inspected?

In Haskell, the () type has two values, namely, () and bottom. If you have an expression e :: (), there's no point in actually inspecting it, since either it's e = () or by inspecting it you're crashing a program which could otherwise have not crashed.
Hence, I figured that perhaps operations on values of type () would not inspect the value and would not distinguish between () and bottom.
However, this is wildly untrue:
▎λ ghci
GHCi, version 9.0.2: https://www.haskell.org/ghc/ :? for help
ghci> u = (undefined :: ())
ghci> show u
"*** Exception: Prelude.undefined
CallStack (from HasCallStack):
error, called at libraries/base/GHC/Err.hs:75:14 in base:GHC.Err
undefined, called at <interactive>:1:6 in interactive:Ghci1
ghci> () == u
*** Exception: Prelude.undefined
CallStack (from HasCallStack):
error, called at libraries/base/GHC/Err.hs:75:14 in base:GHC.Err
undefined, called at <interactive>:1:6 in interactive:Ghci1
ghci> f () = "ok"
ghci> f u
"*** Exception: Prelude.undefined
CallStack (from HasCallStack):
error, called at libraries/base/GHC/Err.hs:75:14 in base:GHC.Err
undefined, called at <interactive>:1:6 in interactive:Ghci1
What is the reason for this? Here are some conjectures:
For some reason that I can't think of, it's useful to be non-lazy on (). Sometimes we want that bottom to propagate.
Haskell semantics are written in such a way that destructuring any ADTs, even trivial ones, inspects them. This means that having case (undefined :: ()) of { () -> ... } not throw would be a violation of language semantics
() is an extremely special case and simply isn't worth the attention to eke out this tiny extra bit of safety in a massive language like Haskell
There's also the possible combination explanation of 2+3, that Haskell could have had semantics dictating that an expression case e of inspects e unless it is of type (), but that would pollute the language spec for relatively low benefit
I will address this part:
For some reason that I can't think of, it's useful to be non-lazy on (). Sometimes we want that bottom to propagate.
Let's have a look at Control.Parallel.Strategies (version 1, an older version). This is a module for parallel evaluation. Let's focus on one of its functions for the sake of simplicity:
parMap :: Strategy b -> (a -> b) -> [a] -> [b]
The result of parMap strat f xs is the same as map f xs, except that the list is computed in parallel. What is the strat argument? Well,
strat :: Strategy b
means
strat :: b -> ()
There are only two things you can do with strat:
call it and ignore the result, which by laziness amounts to not calling it at all;
call it and force the result, even if you know it's () or a bottom.
parMap does the latter, in parallel. This allows the caller to specify a strat argument that evaluates the list values of type b as needed. For example
parMap (\(x,y) -> ()) f xs
parMap (\(x,y) -> x `seq` ()) f xs
parMap (\(x,y) -> x `seq` y `seq` ()) f xs
are valid calls, and will cause parMap to evaluate the new list-of-pairs only to expose the pair constructor, also the first component, also the second component, respectively.
Hence, forcing the () result of strat in this case allows the user to control how much evaluation to perform during parMap, i.e. how much to force the result (in parallel), and consequently which parts of the result should be left unevaluated. (By comparison map f xs would leave the result fully unevaluated -- it is completely lazy. parMap can not do that otherwise it is not longer parallel.)
Minor digression: note that the GADT
data a :~: b where
Refl :: t :~: t
has one constructor like (). Here, it is mandatory that such values are forced as in:
foo :: Int :~: String -> Int -> String
foo Refl x = x ++ " hello"
Here the first argument must be a bottom. By forcing that, we make the function error out with an exception. If we did not force that, we would get a very nasty undefined behavior like those in C and C++, completely breaking type safety. Haskell will correctly reject any attempt to circumvent that:
foo :: Int :~: String -> Int -> String
foo _ x = x ++ " hello"
triggers a type error at compile time.
I don't know for sure, but I suspect it's none of the things you said. Instead, this is so that the language is predictable and consistent.
There are, essentially, two things you observed, and I consider them to be separate things. The first is that checking whether a x is indeed () with a case statement forces evaluation of x; the second is that the instances (of Show and Eq) are written to use a case statement.
Pattern matching: the predictable, consistent rule here is that if you write case <e0> of <pat> -> <e1>, then e0 is evaluated far enough to check whether the constructors in pat are in fact in the given places. Well, okay, there's some wrinkles here to do with irrefutable patterns; let's say instead that e0 is evaluated far enough to check whether pat actually does match! For the () type, that means that the pattern () causes full evaluation -- because you've specified the full value that you expect it to be -- while the pattern x or _ can match without further evaluation.
Class instances: the natural inductive way to specify what the various class instances do is to always have an outermost case that matches against each available constructor with simple variable patterns for the fields, then does something (presumably recursive calls) on each of the fields in turn. That is, simplifying a bit, the show implementation goes like:
show x = case x of
<Con0> field00 field01 field02 <...> -> "<Con0>"
++ " " ++ show field00
++ " " ++ show field01
++ " " ++ show field02
++ <...>
<Con1> field10 field11 field12 <...> -> "<Con1>"
++ " " ++ show field10
++ " " ++ show field11
++ " " ++ show field12
++ <...>
<...>
It is very natural for the specialization of this scheme to the single-constructor, zero-field type () to go:
show x = case x of
() -> "()"
(Additionally, the Report specifies that (==) is is always strict in both arguments; but that property would also arise naturally from the obvious way of writing a generic Eq instance derivation algorithm.) Therefore the path of least surprise is for class instances to pattern match on their argument(s).
#2 is definitely true.
The () type is just a nullary data type with special type/data constructor syntax:
data () = ()
As a result, the Haskell 2010 report, while only providing an informal semantics, makes it pretty clear in section 3.17.2 Informal Semantics of Pattern Matching that the expression:
case undefined of () -> "ack!"
will be evaluated as per rule #5:
Matching the pattern con pat1 … patn against a value, where con is a constructor defined by data, depends on the value:
If the value is of the form con v1 … vn, sub-patterns are matched left-to-right against the components of the data value; if all matches succeed, the overall match succeeds; the first to fail or diverge causes the overall match to fail or diverge, respectively.
If the value is of the form con′ v1 … vm, where con is a different constructor to con′, the match fails.
If the value is ⊥, the match diverges.
Here, the value of undefined is ⊥, so the third bullet point applies, and the match diverges. And if the match diverges, the program diverges, and if the program diverges it must terminate with an error or -- at worst -- loop forever. It cannot continue as if nothing has happened. Admittedly, this last part is not explicitly stated, but it is the only reasonable interpretation for the semantics of a divergent evaluation of an expression.

Strictness of dataToTag argument

In GHC.Prim, we find a magical function named dataToTag#:
dataToTag# :: a -> Int#
It turns a value of any type into an integer based on the data constructor it uses. This is used to speed up derived implementations of Eq, Ord, and Enum. In the GHC source, the docs for dataToTag# explain that the argument should already by evaluated:
The dataToTag# primop should always be applied to an evaluated argument.
The way to ensure this is to invoke it via the 'getTag' wrapper in GHC.Base:
getTag :: a -> Int#
getTag !x = dataToTag# x
It makes total sense to me that we need to force x's evaluation before dataToTag# is called. What I do not get is why the bang pattern is sufficient. The definition of getTag is just syntactic sugar for:
getTag :: a -> Int#
getTag x = x `seq` dataToTag# x
But let's turn to the docs for seq:
A note on evaluation order: the expression seq a b does not guarantee that a will be evaluated before b. The only guarantee given by seq is that the both a and b will be evaluated before seq returns a value. In particular, this means that b may be evaluated before a. If you need to guarantee a specific order of evaluation, you must use the function pseq from the "parallel" package.
In the Control.Parallel module from the parallel package, the docs elaborate further:
... seq is strict in both its arguments, so the compiler may, for example, rearrange a `seq` b into b `seq` a `seq` b ...
How is it that getTag is guaranteed to behave work, given that seq is insufficient for controlling evaluation order?
GHC tracks certain information about each primop. One key datum is whether the primop "can_fail". The original meaning of this flag is that a primop can fail if it can cause a hard fault. For example, array indexing can cause a segmentation fault if the index is out of range, so indexing operations can fail.
If a primop can fail, GHC will restrict certain transformations around it, and in particular won't float it out of any case expressions. It would be rather bad, for example, if
if n < bound
then unsafeIndex c n
else error "out of range"
were compiled to
case unsafeIndex v n of
!x -> if n < bound
then x
else error "out of range"
One of these bottoms is an exception; the other is a segfault.
dataToTag# is marked can_fail. So GHC sees (in Core) something like
getTag = \x -> case x of
y -> dataToTag# y
(Note that case is strict in Core.) Because dataToTag# is marked can_fail, it won't be floated out of any case expressions.

Monads, composition and the order of computation

All the monad articles often state, that monads allow you to sequence effects in order.
But what about simple composition? Ain't
f x = x + 1
g x = x * 2
result = f g x
requires g x to be computed before f ...?
Do monads do the same but with handling of effects?
Disclaimer: Monads are a lot of things. They are notoriously difficult to explain, so I will not attempt to explain what monads are in general here, since the question does not ask for that. I will assume you have a basic grasp on what the Monad interface is as well as how it works for some useful datatypes, like Maybe, Either, and IO.
What is an effect?
Your question begins with a note:
All the monad articles often state, that monads allow you to sequence effects in order.
Hmm. This is interesting. In fact, it is interesting for a couple reasons, one of which you have identified: it implies that monads let you create some sort of sequencing. That’s true, but it’s only part of the picture: it also states that sequencing happens on effects.
Here’s the thing, though… what is an “effect”? Is adding two numbers together an effect? Under most definitions, the answer would be no. What about printing something to stdout, is that an effect? In that case, I think most people would agree that the answer is yes. However, consider something more subtle: is short-circuiting a computation by producing Nothing an effect?
Error effects
Let’s take a look at an example. Consider the following code:
> do x <- Just 1
y <- Nothing
return (x + y)
Nothing
The second line of that example “short-circuits” due to the Monad instance for Maybe. Could that be considered an effect? In some sense, I think so, since it’s non-local, but in another sense, probably not. After all, if the x <- Just 1 or y <- Nothing lines are swapped, the result is still the same, so the ordering doesn’t matter.
However, consider a slightly more complex example that uses Either instead of Maybe:
> do x <- Left "x failed"
y <- Left "y failed"
return (x + y)
Left "x failed"
Now this is more interesting. If you swap the first two lines now, you get a different result! Still, is this a representation of an “effect” like the ones you allude to in your question? After all, it’s just a bunch of function calls. As you know, do notation is just an alternative syntax for a bunch of uses of the >>= operator, so we can expand it out:
> Left "x failed" >>= \x ->
Left "y failed" >>= \y ->
return (x + y)
Left "x failed"
We can even replace the >>= operator with the Either-specific definition to get rid of monads entirely:
> case Left "x failed" of
Right x -> case Left "y failed" of
Right y -> Right (x + y)
Left e -> Left e
Left e -> Left e
Left "x failed"
Therefore, it’s clear that monads do impose some sort of sequencing, but this is not because they are monads and monads are magic, it’s just because they happen to enable a style of programming that looks more impure than Haskell typically allows.
Monads and state
But perhaps that is unsatisfying to you. Error handling is not compelling because it is simply short-circuiting, it doesn’t actually have any sequencing in the result! Well, if we reach for some slightly more complex types, we can do that. For example, consider the Writer type, which allows a sort of “logging” using the monadic interface:
> execWriter $ do
tell "hello"
tell " "
tell "world"
"hello world"
This is even more interesting than before, since now the result of each computation in the do block is unused, but it still affects the output! This is clearly side-effectful, and order is clearly very important! If we reorder the tell expressions, we would get a very different result:
> execWriter $ do
tell " "
tell "world"
tell "hello"
" worldhello"
But how is this possible? Well, again, we can rewrite it to avoid do notation:
execWriter (
tell "hello" >>= \_ ->
tell " " >>= \_ ->
tell "world")
We could inline the definition of >>= again for Writer, but it’s too long to be terribly illustrative here. The point is, though, that Writer is just a completely ordinary Haskell datatype that doesn’t do any I/O or anything like that, and yet we have used the monadic interface to create something that looks like ordered effects.
We can go even further by creating an interface that looks like mutable state using the State type:
> flip execState 0 $ do
modify (+ 3)
modify (* 2)
6
Once again, if we reorder the expressions, we get a different result:
> flip execState 0 $ do
modify (* 2)
modify (+ 3)
3
Clearly, monads are a useful tool for creating interfaces that look stateful and have a well-defined ordering despite actually just being ordinary function calls.
Why can monads do this?
What gives monads this power? Well, they’re not magic—they’re just ordinary pure Haskell code. But consider the type signature for >>=:
(>>=) :: Monad m => m a -> (a -> m b) -> m b
Notice how the second argument depends on a, and the only way to get an a is from the first argument? This means that >>= needs to “run” the first argument to produce a value before it can apply the second argument. This doesn’t have to do with evaluation order so much as it has to do with actually writing code that will typecheck.
Now, it’s true that Haskell is a lazy language. But Haskell’s laziness doesn’t really matter for any of this because all of this code is actually pure, even the example using State! It’s simply a pattern that encodes computations that look sort of stateful in a pure way, but if you actually implemented State yourself, you’d find that it just passes around the “current state” in the definition of the >>= function. There’s not any actual mutation.
And that’s it. Monads, by virtue of their interface, impose an ordering on how their arguments may be evaluated, and instances of Monad exploit that to make stateful-looking interfaces. You don’t need Monad to have evaluation ordering, though, as you found; obviously in (1 + 2) * 3 the addition will be evaluated before the multiplication.
But what about IO??
Okay, you got me. Here’s the problem: IO is magic.
Monads are not magic, but IO is. All of the above examples are purely functional, but obviously reading a file or writing to stdout is not pure. So how the heck does IO work?
Well, IO is implemented by the GHC runtime, and you could not write it yourself. However, in order to make it work nicely with the rest of Haskell, there needs to be a well-defined evaluation order! Otherwise things would get printed out in the wrong order and all sorts of other hell would break loose.
Well, it turns out the Monad’s interface is a great way to ensure that evaluation order is predictable, since it works for pure code already. So IO leverages the same interface to guarantee the evaluation order is the same, and the runtime actually defines what that evaluation means.
However, don’t be misled! You don’t need monads to do I/O in a pure language, and you don’t need IO to have monadic effects. Early versions of Haskell experimented with a non-monadic way to do I/O, and the other parts of this answer explain how you can have pure monadic effects. Remember that monads are not special or holy, they’re just a pattern that Haskell programmers have found useful because of its various properties.
Yes, the functions you proposed are strict for the standard numerical types. But not all functions are! In
f _ = 3
g x = x * 2
result = f (g x)
it is not the case that g x must be computed before f (g x).
Yes, monads use function composition to sequence effects, and are not the only way to achieve sequenced effects.
Strict semantics and side effects
In most languages, there is sequencing by strict semantics applied first to the function side of an expression, then to each argument in turn, and finally the function is applied to the arguments. So in JS, the function application form,
<Code 1>(<Code 2>, <Code 3>)
runs four pieces of code in a specified order: 1, 2, 3, then it checks that the output of 1 was a function, then it calls the function with those two computed arguments. And it does this because any of those steps can have side-effects. You would write,
const logVal = (log, val) => {
console.log(log);
return val;
};
logVal(1, (a, b) => logVal(4, a+b))(
logVal(2, 2),
logVal(3, 3));
And that works for those languages. These are side-effects, which we can say in this context means that JS's type system doesn't give you any way to know that they are there.
Haskell does have a strict application primitive, but it wanted to be pure, which roughly means that it wanted the type system to track effects. So they introduced a form of metaprogramming where one of their types is a type-level adjective, “programs which compute a _____”. A program interacts with the real world; Haskell code in theory doesn't. You have to define “main is a program which computes a unit type” and then the compiler actually just builds that program for you as an executable binary file. By the time that file is run Haskell is not really in the picture any more!
This is therefore more specific than normal function application, because the abstract problem I wrote in JavaScript is,
I have a program which computes {a function from (X, Y) pairs to programs which compute Zs}.
I also have a program which computes an X, and a program which computes a Y.
I want to put these all together into a program which computes a Z.
That's not just function composition itself. But a function can do that.
Peeking inside monads
A monad is a pattern. The pattern is, sometimes you have an adjective which does not add much when you repeat it. For example there is not much added when you say "a delayed delayed x" or "zero or more (zero or more xs)" or "either a null or else either a null or else an x." Similarly for the IO monad, not much is added by "a program to compute a program to compute an x" that is not available in "a program to compute an x."
The pattern is that there is some canonical merging algorithm which merges:
join: given an <adjective> <adjective> x, I will make you an <adjective> x.
We also add two other properties, the adjective should be outputtish,
map: given an x -> y and an <adjective> x, I will make you an <adjective> y
and universally embeddable,
pure: given an x, I will make you an <adjective> x.
Given these three things and a couple axioms you happen to have a common "monad" idea which you can develop One True Syntax for.
Now this metaprogramming idea obviously contains a monad. In JS we would write,
interface IO<x> {
run: () => Promise<x>
}
function join<x>(pprog: IO<IO<x>>): IO<x> {
return { run: () => pprog.run().then(prog => prog.run()) };
}
function map<x, y>(prog: IO<x>, fn: (in: x) => y): IO<y> {
return { run: () => prog.run().then(x => fn(x)) }
}
function pure<x>(input: x): IO<x> {
return { run: () => Promise.resolve(input) }
}
// with those you can also define,
function bind<x, y>(prog: IO<x>, fn: (in: x) => IO<y>): IO<y> {
return join(map(prog, fn));
}
But the fact that a pattern exists does not mean it is useful! I am claiming that these functions turn out to be all you need to resolve the problem above. And it is not hard to see why: you can use bind to create a function scope inside of which the adjective doesn't exist, and manipulate your values there:
function ourGoal<x, y, z>(
fnProg: IO<(inX: x, inY: y) => IO<z>>,
xProg: IO<x>,
yProg: IO<y>): IO<z> {
return bind(fnProg, fn =>
bind(xProg, x =>
bind(yProg, y => fn(x, y))));
}
How this answers your question
Notice that in the above we choose an order of operations by how we write the three binds. We could have written those in some other order. But we needed all the parameters to run the final program.
This choice of how we sequence our operations is indeed realized in the function calls: you are 100% right. But the way that you are doing it, with only function composition, is flawed because it demotes the effects down to side-effects in order to get the types through.

Can any recursive definition be rewritten using foldr?

Say I have a general recursive definition in haskell like this:
foo a0 a1 ... = base_case
foo b0 b1 ...
| cond1 = recursive_case_1
| cond2 = recursive_case_2
...
Can it always rewritten using foldr? Can it be proved?
If we interpret your question literally, we can write const value foldr to achieve any value, as #DanielWagner pointed out in a comment.
A more interesting question is whether we can instead forbid general recursion from Haskell, and "recurse" only through the eliminators/catamorphisms associated to each user-defined data type, which are the natural generalization of foldr to inductively defined data types. This is, essentially, (higher-order) primitive recursion.
When this restriction is performed, we can only compose terminating functions (the eliminators) together. This means that we can no longer define non terminating functions.
As a first example, we lose the trivial recursion
f x = f x
-- or even
a = a
since, as said, the language becomes total.
More interestingly, the general fixed point operator is lost.
fix :: (a -> a) -> a
fix f = f (fix f)
A more intriguing question is: what about the total functions we can express in Haskell? We do lose all the non-total functions, but do we lose any of the total ones?
Computability theory states that, since the language becomes total (no more non termination), we lose expressiveness even on the total fragment.
The proof is a standard diagonalization argument. Fix any enumeration of programs in the total fragment so that we can speak of "the i-th program".
Then, let eval i x be the result of running the i-th program on the natural x as input (for simplicity, assume this is well typed, and that the result is a natural). Note that, since the language is total, then a result must exist. Moreover, eval can be implemented in the unrestricted Haskell language, since we can write an interpreter of Haskell in Haskell (left as an exercise :-P), and that would work as fine for the fragment. Then, we simply take
f n = succ $ eval n n
The above is a total function (a composition of total functions) which can be expressed in Haskell, but not in the fragment. Indeed, otherwise there would be a program to compute it, say the i-th program. In such case we would have
eval i x = f x
for all x. But then,
eval i i = f i = succ $ eval i i
which is impossible -- contradiction. QED.
In type theory, it is indeed the case that you can elaborate all definitions by dependent pattern-matching into ones only using eliminators (a more strongly-typed version of folds, the generalisation of lists' foldr).
See e.g. Eliminating Dependent Pattern Matching (pdf)

Why do we need monads?

In my humble opinion the answers to the famous question "What is a monad?", especially the most voted ones, try to explain what is a monad without clearly explaining why monads are really necessary. Can they be explained as the solution to a problem?
Why do we need monads?
We want to program only using functions. ("functional programming (FP)" after all).
Then, we have a first big problem. This is a program:
f(x) = 2 * x
g(x,y) = x / y
How can we say what is to be executed first? How can we form an ordered sequence of functions (i.e. a program) using no more than functions?
Solution: compose functions. If you want first g and then f, just write f(g(x,y)). This way, "the program" is a function as well: main = f(g(x,y)). OK, but ...
More problems: some functions might fail (i.e. g(2,0), divide by 0). We have no "exceptions" in FP (an exception is not a function). How do we solve it?
Solution: Let's allow functions to return two kind of things: instead of having g : Real,Real -> Real (function from two reals into a real), let's allow g : Real,Real -> Real | Nothing (function from two reals into (real or nothing)).
But functions should (to be simpler) return only one thing.
Solution: let's create a new type of data to be returned, a "boxing type" that encloses maybe a real or be simply nothing. Hence, we can have g : Real,Real -> Maybe Real. OK, but ...
What happens now to f(g(x,y))? f is not ready to consume a Maybe Real. And, we don't want to change every function we could connect with g to consume a Maybe Real.
Solution: let's have a special function to "connect"/"compose"/"link" functions. That way, we can, behind the scenes, adapt the output of one function to feed the following one.
In our case: g >>= f (connect/compose g to f). We want >>= to get g's output, inspect it and, in case it is Nothing just don't call f and return Nothing; or on the contrary, extract the boxed Real and feed f with it. (This algorithm is just the implementation of >>= for the Maybe type). Also note that >>= must be written only once per "boxing type" (different box, different adapting algorithm).
Many other problems arise which can be solved using this same pattern: 1. Use a "box" to codify/store different meanings/values, and have functions like g that return those "boxed values". 2. Have a composer/linker g >>= f to help connecting g's output to f's input, so we don't have to change any f at all.
Remarkable problems that can be solved using this technique are:
having a global state that every function in the sequence of functions ("the program") can share: solution StateMonad.
We don't like "impure functions": functions that yield different output for same input. Therefore, let's mark those functions, making them to return a tagged/boxed value: IO monad.
Total happiness!
The answer is, of course, "We don't". As with all abstractions, it isn't necessary.
Haskell does not need a monad abstraction. It isn't necessary for performing IO in a pure language. The IO type takes care of that just fine by itself. The existing monadic desugaring of do blocks could be replaced with desugaring to bindIO, returnIO, and failIO as defined in the GHC.Base module. (It's not a documented module on hackage, so I'll have to point at its source for documentation.) So no, there's no need for the monad abstraction.
So if it's not needed, why does it exist? Because it was found that many patterns of computation form monadic structures. Abstraction of a structure allows for writing code that works across all instances of that structure. To put it more concisely - code reuse.
In functional languages, the most powerful tool found for code reuse has been composition of functions. The good old (.) :: (b -> c) -> (a -> b) -> (a -> c) operator is exceedingly powerful. It makes it easy to write tiny functions and glue them together with minimal syntactic or semantic overhead.
But there are cases when the types don't work out quite right. What do you do when you have foo :: (b -> Maybe c) and bar :: (a -> Maybe b)? foo . bar doesn't typecheck, because b and Maybe b aren't the same type.
But... it's almost right. You just want a bit of leeway. You want to be able to treat Maybe b as if it were basically b. It's a poor idea to just flat-out treat them as the same type, though. That's more or less the same thing as null pointers, which Tony Hoare famously called the billion-dollar mistake. So if you can't treat them as the same type, maybe you can find a way to extend the composition mechanism (.) provides.
In that case, it's important to really examine the theory underlying (.). Fortunately, someone has already done this for us. It turns out that the combination of (.) and id form a mathematical construct known as a category. But there are other ways to form categories. A Kleisli category, for instance, allows the objects being composed to be augmented a bit. A Kleisli category for Maybe would consist of (.) :: (b -> Maybe c) -> (a -> Maybe b) -> (a -> Maybe c) and id :: a -> Maybe a. That is, the objects in the category augment the (->) with a Maybe, so (a -> b) becomes (a -> Maybe b).
And suddenly, we've extended the power of composition to things that the traditional (.) operation doesn't work on. This is a source of new abstraction power. Kleisli categories work with more types than just Maybe. They work with every type that can assemble a proper category, obeying the category laws.
Left identity: id . f = f
Right identity: f . id = f
Associativity: f . (g . h) = (f . g) . h
As long as you can prove that your type obeys those three laws, you can turn it into a Kleisli category. And what's the big deal about that? Well, it turns out that monads are exactly the same thing as Kleisli categories. Monad's return is the same as Kleisli id. Monad's (>>=) isn't identical to Kleisli (.), but it turns out to be very easy to write each in terms of the other. And the category laws are the same as the monad laws, when you translate them across the difference between (>>=) and (.).
So why go through all this bother? Why have a Monad abstraction in the language? As I alluded to above, it enables code reuse. It even enables code reuse along two different dimensions.
The first dimension of code reuse comes directly from the presence of the abstraction. You can write code that works across all instances of the abstraction. There's the entire monad-loops package consisting of loops that work with any instance of Monad.
The second dimension is indirect, but it follows from the existence of composition. When composition is easy, it's natural to write code in small, reusable chunks. This is the same way having the (.) operator for functions encourages writing small, reusable functions.
So why does the abstraction exist? Because it's proven to be a tool that enables more composition in code, resulting in creating reusable code and encouraging the creation of more reusable code. Code reuse is one of the holy grails of programming. The monad abstraction exists because it moves us a little bit towards that holy grail.
Benjamin Pierce said in TAPL
A type system can be regarded as calculating a kind of static
approximation to the run-time behaviours of the terms in a program.
That's why a language equipped with a powerful type system is strictly more expressive, than a poorly typed language. You can think about monads in the same way.
As #Carl and sigfpe point, you can equip a datatype with all operations you want without resorting to monads, typeclasses or whatever other abstract stuff. However monads allow you not only to write reusable code, but also to abstract away all redundant detailes.
As an example, let's say we want to filter a list. The simplest way is to use the filter function: filter (> 3) [1..10], which equals [4,5,6,7,8,9,10].
A slightly more complicated version of filter, that also passes an accumulator from left to right, is
swap (x, y) = (y, x)
(.*) = (.) . (.)
filterAccum :: (a -> b -> (Bool, a)) -> a -> [b] -> [b]
filterAccum f a xs = [x | (x, True) <- zip xs $ snd $ mapAccumL (swap .* f) a xs]
To get all i, such that i <= 10, sum [1..i] > 4, sum [1..i] < 25, we can write
filterAccum (\a x -> let a' = a + x in (a' > 4 && a' < 25, a')) 0 [1..10]
which equals [3,4,5,6].
Or we can redefine the nub function, that removes duplicate elements from a list, in terms of filterAccum:
nub' = filterAccum (\a x -> (x `notElem` a, x:a)) []
nub' [1,2,4,5,4,3,1,8,9,4] equals [1,2,4,5,3,8,9]. A list is passed as an accumulator here. The code works, because it's possible to leave the list monad, so the whole computation stays pure (notElem doesn't use >>= actually, but it could). However it's not possible to safely leave the IO monad (i.e. you cannot execute an IO action and return a pure value — the value always will be wrapped in the IO monad). Another example is mutable arrays: after you have leaved the ST monad, where a mutable array live, you cannot update the array in constant time anymore. So we need a monadic filtering from the Control.Monad module:
filterM :: (Monad m) => (a -> m Bool) -> [a] -> m [a]
filterM _ [] = return []
filterM p (x:xs) = do
flg <- p x
ys <- filterM p xs
return (if flg then x:ys else ys)
filterM executes a monadic action for all elements from a list, yielding elements, for which the monadic action returns True.
A filtering example with an array:
nub' xs = runST $ do
arr <- newArray (1, 9) True :: ST s (STUArray s Int Bool)
let p i = readArray arr i <* writeArray arr i False
filterM p xs
main = print $ nub' [1,2,4,5,4,3,1,8,9,4]
prints [1,2,4,5,3,8,9] as expected.
And a version with the IO monad, which asks what elements to return:
main = filterM p [1,2,4,5] >>= print where
p i = putStrLn ("return " ++ show i ++ "?") *> readLn
E.g.
return 1? -- output
True -- input
return 2?
False
return 4?
False
return 5?
True
[1,5] -- output
And as a final illustration, filterAccum can be defined in terms of filterM:
filterAccum f a xs = evalState (filterM (state . flip f) xs) a
with the StateT monad, that is used under the hood, being just an ordinary datatype.
This example illustrates, that monads not only allow you to abstract computational context and write clean reusable code (due to the composability of monads, as #Carl explains), but also to treat user-defined datatypes and built-in primitives uniformly.
I don't think IO should be seen as a particularly outstanding monad, but it's certainly one of the more astounding ones for beginners, so I'll use it for my explanation.
Naïvely building an IO system for Haskell
The simplest conceivable IO system for a purely-functional language (and in fact the one Haskell started out with) is this:
main₀ :: String -> String
main₀ _ = "Hello World"
With lazyness, that simple signature is enough to actually build interactive terminal programs – very limited, though. Most frustrating is that we can only output text. What if we added some more exciting output possibilities?
data Output = TxtOutput String
| Beep Frequency
main₁ :: String -> [Output]
main₁ _ = [ TxtOutput "Hello World"
-- , Beep 440 -- for debugging
]
cute, but of course a much more realistic “alterative output” would be writing to a file. But then you'd also want some way to read from files. Any chance?
Well, when we take our main₁ program and simply pipe a file to the process (using operating system facilities), we have essentially implemented file-reading. If we could trigger that file-reading from within the Haskell language...
readFile :: Filepath -> (String -> [Output]) -> [Output]
This would use an “interactive program” String->[Output], feed it a string obtained from a file, and yield a non-interactive program that simply executes the given one.
There's one problem here: we don't really have a notion of when the file is read. The [Output] list sure gives a nice order to the outputs, but we don't get an order for when the inputs will be done.
Solution: make input-events also items in the list of things to do.
data IO₀ = TxtOut String
| TxtIn (String -> [Output])
| FileWrite FilePath String
| FileRead FilePath (String -> [Output])
| Beep Double
main₂ :: String -> [IO₀]
main₂ _ = [ FileRead "/dev/null" $ \_ ->
[TxtOutput "Hello World"]
]
Ok, now you may spot an imbalance: you can read a file and make output dependent on it, but you can't use the file contents to decide to e.g. also read another file. Obvious solution: make the result of the input-events also something of type IO, not just Output. That sure includes simple text output, but also allows reading additional files etc..
data IO₁ = TxtOut String
| TxtIn (String -> [IO₁])
| FileWrite FilePath String
| FileRead FilePath (String -> [IO₁])
| Beep Double
main₃ :: String -> [IO₁]
main₃ _ = [ TxtIn $ \_ ->
[TxtOut "Hello World"]
]
That would now actually allow you to express any file operation you might want in a program (though perhaps not with good performance), but it's somewhat overcomplicated:
main₃ yields a whole list of actions. Why don't we simply use the signature :: IO₁, which has this as a special case?
The lists don't really give a reliable overview of program flow anymore: most subsequent computations will only be “announced” as the result of some input operation. So we might as well ditch the list structure, and simply cons a “and then do” to each output operation.
data IO₂ = TxtOut String IO₂
| TxtIn (String -> IO₂)
| Terminate
main₄ :: IO₂
main₄ = TxtIn $ \_ ->
TxtOut "Hello World"
Terminate
Not too bad!
So what has all of this to do with monads?
In practice, you wouldn't want to use plain constructors to define all your programs. There would need to be a good couple of such fundamental constructors, yet for most higher-level stuff we would like to write a function with some nice high-level signature. It turns out most of these would look quite similar: accept some kind of meaningfully-typed value, and yield an IO action as the result.
getTime :: (UTCTime -> IO₂) -> IO₂
randomRIO :: Random r => (r,r) -> (r -> IO₂) -> IO₂
findFile :: RegEx -> (Maybe FilePath -> IO₂) -> IO₂
There's evidently a pattern here, and we'd better write it as
type IO₃ a = (a -> IO₂) -> IO₂ -- If this reminds you of continuation-passing
-- style, you're right.
getTime :: IO₃ UTCTime
randomRIO :: Random r => (r,r) -> IO₃ r
findFile :: RegEx -> IO₃ (Maybe FilePath)
Now that starts to look familiar, but we're still only dealing with thinly-disguised plain functions under the hood, and that's risky: each “value-action” has the responsibility of actually passing on the resulting action of any contained function (else the control flow of the entire program is easily disrupted by one ill-behaved action in the middle). We'd better make that requirement explicit. Well, it turns out those are the monad laws, though I'm not sure we can really formulate them without the standard bind/join operators.
At any rate, we've now reached a formulation of IO that has a proper monad instance:
data IO₄ a = TxtOut String (IO₄ a)
| TxtIn (String -> IO₄ a)
| TerminateWith a
txtOut :: String -> IO₄ ()
txtOut s = TxtOut s $ TerminateWith ()
txtIn :: IO₄ String
txtIn = TxtIn $ TerminateWith
instance Functor IO₄ where
fmap f (TerminateWith a) = TerminateWith $ f a
fmap f (TxtIn g) = TxtIn $ fmap f . g
fmap f (TxtOut s c) = TxtOut s $ fmap f c
instance Applicative IO₄ where
pure = TerminateWith
(<*>) = ap
instance Monad IO₄ where
TerminateWith x >>= f = f x
TxtOut s c >>= f = TxtOut s $ c >>= f
TxtIn g >>= f = TxtIn $ (>>=f) . g
Obviously this is not an efficient implementation of IO, but it's in principle usable.
Monads serve basically to compose functions together in a chain. Period.
Now the way they compose differs across the existing monads, thus resulting in different behaviors (e.g., to simulate mutable state in the state monad).
The confusion about monads is that being so general, i.e., a mechanism to compose functions, they can be used for many things, thus leading people to believe that monads are about state, about IO, etc, when they are only about "composing functions".
Now, one interesting thing about monads, is that the result of the composition is always of type "M a", that is, a value inside an envelope tagged with "M". This feature happens to be really nice to implement, for example, a clear separation between pure from impure code: declare all impure actions as functions of type "IO a" and provide no function, when defining the IO monad, to take out the "a" value from inside the "IO a". The result is that no function can be pure and at the same time take out a value from an "IO a", because there is no way to take such value while staying pure (the function must be inside the "IO" monad to use such value). (NOTE: well, nothing is perfect, so the "IO straitjacket" can be broken using "unsafePerformIO : IO a -> a" thus polluting what was supposed to be a pure function, but this should be used very sparingly and when you really know to be not introducing any impure code with side-effects.
Monads are just a convenient framework for solving a class of recurring problems. First, monads must be functors (i.e. must support mapping without looking at the elements (or their type)), they must also bring a binding (or chaining) operation and a way to create a monadic value from an element type (return). Finally, bind and return must satisfy two equations (left and right identities), also called the monad laws. (Alternatively one could define monads to have a flattening operation instead of binding.)
The list monad is commonly used to deal with non-determinism. The bind operation selects one element of the list (intuitively all of them in parallel worlds), lets the programmer to do some computation with them, and then combines the results in all worlds to single list (by concatenating, or flattening, a nested list). Here is how one would define a permutation function in the monadic framework of Haskell:
perm [e] = [[e]]
perm l = do (leader, index) <- zip l [0 :: Int ..]
let shortened = take index l ++ drop (index + 1) l
trailer <- perm shortened
return (leader : trailer)
Here is an example repl session:
*Main> perm "a"
["a"]
*Main> perm "ab"
["ab","ba"]
*Main> perm ""
[]
*Main> perm "abc"
["abc","acb","bac","bca","cab","cba"]
It should be noted that the list monad is in no way a side effecting computation. A mathematical structure being a monad (i.e. conforming to the above mentioned interfaces and laws) does not imply side effects, though side-effecting phenomena often nicely fit into the monadic framework.
You need monads if you have a type constructor and functions that returns values of that type family. Eventually, you would like to combine these kind of functions together. These are the three key elements to answer why.
Let me elaborate. You have Int, String and Real and functions of type Int -> String, String -> Real and so on. You can combine these functions easily, ending with Int -> Real. Life is good.
Then, one day, you need to create a new family of types. It could be because you need to consider the possibility of returning no value (Maybe), returning an error (Either), multiple results (List) and so on.
Notice that Maybe is a type constructor. It takes a type, like Int and returns a new type Maybe Int. First thing to remember, no type constructor, no monad.
Of course, you want to use your type constructor in your code, and soon you end with functions like Int -> Maybe String and String -> Maybe Float. Now, you can't easily combine your functions. Life is not good anymore.
And here's when monads come to the rescue. They allow you to combine that kind of functions again. You just need to change the composition . for >==.
Why do we need monadic types?
Since it was the quandary of I/O and its observable effects in nonstrict languages like Haskell that brought the monadic interface to such prominence:
[...] monads are used to address the more general problem of computations (involving state, input/output, backtracking, ...) returning values: they do not solve any input/output-problems directly but rather provide an elegant and flexible abstraction of many solutions to related problems. [...] For instance, no less than three different input/output-schemes are used to solve these basic problems in Imperative functional programming, the paper which originally proposed `a new model, based on monads, for performing input/output in a non-strict, purely functional language'. [...]
[Such] input/output-schemes merely provide frameworks in which side-effecting operations can safely be used with a guaranteed order of execution and without affecting the properties of the purely functional parts of the language.
Claus Reinke (pages 96-97 of 210).
(emphasis by me.)
[...] When we write effectful code – monads or no monads – we have to constantly keep in mind the context of expressions we pass around.
The fact that monadic code ‘desugars’ (is implementable in terms of) side-effect-free code is irrelevant. When we use monadic notation, we program within that notation – without considering what this notation desugars into. Thinking of the desugared code breaks the monadic abstraction. A side-effect-free, applicative code is normally compiled to (that is, desugars into) C or machine code. If the desugaring argument has any force, it may be applied just as well to the applicative code, leading to the conclusion that it all boils down to the machine code and hence all programming is imperative.
[...] From the personal experience, I have noticed that the mistakes I make when writing monadic code are exactly the mistakes I made when programming in C. Actually, monadic mistakes tend to be worse, because monadic notation (compared to that of a typical imperative language) is ungainly and obscuring.
Oleg Kiselyov (page 21 of 26).
The most difficult construct for students to understand is the monad. I introduce IO without mentioning monads.
Olaf Chitil.
More generally:
Still, today, over 25 years after the introduction of the concept of monads to the world of functional programming, beginning functional programmers struggle to grasp the concept of monads. This struggle is exemplified by the numerous blog posts about the effort of trying to learn about monads. From our own experience we notice that even at university level, bachelor level students often struggle to comprehend monads and consistently score poorly on monad-related exam questions.
Considering that the concept of monads is not likely to disappear from the functional programming landscape any time soon, it is vital that we, as the functional programming community, somehow overcome the problems novices encounter when first studying monads.
Tim Steenvoorden, Jurriën Stutterheim, Erik Barendsen and Rinus Plasmeijer.
If only there was another way to specify "a guaranteed order of execution" in Haskell, while keeping the ability to separate regular Haskell definitions from those involved in I/O (and its observable effects) - translating this variation of Philip Wadler's echo:
val echoML : unit -> unit
fun echoML () = let val c = getcML () in
if c = #"\n" then
()
else
let val _ = putcML c in
echoML ()
end
fun putcML c = TextIO.output1(TextIO.stdOut,c);
fun getcML () = valOf(TextIO.input1(TextIO.stdIn));
...could then be as simple as:
echo :: OI -> ()
echo u = let !(u1:u2:u3:_) = partsOI u in
let !c = getChar u1 in
if c == '\n' then
()
else
let !_ = putChar c u2 in
echo u3
where:
data OI -- abstract
foreign import ccall "primPartOI" partOI :: OI -> (OI, OI)
⋮
foreign import ccall "primGetCharOI" getChar :: OI -> Char
foreign import ccall "primPutCharOI" putChar :: Char -> OI -> ()
⋮
and:
partsOI :: OI -> [OI]
partsOI u = let !(u1, u2) = partOI u in u1 : partsOI u2
How would this work? At run-time, Main.main receives an initial OI pseudo-data value as an argument:
module Main(main) where
main :: OI -> ()
⋮
...from which other OI values are produced, using partOI or partsOI. All you have to do is ensure each new OI value is used at most once, in each call to an OI-based definition, foreign or otherwise. In return, you get back a plain ordinary result - it isn't e.g. paired with some odd abstract state, or requires the use of a callback continuation, etc.
Using OI, instead of the unit type () like Standard ML does, means we can avoid always having to use the monadic interface:
Once you're in the IO monad, you're stuck there forever, and are reduced to Algol-style imperative programming.
Robert Harper.
But if you really do need it:
type IO a = OI -> a
unitIO :: a -> IO a
unitIO x = \ u -> let !_ = partOI u in x
bindIO :: IO a -> (a -> IO b) -> IO b
bindIO m k = \ u -> let !(u1, u2) = partOI u in
let !x = m u1 in
let !y = k x u2 in
y
⋮
So, monadic types aren't always needed - there are other interfaces out there:
LML had a fully fledged implementation of oracles running of a multi-processor (a Sequent Symmetry) back in ca 1989. The description in the Fudgets thesis refers to this implementation. It was fairly pleasant to work with and quite practical.
[...]
These days everything is done with monads so other solutions are sometimes forgotten.
Lennart Augustsson (2006).
Wait a moment: since it so closely resembles Standard ML's direct use of effects, is this approach and its use of pseudo-data referentially transparent?
Absolutely - just find a suitable definition of "referential transparency"; there's plenty to choose from...

Resources