Monads in Haskell and Purity - haskell

My question is whether monads in Haskell actually maintain Haskell's purity, and if so how. Frequently I have read about how side effects are impure but that side effects are needed for useful programs (e.g. I/O). In the next sentence it is stated that Haskell's solution to this is monads. Then monads are explained to some degree or another, but not really how they solve the side-effect problem.
I have seen this and this, and my interpretation of the answers is actually one that came to me in my own readings -- the "actions" of the IO monad are not the I/O themselves but objects that, when executed, perform I/O. But it occurs to me that one could make the same argument for any code or perhaps any compiled executable. Couldn't you say that a C++ program only produces side effects when the compiled code is executed? That all of C++ is inside the IO monad and so C++ is pure? I doubt this is true, but I honestly don't know in what way it is not. In fact, didn't Moggi (sp?) initially use monads to model the denotational semantics of imperative programs?
Some background: I am a fan of Haskell and functional programming and I hope to learn more about both as my studies continue. I understand the benefits of referential transparency, for example. The motivation for this question is that I am a grad student and I will be giving 2 1-hour presentations to a programming languages class, one covering Haskell in particular and the other covering functional programming in general. I suspect that the majority of the class is not familiar with functional programming, maybe having seen a bit of scheme. I hope to be able to (reasonably) clearly explain how monads solve the purity problem without going into category theory and the theoretical underpinnings of monads, which I wouldn't have time to cover and anyway I don't fully understand myself -- certainly not well enough to present.
I wonder if "purity" in this context is not really well-defined?

It's hard to argue conclusively in either direction because "pure" is not particularly well-defined. Certainly, something makes Haskell fundamentally different from other languages, and it's deeply related to managing side-effects and the IO type¹, but it's not clear exactly what that something is. Given a concrete definition to refer to we could just check if it applies, but this isn't easy: such definitions will tend to either not match everyone's expectations or be too broad to be useful.
So what makes Haskell special, then? In my view, it's the separation between evaluation and execution.
The base language—closely related to the λ-caluclus—is all about the former. You work with expressions that evaluate to other expressions, 1 + 1 to 2. No side-effects here, not because they were suppressed or removed but simply because they don't make sense in the first place. They're not part of the model² any more than, say, backtracking search is part of the model of Java (as opposed to Prolog).
If we just stuck to this base language with no added facilities for IO, I think it would be fairly uncontroversial to call it "pure". It would still be useful as, perhaps, a replacement for Mathematica. You would write your program as an expression and then get the result of evaluating the expression at the REPL. Nothing more than a fancy calculator, and nobody accuses the expression language you use in a calculator of being impure³!
But, of course, this is too limiting. We want to use our language to read files and serve web pages and draw pictures and control robots and interact with the user. So the question, then, is how to preserve everything we like about evaluating expressions while extending our language to do everything we want.
The answer we've come up with? IO. A special type of expression that our calculator-like language can evaluate which corresponds to doing some effectful actions. Crucially, evaluation still works just as before, even for things in IO. The effects get executed in the order specified by the resulting IO value, not based on how it was evaluated. IO is what we use to introduce and manage effects into our otherwise-pure expression language.
I think that's enough to make describing Haskell as "pure" meaningful.
footnotes
¹ Note how I said IO and not monads in general: the concept of a monad is immensely useful for dozens of things unrelated to input and output, and the IO types has to be more than just a monad to be useful. I feel the two are linked too closely in common discourse.
² This is why unsafePerformIO is so, well, unsafe: it breaks the core abstraction of the language. This is the same as, say, putzing with specific registers in C: it can both cause weird behavior and stop your code from being portable because it goes below C's level of abstraction.
³ Well, mostly, as long as we ignore things like generating random numbers.

A function with type, for example, a -> IO b always returns an identical IO action when given the same input; it is pure in that it cannot possibly inspect the environment, and obeys all the usual rules for pure functions. This means that, among other things, the compiler can apply all of its usual optimization rules to functions with an IO in their type, because it knows they are still pure functions.
Now, the IO action returned may, when run, look at the environment, read files, modify global state, whatever, all bets are off once you run an action. But you don't necessarily have to run an action; you can put five of them into a list and then run them in reverse of the order in which you created them, or never run some of them at all, if you want; you couldn't do this if IO actions implicitly ran themselves when you created them.
Consider this silly program:
main :: IO ()
main = do
inputs <- take 5 . lines <$> getContents
let [line1,line2,line3,line4,line5] = map print inputs
line3
line1
line2
line5
If you run this, and then enter 5 lines, you will see them printed back to you but in a different order, and with one omitted, even though our haskell program runs map print over them in the order they were received. You couldn't do this with C's printf, because it immediately performs its IO when called; haskell's version just returns an IO action, which you can still manipulate as a first-class value and do whatever you want with.

I see two main differences here:
1) In haskell, you can do things that are not in the IO monad. Why is this good? Because if you have a function definitelyDoesntLaunchNukes :: Int -> IO Int you don't know that the resulting IO action doesn't launch nukes, it might for all you know. cantLaunchNukes :: Int -> Int will definitely not launch any nukes (barring any ugly hacks that you should avoid in nearly all circumstances).
2) In haskell, it's not just a cute analogy: IO actions are first class values. You can put them in lists, and leave them there for as long as you want, they won't do anything unless they somehow become part of the main action. The closest that C has to that are function pointers, which are quite a bit more cumbersome to use. In C++ (and most modern imperative languages really) you have closures which technically could be used for this purpose, but rarely are - mainly because Haskell is pure and they aren't.
Why does that distinction matter here? Well, where are you going to get your other IO actions/closures from? Probably, functions/methods of some description. Which, in an impure language, can themselves have side effects, rendering the attempt of isolating them in these languages pointless.

fiction-mode: Active
It was quite a challenge, and I think a wormhole could be forming in the neighbour's backyard, but I managed to grab part of a Haskell I/O implementation from an alternate reality:
class Kleisli k where
infixr 1 >=>
simple :: (a -> b) -> (a -> k b)
(>=>) :: (a -> k b) -> (b -> k c) -> a -> k c
instance Kleisli IO where
simple = primSimpleIO
(>=>) = primPipeIO
primitive primSimpleIO :: (a -> b) -> (a -> IO b)
primitive primPipeIO :: (a -> IO b) -> (b -> IO c) -> a -> IO c
Back in our slightly-mutilated reality (sorry!), I have used this other form of Haskell I/O to define our form of Haskell I/O:
instance Monad IO where
return x = simple (const x) ()
m >>= k = (const m >=> k) ()
and it works!
fiction-mode: Offline
My question is whether monads in Haskell actually maintain Haskell's purity, and if so how.
The monadic interface, by itself, doesn't maintain restrain the effects - it is only an interface, albeit a jolly-versatile one. As my little work of fiction shows, there are other possible interfaces for the job - it's just a matter of how convenient they are to use in practice.
For an implementation of Haskell I/O, what keeps the effects under control is that all the pertinent entities, be they:
IO, simple, (>=>) etc
or:
IO, return, (>>=) etc
are abstract - how the implementation defines those is kept private.
Otherwise, you would be able to devise "novelties" like this:
what_the_heck =
do spare_world <- getWorld -- how easy was that?
launchMissiles -- let's mess everything up,
putWorld spare_world -- and bring it all back :-D
what_the_heck -- that was fun; let's do it again!
(Aren't you glad our reality isn't quite so pliable? ;-)
This observation extends to types like ST (encapsulated state) and STM (concurrency) and their stewards (runST, atomically etc). For types like lists, Maybe and Either, their orthodox definitions in Haskell means no visible effects.
So when you see an interface - monadic, applicative, etc - for certain abstract types, any effects (if they exist) are contained by keeping its implementation private; safe from being used in aberrant ways.

Related

Is the monadic IO construct in Haskell just a convention?

Regarding Haskell's monadic IO construct:
Is it just a convention or is there is a implementation reason for it?
Could you not just FFI into libc.so instead to do your I/O, and skip the whole IO-monad thing?
Would it work anyway, or is the outcome nondeterministic because of:
(a) Haskell's lazy evaluation?
(b) another reason, like that the GHC is pattern-matching for the IO monad and then handling it in a special way (or something else)?
What is the real reason - in the end you end up using a side effect, so why not do it the simple way?
Yes, monadic I/O is a consequence of Haskell being lazy. Specifically, though, monadic I/O is a consequence of Haskell being pure, which is effectively necessary for a lazy language to be predictable.†
This is easy to illustrate with an example. Imagine for a moment that Haskell were not pure, but it was still lazy. Instead of putStrLn having the type String -> IO (), it would simply have the type String -> (), and it would print a string to stdout as a side-effect. The trouble with this is that this would only happen when putStrLn is actually called, and in a lazy language, functions are only called when their results are needed.
Here’s the trouble: putStrLn produces (). Looking at a value of type () is useless, because () means “boring”. That means that this program would do what you expect:
main :: ()
main =
case putStr "Hello, " of
() -> putStrLn " world!"
-- prints “Hello, world!\n”
But I think you can agree that programming style is pretty odd. The case ... of is necessary, however, because it forces the evaluation of the call to putStr by matching against (). If you tweak the program slightly:
main :: ()
main =
case putStr "Hello, " of
_ -> putStrLn " world!"
…now it only prints world!\n, and the first call isn’t evaluated at all.
This actually gets even worse, though, because it becomes even harder to predict as soon as you start trying to do any actual programming. Consider this program:
printAndAdd :: String -> Integer -> Integer -> Integer
printAndAdd msg x y = putStrLn msg `seq` (x + y)
main :: ()
main =
let x = printAndAdd "first" 1 2
y = printAndAdd "second" 3 4
in (y + x) `seq` ()
Does this program print out first\nsecond\n or second\nfirst\n? Without knowing the order in which (+) evaluates its arguments, we don’t know. And in Haskell, evaluation order isn’t even always well-defined, so it’s entirely possible that the order in which the two effects are executed is actually completely impossible to determine!
This problem doesn’t arise in strict languages with a well-defined evaluation order, but in a lazy language like Haskell, we need some additional structure to ensure side-effects are (a) actually evaluated and (b) executed in the correct order. Monads happen to be an interface that elegantly provide the necessary structure to enforce that order.
Why is that? And how is that even possible? Well, the monadic interface provides a notion of data dependency in the signature for >>=, which enforces a well-defined evaluation order. Haskell’s implementation of IO is “magic”, in the sense that it’s implemented in the runtime, but the choice of the monadic interface is far from arbitrary. It seems to be a fairly good way to encode the notion of sequential actions in a pure language, and it makes it possible for Haskell to be lazy and referentially transparent without sacrificing predictable sequencing of effects.
It’s worth noting that monads are not the only way to encode side-effects in a pure way—in fact, historically, they’re not even the only way Haskell handled side-effects. Don’t be misled into thinking that monads are only for I/O (they’re not), only useful in a lazy language (they’re plenty useful to maintain purity even in a strict language), only useful in a pure language (many things are useful monads that aren’t just for enforcing purity), or that you needs monads to do I/O (you don’t). They do seem to have worked out pretty well in Haskell for those purposes, though.
† Regarding this, Simon Peyton Jones once noted that “Laziness keeps you honest” with respect to purity.
Could you just FFI into libc.so instead to do IO and skip the IO Monad thing?
Taking from https://en.wikibooks.org/wiki/Haskell/FFI#Impure_C_Functions, if you declare an FFI function as pure (so, with no reference to IO), then
GHC sees no point in calculating twice the result of a pure function
which means the the result of the function call is effectively cached. For example, a program where a foreign impure pseudo-random number generator is declared to return a CUInt
{-# LANGUAGE ForeignFunctionInterface #-}
import Foreign
import Foreign.C.Types
foreign import ccall unsafe "stdlib.h rand"
c_rand :: CUInt
main = putStrLn (show c_rand) >> putStrLn (show c_rand)
returns the same thing every call, at least on my compiler/system:
16807
16807
If we change the declaration to return a IO CUInt
{-# LANGUAGE ForeignFunctionInterface #-}
import Foreign
import Foreign.C.Types
foreign import ccall unsafe "stdlib.h rand"
c_rand :: IO CUInt
main = c_rand >>= putStrLn . show >> c_rand >>= putStrLn . show
then this results in (probably) a different number returned each call, since the compiler knows it's impure:
16807
282475249
So you're back to having to use IO for the calls to the standard libraries anyway.
Let's say using FFI we defined a function
c_write :: String -> ()
which lies about its purity, in that whenever its result is forced it prints the string. So that we don't run into the caching problems in Michal's answer, we can define these functions to take an extra () argument.
c_write :: String -> () -> ()
c_rand :: () -> CUInt
On an implementation level this will work as long as CSE is not too aggressive (which it is not in GHC because that can lead to unexpected memory leaks, it turns out). Now that we have things defined this way, there are many awkward usage questions that Alexis points out—but we can solve them using a monad:
newtype IO a = IO { runIO :: () -> a }
instance Monad IO where
return = IO . const
m >>= f = IO $ \() -> let x = runIO m () in x `seq` f x
rand :: IO CUInt
rand = IO c_rand
Basically, we just stuff all of Alexis's awkward usage questions into a monad, and as long as we use the monadic interface, everything stays predictable. In this sense IO is just a convention—because we can implement it in Haskell there is nothing fundamental about it.
That's from the operational vantage point.
On the other hand, Haskell's semantics in the report are specified using denotational semantics alone. And, in my opinion, the fact that Haskell has a precise denotational semantics is one of the most beautiful and useful qualities of the language, allowing me a precise framework to think about abstractions and thus manage complexity with precision. And while the usual abstract IO monad has no accepted denotational semantics (to the lament of some of us), it is at least conceivable that we could create a denotational model for it, thus preserving some of the benefits of Haskell's denotational model. However, the form of I/O we have just given is completely incompatible with Haskell's denotational semantics.
Simply put, there are only supposed to be two distinguishable values (modulo fatal error messages) of type (): () and ⊥. If we treat FFI as the fundamentals of I/O and use the IO monad only "as a convention", then we effectively add a jillion values to every type—to continue having a denotational semantics, every value must be adjoined with the possibility of performing I/O prior to its evaluation, and with the extra complexity this introduces, we essentially lose all our ability to consider any two distinct programs equivalent except in the most trivial cases—that is, we lose our ability to refactor.
Of course, because of unsafePerformIO this is already technically the case, and advanced Haskell programmers do need to think about the operational semantics as well. But most of the time, including when working with I/O, we can forget about all that and refactor with confidence, precisely because we have learned that when we use unsafePerformIO, we must be very careful to ensure it plays nicely, that it still affords us as much denotational reasoning as possible. If a function has unsafePerformIO, I automatically give it 5 or 10 times more attention than regular functions, because I need to understand the valid patterns of use (usually the type signature tells me everything I need to know), I need to think about caching and race conditions, I need to think about how deep I need to force its results, etc. It's awful[1]. The same care would be necessary of FFI I/O.
In conclusion: yes it's a convention, but if you don't follow it then we can't have nice things.
[1] Well actually I think it's pretty fun, but it's surely not practical to think about all those complexities all the time.
That depends on what the meaning of "is" is—or at least what the meaning of "convention" is.
If a "convention" means "the way things are usually done" or "an agreement among parties covering a particular matter" then it is easy to give a boring answer: yes, the IO monad is a convention. It is the way the designers of the language agreed to handle IO operations and the way that users of the language usually perform IO operations.
If we are allowed to choose a more interesting definition of "convention" then we can get a more interesting answer. If a "convention" is a discipline imposed on a language by its users in order to achieve a particular goal without assistance from the language itself, then the answer is no: the IO monad is the opposite of a convention. It is a discipline enforced by the language that assists its users in constructing and reasoning about programs.
The purpose of the IO type is to create a clear distinction between the types of "pure" values and the types of values which require execution by the runtime system to generate a meaningful result. The Haskell type system enforces this strict separation, preventing a user from (say) creating a value of type Int which launches the proverbial missiles. This is not a convention in the second sense: its entire goal is to move the discipline required to perform side effects in a safe and consistent way from the user and onto the language and its compiler.
Could you just FFI into libc.so instead to do IO and skip the IO Monad thing?
It is, of course, possible to do IO without an IO monad: see almost every other extant programming language.
Would it work anyway or is the outcome undeterministic because of Haskell evaluating lazy or something else, like that the GHC is pattern matching for IO Monad and then handling it in a special way or something else.
There is no such thing as a free lunch. If Haskell allowed any value to require execution involving IO then it would have to lose other things that we value. The most important of these is probably referential transparency: if myInt could sometimes be 1 and sometimes be 5 depending on external factors then we would lose most of our ability to reason about our programs in a rigorous way (known as equational reasoning).
Laziness was mentioned in other answers, but the issue with laziness would specifically be that sharing would no longer be safe. If x in let x = someExpensiveComputationOf y in x * x was not referentially transparent, GHC would not be able to share the work and would have to compute it twice.
What is the real reason?
Without the strict separation of effectful values from non-effectful values provided by IO and enforced by the compiler, Haskell would effectively cease to be Haskell. There are plenty of languages that don't enforce this discipline. It would be nice to have at least one around that does.
In the end you end you endup in a sideeffect. So why not do it the simple way?
Yes, in the end your program is represented by a value called main with an IO type. But the question isn't where you end up, it's where you start: If you start by being able to differentiate between effectful and non-effectful values in a rigorous way then you gain a lot of advantages when constructing that program.
What is the real reason - in the end you end up using a side effect, so why not do it the simple way?
...you mean like Standard ML? Well, there's a price to pay - instead of being able to write:
any :: (a -> Bool) -> [a] -> Bool
any p = or . map p
you would have to type out this:
any :: (a -> Bool) -> [a] -> Bool
any p [] = False
any p (y:ys) = y || any p ys
Could you not just FFI into libc.so instead to do your I/O, and skip the whole IO-monad thing?
Let's rephrase the question:
Could you not just do I/O like Standard ML, and skip the whole IO-monad thing?
...because that's effectively what you would be trying to do. Why "trying"?
SML is strict, and relies on sytactic ordering to specify the order of evaluation everywhere;
Haskell is non-strict, and relies on data dependencies to specify the order of evaluation for certain expressions e.g. I/O actions.
So:
Would it work anyway, or is the outcome nondeterministic because of:
(a) Haskell's lazy evaluation?
(a) - the combination of non-strict semantics and visible effects is generally useless. For an amusing exhibition of just how useless this combination can be, watch this presentation by Erik Meiyer (the slides can be found here).

Functional programming : Where does the side effect actually happen?

After beginning to learn Haskell, there something I don't understand in Haskell even after reading a lot of documentation.
I understand that to perform IO operation you have to use a "IO monad" that wrapped a value in a kind of "black box" so the any function that uses the IO monad is still purely functional. OK, fine, but then where does the IO operation actually happen?
Does it mean that a Monad itself is not purely functional? Or is the IO operation implemented, let's say in C, and "embedded" inside the Haskell compiler?
Can I write, in pure Haskell, with or without a Monad, something that will do an IO operation? And if not, where does this ability come from if it's not possible inside the language itself? If it's embedded/linked inside the Haskell compiler to chunks of C code, will that eventually be called by the IO Monad to do the "dirty job"?
As a preface, it's not "the IO Monad", despite what many poorly-written introductions say. It's just "the IO type". There's nothing magical about monads. Haskell's Monad class is a pretty boring thing - it's just unfamiliar and more abstract than what most languages can support. You don't ever see anyone call IO "the IO Alternative," even though IO implements Alternative. Focusing too much on Monad only gets in the way of learning.
The conceptual magic for principled handling of effects (not side-effects!) in pure languages is the existence of an IO type at all. It's a real type. It's not some flag saying "This is impure!". It's a full Haskell type of kind * -> *, just like Maybe or []. Type signatures accepting IO values as arguments, like IO a -> IO (Maybe a), make sense. Type signatures with nested IO make sense, like IO (IO a).
So if it's a real type, it must have a concrete meaning. Maybe a as a type represents a possibly-missing value of type a. [a] means 0 or more values of type a. IO a means a sequence of effects that produce a value of type a.
Note that the entire purpose of IO is to represent a sequence of effects. As I said above, they're not side effects. They can't be hidden away in an innocuous-looking leaf of the program and mysteriously change things behind the back of other code. Instead, effects are rather explicitly called out by the fact that they're an IO value. This is why people attempt to minimize the part of their program using IO types. The less that you do there, the fewer ways there are for spooky action at a distance to interfere with your program.
As to the main thrust of your question, then - a complete Haskell program is an IO value called main, and the collection of definitions it uses. The compiler, when it generates code, inserts a decidedly non-Haskell block of code that actually runs the sequence of effects in an IO value. In some sense, this is what Simon Peyton Jones (one of the long-time authors of GHC) was getting at in his talk Haskell is useless.
It's true that whatever actually executes the IO action cannot remain conceptually pure. (And there is that very impure function that runs IO actions exposed within the Haskell language. I won't say more about it than it was added to support the foreign function interface, and using it improperly will break your program very badly.) But the point of Haskell is to provide a principled interface to the effect system, and hide away the unprincipled bits. And it does that in a way that's actually quite useful in practice.

Understanding pure functions in Haskell with IO

Given a Haskell value (edit per Rein Heinrich's comment) f:
f :: IO Int
f = ... -- ignoring its implementation
Quoting "Type-Driven Development with Idris,"
The key property of a pure function is that the same inputs always produce the same result. This property is known as referential transparency
Is f, and, namely all IO ... functions in Haskell, pure? It seems to me that they are not since, lookInDatabase :: IO DBThing, won't always return the same value since:
at t=0, the DB might be down
at t=1, the DB might be up and return MyDbThing would result
In short, is f (and IO ... functions in general) pure? If yes, then please correct my incorrect understanding given my attempt to disprove the functional purity of f with my t=... examples.
IO is really a separate language, conceptually. It's the language of the Haskell RTS (runtime system). It's implemented in Haskell as a (relatively simple) embedded DSL whose "scripts" have the type IO a.
So Haskell functions that return values of type IO a, are actually not the functions that are being executed at runtime — what gets executed is the IO a value itself. So these functions actually are pure but their return values represent non-pure computations.
From a language design point of view, IO is a really elegant hack to keep the non-pure ugliness completely isolated away while at the same integrating it tightly into its pure surroundings, without resorting to special casing. In other words, the design does not solve the problems caused by impure IO but it does a great job of at least not affecting the pure parts of your code.
The next step would be to look into FRP — with FRP you can make the layer that contains IO even thinner and move even more of non-pure logic into pure logic.
You might also want to read John Backus' writings on the topic of Function Programming, the limitations of the Von Neumann architecture etc. Conal Elliott is also a name to google if you're interested in the relationship between purity and IO.
P.S. also worth noting is that while IO is heavily reliant on monads to work around an aspect of lazy evaluation, and because monads are a very nice way of structuring embedded DSLs (of which IO is just a single example), monads are much more general than IO, so try not to think about IO and monads in the same context too much — they are two separate things and both could exist without the other.
First of all, you're right in noticing that I/O actions are not pure. That's impossible. But, purity in all functions is one of Haskell's promising points, so what's happening?
Whether you like it or not, a function that applies into a (may also be incorrectly said "returns a") IO Something with some arguments will always return the same IO Something with the same arguments. The IO monad allows you to "hide" actions inside of the container the monad acts like. When you have a IO String, that function/object does not contain a String/[Char], but rather sort of a promise that you'll get that String somehow in the future. Thus, IO contains information of what to do when the impure I/O action needs to be performed.
After all, the only way for an IO action to be performed is by it having the name main, or be a dependency of main thereof. Because of the flexibility of monads, you can "concatenate" IO actions. A program like this... (note: this code is not a good idea)
main = do
input <- getLine
putStrLn input
Is syntatic sugar for...
main =
getLine >>= (\input -> putStrLn input)
That would state that main is the I/O action resulting from printing to standard output a string read from standard input, followed by a newline character. Did you saw the magic? IO is just a wrapper representing what to do, in an impure context, to produce some given output, but not the result of that operation, because that would need the Haskell language to admit impure code.
Think of it as sort of a receipe. If you have a receipe (read: IO monad) for a cake (read: Something in IO Something), you know how to make the cake, but you can't make the cake (because you could screw that masterpiece). Instead, the master chief (read: the most basic parts of the Haskell runtime system, responsible for applying main) does the dirty work for you (read: doing impure/illegal stuff), and, the best of all, he won't commit any mistakes (read: breaking code purity)... unless the oven breaks of course (read: System.IO.Error), but he knows how to clean that up (read: code will always remain pure).
This is one of the reasons that IO is an opaque type. It's implementation is somewhat controversial (until you read GHC's source code), and is better of to be left as implementation-defined.
Just be happy, because you've been illuminated by purity. A lot of programmers don't even know of Haskell's existence!
I hope this has led some light on you!
Haskell is pulling a trick here. IO both is and isn't pure, depending on how you look at it.
On the "IO is pure" side, you're fallen into the very common error of thinking of a function returning an IO DBThing as of it were returning a DBThing. When someone claims that a function with type Stuff -> IO DBThing is pure they are not saying that you can feed it the same Stuff and always get the same DBThing; as you correctly note that is impossible, and also not very useful! What they're saving is that given particular Stuff you'll always get back the same IO DBThing.
You actually can't get a DBThing out of an IO DBThing at all, so Haskell don't ever have to worry about the database containing different values (or being unavailable) at different times. All you can do with an IO DBThing is combine it with something else that needs a DBThing and produces some other kind of IO thing; the result of such a combination is an IO thing.
What Haskell is doing here is building up a correspondence between manipulation of pure Haskell values and changes that would happen out in the world outside the program. There are things you can do with some ordinary pure values that don't make any sense with impure operations like altering the state of a database. So using the correspondence between IO values and the outside world, Haskell simply doesn't provide you with any operations on IO values that would correspond to things that don't make sense in the real world.
There are several ways to explain how you're "purely" manipulating the real world. One is to say that IO is just like a state monad, only the state being threaded through is the entire world outside your program;= (so your Stuff -> IO DBThing function really has an extra hidden argument that receives the world, and actually returns a DBThing along with another world; it's always called with different worlds, and that's why it can return different DBThing values even when called with the same Stuff). Another explanation is that an IO DBThing value is itself an imperative program; your Haskell program is a totally pure function doing no IO, which returns an impure program that does IO, and the Haskell runtime system (impurely) executes the program it returns.
But really these are both simply metaphors. The point is that the IO value simply has a very limited interface which doesn't allow you to do anything that doesn't make sense as a real world action.
Note that the concept of monad hasn't actually come into this. Haskell's IO system really doesn't depend on monads; Monad is just a convenient interface which is sufficiently limited that if you're only using the generic monad interface you also can't break the IO limitations (even if you don't know your monad is actually IO). Since the Monad interface is also interesting enough to write a lot of useful programs, the fact that IO forms a monad allows a lot of code that's useful on other types to be generically reused on IO.
Does this mean you actually get to write pure IO code? Not really. This is the "of course IO isn't pure" side of the coin. When you're using the fancy "combining IO functions together" you still have to think about your program executing steps one after the other (or in parallel), affecting and being affected by outside conditions and systems; in short exactly the same kind of reasoning you have to use to write IO code in an imperative language (only with a nicer type system than most of them). Making IO pure doesn't really help you banish impurity from the way you have to think about your code.
So what's the point? Well for one, it gets us a compiler-enforced demarcation of code that can do IO and code that can't. If there's no IO tag on the type then impure IO isn't involved. That would be useful in any language just on its own. And the compiler knows this too; Haskell compilers can apply optimizations to non-IO code that would be invalid in most other languages because it's often impossible to know that a given section of code doesn't have side effects (unless you can see the full implementation of everything the code calls, transitively).
Also, because IO is pure, code analysis tools (including your brain) don't have to treat IO-code specially. If you can pick out a code transformation that would be valid on pure code with the same structure as the IO code, you can do it on the IO code. Compilers make use of this. Many transformations are ruled out by the structure that IO code must use (in order to stay within the bounds of things that have a sensible correspondence to things in the outside world) but they would also be ruled out by any pure code that used the same structure; the careful construction of the IO interface makes "execution order dependency" look like ordinary "data dependency", so you can just use the rules of data dependency to determine the rules of using IO.
Short answer: Yes, that f is referential transparent.
Whenever you look at it, it equals the same value.
But that doesn't mean it will always bind the same value.
In short, is f (and IO ... functions in general) pure?
So what you're really asking is:
Are IO definitions in Haskell pure?
You're really not going to like it.
Deep Thought.
It depends on what you mean by "pure".
From section 6.1.7 (page 75) of the Haskell 2010 report:
The IO type serves as a tag for operations (actions) that interact with the outside world. The IO type is abstract: no constructors are visible to the user. IO is an instance of the Monad and Functor classes.
the crucial point being:
The IO type is abstract
If Haskell's FFI was sufficiently-enhanced, IO could be as simple as:
data IO a -- a tag type: no visible constructors
instance Monad IO where
return = unitIO
(>>=) = bindIO
foreign import ccall "primUnitIO" unitIO :: a -> IO a
foreign import ccall "primBindIO" bindIO :: IO a -> (a -> IO b) -> IO b
⋮
No Haskell definitions whatsoever: all I/O-related activity is performed by calls to foreign code, usually written in the same language as the Haskell implementation. A variation of this approach
is used in Agda:
4 Compiling Agda programs
This section deals with the topic of getting Agda programs to interact
with the real world. Type checking Agda programs requires evaluating
arbitrary terms, ans as long as all terms are pure and normalizing this is
not a problem, but what happens when we introduce side effects? Clearly,
we don't want side effects to happen at compile time. Another question is
what primitives the language should provide for constructing side effecting
programs. In Agda, these problems are solved by allowing arbitrary
Haskell functions to be imported as axioms. At compile time, these imported
functions have no reduction behaviour, only at run time is the
Haskell function executed.
(emphasis by me.)
By moving the problem of I/O outside of Haskell or Agda, questions of "purity" are now a matter for that other language (or languages!).
Given these circumstances, there can be no "standard definition" for IO, so there's no common way to determine such a property for that type, let alone any of its expressions. We can't even provide a simple proof that IO is monadic (i.e. it satisfies the monad laws) as return and (>>=) simply cannot be defined in standard Haskell 2010.
To get some idea on how this affects the determining of various IO-related properties, see:
Semantics of fixIO by Levent Erkok, John Launchbury and Andrew Moran.
Tackling the Awkward Squad: … by Simon Peyton Jones (starting from section 3.2 on page 20).
So when you next hear or read about Haskell being "referentially transparent" or "purely functional", you now know that (at least for I/O) they're just conjectures - no actual standard definition means there's no way to prove or disprove them.
(If you're now wondering how Haskell got into this state, I provide some more details here.)

Aren't Monads essentially just "conceptual" sugar?

Let's assume we allow two types of functions in Haskell:
strictly pure (as usual)
potentially non-pure (procedures)
The distinction would be made f.x. by declaring that a dot (".") as first letter of a function name declares it as a non-pure procedure.
And further we would set the rules:
pure functions may be called by pure and non-pure functions
non-pure functions may only be called by non-pure functions
and: non-pure functions may be programmed imperatively
With this syntactical sugar and specifications at hand - would there still be a need for Monads? Is there anything Monads could do which above rule set would not allow?
B/c as I came to understand Monads - this is exactly their purpose. Just that very smart people managed to achieve this purely by means of functional methods with a cathegory theoretical tool set at hand.
No.
Monads have nothing to do with purity or impurity in principle. It just so happens that IO models impure code nicely, but Monad class can be used perfectly right for instances like State or Maybe, which are absolutely pure.
Monads also allow expressing complex context hierarchies (as I choose to call them) in a very explicit way. "pure/impure" isn't the only division you might want to make. Consider authorized/unauthorized, for example. The list of possible uses goes on and on... I'd encourage you to look at other commonly used instances, like ST, STM, RWS, "restricted IO" and friends to get a better understanding of the possibilities.
Soon enough you'll start making your own monads, tailored to the problem at hand.
B/c as I came to understand Monads - this is exactly their purpose.
Monads in their full generality have nothing to do with purity/impurity or imperative sequencing. So no, monads are most certainly not conceptual sugar for effect encapsulation if I understood your question.
Consider that overwhelmingly most of the monads in the Prelude: List, Reader, State, Cont, Either, (->) have nothing to do with effects or IO. It's a very common misconception to assume that IO is the "canonical" monad, while in fact it's really a degenerate case.
B/c as I came to understand Monads - this is exactly their purpose.
This: http://homepages.inf.ed.ac.uk/wadler/topics/monads.html#monads was the first paper on monads in Haskell:
Category theorists invented monads in the 1960's to concisely express certain aspects of universal algebra.
So right away you can see that monads have nothing to do with "pure" / "impure" computations. The most common monad (in the world!) is Maybe:
data Maybe a
= Nothing
| Just a
instance Monad Maybe where
return = Just
Nothing >>= f = Nothing
Just x >>= f = f x
The monad is the four-tuple (Maybe, liftM, return, join), where:
liftM :: (a -> b) -> Maybe a -> Maybe b
liftM f mb = mb >>= return . f
join :: Maybe (Maybe a) -> Maybe a
join Nothing = Nothing
join (Just mb) = mb
Note that liftM takes a non-Maybe function (not pure!) and applies it to a Maybe, while join takes a two-level Maybe and simplifies it to a single layer (so Just in the result comes from having two layers of Just:
join (Just (Just x)) = Just x
while Nothing in the result can come from a Nothing at either layer:
join Nothing = Nothing
join (Just Nothing) = Nothing
). We can translate these terms as follows:
Maybe: a value that may or may not be present.
liftM: apply this function to the value if present, otherwise do nothing.
return: take this value that is present and inject it into the extra structure of a Maybe.
join: take a (value that may or may not be present) that may or may not be present and erase the distinction between the two layers of 'may or may not be present'.
Now, Maybe is a perfectly suitable data type. In scripting languages, it's expressed just by using undef or the equivalent to express Nothing, and representing Just x the same way as x. In C/C++, it's expressed by using a pointer type t*, and allowing the pointer to be NULL. In Scala, there's an explicit container type: http://www.scala-lang.org/api/current/index.html#scala.Option . So you can't say "oh, that's just exceptions" because languages with exceptions still want to be able to express 'no value here' without throwing an exception, and then apply functions if the value is there (which is why Scala's Option type has a foreach method). But 'apply this function if the value is there' is precisely what Maybe's >>= does! So it's a very useful operation.
You'll notice that C and the scripting languages don't generally allow the distinction between Nothing and Just Nothing to be made --- a value is either present or not. In a functional language --- like Haskell --- it's interesting to allow both versions, which is why we need join to erase that distinction when we're done with it. (And, mathematically, it's nicer to define >>= in terms of liftM and join than the other way around).
Incidentally, to clear up the common mis-conception about Haskell and IO: GHC's implementation of IO wrappers up the side-effectfulness of GHC's implementation of I/O. Even that is a terrible design decision of GHC --- imperative (which is different than impure!) I/O can be modeled monadically without impurity at any level of the system. You don't need side effects (at any layer of the system) to do I/O!

What other ways can state be handled in a pure functional language besides with Monads?

So I started to wrap my head around Monads (used in Haskell). I'm curious what other ways IO or state can be handled in a pure functional language (both in theory or reality). For example, there is a logical language called "mercury" that uses "effect-typing". In a program such as haskell, how would effect-typing work? How does other systems work?
There are several different questions involved here.
First, IO and State are very different things. State is easy to do
yourself: Just pass an extra argument to every function, and return an extra
result, and you have a "stateful function"; for example, turn a -> b into
a -> s -> (b,s).
There's no magic involved here: Control.Monad.State provides a wrapper that
makes working with "state actions" of the form s -> (a,s) convenient, as well
as a bunch of helper functions, but that's it.
I/O, by its nature, has to have some magic in its implementation. But there are
a lot of ways of expressing I/O in Haskell that don't involve the word "monad".
If we had an IO-free subset of Haskell as-is, and we wanted to invent IO from
scratch, without knowing anything about monads, there are many things we might
do.
For example, if all we want to do is print to stdout, we might say:
type PrintOnlyIO = String
main :: PrintOnlyIO
main = "Hello world!"
And then have an RTS (runtime system) which evaluates the string and prints it.
This lets us write any Haskell program whose I/O consists entirely of printing
to stdout.
This isn't very useful, however, because we want interactivity! So let's invent
a new type of IO which allows for it. The simplest thing that comes to mind is
type InteractIO = String -> String
main :: InteractIO
main = map toUpper
This approach to IO lets us write any code which reads from stdin and writes to
stdout (the Prelude comes with a function interact :: InteractIO -> IO ()
which does this, by the way).
This is much better, since it lets us write interactive programs. But it's
still very limited compared to all the IO we want to do, and also quite
error-prone (if we accidentally try to read too far into stdin, the program
will just block until the user types more in).
We want to be able to do more than read stdin and write stdout. Here's how
early versions of Haskell did I/O, approximately:
data Request = PutStrLn String | GetLine | Exit | ...
data Response = Success | Str String | ...
type DialogueIO = [Response] -> [Request]
main :: DialogueIO
main resps1 =
PutStrLn "what's your name?"
: GetLine
: case resps1 of
Success : Str name : resps2 ->
PutStrLn ("hi " ++ name ++ "!")
: Exit
When we write main, we get a lazy list argument and return a lazy list as a
result. The lazy list we return has values like PutStrLn s and GetLine;
after we yield a (request) value, we can examine the next element of the
(response) list, and the RTS will arrange for it to be the response to our
request.
There are ways to make working with this mechanism nicer, but as you can
imagine, the approach gets pretty awkward pretty quickly. Also, it's
error-prone in the same way as the previous one.
Here's another approach which is much less error-prone, and conceptually very
close to how Haskell IO actually behaves:
data ContIO = Exit | PutStrLn String ContIO | GetLine (String -> ContIO) | ...
main :: ContIO
main =
PutStrLn "what's your name?" $
GetLine $ \name ->
PutStrLn ("hi " ++ name ++ "!") $
Exit
The key is that instead of taking a "lazy list" of responses as one big
argument at he beginning of main, we make individual requests that accept one
argument at a time.
Our program is now just a regular data type -- a lot like a linked list, except
you can't just traverse it normally: When the RTS interprets main, sometimes
it encounters a value like GetLine which holds a function; then it has to get
a string from stdin using RTS magic, and pass that string to the function,
before it can continue. Exercise: Write interpret :: ContIO -> IO ().
Note that none of these implementations involve "world-passing".
"world-passing" isn't really how I/O works in Haskell. The actual
implementation of the IO type in GHC involves an internal type called
RealWorld, but that's only an implementation detail.
Actual Haskell IO adds a type parameter so we can write actions that
"produce" arbitrary values -- so it looks more like data IO a = Done a |
PutStr String (IO a) | GetLine (String -> IO a) | .... That gives us more
flexibility, because we can create "IO actions" that produce arbitrary
values.
(As Russell O'Connor points out,
this type is just a free monad. We can write a Monad instance for it easily.)
Where do monads come into it, then? It turns out that we don't need Monad for
I/O, and we don't need Monad for state, so why do we need it at all? The
answer is that we don't. There's nothing magical about the type class Monad.
However, when we work with IO and State (and lists and functions and
Maybe and parsers and continuation-passing style and ...) for long enough, we
eventually figure out that they behave pretty similarly in some ways. We might
write a function that prints every string in a list, and a function that runs
every stateful computation in a list and threads the state through, and they'll
look very similar to each other.
Since we don't like writing a lot of similar-looking code, we want a way to
abstract it; Monad turns out to be a great abstraction, because it lets us
abstract many types that seem very different, but still provide a lot of useful
functionality (including everything in Control.Monad).
Given bindIO :: IO a -> (a -> IO b) -> IO b and returnIO :: a -> IO a, we
could write any IO program in Haskell without ever thinking about monads. But
we'd probably end up replicating a lot of the functions in Control.Monad,
like mapM and forever and when and (>=>).
By implementing the common Monad API, we get to use the exact same code for
working with IO actions as we do with parsers and lists. That's really the only
reason we have the Monad class -- to capture the similarities between
different types.
Another major approach is uniqueness typing, as in Clean. The short story is that handles to state (including the real world) can only be used once, and functions that access mutable state return a new handle. This means that an output of the first call is an input of a second, forcing the sequential evaluation.
Effect typing is used in the Disciple Compiler for Haskell, but to the best of my knowledge it would take considerable compiler work to enable it in, say, GHC. I shall leave discussion of the details to those better-informed than myself.
Well, first what is state? It can manifest as a mutable variable, which you don't have in Haskell. You only have memory references (IORef, MVar, Ptr, etc.) and IO/ST actions to act on them.
However, state itself can be pure as well. To acknowledge that review the 'Stream' type:
data Stream a = Stream a (Stream a)
This is a stream of values. However an alternative way to interpret this type is a changing value:
stepStream :: Stream a -> (a, Stream a)
stepStream (Stream x xs) = (x, xs)
This gets interesting when you allow two streams to communicate. You then get the automaton category Auto:
newtype Auto a b = Auto (a -> (b, Auto a b))
This is really like Stream, except that now at every instant the stream gets some input value of type a. This forms a category, so one instant of a stream can get its value from the same instant of another stream.
Again a different interpretation of this: You have two computations that change over time and you allow them to communicate. So every computation has local state. Here is a type that is isomorphic to Auto:
data LS a b =
forall s.
LS s ((a, s) -> (b, s))
Take a look at A History of Haskell: Being Lazy With Class. It describes two different approaches to doing I/O in Haskell, before monads were invented: continuations and streams.
There is an approach called Functional Reactive Programming that represents time-varying values and/or event streams as a first-class abstraction. A recent example that comes to my mind is Elm (it is written in Haskell and has a syntax similar to Haskell).
I'm curious - what other ways I/O or state can be handled in a pure functional language (both in theory or reality)?
I'll just add to what's already been mentioned here (note: some of these approaches don't seem to have one, so there are a few "improvised names").
Approaches with freely-available descriptions or implementations:
"Orthogonal directives" - see An alternative approach to I/O by Maarten Fokkinga and Jan Kuper.
Pseudodata - see Nondeterminism with Referential Transparency in Functional Programming Languages by F. Warren Burton. The approach is used by Dave Harrison to implement clocks in his thesis
Functional Real-Time Programming: The Language Ruth And Its Semantics, and name supplies in the functional pearl On generating unique names by Lennart Augustsson, Mikael Rittri and Dan Synek; there are also a few library implementations in Hackage.
Witnesses - see Witnessing Side Effects by Tachio Terauchi and Alex Aiken.
Observers - see Assignments for Applicative Languages by Vipin Swarup, Uday S. Reddy and Evan Ireland.
Other approaches - references only:
System tokens:
L. Augustsson. Functional I/O Using System Tokens. PMG Memo 72, Dept Computer Science, Chalmers University of Technology, S-412 96 Göteborg, 1989.
"Effect trees":
Rebelsky S.A. (1992) I/O trees and interactive lazy functional programming. In: Bruynooghe M., Wirsing M. (eds) Programming Language Implementation and Logic Programming. PLILP 1992. Lecture Notes in Computer Science, vol 631. Springer, Berlin, Heidelberg.

Resources