Aren't Monads essentially just "conceptual" sugar?

Aren't Monads essentially just "conceptual" sugar? - haskell

Let's assume we allow two types of functions in Haskell:
strictly pure (as usual)
potentially non-pure (procedures)
The distinction would be made f.x. by declaring that a dot (".") as first letter of a function name declares it as a non-pure procedure.
And further we would set the rules:
pure functions may be called by pure and non-pure functions
non-pure functions may only be called by non-pure functions
and: non-pure functions may be programmed imperatively
With this syntactical sugar and specifications at hand - would there still be a need for Monads? Is there anything Monads could do which above rule set would not allow?
B/c as I came to understand Monads - this is exactly their purpose. Just that very smart people managed to achieve this purely by means of functional methods with a cathegory theoretical tool set at hand.

No.
Monads have nothing to do with purity or impurity in principle. It just so happens that IO models impure code nicely, but Monad class can be used perfectly right for instances like State or Maybe, which are absolutely pure.
Monads also allow expressing complex context hierarchies (as I choose to call them) in a very explicit way. "pure/impure" isn't the only division you might want to make. Consider authorized/unauthorized, for example. The list of possible uses goes on and on... I'd encourage you to look at other commonly used instances, like ST, STM, RWS, "restricted IO" and friends to get a better understanding of the possibilities.
Soon enough you'll start making your own monads, tailored to the problem at hand.

B/c as I came to understand Monads - this is exactly their purpose.
Monads in their full generality have nothing to do with purity/impurity or imperative sequencing. So no, monads are most certainly not conceptual sugar for effect encapsulation if I understood your question.
Consider that overwhelmingly most of the monads in the Prelude: List, Reader, State, Cont, Either, (->) have nothing to do with effects or IO. It's a very common misconception to assume that IO is the "canonical" monad, while in fact it's really a degenerate case.

B/c as I came to understand Monads - this is exactly their purpose.
This: http://homepages.inf.ed.ac.uk/wadler/topics/monads.html#monads was the first paper on monads in Haskell:
Category theorists invented monads in the 1960's to concisely express certain aspects of universal algebra.
So right away you can see that monads have nothing to do with "pure" / "impure" computations. The most common monad (in the world!) is Maybe:
data Maybe a
= Nothing
| Just a
instance Monad Maybe where
return = Just
Nothing >>= f = Nothing
Just x >>= f = f x
The monad is the four-tuple (Maybe, liftM, return, join), where:
liftM :: (a -> b) -> Maybe a -> Maybe b
liftM f mb = mb >>= return . f
join :: Maybe (Maybe a) -> Maybe a
join Nothing = Nothing
join (Just mb) = mb
Note that liftM takes a non-Maybe function (not pure!) and applies it to a Maybe, while join takes a two-level Maybe and simplifies it to a single layer (so Just in the result comes from having two layers of Just:
join (Just (Just x)) = Just x
while Nothing in the result can come from a Nothing at either layer:
join Nothing = Nothing
join (Just Nothing) = Nothing
). We can translate these terms as follows:
Maybe: a value that may or may not be present.
liftM: apply this function to the value if present, otherwise do nothing.
return: take this value that is present and inject it into the extra structure of a Maybe.
join: take a (value that may or may not be present) that may or may not be present and erase the distinction between the two layers of 'may or may not be present'.
Now, Maybe is a perfectly suitable data type. In scripting languages, it's expressed just by using undef or the equivalent to express Nothing, and representing Just x the same way as x. In C/C++, it's expressed by using a pointer type t*, and allowing the pointer to be NULL. In Scala, there's an explicit container type: http://www.scala-lang.org/api/current/index.html#scala.Option . So you can't say "oh, that's just exceptions" because languages with exceptions still want to be able to express 'no value here' without throwing an exception, and then apply functions if the value is there (which is why Scala's Option type has a foreach method). But 'apply this function if the value is there' is precisely what Maybe's >>= does! So it's a very useful operation.
You'll notice that C and the scripting languages don't generally allow the distinction between Nothing and Just Nothing to be made --- a value is either present or not. In a functional language --- like Haskell --- it's interesting to allow both versions, which is why we need join to erase that distinction when we're done with it. (And, mathematically, it's nicer to define >>= in terms of liftM and join than the other way around).
Incidentally, to clear up the common mis-conception about Haskell and IO: GHC's implementation of IO wrappers up the side-effectfulness of GHC's implementation of I/O. Even that is a terrible design decision of GHC --- imperative (which is different than impure!) I/O can be modeled monadically without impurity at any level of the system. You don't need side effects (at any layer of the system) to do I/O!

Related

Haskell functor implementation dealing with "BOX"?

In Category theory, Functor concept is as the below:
https://ncatlab.org/nlab/show/functor
In Haskell, Functor type can be expressed as:
fmap :: (a -> b) -> f a -> f b
https://hackage.haskell.org/package/base-4.14.0.0/docs/Data-Functor.html
and I could see the both really corresponds well.
However, once we actually try to implement this Functor concept down to a code, it seems impossible to define F or fmap as simple as the diagram shown above.
In fact, there is a famous article about Functor/Monad.
Functors, Applicatives, And Monads In Pictures
Here,
Simple enough. Lets extend this by saying that any value can be in a context. For now you can think of a context as a box that you can put a value in:
or
Here's what is happening behind the scenes when we write fmap (+3) (Just 2):
What I always feel about Functor is the concept of Functor in category theory and concept of wrap&unwrap to/from "BOX" does not match well.
Quesion Point 1.
fmap :: (a -> b) -> f a -> f b
https://hackage.haskell.org/package/base-4.14.0.0/docs/Data-Functor.html
Where is the actual implementation of wrap&unwrap to/from "BOX" in Haskell?
Question Point 2.
Why the concept of Functor in category theory and concept of wrap&unwrap to/from "BOX" does not match well?
EDIT:
Even for IO functor, during the composition process, f is unwrapped:
// f is unwrapped in composition process
const compose = g => f => x => g(f(x));
const fmap = compose;
const print = a => () => console.log(a);
// safely no side-effect
const todo = fmap(print("bar"))(print("foo"));
//side effect
todo(undefined); // foo bar
// with pipleline operator (ES.next)
//
// const todo = print("foo")
// |> fmap(print("bar"))
// |> fmap(print("baz"));
// todo(undefined); // foo bar baz

The ideas from category theory are so abstract that anyone who tries to provide an intuitive introduction runs the risk of simplifying concepts to the point where they may confuse people. As the author of an article series in this space, I can testify that it doesn't take much imprecise language before someone misunderstands the text.
I don't know the particular article, but I believe that it may exhibit the same trait. The wrap/unwrap metaphor fits a substantial subset of functors (e.g. Maybe, [], Either l, etc.), but not all.
Famously, you're not supposed to unwrap IO; that is by design. At that point, the wrap/unwrap metaphor falls apart. It's no longer valid in the face of IO.
Indeed, the concepts don't match. I'd say that the wrap/unwrap metaphor may be useful as an introduction, but as always, there's a limit to how much you can stretch a metaphor.
How are the Functor instances implemented? Most introductions to Haskell will show you how to write fmap for Maybe, [], and a few other types. It can also be a good exercise to implement them yourself, if you get the chance.
GHC and its ecosystem is open source, so you can always look at the source code if you wonder how a particular instance is implemented.
Again, IO is a big exception to the rule. As far as I understand it, its Functor, Applicative, Monad, etc. instances aren't implemented in (Safe) Haskell, but rather in a small core of unsafe code (C or C++, I believe) that constitutes the core of the compiler and/or runtime environment. There's no (explicit, visible, safe) unwrapping going on with IO. I think it's more helpful to think of IO's Functor instance as the structure-preserving map that it is.
For more details about the correspondence between category theory and Haskell, I recommend Bartosz Milewski's article series on the topic.

Look at the arrows in the picture. There is no way to go from the functor level back to the non-functor level. You would need a function that goes from F(x) to x, but - as you can see - none is defined.
There are specific functors (like Maybe) that offer the "unwrapping" functionality, but such feature is always an add-on, it's delivered on top of something being a functor. For example you might say: Maybe is a functor, and also it has an interesting property: there's a partial function that maps Maybe X to X, and which reverses pure.
UPDATE (after the additional question appeared) The concepts of a box and a functor simply don't match. Also, as far as I know, no good metaphor for a functor (or a monad, or an applicative) has been found - and not for the lack of trying. It's not even surprising: most abstractions lack good metaphors, precisely because abstractions and metaphors are polar opposites (in some way).
Abstraction strips a concept to its core, leaving only the barest essentials. Metaphor on the other hand, extends a concept, widens the semantic space, suggests more meaning. When I say: "your eyes have the color of chocolate", I am abstracting a notion of a "color". But I also metaphorically associate the eyes and chocolate: I suggest that they have more in common than just the color: silky texture, sweetness, pleasure - all those concepts are present, although none were named. If I said "your eyes have the color of excrement" the abstraction used would be exactly the same - but the metaphorical meaning: very different. I would not say it even to a logician, even though a logician would technically understand that the sentence is not offensive.
When on auto-pilot, most humans think in metaphors, not abstractions. Care must be taken when explaining the latter in terms of the former, because the meanings will spill over. When you hear "a box", the autopilot in your head tells you that you can put stuff in and get stuff out. Functor is not like that. So the metaphor is misleading.
Functor embodies an abstraction of a... box or a wrapper, but one that allows us to work on their contents WITHOUT ever unwrapping it. This lack of unwrapping is exactly the thing which makes functors interesting: otherwise fmap would just be a syntactic sugar for unwrapping, applying a function and wrapping the results up. Studying functors lets us understand how much is possible without unwrapping values - and, what's even better and more enlightening, it lets us understand what is impossible without unwrapping. The steps which lead up to applicatives, arrows and monads show us how to overcome some of the limitations by allowing additional operationsm but stukk without ever allowing unwrapping, because if we allowed unwrapping, the steps would make no sense (i.e. become trivial).

Haskell: parallel computation and the 'sequential property' of monads

I am confused about why the REPA function computeP packs its result in a monad. It has the following type signature.
computeP :: (Load r1 sh e, Target r2 e, Source r2 e, Monad m) =>
Array r1 sh e -> m (Array r2 sh e)
In this tutorial it says
The reason for this is that monads give a well defined notion of sequence and thus computeP enforces completion of parallel evaluation in a particular point of monadic computations.
Likewise, this answer on Stack Overflow states that
The reason why parallel computation in Repa must be monadic has to do partially with lazyness, but mostly with Repa's inability to deal with nested parallelism. Sequential property of a Monad solves it for the most part[.]
Questions
What does having this 'sequential property' mean exactly?
How does a monad enforce this?
For the example of computeP: there is no constraint on which monad is used, so I could use the identity monad. Is it okay then to use the following function instead that just unpacks the monad, or will this give unexpected results because it lacks this sequential property? If it is okay, is there even a need to use a monad at all?
import Data.Functor.Identity
import Data.Array.Repa.Eval
import Data.Array.Repa
myComputeP :: (Load r1 sh e, Target r2 e, Source r2 e) => Array r1 sh e -> Array r2 sh e
myComputeP = runIdentity . computeP
Any help would be great.

This monad constraint is a heuristic trick. It helps disciplined users to avoid nested parallelism, but does nothing against malicious or clueless users.
Nested parallelism is the situation where, while computing some array in parallel, you end up having to compute another array in parallel. Repa does not support it (the reason is not important), so it tries to avoid it.
The type of computeP helps ensure that parallel computations are done in sequence with each other, but it is far from airtight; it is merely a "best effort" abstraction.
How does a monad enforce this?
Actually, computeP is only expected to work with monads whose bind (>>=) is strict in its first argument, so that in u >>= k, the function k will be applied only after u has been evaluated. Then if you use computeP with such a monad,
do w <- computeP v
k w
it is guaranteed that the vector w is evaluated before it gets passed to k, which may safely perform other computeP operations.
Examples of strict monads: IO, strict State, Maybe, [].
Examples of lazy monads: Identity, lazy State, Reader. (Lazy monads can be made strict, but not the other way around. In particular, you can define a strict identity monad if you want to do only Repa computations.)
To prevent nested parallelism, the type of computeP intentionally makes it cumbersome to use within operations that are likely to be done in parallel, such as map :: (a -> b) -> Array _ _ a -> Array _ _ b and fromFunction :: sh -> (sh -> a) -> Array _ _ a, which take non-monadic functions. One can still explicitly unwrap computeP, for example with runIdentity as you noticed: you can shoot yourself in the foot if you want, but it's on you to load the gun, point it down and pull the trigger.
Hopefully, that answers what is going on in Repa. What follows is a theoretical digression to answer this other question:
What does having this 'sequential property' mean exactly?
Those quotations are quite handwavy. As I see it, the relation between "sequentiality" and "monads" is two-fold.
First, for many monads in Haskell, the definition of (>>=) naturally dictates an order of evaluation, typically because it immediately pattern-matches on the first argument. As explained earlier, that's what Repa relies on to enforce that computeP computations happen in sequence (and that's why it would break if you specialize it to Identity; it is not a strict monad). In the grand scheme of things, this is a rather minor detail of lazy evaluation, rather than anything proper to monads in general.
Second, one core idea of pure functional programming is first-class computations, with explicit definitions of effects and composition. In this case, the effect is the parallel evaluation of vectors, and the composition we care about is sequential composition. Monads provide a general model, or interface, for sequential composition. That's another part of the reason why monads help solve the problem of avoiding nested parallelism in Repa.
The point is not that monads have an inherently sequential aspect, but it is that sequential composition is inherently monadic: if you try to list general properties that you would expect from anything worthy of the name "sequential composition", you're likely to end up with properties that together are called "monad"; that's one of the points of Moggi's seminal paper "Notions of computation and monads".
"Monads" is not a magical concept, rather it's enormously general so that lots of things happen to be monads. After all, the main requirement is that there's an associative operation; that's a pretty reasonable assumption to make about sequential composition. (If, when you hear "associativity", you think "monoid" or "category", note that all of these concepts are unified under the umbrella of "monoid objects", so as far as "associativity" is concerned, they're all the same idea, just in different categories.)

What is an (composite) effect (possibly represented as a Monad+Monad Transformers) ? Precise, clear, short answer/definition? [duplicate]

Time and again I read the term effectful, but I am still unable to give a clear definition of what it means. I assume the correct context is effectful computations, but I've also seen the term effectful values)
I used to think that effectful means having side effects. But in Haskell there are no side-effects (except to some extent IO). Still there are effectful computations all over the place.
Then I read that monads are used to create effectful computations. I can somewhat understand this in the context of the State Monad. But I fail to see any side-effect in the Maybe monad. In general it seems to me, that Monads which wrap a function-like thing are easier to see as producing side-effects than Monads which just wrap a value.
When it comes to Applicative functors I am even more lost. I always saw applicative functors as a way to map a function with more than one argument. I cannot see any side-effect here. Or is there a difference between effectful and with effects?

A side effect is an observable interaction with its environment (apart from computing its result value). In Haskell, we try hard to avoid functions with such side effects. This even applies to IO actions: when an IO action is evaluated, no side effects are performed, they are executed only when the actions prescribed in the IO value are executed within main.
However, when working with abstractions that are related to composing computations, such as applicative functors and monads, it's convenient to somewhat distinguish between the actual value and the "rest", which we often call an "effect". In particular, if we have a type f of kind * -> *, then in f a the a part is "the value" and whatever "remains" is "the effect".
I intentionally quoted the terms, as there is no precise definition (as far as I know), it's merely a colloquial definition. In some cases there are no values at all, or multiple values. For example for Maybe the "effect" is that there might be no value (and the computation is aborted), for [] the "effect" is that there are multiple (or zero) values. For more complex types this distinction can be even more difficult.
The distinction between "effects" and "values" doesn't really depend on the abstraction. Functor, Applicative and Monad just give us tools what we can do with them (Functors allow to modify values inside, Applicatives allow to combine effects and Monads allow effects to depend on the previous values). But in the context of Monads, it's somewhat easier to create a mental picture of what is going on, because a monadic action can "see" the result value of the previous computation, as witnessed by the
(>>=) :: m a -> (a -> m b) -> m b
operator: The second function receives a value of type a, so we can imagine "the previous computation had some effect and now there is its result value with which we can do something".

In support of Petr Pudlák's answer, here is an argument concerning the origin of the broader notion of "effect" espoused there.
The phrase "effectful programming" shows up in the abstract of McBride and Patterson's Applicative Programming with Effects, the paper which introduced applicative functors:
In this paper, we introduce Applicative functors — an abstract characterisation of an applicative style of effectful programming, weaker than Monads and hence more widespread.
"Effect" and "effectful" appear in a handful of other passages of the paper; these ocurrences are deemed unremarkable enough not to require an explicit clarification. For instance, this remark is made just after the definition of Applicative is presented (p. 3):
In each example, there is a type constructor f that embeds the usual
notion of value, but supports its own peculiar way of giving meaning to the usual applicative language [...] We correspondingly introduce the Applicative class:
[A Haskell definition of Applicative]
This class generalises S and K [i.e. the S and K combinators, which show up in the Reader/function Applicative instance] from threading an environment to threading an effect in general.
From these quotes, we can infer that, in this context:
Effects are the things that Applicative threads "in general".
Effects are associated with the type constructors that are given Applicative instances.
Monad also deals with effects.
Following these leads, we can trace back this usage of "effect" back to at least Wadler's papers on monads. For instance, here is a quote from page 6 of Monads for functional programming:
In general, a function of type a → b is replaced by a function of type a
→ M b. This can be read as a function that accepts an argument of type a
and returns a result of type b, with a possible additional effect captured by
M. This effect may be to act on state, generate output, raise an exception, or what have you.
And from the same paper, page 21:
If monads encapsulate effects and lists form a monad, do lists correspond to some effect? Indeed they do, and the effect they correspond to is choice. One can think of a computation of type [a] as offering a choice of values, one for each element of the list. The monadic equivalent of a function of type a → b is a function of type a → [b].
The "correspond to some effect" turn of phrase here is key. It ties back to the more straightforward claim in the abstract:
Monads provide a convenient framework for simulating effects found in other languages, such as global state, exception handling, output, or non-determinism.
The pitch is that monads can be used to express things that, in "other languages", are typically encoded as side-effects -- that is, as Petr Pudlák puts it in his answer here, "an observable interaction with [a function's] environment (apart from computing its result value)". Through metonymy, that has readily led to "effect" acquiring a second meaning, broader than that of "side-effect" -- namely, whatever is introduced through a type constructor which is a Monad instance. Over time, this meaning was further generalised to cover other functor classes such as Applicative, as seen in McBride and Patterson's work.
In summary, I consider "effect" to have two reasonable meanings in Haskell parlance:
A "literal" or "absolute" one: an effect is a side-effect; and
A "generalised" or "relative" one: an effect is a functorial context.
On occasion, avoidable disagreements over terminology happen when each of the involved parties implicitly assumes a different meaning of "effect". Another possible point of contention involves whether it is legitimate to speak of effects when dealing with Functor alone, as opposed to subclasses such as Applicative or Monad (I believe it is okay to do so, in agreement with Petr Pudlák's answer to Why can applicative functors have side effects, but functors can't?).

To my mind, a "side effect" is anything that a normal function couldn't do. In other words, anything in addition to just returning a value.
Consider the following code block:
let
y = foo x
z = bar y
in foobar z
This calls foo, and then calls bar, and then calls foobar, three ordinary functions. Simple enough, right? Now consider this:
do
y <- foo x
z <- bar y
foobar z
This also calls three functions, but it also invisibly calls (>>=) between each pair of lines as well. And that means that some strange things happen, depending on what type of monad the functions are running in.
If this is the identity monad, nothing special happens. The monadic version does exactly the same thing as the pure version. There are no side-effects.
If each function returns a Maybe-something, then if (say) bar returns Nothing, the entire code block aborts. A normal function can't do that. (I.e., in the pure version, there is no way to prevent foobar being called.) So this version does something that the pure version cannot. Each function can return a value or abort the block. That's a side-effect.
If each function returns a list-of-something, then the code executes for all possible combinations of results. Again, in the pure version, there is no way to make any of the functions execute multiple times with different arguments. So that's a side-effect.
If each function runs in a state monad, then (for example) foo can send some data directly to foobar, in addition to the value you can see being passed through bar. Again, you can't do that with pure functions, so that's a side-effect.
In IO monad, you have all sorts of interesting effects. You can save files to disk (a file is basically a giant global variable), you can even affect code running on other computers (we call this network I/O).
The ST monad is a cut-down version of the IO monad. It allows mutable state, but self-contained computations cannot influence each other.
The STM monad lets multiple threads talk to each other, and may cause the code to execute multiple times, and... well, you can't do any of this with normal functions.
The continuation monad allows you to break people's minds! Arguably that is possible with pure functions...

"Effect is a very vague term and that is ok because we are trying to talk about something that is outside the language. Effect and side effect are not the same thing. Effects are good. Side effects are bugs.
Their lexical similarity is really unfortunate because it leads to a lot of people conflating these ideas when they read about them and people using one instead of the other so it leads to a lot of confusion."
see here for more: https://www.slideshare.net/pjschwarz/rob-norrisfunctionalprogrammingwitheffects

Functional programming context
Effect generally means the stuff (behaviour, additional logic) that is implemented in Applicative/Monad instances.
Also, it can be said that a simple value is extended with additional behaviour.
For example,
Option models the effects of optionality
or
Option is a monad that models the effect of optionality (of being something optional)
Resources: Resource 1,
Resource 2

Is there a real-world applicability for the continuation monad outside of academic use?

(later visitors: two answers to this question both give excellent insight, if you are interested you should probably read them both, I could only except one as a limitation of SO)
From all discussions I find online on the continuation monad they either mention how it could be used with some trivial examples, or they explain that it is a fundamental building block, as in this article on the Mother of all monads is the continuation monad.
I wonder if there is applicability outside of this range. I mean, does it make sense to wrap a recursive function, or mutual recursion in a continuation monad? Does it aid in readability?
Here's an F# version of the continuation mode taken from this SO post:
type ContinuationMonad() =
member this.Bind (m, f) = fun c -> m (fun a -> f a c)
member this.Return x = fun k -> k x
let cont = ContinuationMonad()
Is it merely of academic interest, for instance to help understand monads, or computation builders? Or is there some real-world applicability, added type-safety, or does it circumvent typical programming problems that are hard to solve otherwise?
I.e., the continuation monad with call/cc from Ryan Riley shows that it is complex to handle exceptions, but it doesn't explain what problem it is trying to solve and the examples don't show why it needs specifically this monad. Admittedly, I just don't understand what it does, but it may be a treasure trove!
(Note: I am not interested in understanding how the continuation monad works, I think I have a fair grasp of it, I just don't see what programming problem it solves.)

The "mother of all monads" stuff is not purely academic. Dan Piponi references Andrzej Filinski's Representing Monads, a rather good paper. The upshot of it is if your language has delimited continuations (or can mimic them with call/cc and a single piece of mutable state) then you can transparently add any monadic effect to any code. In other words, if you have delimited continuations and no other side effects, you can implement (global) mutable state or exceptions or backtracking non-determinism or cooperative concurrency. You can do each of these just by defining a few simply functions. No global transformation or anything needed. Also, you only pay for the side-effects when you use them. It turns out the Schemers were completely right about call/cc being highly expressive.
If your language doesn't have delimited continuations, you can get them via the continuation monad (or better the double-barrelled continuation monad). Of course, if you're going to write in monadic-style anyway – which is a global transformation – why not just use the desired monad from the get-go? For Haskellers, this is typically what we do, however, there are still benefits from using the continuation monad in many cases (albeit hidden away). A good example is the Maybe/Option monad which is like having exceptions except there's only one type of exception. Basically, this monad captures the pattern of returning an "error code" and checking it after each function call. And that's exactly what the typical definition does, except by "function call" I meant every (monadic) step of the computation. Suffice to say, this is pretty inefficient, especially when the vast majority of the time there is no error. If you reflect Maybe into the continuation monad though, while you have to pay the cost of the CPSed code (which GHC Haskell handles suprisingly well), you only pay to check the "error code" in places where it matters, i.e. catch statements. In Haskell, the Codensity monad than danidiaz mentioned is a better choice because the last thing Haskellers want is to make it so that arbitrary effects can be transparently interleaved in their code.
As danidiaz also mentioned, many monads are more easily or more efficiently implemented using essentially a continuation monad or some variant. Backtracking search is one example. While not the newest thing on the backtracking, one of my favorite papers that used it was Typed Logical Variables in Haskell. The techniques used in it was also used in the Wired Hardware Description Language. Also from Koen Claesson is A Poor Man's Concurrency Monad. More modern uses of the ideas in this example include: the monad for deterministic parallelism in Haskell A Monad for Deterministic Parallelism and scalable I/O managers Combining Events And Threads For Scalable Network Services. I'm sure I can find similar techniques used in Scala. If it wasn't provided, you could use a continuation monad to implement asynchronous workflows in F#. In fact, Don Syme references exactly the same papers I just referenced. If you can serialize functions but don't have continuations, you can use a continuation monad to get them and do the serialized continuation type of web programming made popular by systems like Seaside. Even without serializable continuations, you can use the pattern (essentially the same as async) to at least avoid callbacks while storing the continuations locally and only sending a key.
Ultimately, relatively few people outside of Haskellers are using monads in any capacity, and as I alluded to earlier, Haskellers tend to want to use more contcrollable monads than the continuation monad, though they do use them internally quite a bit. Nevertheless, continuation monads or continuation monad like things, particularly for asynchronous programming, are becoming less uncommon. As C#, F#, Scala, Swift, and even Java start incorporating support monadic or at least monadic-style programming, these ideas will become more broadly used. If the Node developers were more conversant with this, maybe they would have realized you could have your cake and eat it too with regards to event-driven programming.

To provide a more direct F#-specific answer (even though Derek already covered that too), the continuation monad pretty much captures the core of how asynchronous workflows work.
A continuation monad is a function that, when given a continuation, eventually calls the continuation with the result (it may never call it or it may call it repeatedly too):
type Cont<'T> = ('T -> unit) -> unit
F# asynchronous computations are a bit more complex - they take continuation (in case of success), exception and cancellation continuations and also include the cancellation token. Using a slightly simplified definition, F# core library uses (see the full definition here):
type AsyncParams =
{ token : CancellationToken
econt : exn -> unit
ccont : exn -> unit }
type Async<'T> = ('T -> unit) * AsyncParams -> unit
As you can see, if you ignore AsyncParams, it is pretty much the continuation monad. In F#, I think the "classical" monads are more useful as an inspiration than as a direct implementation mechanism. Here, the continuation monad provides a useful model of how to handle certain kinds of computations - and with many additional async-specific aspects, the core idea can be used to implement asynchronous computations.
I think this is quite different to how monads are used in classic academic works or in Haskell, where they tend to be used "as is" and perhaps composed in various ways to construct more complex monads that capture more complex behaviour.
This may be just my personal opinion, but I'd say that the continuation monad is not practically useful in itself, but it is a basis for some very practical ideas. (Just like lambda calculus is not really practically useful in itself, but it can be seen as an inspiration for nice practical languages!)

I certainly find it easier to read a recursive function implemented using the continuation monad compared to one implemented using explicit recursion. For example, given this tree type:
type 'a Tree =
| Node of 'a * 'a Tree * 'a Tree
| Empty
here's one way to write a bottom-up fold over a tree:
let rec fold e f t = cont {
match t with
| Node(a,t1,t2) ->
let! r1 = fold e f t1
let! r2 = fold e f t2
return f a r1 r2
| Empty -> return e
}
This is clearly analogous to a naïve fold:
let rec fold e f t =
match t with
| Node(a,t1,t2) ->
let r1 = fold e f t1
let r2 = fold e f t2
f a r1 r2
| Empty -> return e
except that the naïve fold will blow the stack when called on a deep tree because it's not tail recursive, while the fold written using the continuation monad won't. You can of course write the same thing using explicit continuations, but to my eye the amount of clutter that they add distracts from the structure of the algorithm (and putting them in place is not completely fool-proof):
let rec fold e f t k =
match t with
| Node(a,t1,t2) ->
fold e f t1 (fun r1 ->
fold e f t2 (fun r2 ->
k (f r1 r2)))
| Empty -> k e
Note that in order for this to work, you'll need to modify your definition of ContinuationMonad to include
member this.Delay f v = f () v

Monads in Haskell and Purity

My question is whether monads in Haskell actually maintain Haskell's purity, and if so how. Frequently I have read about how side effects are impure but that side effects are needed for useful programs (e.g. I/O). In the next sentence it is stated that Haskell's solution to this is monads. Then monads are explained to some degree or another, but not really how they solve the side-effect problem.
I have seen this and this, and my interpretation of the answers is actually one that came to me in my own readings -- the "actions" of the IO monad are not the I/O themselves but objects that, when executed, perform I/O. But it occurs to me that one could make the same argument for any code or perhaps any compiled executable. Couldn't you say that a C++ program only produces side effects when the compiled code is executed? That all of C++ is inside the IO monad and so C++ is pure? I doubt this is true, but I honestly don't know in what way it is not. In fact, didn't Moggi (sp?) initially use monads to model the denotational semantics of imperative programs?
Some background: I am a fan of Haskell and functional programming and I hope to learn more about both as my studies continue. I understand the benefits of referential transparency, for example. The motivation for this question is that I am a grad student and I will be giving 2 1-hour presentations to a programming languages class, one covering Haskell in particular and the other covering functional programming in general. I suspect that the majority of the class is not familiar with functional programming, maybe having seen a bit of scheme. I hope to be able to (reasonably) clearly explain how monads solve the purity problem without going into category theory and the theoretical underpinnings of monads, which I wouldn't have time to cover and anyway I don't fully understand myself -- certainly not well enough to present.
I wonder if "purity" in this context is not really well-defined?

It's hard to argue conclusively in either direction because "pure" is not particularly well-defined. Certainly, something makes Haskell fundamentally different from other languages, and it's deeply related to managing side-effects and the IO type¹, but it's not clear exactly what that something is. Given a concrete definition to refer to we could just check if it applies, but this isn't easy: such definitions will tend to either not match everyone's expectations or be too broad to be useful.
So what makes Haskell special, then? In my view, it's the separation between evaluation and execution.
The base language—closely related to the λ-caluclus—is all about the former. You work with expressions that evaluate to other expressions, 1 + 1 to 2. No side-effects here, not because they were suppressed or removed but simply because they don't make sense in the first place. They're not part of the model² any more than, say, backtracking search is part of the model of Java (as opposed to Prolog).
If we just stuck to this base language with no added facilities for IO, I think it would be fairly uncontroversial to call it "pure". It would still be useful as, perhaps, a replacement for Mathematica. You would write your program as an expression and then get the result of evaluating the expression at the REPL. Nothing more than a fancy calculator, and nobody accuses the expression language you use in a calculator of being impure³!
But, of course, this is too limiting. We want to use our language to read files and serve web pages and draw pictures and control robots and interact with the user. So the question, then, is how to preserve everything we like about evaluating expressions while extending our language to do everything we want.
The answer we've come up with? IO. A special type of expression that our calculator-like language can evaluate which corresponds to doing some effectful actions. Crucially, evaluation still works just as before, even for things in IO. The effects get executed in the order specified by the resulting IO value, not based on how it was evaluated. IO is what we use to introduce and manage effects into our otherwise-pure expression language.
I think that's enough to make describing Haskell as "pure" meaningful.
footnotes
¹ Note how I said IO and not monads in general: the concept of a monad is immensely useful for dozens of things unrelated to input and output, and the IO types has to be more than just a monad to be useful. I feel the two are linked too closely in common discourse.
² This is why unsafePerformIO is so, well, unsafe: it breaks the core abstraction of the language. This is the same as, say, putzing with specific registers in C: it can both cause weird behavior and stop your code from being portable because it goes below C's level of abstraction.
³ Well, mostly, as long as we ignore things like generating random numbers.

A function with type, for example, a -> IO b always returns an identical IO action when given the same input; it is pure in that it cannot possibly inspect the environment, and obeys all the usual rules for pure functions. This means that, among other things, the compiler can apply all of its usual optimization rules to functions with an IO in their type, because it knows they are still pure functions.
Now, the IO action returned may, when run, look at the environment, read files, modify global state, whatever, all bets are off once you run an action. But you don't necessarily have to run an action; you can put five of them into a list and then run them in reverse of the order in which you created them, or never run some of them at all, if you want; you couldn't do this if IO actions implicitly ran themselves when you created them.
Consider this silly program:
main :: IO ()
main = do
inputs <- take 5 . lines <$> getContents
let [line1,line2,line3,line4,line5] = map print inputs
line3
line1
line2
line5
If you run this, and then enter 5 lines, you will see them printed back to you but in a different order, and with one omitted, even though our haskell program runs map print over them in the order they were received. You couldn't do this with C's printf, because it immediately performs its IO when called; haskell's version just returns an IO action, which you can still manipulate as a first-class value and do whatever you want with.

I see two main differences here:
1) In haskell, you can do things that are not in the IO monad. Why is this good? Because if you have a function definitelyDoesntLaunchNukes :: Int -> IO Int you don't know that the resulting IO action doesn't launch nukes, it might for all you know. cantLaunchNukes :: Int -> Int will definitely not launch any nukes (barring any ugly hacks that you should avoid in nearly all circumstances).
2) In haskell, it's not just a cute analogy: IO actions are first class values. You can put them in lists, and leave them there for as long as you want, they won't do anything unless they somehow become part of the main action. The closest that C has to that are function pointers, which are quite a bit more cumbersome to use. In C++ (and most modern imperative languages really) you have closures which technically could be used for this purpose, but rarely are - mainly because Haskell is pure and they aren't.
Why does that distinction matter here? Well, where are you going to get your other IO actions/closures from? Probably, functions/methods of some description. Which, in an impure language, can themselves have side effects, rendering the attempt of isolating them in these languages pointless.

fiction-mode: Active
It was quite a challenge, and I think a wormhole could be forming in the neighbour's backyard, but I managed to grab part of a Haskell I/O implementation from an alternate reality:
class Kleisli k where
infixr 1 >=>
simple :: (a -> b) -> (a -> k b)
(>=>) :: (a -> k b) -> (b -> k c) -> a -> k c
instance Kleisli IO where
simple = primSimpleIO
(>=>) = primPipeIO
primitive primSimpleIO :: (a -> b) -> (a -> IO b)
primitive primPipeIO :: (a -> IO b) -> (b -> IO c) -> a -> IO c
Back in our slightly-mutilated reality (sorry!), I have used this other form of Haskell I/O to define our form of Haskell I/O:
instance Monad IO where
return x = simple (const x) ()
m >>= k = (const m >=> k) ()
and it works!
fiction-mode: Offline
My question is whether monads in Haskell actually maintain Haskell's purity, and if so how.
The monadic interface, by itself, doesn't maintain restrain the effects - it is only an interface, albeit a jolly-versatile one. As my little work of fiction shows, there are other possible interfaces for the job - it's just a matter of how convenient they are to use in practice.
For an implementation of Haskell I/O, what keeps the effects under control is that all the pertinent entities, be they:
IO, simple, (>=>) etc
or:
IO, return, (>>=) etc
are abstract - how the implementation defines those is kept private.
Otherwise, you would be able to devise "novelties" like this:
what_the_heck =
do spare_world <- getWorld -- how easy was that?
launchMissiles -- let's mess everything up,
putWorld spare_world -- and bring it all back :-D
what_the_heck -- that was fun; let's do it again!
(Aren't you glad our reality isn't quite so pliable? ;-)
This observation extends to types like ST (encapsulated state) and STM (concurrency) and their stewards (runST, atomically etc). For types like lists, Maybe and Either, their orthodox definitions in Haskell means no visible effects.
So when you see an interface - monadic, applicative, etc - for certain abstract types, any effects (if they exist) are contained by keeping its implementation private; safe from being used in aberrant ways.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string