Haskell: parallel computation and the 'sequential property' of monads

Haskell: parallel computation and the 'sequential property' of monads - haskell

I am confused about why the REPA function computeP packs its result in a monad. It has the following type signature.
computeP :: (Load r1 sh e, Target r2 e, Source r2 e, Monad m) =>
Array r1 sh e -> m (Array r2 sh e)
In this tutorial it says
The reason for this is that monads give a well defined notion of sequence and thus computeP enforces completion of parallel evaluation in a particular point of monadic computations.
Likewise, this answer on Stack Overflow states that
The reason why parallel computation in Repa must be monadic has to do partially with lazyness, but mostly with Repa's inability to deal with nested parallelism. Sequential property of a Monad solves it for the most part[.]
Questions
What does having this 'sequential property' mean exactly?
How does a monad enforce this?
For the example of computeP: there is no constraint on which monad is used, so I could use the identity monad. Is it okay then to use the following function instead that just unpacks the monad, or will this give unexpected results because it lacks this sequential property? If it is okay, is there even a need to use a monad at all?
import Data.Functor.Identity
import Data.Array.Repa.Eval
import Data.Array.Repa
myComputeP :: (Load r1 sh e, Target r2 e, Source r2 e) => Array r1 sh e -> Array r2 sh e
myComputeP = runIdentity . computeP
Any help would be great.

This monad constraint is a heuristic trick. It helps disciplined users to avoid nested parallelism, but does nothing against malicious or clueless users.
Nested parallelism is the situation where, while computing some array in parallel, you end up having to compute another array in parallel. Repa does not support it (the reason is not important), so it tries to avoid it.
The type of computeP helps ensure that parallel computations are done in sequence with each other, but it is far from airtight; it is merely a "best effort" abstraction.
How does a monad enforce this?
Actually, computeP is only expected to work with monads whose bind (>>=) is strict in its first argument, so that in u >>= k, the function k will be applied only after u has been evaluated. Then if you use computeP with such a monad,
do w <- computeP v
k w
it is guaranteed that the vector w is evaluated before it gets passed to k, which may safely perform other computeP operations.
Examples of strict monads: IO, strict State, Maybe, [].
Examples of lazy monads: Identity, lazy State, Reader. (Lazy monads can be made strict, but not the other way around. In particular, you can define a strict identity monad if you want to do only Repa computations.)
To prevent nested parallelism, the type of computeP intentionally makes it cumbersome to use within operations that are likely to be done in parallel, such as map :: (a -> b) -> Array _ _ a -> Array _ _ b and fromFunction :: sh -> (sh -> a) -> Array _ _ a, which take non-monadic functions. One can still explicitly unwrap computeP, for example with runIdentity as you noticed: you can shoot yourself in the foot if you want, but it's on you to load the gun, point it down and pull the trigger.
Hopefully, that answers what is going on in Repa. What follows is a theoretical digression to answer this other question:
What does having this 'sequential property' mean exactly?
Those quotations are quite handwavy. As I see it, the relation between "sequentiality" and "monads" is two-fold.
First, for many monads in Haskell, the definition of (>>=) naturally dictates an order of evaluation, typically because it immediately pattern-matches on the first argument. As explained earlier, that's what Repa relies on to enforce that computeP computations happen in sequence (and that's why it would break if you specialize it to Identity; it is not a strict monad). In the grand scheme of things, this is a rather minor detail of lazy evaluation, rather than anything proper to monads in general.
Second, one core idea of pure functional programming is first-class computations, with explicit definitions of effects and composition. In this case, the effect is the parallel evaluation of vectors, and the composition we care about is sequential composition. Monads provide a general model, or interface, for sequential composition. That's another part of the reason why monads help solve the problem of avoiding nested parallelism in Repa.
The point is not that monads have an inherently sequential aspect, but it is that sequential composition is inherently monadic: if you try to list general properties that you would expect from anything worthy of the name "sequential composition", you're likely to end up with properties that together are called "monad"; that's one of the points of Moggi's seminal paper "Notions of computation and monads".
"Monads" is not a magical concept, rather it's enormously general so that lots of things happen to be monads. After all, the main requirement is that there's an associative operation; that's a pretty reasonable assumption to make about sequential composition. (If, when you hear "associativity", you think "monoid" or "category", note that all of these concepts are unified under the umbrella of "monoid objects", so as far as "associativity" is concerned, they're all the same idea, just in different categories.)

Related

What is an (composite) effect (possibly represented as a Monad+Monad Transformers) ? Precise, clear, short answer/definition? [duplicate]

Time and again I read the term effectful, but I am still unable to give a clear definition of what it means. I assume the correct context is effectful computations, but I've also seen the term effectful values)
I used to think that effectful means having side effects. But in Haskell there are no side-effects (except to some extent IO). Still there are effectful computations all over the place.
Then I read that monads are used to create effectful computations. I can somewhat understand this in the context of the State Monad. But I fail to see any side-effect in the Maybe monad. In general it seems to me, that Monads which wrap a function-like thing are easier to see as producing side-effects than Monads which just wrap a value.
When it comes to Applicative functors I am even more lost. I always saw applicative functors as a way to map a function with more than one argument. I cannot see any side-effect here. Or is there a difference between effectful and with effects?

A side effect is an observable interaction with its environment (apart from computing its result value). In Haskell, we try hard to avoid functions with such side effects. This even applies to IO actions: when an IO action is evaluated, no side effects are performed, they are executed only when the actions prescribed in the IO value are executed within main.
However, when working with abstractions that are related to composing computations, such as applicative functors and monads, it's convenient to somewhat distinguish between the actual value and the "rest", which we often call an "effect". In particular, if we have a type f of kind * -> *, then in f a the a part is "the value" and whatever "remains" is "the effect".
I intentionally quoted the terms, as there is no precise definition (as far as I know), it's merely a colloquial definition. In some cases there are no values at all, or multiple values. For example for Maybe the "effect" is that there might be no value (and the computation is aborted), for [] the "effect" is that there are multiple (or zero) values. For more complex types this distinction can be even more difficult.
The distinction between "effects" and "values" doesn't really depend on the abstraction. Functor, Applicative and Monad just give us tools what we can do with them (Functors allow to modify values inside, Applicatives allow to combine effects and Monads allow effects to depend on the previous values). But in the context of Monads, it's somewhat easier to create a mental picture of what is going on, because a monadic action can "see" the result value of the previous computation, as witnessed by the
(>>=) :: m a -> (a -> m b) -> m b
operator: The second function receives a value of type a, so we can imagine "the previous computation had some effect and now there is its result value with which we can do something".

In support of Petr Pudlák's answer, here is an argument concerning the origin of the broader notion of "effect" espoused there.
The phrase "effectful programming" shows up in the abstract of McBride and Patterson's Applicative Programming with Effects, the paper which introduced applicative functors:
In this paper, we introduce Applicative functors — an abstract characterisation of an applicative style of effectful programming, weaker than Monads and hence more widespread.
"Effect" and "effectful" appear in a handful of other passages of the paper; these ocurrences are deemed unremarkable enough not to require an explicit clarification. For instance, this remark is made just after the definition of Applicative is presented (p. 3):
In each example, there is a type constructor f that embeds the usual
notion of value, but supports its own peculiar way of giving meaning to the usual applicative language [...] We correspondingly introduce the Applicative class:
[A Haskell definition of Applicative]
This class generalises S and K [i.e. the S and K combinators, which show up in the Reader/function Applicative instance] from threading an environment to threading an effect in general.
From these quotes, we can infer that, in this context:
Effects are the things that Applicative threads "in general".
Effects are associated with the type constructors that are given Applicative instances.
Monad also deals with effects.
Following these leads, we can trace back this usage of "effect" back to at least Wadler's papers on monads. For instance, here is a quote from page 6 of Monads for functional programming:
In general, a function of type a → b is replaced by a function of type a
→ M b. This can be read as a function that accepts an argument of type a
and returns a result of type b, with a possible additional effect captured by
M. This effect may be to act on state, generate output, raise an exception, or what have you.
And from the same paper, page 21:
If monads encapsulate effects and lists form a monad, do lists correspond to some effect? Indeed they do, and the effect they correspond to is choice. One can think of a computation of type [a] as offering a choice of values, one for each element of the list. The monadic equivalent of a function of type a → b is a function of type a → [b].
The "correspond to some effect" turn of phrase here is key. It ties back to the more straightforward claim in the abstract:
Monads provide a convenient framework for simulating effects found in other languages, such as global state, exception handling, output, or non-determinism.
The pitch is that monads can be used to express things that, in "other languages", are typically encoded as side-effects -- that is, as Petr Pudlák puts it in his answer here, "an observable interaction with [a function's] environment (apart from computing its result value)". Through metonymy, that has readily led to "effect" acquiring a second meaning, broader than that of "side-effect" -- namely, whatever is introduced through a type constructor which is a Monad instance. Over time, this meaning was further generalised to cover other functor classes such as Applicative, as seen in McBride and Patterson's work.
In summary, I consider "effect" to have two reasonable meanings in Haskell parlance:
A "literal" or "absolute" one: an effect is a side-effect; and
A "generalised" or "relative" one: an effect is a functorial context.
On occasion, avoidable disagreements over terminology happen when each of the involved parties implicitly assumes a different meaning of "effect". Another possible point of contention involves whether it is legitimate to speak of effects when dealing with Functor alone, as opposed to subclasses such as Applicative or Monad (I believe it is okay to do so, in agreement with Petr Pudlák's answer to Why can applicative functors have side effects, but functors can't?).

To my mind, a "side effect" is anything that a normal function couldn't do. In other words, anything in addition to just returning a value.
Consider the following code block:
let
y = foo x
z = bar y
in foobar z
This calls foo, and then calls bar, and then calls foobar, three ordinary functions. Simple enough, right? Now consider this:
do
y <- foo x
z <- bar y
foobar z
This also calls three functions, but it also invisibly calls (>>=) between each pair of lines as well. And that means that some strange things happen, depending on what type of monad the functions are running in.
If this is the identity monad, nothing special happens. The monadic version does exactly the same thing as the pure version. There are no side-effects.
If each function returns a Maybe-something, then if (say) bar returns Nothing, the entire code block aborts. A normal function can't do that. (I.e., in the pure version, there is no way to prevent foobar being called.) So this version does something that the pure version cannot. Each function can return a value or abort the block. That's a side-effect.
If each function returns a list-of-something, then the code executes for all possible combinations of results. Again, in the pure version, there is no way to make any of the functions execute multiple times with different arguments. So that's a side-effect.
If each function runs in a state monad, then (for example) foo can send some data directly to foobar, in addition to the value you can see being passed through bar. Again, you can't do that with pure functions, so that's a side-effect.
In IO monad, you have all sorts of interesting effects. You can save files to disk (a file is basically a giant global variable), you can even affect code running on other computers (we call this network I/O).
The ST monad is a cut-down version of the IO monad. It allows mutable state, but self-contained computations cannot influence each other.
The STM monad lets multiple threads talk to each other, and may cause the code to execute multiple times, and... well, you can't do any of this with normal functions.
The continuation monad allows you to break people's minds! Arguably that is possible with pure functions...

"Effect is a very vague term and that is ok because we are trying to talk about something that is outside the language. Effect and side effect are not the same thing. Effects are good. Side effects are bugs.
Their lexical similarity is really unfortunate because it leads to a lot of people conflating these ideas when they read about them and people using one instead of the other so it leads to a lot of confusion."
see here for more: https://www.slideshare.net/pjschwarz/rob-norrisfunctionalprogrammingwitheffects

Functional programming context
Effect generally means the stuff (behaviour, additional logic) that is implemented in Applicative/Monad instances.
Also, it can be said that a simple value is extended with additional behaviour.
For example,
Option models the effects of optionality
or
Option is a monad that models the effect of optionality (of being something optional)
Resources: Resource 1,
Resource 2

Is there a real-world applicability for the continuation monad outside of academic use?

(later visitors: two answers to this question both give excellent insight, if you are interested you should probably read them both, I could only except one as a limitation of SO)
From all discussions I find online on the continuation monad they either mention how it could be used with some trivial examples, or they explain that it is a fundamental building block, as in this article on the Mother of all monads is the continuation monad.
I wonder if there is applicability outside of this range. I mean, does it make sense to wrap a recursive function, or mutual recursion in a continuation monad? Does it aid in readability?
Here's an F# version of the continuation mode taken from this SO post:
type ContinuationMonad() =
member this.Bind (m, f) = fun c -> m (fun a -> f a c)
member this.Return x = fun k -> k x
let cont = ContinuationMonad()
Is it merely of academic interest, for instance to help understand monads, or computation builders? Or is there some real-world applicability, added type-safety, or does it circumvent typical programming problems that are hard to solve otherwise?
I.e., the continuation monad with call/cc from Ryan Riley shows that it is complex to handle exceptions, but it doesn't explain what problem it is trying to solve and the examples don't show why it needs specifically this monad. Admittedly, I just don't understand what it does, but it may be a treasure trove!
(Note: I am not interested in understanding how the continuation monad works, I think I have a fair grasp of it, I just don't see what programming problem it solves.)

The "mother of all monads" stuff is not purely academic. Dan Piponi references Andrzej Filinski's Representing Monads, a rather good paper. The upshot of it is if your language has delimited continuations (or can mimic them with call/cc and a single piece of mutable state) then you can transparently add any monadic effect to any code. In other words, if you have delimited continuations and no other side effects, you can implement (global) mutable state or exceptions or backtracking non-determinism or cooperative concurrency. You can do each of these just by defining a few simply functions. No global transformation or anything needed. Also, you only pay for the side-effects when you use them. It turns out the Schemers were completely right about call/cc being highly expressive.
If your language doesn't have delimited continuations, you can get them via the continuation monad (or better the double-barrelled continuation monad). Of course, if you're going to write in monadic-style anyway – which is a global transformation – why not just use the desired monad from the get-go? For Haskellers, this is typically what we do, however, there are still benefits from using the continuation monad in many cases (albeit hidden away). A good example is the Maybe/Option monad which is like having exceptions except there's only one type of exception. Basically, this monad captures the pattern of returning an "error code" and checking it after each function call. And that's exactly what the typical definition does, except by "function call" I meant every (monadic) step of the computation. Suffice to say, this is pretty inefficient, especially when the vast majority of the time there is no error. If you reflect Maybe into the continuation monad though, while you have to pay the cost of the CPSed code (which GHC Haskell handles suprisingly well), you only pay to check the "error code" in places where it matters, i.e. catch statements. In Haskell, the Codensity monad than danidiaz mentioned is a better choice because the last thing Haskellers want is to make it so that arbitrary effects can be transparently interleaved in their code.
As danidiaz also mentioned, many monads are more easily or more efficiently implemented using essentially a continuation monad or some variant. Backtracking search is one example. While not the newest thing on the backtracking, one of my favorite papers that used it was Typed Logical Variables in Haskell. The techniques used in it was also used in the Wired Hardware Description Language. Also from Koen Claesson is A Poor Man's Concurrency Monad. More modern uses of the ideas in this example include: the monad for deterministic parallelism in Haskell A Monad for Deterministic Parallelism and scalable I/O managers Combining Events And Threads For Scalable Network Services. I'm sure I can find similar techniques used in Scala. If it wasn't provided, you could use a continuation monad to implement asynchronous workflows in F#. In fact, Don Syme references exactly the same papers I just referenced. If you can serialize functions but don't have continuations, you can use a continuation monad to get them and do the serialized continuation type of web programming made popular by systems like Seaside. Even without serializable continuations, you can use the pattern (essentially the same as async) to at least avoid callbacks while storing the continuations locally and only sending a key.
Ultimately, relatively few people outside of Haskellers are using monads in any capacity, and as I alluded to earlier, Haskellers tend to want to use more contcrollable monads than the continuation monad, though they do use them internally quite a bit. Nevertheless, continuation monads or continuation monad like things, particularly for asynchronous programming, are becoming less uncommon. As C#, F#, Scala, Swift, and even Java start incorporating support monadic or at least monadic-style programming, these ideas will become more broadly used. If the Node developers were more conversant with this, maybe they would have realized you could have your cake and eat it too with regards to event-driven programming.

To provide a more direct F#-specific answer (even though Derek already covered that too), the continuation monad pretty much captures the core of how asynchronous workflows work.
A continuation monad is a function that, when given a continuation, eventually calls the continuation with the result (it may never call it or it may call it repeatedly too):
type Cont<'T> = ('T -> unit) -> unit
F# asynchronous computations are a bit more complex - they take continuation (in case of success), exception and cancellation continuations and also include the cancellation token. Using a slightly simplified definition, F# core library uses (see the full definition here):
type AsyncParams =
{ token : CancellationToken
econt : exn -> unit
ccont : exn -> unit }
type Async<'T> = ('T -> unit) * AsyncParams -> unit
As you can see, if you ignore AsyncParams, it is pretty much the continuation monad. In F#, I think the "classical" monads are more useful as an inspiration than as a direct implementation mechanism. Here, the continuation monad provides a useful model of how to handle certain kinds of computations - and with many additional async-specific aspects, the core idea can be used to implement asynchronous computations.
I think this is quite different to how monads are used in classic academic works or in Haskell, where they tend to be used "as is" and perhaps composed in various ways to construct more complex monads that capture more complex behaviour.
This may be just my personal opinion, but I'd say that the continuation monad is not practically useful in itself, but it is a basis for some very practical ideas. (Just like lambda calculus is not really practically useful in itself, but it can be seen as an inspiration for nice practical languages!)

I certainly find it easier to read a recursive function implemented using the continuation monad compared to one implemented using explicit recursion. For example, given this tree type:
type 'a Tree =
| Node of 'a * 'a Tree * 'a Tree
| Empty
here's one way to write a bottom-up fold over a tree:
let rec fold e f t = cont {
match t with
| Node(a,t1,t2) ->
let! r1 = fold e f t1
let! r2 = fold e f t2
return f a r1 r2
| Empty -> return e
}
This is clearly analogous to a naïve fold:
let rec fold e f t =
match t with
| Node(a,t1,t2) ->
let r1 = fold e f t1
let r2 = fold e f t2
f a r1 r2
| Empty -> return e
except that the naïve fold will blow the stack when called on a deep tree because it's not tail recursive, while the fold written using the continuation monad won't. You can of course write the same thing using explicit continuations, but to my eye the amount of clutter that they add distracts from the structure of the algorithm (and putting them in place is not completely fool-proof):
let rec fold e f t k =
match t with
| Node(a,t1,t2) ->
fold e f t1 (fun r1 ->
fold e f t2 (fun r2 ->
k (f r1 r2)))
| Empty -> k e
Note that in order for this to work, you'll need to modify your definition of ContinuationMonad to include
member this.Delay f v = f () v

Aren't Monads essentially just "conceptual" sugar?

Let's assume we allow two types of functions in Haskell:
strictly pure (as usual)
potentially non-pure (procedures)
The distinction would be made f.x. by declaring that a dot (".") as first letter of a function name declares it as a non-pure procedure.
And further we would set the rules:
pure functions may be called by pure and non-pure functions
non-pure functions may only be called by non-pure functions
and: non-pure functions may be programmed imperatively
With this syntactical sugar and specifications at hand - would there still be a need for Monads? Is there anything Monads could do which above rule set would not allow?
B/c as I came to understand Monads - this is exactly their purpose. Just that very smart people managed to achieve this purely by means of functional methods with a cathegory theoretical tool set at hand.

No.
Monads have nothing to do with purity or impurity in principle. It just so happens that IO models impure code nicely, but Monad class can be used perfectly right for instances like State or Maybe, which are absolutely pure.
Monads also allow expressing complex context hierarchies (as I choose to call them) in a very explicit way. "pure/impure" isn't the only division you might want to make. Consider authorized/unauthorized, for example. The list of possible uses goes on and on... I'd encourage you to look at other commonly used instances, like ST, STM, RWS, "restricted IO" and friends to get a better understanding of the possibilities.
Soon enough you'll start making your own monads, tailored to the problem at hand.

B/c as I came to understand Monads - this is exactly their purpose.
Monads in their full generality have nothing to do with purity/impurity or imperative sequencing. So no, monads are most certainly not conceptual sugar for effect encapsulation if I understood your question.
Consider that overwhelmingly most of the monads in the Prelude: List, Reader, State, Cont, Either, (->) have nothing to do with effects or IO. It's a very common misconception to assume that IO is the "canonical" monad, while in fact it's really a degenerate case.

B/c as I came to understand Monads - this is exactly their purpose.
This: http://homepages.inf.ed.ac.uk/wadler/topics/monads.html#monads was the first paper on monads in Haskell:
Category theorists invented monads in the 1960's to concisely express certain aspects of universal algebra.
So right away you can see that monads have nothing to do with "pure" / "impure" computations. The most common monad (in the world!) is Maybe:
data Maybe a
= Nothing
| Just a
instance Monad Maybe where
return = Just
Nothing >>= f = Nothing
Just x >>= f = f x
The monad is the four-tuple (Maybe, liftM, return, join), where:
liftM :: (a -> b) -> Maybe a -> Maybe b
liftM f mb = mb >>= return . f
join :: Maybe (Maybe a) -> Maybe a
join Nothing = Nothing
join (Just mb) = mb
Note that liftM takes a non-Maybe function (not pure!) and applies it to a Maybe, while join takes a two-level Maybe and simplifies it to a single layer (so Just in the result comes from having two layers of Just:
join (Just (Just x)) = Just x
while Nothing in the result can come from a Nothing at either layer:
join Nothing = Nothing
join (Just Nothing) = Nothing
). We can translate these terms as follows:
Maybe: a value that may or may not be present.
liftM: apply this function to the value if present, otherwise do nothing.
return: take this value that is present and inject it into the extra structure of a Maybe.
join: take a (value that may or may not be present) that may or may not be present and erase the distinction between the two layers of 'may or may not be present'.
Now, Maybe is a perfectly suitable data type. In scripting languages, it's expressed just by using undef or the equivalent to express Nothing, and representing Just x the same way as x. In C/C++, it's expressed by using a pointer type t*, and allowing the pointer to be NULL. In Scala, there's an explicit container type: http://www.scala-lang.org/api/current/index.html#scala.Option . So you can't say "oh, that's just exceptions" because languages with exceptions still want to be able to express 'no value here' without throwing an exception, and then apply functions if the value is there (which is why Scala's Option type has a foreach method). But 'apply this function if the value is there' is precisely what Maybe's >>= does! So it's a very useful operation.
You'll notice that C and the scripting languages don't generally allow the distinction between Nothing and Just Nothing to be made --- a value is either present or not. In a functional language --- like Haskell --- it's interesting to allow both versions, which is why we need join to erase that distinction when we're done with it. (And, mathematically, it's nicer to define >>= in terms of liftM and join than the other way around).
Incidentally, to clear up the common mis-conception about Haskell and IO: GHC's implementation of IO wrappers up the side-effectfulness of GHC's implementation of I/O. Even that is a terrible design decision of GHC --- imperative (which is different than impure!) I/O can be modeled monadically without impurity at any level of the system. You don't need side effects (at any layer of the system) to do I/O!

How do you identify monadic design patterns?

I my way to learn Haskell I'm starting to grasp the monad concept and starting to use the known monads in my code but I'm still having difficulties approaching monads from a designer point of view. In OO there are several rules like, "identify nouns" for objects, watch for some kind of state and interface... but I'm not able to find equivalent resources for monads.
So how do you identify a problem as monadic in nature? What are good design patterns for monadic design? What's your approach when you realize that some code would be better refactored into a monad?

A helpful rule of thumb is when you see values in a context; monads can be seen as layering "effects" on:
Maybe: partiality (uses: computations that can fail)
Either: short-circuiting errors (uses: error/exception handling)
[] (the list monad): nondeterminism (uses: list generation, filtering, ...)
State: a single mutable reference (uses: state)
Reader: a shared environment (uses: variable bindings, common information, ...)
Writer: a "side-channel" output or accumulation (uses: logging, maintaining a write-only counter, ...)
Cont: non-local control-flow (uses: too numerous to list)
Usually, you should generally design your monad by layering on the monad transformers from the standard Monad Transformer Library, which let you combine the above effects into a single monad. Together, these handle the majority of monads you might want to use. There are some additional monads not included in the MTL, such as the probability and supply monads.
As far as developing an intuition for whether a newly-defined type is a monad, and how it behaves as one, you can think of it by going up from Functor to Monad:
Functor lets you transform values with pure functions.
Applicative lets you embed pure values and express application — (<*>) lets you go from an embedded function and its embedded argument to an embedded result.
Monad lets the structure of embedded computations depend on the values of previous computations.
The easiest way to understand this is to look at the type of join:
join :: (Monad m) => m (m a) -> m a
This means that if you have an embedded computation whose result is a new embedded computation, you can create a computation that executes the result of that computation. So you can use monadic effects to create a new computation based on values of previous computations, and transfer control flow to that computation.
Interestingly, this can be a weakness of structuring things monadically: with Applicative, the structure of the computation is static (i.e. a given Applicative computation has a certain structure of effects that cannot change based on intermediate values), whereas with Monad it is dynamic. This can restrict the optimisation you can do; for instance, applicative parsers are less powerful than monadic ones (well, this isn't strictly true, but it effectively is), but they can be optimised better.
Note that (>>=) can be defined as
m >>= f = join (fmap f m)
and so a monad can be defined simply with return and join (assuming it's a Functor; all monads are applicative functors, but Haskell's typeclass hierarchy unfortunately doesn't require this for historical reasons).
As an additional note, you probably shouldn't focus too heavily on monads, no matter what kind of buzz they get from misguided non-Haskellers. There are many typeclasses that represent meaningful and powerful patterns, and not everything is best expressed as a monad. Applicative, Monoid, Foldable... which abstraction to use depends entirely on your situation. And, of course, just because something is a monad doesn't mean it can't be other things too; being a monad is just another property of a type.
So, you shouldn't think too much about "identifying monads"; the questions are more like:
Can this code be expressed in a simpler monadic form? With which monad?
Is this type I've just defined a monad? What generic patterns encoded by the standard functions on monads can I take advantage of?

Follow the types.
If you find you have written functions with all of these types
(a -> b) -> YourType a -> YourType b
a -> YourType a
YourType (YourType a) -> YourType a
or all of these types
a -> YourType a
YourType a -> (a -> YourType b) -> YourType b
then YourType may be a monad. (I say “may” because the functions must obey the monad laws as well.)
(Remember you can reorder arguments, so e.g. YourType a -> (a -> b) -> YourType b is just (a -> b) -> YourType a -> YourType b in disguise.)
Don't look out only for monads! If you have functions of all of these types
YourType
YourType -> YourType -> YourType
and they obey the monoid laws, you have a monoid! That can be valuable too. Similarly for other typeclasses, most importantly Functor.

There's the effect view of monads:
Maybe - partiality / failure short-circuiting
Either - error reporting / short-circuiting (like Maybe with more information)
Writer - write only "state", commonly logging
Reader - read-only state, commonly environment passing
State - read / write state
Resumption - pausable computation
List - multiple successes
Once you are familiar with these effects its easy to build monads combining them with monad transformers. Note that combining some monads needs special care (particularly Cont and any monads with backtracking).
One thing important to note is there aren't many monads. There are some exotic ones that aren't in the standard libraries e.g the probability monad and variations of the Cont monad like Codensity. But unless you are doing something mathematical its unlikely you will invent (or discover) a new monad, however if you use Haskell long enough you'll build many monads that are different combinations of the standard ones.
Edit - Also note that the order you stack monad transformers results in different monads:
If you add ErrorT (transformer) to a Writer monad, you get this monad Either err (log,a) - you can only access the log if you have no error.
If you add WriterT (transfomer) to an Error monad, you get this monad (log, Either err a) which always gives access to the log.

This is sort of a non-answer, but I feel it is important to say anyways. Just ask! StackOverflow, /r/haskell, and the #haskell irc channel are all great places to get quick feedback from smart people. If you are working on a problem, and you suspect that there's some monadic magic that could make it easier, just ask! The Haskell community loves to solve problems, and is ridiculously friendly.
Don't misunderstand, I'm not encouraging you to never learn for yourself. Quite the contrary, interacting with the Haskell community is one of the best ways to learn. LYAH and RWH, 2 Haskell books that are freely available online, come highly recommended as well.
Oh, and don't forget to play, play, play! As you play around with monadic code, you'll start to get the feel of what "shape" monads have, and when monadic combinators can be useful. If you're rolling your own monad, then usually the type system will guide you to an obvious, simple solution. But to be honest, you should rarely need to roll your own instance of Monad, since Haskell libraries provide tons of useful things as mentioned by other answerers.

There's a common notion that one sees in many programming languages of an "infectious function tag" -- some special behavior for a function that must extend to its callers as well.
Rust functions can be unsafe, meaning they perform operations that can potentially violate memory unsafety. unsafe functions can call normal functions, but any function that calls an unsafe function must be unsafe as well.
Python functions can be async, meaning they return a promise rather than an actual value. async functions can call normal functions, but invocation of an async function (via await) can only be done by another async function.
Haskell functions can be impure, meaning they return an IO a rather than an a. Impure functions can call pure functions, but impure functions can only be called by other impure functions.
Mathematical functions can be partial, meaning they don't map every value in their domain to an output. The definitions of partial functions can reference total functions, but if a total function maps some of its domain to a partial function, it becomes partial as well.
While there may be ways to invoke a tagged function from an untagged function, there is no general way, and doing so can often be dangerous and threatens to break the abstraction the language tries to provide.
The benefit, then, of having tags is that you can expose a set of special primitives that are given this tag and have any function that uses these primitives make that clear in its signature.
Say you're a language designer and you recognize this pattern, and you decide that you want to allow user-defined tags. Let's say the user defined a tag Err, representing computations that may throw an error. A function using Err might look like this:
function div <Err> (n: Int, d: Int): Int
if d == 0
throwError("division by 0")
else
return (n / d)
If we wanted to simplify things, we might observe that there's nothing erroneous about taking arguments - it's computing the return value where problems might arise. So we can restrict tags to functions that take no arguments, and have div return a closure rather than the actual value:
function div(n: Int, d: Int): <Err> () -> Int
() =>
if d == 0
throwError("division by 0")
else
return (n / d)
In a lazy language such as Haskell, we don't need the closure, and can just return a lazy value directly:
div :: Int -> Int -> Err Int
div _ 0 = throwError "division by 0"
div n d = return $ n / d
It is now apparent that, in Haskell, tags need no special language support - they are ordinary type constructors. Let's make a typeclass for them!
class Tag m where
We want to be able to call an untagged function from a tagged function, which is equivalent to turning an untagged value (a) into a tagged value (m a).
addTag :: a -> m a
We also want to be able to take a tagged value (m a) and apply a tagged function (a -> m b) to get a tagged result (m b):
embed :: m a -> (a -> m b) -> m b
This, of course, is precisely the definition of a monad! addTag corresponds to return, and embed corresponds to (>>=).
It is now clear that "tagged functions" are merely a type of monad. As such, whenever you spot a place where a "function tag" could apply, chances are you've got a place suitable for a monad.
P.S. Regarding the tags I've mentioned in this answer: Haskell models impurity with the IO monad and partiality with the Maybe monad. Most languages implement async/promises fairly transparently, and there seems to be a Haskell package called promise that mimics this functionality. The Err monad is equivalent to the Either String monad. I'm not aware of any language that models memory unsafety monadically, it could be done.

Relax ordering constraints in monadic computation

here is some food for thought.
When I write monadic code, the monad imposes ordering on the operations done. For example, If I write in the IO monad:
do a <- doSomething
b <- doSomethingElse
return (a + b)
I know doSomething will be executed before doSomethingElse.
Now, consider the equivalent code in a language like C:
return (doSomething() + doSomethingElse());
The semantics of C don't actually specify what order these two function calls will be evaluated, so the compiler is free to move things around as it pleases.
My question is this: How would I write monadic code in Haskell that also leaves this evaluation order undefined? Ideally, I would reap some benefits when my compiler's optimizer looks at the code and starts moving things around.
Here are some possible techniques that don't get the job done, but are in the right "spirit":
Write the code in functorial style, that is, write plus doSomething doSomethingElse and let plus schedule the monadic calls. Drawbacks: You lose sharing on the results of the monadic actions, and plus still makes a decision about when things end up being evaluated.
Use lazy IO, that is, unsafeInterleaveIO, which defers the scheduling to the demands lazy of evaluation. But lazy is different from strict with undefined order: in particular I do want all of my monadic actions to get executed.
Lazy IO, combined with immediately seq'ing all of the arguments. In particular, seq does not impose ordering constraints.
In this sense, I want something more flexible than monadic ordering but less flexible than full-out laziness.

This problem of over-sequentializing monad code is known as the "commutative monads problem".
Commutative monads are monads for which the order of actions makes no difference (they commute), that is when following code:
do a <- f x
b <- g y
m a b
is the same as:
do b <- g y
a <- f x
m a b
there are many monads that commute (e.g. Maybe, Random). If the monad is commutative, then the operations captured within it can be computed in parallel, for example. They are very useful!
However, we don't have a good syntax for monads that commute, though a lot of people have asked for such a thing -- it is still an open research problem.
As an aside, applicative functors do give us such freedom to reorder computations, however, you have to give up on the notion of bind (as suggestions for e.g. liftM2 show).

This is a deep dirty hack, but it seems like it should do the trick to me.
{-# OPTIONS_GHC -fglasgow-exts #-}
{-# LANGUAGE MagicHash #-}
module Unorder where
import GHC.Types
unorder :: IO a -> IO b -> IO (a, b)
unorder (IO f) (IO g) = IO $ \rw# ->
let (# _, x #) = f rw#
(# _, y #) = g rw#
in (# rw# , (x,y) #)
Since this puts non-determinism in the hands of the compiler, it should behave "correctly" (i.e. nondeterministically) with regards to control flow issues (i.e. exceptions) as well.
On the other hand, we can't pull the same trick most standard monads such as State and Either a since we're really relying on spooky action at a distance available via messing with the RealWorld token. To get the right behavior, we'd need some annotation available to the optimizer that indicated we were fine with nondeterministic choice between two non-equivalent alternatives.

The semantics of C don't actually specify what order these two function calls will be evaluated, so the compiler is free to move things around as it pleases.
But what if doSomething() causes a side-effect that will change the behavior of doSomethingElse()? Do you really want the compiler to mess with the order? (Hint: no) The fact that you are in a monad at all suggests that such may be the case. Your note that "you lose sharing on the results" also suggests this.
However, note that monadic does not always mean sequenced. It's not exactly what you describe, but you may be interested in the Par monad, which allows you to run your actions in parallel.
You are interested in leaving the order undefined so that the compiler can magically optimize it for you. Perhaps instead you should use something like the Par monad to indicate the dependencies (some things inevitably have to happen before others) and then let the rest run in parallel.
Side note: don't confuse Haskell's return to be anything like C's return

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string