Relax ordering constraints in monadic computation

Relax ordering constraints in monadic computation - haskell

here is some food for thought.
When I write monadic code, the monad imposes ordering on the operations done. For example, If I write in the IO monad:
do a <- doSomething
b <- doSomethingElse
return (a + b)
I know doSomething will be executed before doSomethingElse.
Now, consider the equivalent code in a language like C:
return (doSomething() + doSomethingElse());
The semantics of C don't actually specify what order these two function calls will be evaluated, so the compiler is free to move things around as it pleases.
My question is this: How would I write monadic code in Haskell that also leaves this evaluation order undefined? Ideally, I would reap some benefits when my compiler's optimizer looks at the code and starts moving things around.
Here are some possible techniques that don't get the job done, but are in the right "spirit":
Write the code in functorial style, that is, write plus doSomething doSomethingElse and let plus schedule the monadic calls. Drawbacks: You lose sharing on the results of the monadic actions, and plus still makes a decision about when things end up being evaluated.
Use lazy IO, that is, unsafeInterleaveIO, which defers the scheduling to the demands lazy of evaluation. But lazy is different from strict with undefined order: in particular I do want all of my monadic actions to get executed.
Lazy IO, combined with immediately seq'ing all of the arguments. In particular, seq does not impose ordering constraints.
In this sense, I want something more flexible than monadic ordering but less flexible than full-out laziness.

This problem of over-sequentializing monad code is known as the "commutative monads problem".
Commutative monads are monads for which the order of actions makes no difference (they commute), that is when following code:
do a <- f x
b <- g y
m a b
is the same as:
do b <- g y
a <- f x
m a b
there are many monads that commute (e.g. Maybe, Random). If the monad is commutative, then the operations captured within it can be computed in parallel, for example. They are very useful!
However, we don't have a good syntax for monads that commute, though a lot of people have asked for such a thing -- it is still an open research problem.
As an aside, applicative functors do give us such freedom to reorder computations, however, you have to give up on the notion of bind (as suggestions for e.g. liftM2 show).

This is a deep dirty hack, but it seems like it should do the trick to me.
{-# OPTIONS_GHC -fglasgow-exts #-}
{-# LANGUAGE MagicHash #-}
module Unorder where
import GHC.Types
unorder :: IO a -> IO b -> IO (a, b)
unorder (IO f) (IO g) = IO $ \rw# ->
let (# _, x #) = f rw#
(# _, y #) = g rw#
in (# rw# , (x,y) #)
Since this puts non-determinism in the hands of the compiler, it should behave "correctly" (i.e. nondeterministically) with regards to control flow issues (i.e. exceptions) as well.
On the other hand, we can't pull the same trick most standard monads such as State and Either a since we're really relying on spooky action at a distance available via messing with the RealWorld token. To get the right behavior, we'd need some annotation available to the optimizer that indicated we were fine with nondeterministic choice between two non-equivalent alternatives.

The semantics of C don't actually specify what order these two function calls will be evaluated, so the compiler is free to move things around as it pleases.
But what if doSomething() causes a side-effect that will change the behavior of doSomethingElse()? Do you really want the compiler to mess with the order? (Hint: no) The fact that you are in a monad at all suggests that such may be the case. Your note that "you lose sharing on the results" also suggests this.
However, note that monadic does not always mean sequenced. It's not exactly what you describe, but you may be interested in the Par monad, which allows you to run your actions in parallel.
You are interested in leaving the order undefined so that the compiler can magically optimize it for you. Perhaps instead you should use something like the Par monad to indicate the dependencies (some things inevitably have to happen before others) and then let the rest run in parallel.
Side note: don't confuse Haskell's return to be anything like C's return

Related

Is the monadic IO construct in Haskell just a convention?

Regarding Haskell's monadic IO construct:
Is it just a convention or is there is a implementation reason for it?
Could you not just FFI into libc.so instead to do your I/O, and skip the whole IO-monad thing?
Would it work anyway, or is the outcome nondeterministic because of:
(a) Haskell's lazy evaluation?
(b) another reason, like that the GHC is pattern-matching for the IO monad and then handling it in a special way (or something else)?
What is the real reason - in the end you end up using a side effect, so why not do it the simple way?

Yes, monadic I/O is a consequence of Haskell being lazy. Specifically, though, monadic I/O is a consequence of Haskell being pure, which is effectively necessary for a lazy language to be predictable.†
This is easy to illustrate with an example. Imagine for a moment that Haskell were not pure, but it was still lazy. Instead of putStrLn having the type String -> IO (), it would simply have the type String -> (), and it would print a string to stdout as a side-effect. The trouble with this is that this would only happen when putStrLn is actually called, and in a lazy language, functions are only called when their results are needed.
Here’s the trouble: putStrLn produces (). Looking at a value of type () is useless, because () means “boring”. That means that this program would do what you expect:
main :: ()
main =
case putStr "Hello, " of
() -> putStrLn " world!"
-- prints “Hello, world!\n”
But I think you can agree that programming style is pretty odd. The case ... of is necessary, however, because it forces the evaluation of the call to putStr by matching against (). If you tweak the program slightly:
main :: ()
main =
case putStr "Hello, " of
_ -> putStrLn " world!"
…now it only prints world!\n, and the first call isn’t evaluated at all.
This actually gets even worse, though, because it becomes even harder to predict as soon as you start trying to do any actual programming. Consider this program:
printAndAdd :: String -> Integer -> Integer -> Integer
printAndAdd msg x y = putStrLn msg `seq` (x + y)
main :: ()
main =
let x = printAndAdd "first" 1 2
y = printAndAdd "second" 3 4
in (y + x) `seq` ()
Does this program print out first\nsecond\n or second\nfirst\n? Without knowing the order in which (+) evaluates its arguments, we don’t know. And in Haskell, evaluation order isn’t even always well-defined, so it’s entirely possible that the order in which the two effects are executed is actually completely impossible to determine!
This problem doesn’t arise in strict languages with a well-defined evaluation order, but in a lazy language like Haskell, we need some additional structure to ensure side-effects are (a) actually evaluated and (b) executed in the correct order. Monads happen to be an interface that elegantly provide the necessary structure to enforce that order.
Why is that? And how is that even possible? Well, the monadic interface provides a notion of data dependency in the signature for >>=, which enforces a well-defined evaluation order. Haskell’s implementation of IO is “magic”, in the sense that it’s implemented in the runtime, but the choice of the monadic interface is far from arbitrary. It seems to be a fairly good way to encode the notion of sequential actions in a pure language, and it makes it possible for Haskell to be lazy and referentially transparent without sacrificing predictable sequencing of effects.
It’s worth noting that monads are not the only way to encode side-effects in a pure way—in fact, historically, they’re not even the only way Haskell handled side-effects. Don’t be misled into thinking that monads are only for I/O (they’re not), only useful in a lazy language (they’re plenty useful to maintain purity even in a strict language), only useful in a pure language (many things are useful monads that aren’t just for enforcing purity), or that you needs monads to do I/O (you don’t). They do seem to have worked out pretty well in Haskell for those purposes, though.
† Regarding this, Simon Peyton Jones once noted that “Laziness keeps you honest” with respect to purity.

Could you just FFI into libc.so instead to do IO and skip the IO Monad thing?
Taking from https://en.wikibooks.org/wiki/Haskell/FFI#Impure_C_Functions, if you declare an FFI function as pure (so, with no reference to IO), then
GHC sees no point in calculating twice the result of a pure function
which means the the result of the function call is effectively cached. For example, a program where a foreign impure pseudo-random number generator is declared to return a CUInt
{-# LANGUAGE ForeignFunctionInterface #-}
import Foreign
import Foreign.C.Types
foreign import ccall unsafe "stdlib.h rand"
c_rand :: CUInt
main = putStrLn (show c_rand) >> putStrLn (show c_rand)
returns the same thing every call, at least on my compiler/system:
16807
16807
If we change the declaration to return a IO CUInt
{-# LANGUAGE ForeignFunctionInterface #-}
import Foreign
import Foreign.C.Types
foreign import ccall unsafe "stdlib.h rand"
c_rand :: IO CUInt
main = c_rand >>= putStrLn . show >> c_rand >>= putStrLn . show
then this results in (probably) a different number returned each call, since the compiler knows it's impure:
16807
282475249
So you're back to having to use IO for the calls to the standard libraries anyway.

Let's say using FFI we defined a function
c_write :: String -> ()
which lies about its purity, in that whenever its result is forced it prints the string. So that we don't run into the caching problems in Michal's answer, we can define these functions to take an extra () argument.
c_write :: String -> () -> ()
c_rand :: () -> CUInt
On an implementation level this will work as long as CSE is not too aggressive (which it is not in GHC because that can lead to unexpected memory leaks, it turns out). Now that we have things defined this way, there are many awkward usage questions that Alexis points out—but we can solve them using a monad:
newtype IO a = IO { runIO :: () -> a }
instance Monad IO where
return = IO . const
m >>= f = IO $ \() -> let x = runIO m () in x `seq` f x
rand :: IO CUInt
rand = IO c_rand
Basically, we just stuff all of Alexis's awkward usage questions into a monad, and as long as we use the monadic interface, everything stays predictable. In this sense IO is just a convention—because we can implement it in Haskell there is nothing fundamental about it.
That's from the operational vantage point.
On the other hand, Haskell's semantics in the report are specified using denotational semantics alone. And, in my opinion, the fact that Haskell has a precise denotational semantics is one of the most beautiful and useful qualities of the language, allowing me a precise framework to think about abstractions and thus manage complexity with precision. And while the usual abstract IO monad has no accepted denotational semantics (to the lament of some of us), it is at least conceivable that we could create a denotational model for it, thus preserving some of the benefits of Haskell's denotational model. However, the form of I/O we have just given is completely incompatible with Haskell's denotational semantics.
Simply put, there are only supposed to be two distinguishable values (modulo fatal error messages) of type (): () and ⊥. If we treat FFI as the fundamentals of I/O and use the IO monad only "as a convention", then we effectively add a jillion values to every type—to continue having a denotational semantics, every value must be adjoined with the possibility of performing I/O prior to its evaluation, and with the extra complexity this introduces, we essentially lose all our ability to consider any two distinct programs equivalent except in the most trivial cases—that is, we lose our ability to refactor.
Of course, because of unsafePerformIO this is already technically the case, and advanced Haskell programmers do need to think about the operational semantics as well. But most of the time, including when working with I/O, we can forget about all that and refactor with confidence, precisely because we have learned that when we use unsafePerformIO, we must be very careful to ensure it plays nicely, that it still affords us as much denotational reasoning as possible. If a function has unsafePerformIO, I automatically give it 5 or 10 times more attention than regular functions, because I need to understand the valid patterns of use (usually the type signature tells me everything I need to know), I need to think about caching and race conditions, I need to think about how deep I need to force its results, etc. It's awful[1]. The same care would be necessary of FFI I/O.
In conclusion: yes it's a convention, but if you don't follow it then we can't have nice things.
[1] Well actually I think it's pretty fun, but it's surely not practical to think about all those complexities all the time.

That depends on what the meaning of "is" is—or at least what the meaning of "convention" is.
If a "convention" means "the way things are usually done" or "an agreement among parties covering a particular matter" then it is easy to give a boring answer: yes, the IO monad is a convention. It is the way the designers of the language agreed to handle IO operations and the way that users of the language usually perform IO operations.
If we are allowed to choose a more interesting definition of "convention" then we can get a more interesting answer. If a "convention" is a discipline imposed on a language by its users in order to achieve a particular goal without assistance from the language itself, then the answer is no: the IO monad is the opposite of a convention. It is a discipline enforced by the language that assists its users in constructing and reasoning about programs.
The purpose of the IO type is to create a clear distinction between the types of "pure" values and the types of values which require execution by the runtime system to generate a meaningful result. The Haskell type system enforces this strict separation, preventing a user from (say) creating a value of type Int which launches the proverbial missiles. This is not a convention in the second sense: its entire goal is to move the discipline required to perform side effects in a safe and consistent way from the user and onto the language and its compiler.
Could you just FFI into libc.so instead to do IO and skip the IO Monad thing?
It is, of course, possible to do IO without an IO monad: see almost every other extant programming language.
Would it work anyway or is the outcome undeterministic because of Haskell evaluating lazy or something else, like that the GHC is pattern matching for IO Monad and then handling it in a special way or something else.
There is no such thing as a free lunch. If Haskell allowed any value to require execution involving IO then it would have to lose other things that we value. The most important of these is probably referential transparency: if myInt could sometimes be 1 and sometimes be 5 depending on external factors then we would lose most of our ability to reason about our programs in a rigorous way (known as equational reasoning).
Laziness was mentioned in other answers, but the issue with laziness would specifically be that sharing would no longer be safe. If x in let x = someExpensiveComputationOf y in x * x was not referentially transparent, GHC would not be able to share the work and would have to compute it twice.
What is the real reason?
Without the strict separation of effectful values from non-effectful values provided by IO and enforced by the compiler, Haskell would effectively cease to be Haskell. There are plenty of languages that don't enforce this discipline. It would be nice to have at least one around that does.
In the end you end you endup in a sideeffect. So why not do it the simple way?
Yes, in the end your program is represented by a value called main with an IO type. But the question isn't where you end up, it's where you start: If you start by being able to differentiate between effectful and non-effectful values in a rigorous way then you gain a lot of advantages when constructing that program.

What is the real reason - in the end you end up using a side effect, so why not do it the simple way?
...you mean like Standard ML? Well, there's a price to pay - instead of being able to write:
any :: (a -> Bool) -> [a] -> Bool
any p = or . map p
you would have to type out this:
any :: (a -> Bool) -> [a] -> Bool
any p [] = False
any p (y:ys) = y || any p ys
Could you not just FFI into libc.so instead to do your I/O, and skip the whole IO-monad thing?
Let's rephrase the question:
Could you not just do I/O like Standard ML, and skip the whole IO-monad thing?
...because that's effectively what you would be trying to do. Why "trying"?
SML is strict, and relies on sytactic ordering to specify the order of evaluation everywhere;
Haskell is non-strict, and relies on data dependencies to specify the order of evaluation for certain expressions e.g. I/O actions.
So:
Would it work anyway, or is the outcome nondeterministic because of:
(a) Haskell's lazy evaluation?
(a) - the combination of non-strict semantics and visible effects is generally useless. For an amusing exhibition of just how useless this combination can be, watch this presentation by Erik Meiyer (the slides can be found here).

In what sense is IO monad special (if at all)?

After diving into monads I understand that they are a general concept to allow chaining computations inside some context (failing, non-determinism, state, etc) and there is no magic behind them.
Still IO monad feels even if not magic, but special.
you cannot escape IO monad like you can with other monads
IO action can only be run by the main function
IO is always at the bottom of a monad transformers chain
implementation of IO monad is unclear and source code shows some Haskell internals
What are the reasons for the points above? What makes IO so special?
Update: in pure code evaluation order doesn't matter. But it does matter when doing IO (we want to save customer before we read it). From what I understand IO monad gives us such ordering guarantees. Is it a property of a monad in general or it is something specific to IO monad?

you cannot escape IO monad like you can with other monads
I'm not sure what you mean by “escape”. If you mean, somehow unwrap the internal representation of the monadic values (e.g. list -> sequence of cons-cells) – that is an implementation detail. In fact, you can define an IO emulation in pure Haskell – basically just a State over a lot of globally-available data. That would have all the semantics of IO, but without actually interacting with the real world, only a simulation thereof.
If you mean, you can “extract values” from within the monad – nope, that's not in general possible, even for most pure-haskell monads. For instance, you can't extract a value from Maybe a (could be Nothing) or Reader b a (what if b is uninhabited?)
IO action can only be run by the main function
Well, in a sense, everything can only be run by the main function. Code that's not in some way invoked from main will only sit there, you could replace it with undefined without changing anything.
IO is always at the bottom of a monad transformers chain
True, but that's also the case for e.g. ST.
Implementation of IO monad is unclear and source code shows some Haskell internals
Again: implementation is just, well, an implementation detail. The complexity of IO implementations actually has a lot to do with it being highly optimised; the same is also true for specialised pure monads (e.g. attoparsec).
As I already said), much simpler implementations are possible, they just wouldn't be as useful as the full-fledged optimised real-world IO type.
Fortunately, implementation needn't really bother you; the inside of IO may be unclear but the outside, the actual monadic interface, is pretty simple.
in pure code evaluation order doesn't matter
First of all – evaluation order can matter in pure code!
Prelude> take 10 $ foldr (\h t -> h `seq` (h:t)) [] [0..]
[0,1,2,3,4,5,6,7,8,9]
Prelude> take 10 $ foldr (\h t -> t `seq` (h:t)) [] [0..]
^CInterrupted.
But indeed you can never get a wrong, non-⊥ result due to misordered pure-code evaluation. That actually doesn't apply to reordering monadic actions (IO or otherwise) though, because changing the sequence order changes the actual structure of the resultant action, not just the evaluation order that the runtime will use to construct this structure.
For example (list monad):
Prelude> [1,2,3] >>= \e -> [10,20,30] >>= \z -> [e+z]
[11,21,31,12,22,32,13,23,33]
Prelude> [10,20,30] >>= \z -> [1,2,3] >>= \e -> [e+z]
[11,12,13,21,22,23,31,32,33]
All that said, certainly IO is quite special, indeed I think some people hesitate to call it a monad (it's a bit unclear what it's actually supposed to mean for IO to fulfill the monad laws). In particular, lazy IO is one massive troublemaker (and best just avoided, at all times).

Similar statements could be made about the ST monad, or arguably the STM monad (although you can actually implement that one on top of IO).
Basically things like the Reader monad, Error monad, Writer monad, etc., are all just pure code. The ST and IO monads are the only ones that really do impure things (state mutation, etc), so they aren't definable in pure Haskell. They have to be "hard-wired" into the compiler somewhere.

Monads in Haskell and Purity

My question is whether monads in Haskell actually maintain Haskell's purity, and if so how. Frequently I have read about how side effects are impure but that side effects are needed for useful programs (e.g. I/O). In the next sentence it is stated that Haskell's solution to this is monads. Then monads are explained to some degree or another, but not really how they solve the side-effect problem.
I have seen this and this, and my interpretation of the answers is actually one that came to me in my own readings -- the "actions" of the IO monad are not the I/O themselves but objects that, when executed, perform I/O. But it occurs to me that one could make the same argument for any code or perhaps any compiled executable. Couldn't you say that a C++ program only produces side effects when the compiled code is executed? That all of C++ is inside the IO monad and so C++ is pure? I doubt this is true, but I honestly don't know in what way it is not. In fact, didn't Moggi (sp?) initially use monads to model the denotational semantics of imperative programs?
Some background: I am a fan of Haskell and functional programming and I hope to learn more about both as my studies continue. I understand the benefits of referential transparency, for example. The motivation for this question is that I am a grad student and I will be giving 2 1-hour presentations to a programming languages class, one covering Haskell in particular and the other covering functional programming in general. I suspect that the majority of the class is not familiar with functional programming, maybe having seen a bit of scheme. I hope to be able to (reasonably) clearly explain how monads solve the purity problem without going into category theory and the theoretical underpinnings of monads, which I wouldn't have time to cover and anyway I don't fully understand myself -- certainly not well enough to present.
I wonder if "purity" in this context is not really well-defined?

It's hard to argue conclusively in either direction because "pure" is not particularly well-defined. Certainly, something makes Haskell fundamentally different from other languages, and it's deeply related to managing side-effects and the IO type¹, but it's not clear exactly what that something is. Given a concrete definition to refer to we could just check if it applies, but this isn't easy: such definitions will tend to either not match everyone's expectations or be too broad to be useful.
So what makes Haskell special, then? In my view, it's the separation between evaluation and execution.
The base language—closely related to the λ-caluclus—is all about the former. You work with expressions that evaluate to other expressions, 1 + 1 to 2. No side-effects here, not because they were suppressed or removed but simply because they don't make sense in the first place. They're not part of the model² any more than, say, backtracking search is part of the model of Java (as opposed to Prolog).
If we just stuck to this base language with no added facilities for IO, I think it would be fairly uncontroversial to call it "pure". It would still be useful as, perhaps, a replacement for Mathematica. You would write your program as an expression and then get the result of evaluating the expression at the REPL. Nothing more than a fancy calculator, and nobody accuses the expression language you use in a calculator of being impure³!
But, of course, this is too limiting. We want to use our language to read files and serve web pages and draw pictures and control robots and interact with the user. So the question, then, is how to preserve everything we like about evaluating expressions while extending our language to do everything we want.
The answer we've come up with? IO. A special type of expression that our calculator-like language can evaluate which corresponds to doing some effectful actions. Crucially, evaluation still works just as before, even for things in IO. The effects get executed in the order specified by the resulting IO value, not based on how it was evaluated. IO is what we use to introduce and manage effects into our otherwise-pure expression language.
I think that's enough to make describing Haskell as "pure" meaningful.
footnotes
¹ Note how I said IO and not monads in general: the concept of a monad is immensely useful for dozens of things unrelated to input and output, and the IO types has to be more than just a monad to be useful. I feel the two are linked too closely in common discourse.
² This is why unsafePerformIO is so, well, unsafe: it breaks the core abstraction of the language. This is the same as, say, putzing with specific registers in C: it can both cause weird behavior and stop your code from being portable because it goes below C's level of abstraction.
³ Well, mostly, as long as we ignore things like generating random numbers.

A function with type, for example, a -> IO b always returns an identical IO action when given the same input; it is pure in that it cannot possibly inspect the environment, and obeys all the usual rules for pure functions. This means that, among other things, the compiler can apply all of its usual optimization rules to functions with an IO in their type, because it knows they are still pure functions.
Now, the IO action returned may, when run, look at the environment, read files, modify global state, whatever, all bets are off once you run an action. But you don't necessarily have to run an action; you can put five of them into a list and then run them in reverse of the order in which you created them, or never run some of them at all, if you want; you couldn't do this if IO actions implicitly ran themselves when you created them.
Consider this silly program:
main :: IO ()
main = do
inputs <- take 5 . lines <$> getContents
let [line1,line2,line3,line4,line5] = map print inputs
line3
line1
line2
line5
If you run this, and then enter 5 lines, you will see them printed back to you but in a different order, and with one omitted, even though our haskell program runs map print over them in the order they were received. You couldn't do this with C's printf, because it immediately performs its IO when called; haskell's version just returns an IO action, which you can still manipulate as a first-class value and do whatever you want with.

I see two main differences here:
1) In haskell, you can do things that are not in the IO monad. Why is this good? Because if you have a function definitelyDoesntLaunchNukes :: Int -> IO Int you don't know that the resulting IO action doesn't launch nukes, it might for all you know. cantLaunchNukes :: Int -> Int will definitely not launch any nukes (barring any ugly hacks that you should avoid in nearly all circumstances).
2) In haskell, it's not just a cute analogy: IO actions are first class values. You can put them in lists, and leave them there for as long as you want, they won't do anything unless they somehow become part of the main action. The closest that C has to that are function pointers, which are quite a bit more cumbersome to use. In C++ (and most modern imperative languages really) you have closures which technically could be used for this purpose, but rarely are - mainly because Haskell is pure and they aren't.
Why does that distinction matter here? Well, where are you going to get your other IO actions/closures from? Probably, functions/methods of some description. Which, in an impure language, can themselves have side effects, rendering the attempt of isolating them in these languages pointless.

fiction-mode: Active
It was quite a challenge, and I think a wormhole could be forming in the neighbour's backyard, but I managed to grab part of a Haskell I/O implementation from an alternate reality:
class Kleisli k where
infixr 1 >=>
simple :: (a -> b) -> (a -> k b)
(>=>) :: (a -> k b) -> (b -> k c) -> a -> k c
instance Kleisli IO where
simple = primSimpleIO
(>=>) = primPipeIO
primitive primSimpleIO :: (a -> b) -> (a -> IO b)
primitive primPipeIO :: (a -> IO b) -> (b -> IO c) -> a -> IO c
Back in our slightly-mutilated reality (sorry!), I have used this other form of Haskell I/O to define our form of Haskell I/O:
instance Monad IO where
return x = simple (const x) ()
m >>= k = (const m >=> k) ()
and it works!
fiction-mode: Offline
My question is whether monads in Haskell actually maintain Haskell's purity, and if so how.
The monadic interface, by itself, doesn't maintain restrain the effects - it is only an interface, albeit a jolly-versatile one. As my little work of fiction shows, there are other possible interfaces for the job - it's just a matter of how convenient they are to use in practice.
For an implementation of Haskell I/O, what keeps the effects under control is that all the pertinent entities, be they:
IO, simple, (>=>) etc
or:
IO, return, (>>=) etc
are abstract - how the implementation defines those is kept private.
Otherwise, you would be able to devise "novelties" like this:
what_the_heck =
do spare_world <- getWorld -- how easy was that?
launchMissiles -- let's mess everything up,
putWorld spare_world -- and bring it all back :-D
what_the_heck -- that was fun; let's do it again!
(Aren't you glad our reality isn't quite so pliable? ;-)
This observation extends to types like ST (encapsulated state) and STM (concurrency) and their stewards (runST, atomically etc). For types like lists, Maybe and Either, their orthodox definitions in Haskell means no visible effects.
So when you see an interface - monadic, applicative, etc - for certain abstract types, any effects (if they exist) are contained by keeping its implementation private; safe from being used in aberrant ways.

Aren't Monads essentially just "conceptual" sugar?

Let's assume we allow two types of functions in Haskell:
strictly pure (as usual)
potentially non-pure (procedures)
The distinction would be made f.x. by declaring that a dot (".") as first letter of a function name declares it as a non-pure procedure.
And further we would set the rules:
pure functions may be called by pure and non-pure functions
non-pure functions may only be called by non-pure functions
and: non-pure functions may be programmed imperatively
With this syntactical sugar and specifications at hand - would there still be a need for Monads? Is there anything Monads could do which above rule set would not allow?
B/c as I came to understand Monads - this is exactly their purpose. Just that very smart people managed to achieve this purely by means of functional methods with a cathegory theoretical tool set at hand.

No.
Monads have nothing to do with purity or impurity in principle. It just so happens that IO models impure code nicely, but Monad class can be used perfectly right for instances like State or Maybe, which are absolutely pure.
Monads also allow expressing complex context hierarchies (as I choose to call them) in a very explicit way. "pure/impure" isn't the only division you might want to make. Consider authorized/unauthorized, for example. The list of possible uses goes on and on... I'd encourage you to look at other commonly used instances, like ST, STM, RWS, "restricted IO" and friends to get a better understanding of the possibilities.
Soon enough you'll start making your own monads, tailored to the problem at hand.

B/c as I came to understand Monads - this is exactly their purpose.
Monads in their full generality have nothing to do with purity/impurity or imperative sequencing. So no, monads are most certainly not conceptual sugar for effect encapsulation if I understood your question.
Consider that overwhelmingly most of the monads in the Prelude: List, Reader, State, Cont, Either, (->) have nothing to do with effects or IO. It's a very common misconception to assume that IO is the "canonical" monad, while in fact it's really a degenerate case.

B/c as I came to understand Monads - this is exactly their purpose.
This: http://homepages.inf.ed.ac.uk/wadler/topics/monads.html#monads was the first paper on monads in Haskell:
Category theorists invented monads in the 1960's to concisely express certain aspects of universal algebra.
So right away you can see that monads have nothing to do with "pure" / "impure" computations. The most common monad (in the world!) is Maybe:
data Maybe a
= Nothing
| Just a
instance Monad Maybe where
return = Just
Nothing >>= f = Nothing
Just x >>= f = f x
The monad is the four-tuple (Maybe, liftM, return, join), where:
liftM :: (a -> b) -> Maybe a -> Maybe b
liftM f mb = mb >>= return . f
join :: Maybe (Maybe a) -> Maybe a
join Nothing = Nothing
join (Just mb) = mb
Note that liftM takes a non-Maybe function (not pure!) and applies it to a Maybe, while join takes a two-level Maybe and simplifies it to a single layer (so Just in the result comes from having two layers of Just:
join (Just (Just x)) = Just x
while Nothing in the result can come from a Nothing at either layer:
join Nothing = Nothing
join (Just Nothing) = Nothing
). We can translate these terms as follows:
Maybe: a value that may or may not be present.
liftM: apply this function to the value if present, otherwise do nothing.
return: take this value that is present and inject it into the extra structure of a Maybe.
join: take a (value that may or may not be present) that may or may not be present and erase the distinction between the two layers of 'may or may not be present'.
Now, Maybe is a perfectly suitable data type. In scripting languages, it's expressed just by using undef or the equivalent to express Nothing, and representing Just x the same way as x. In C/C++, it's expressed by using a pointer type t*, and allowing the pointer to be NULL. In Scala, there's an explicit container type: http://www.scala-lang.org/api/current/index.html#scala.Option . So you can't say "oh, that's just exceptions" because languages with exceptions still want to be able to express 'no value here' without throwing an exception, and then apply functions if the value is there (which is why Scala's Option type has a foreach method). But 'apply this function if the value is there' is precisely what Maybe's >>= does! So it's a very useful operation.
You'll notice that C and the scripting languages don't generally allow the distinction between Nothing and Just Nothing to be made --- a value is either present or not. In a functional language --- like Haskell --- it's interesting to allow both versions, which is why we need join to erase that distinction when we're done with it. (And, mathematically, it's nicer to define >>= in terms of liftM and join than the other way around).
Incidentally, to clear up the common mis-conception about Haskell and IO: GHC's implementation of IO wrappers up the side-effectfulness of GHC's implementation of I/O. Even that is a terrible design decision of GHC --- imperative (which is different than impure!) I/O can be modeled monadically without impurity at any level of the system. You don't need side effects (at any layer of the system) to do I/O!

How do you identify monadic design patterns?

I my way to learn Haskell I'm starting to grasp the monad concept and starting to use the known monads in my code but I'm still having difficulties approaching monads from a designer point of view. In OO there are several rules like, "identify nouns" for objects, watch for some kind of state and interface... but I'm not able to find equivalent resources for monads.
So how do you identify a problem as monadic in nature? What are good design patterns for monadic design? What's your approach when you realize that some code would be better refactored into a monad?

A helpful rule of thumb is when you see values in a context; monads can be seen as layering "effects" on:
Maybe: partiality (uses: computations that can fail)
Either: short-circuiting errors (uses: error/exception handling)
[] (the list monad): nondeterminism (uses: list generation, filtering, ...)
State: a single mutable reference (uses: state)
Reader: a shared environment (uses: variable bindings, common information, ...)
Writer: a "side-channel" output or accumulation (uses: logging, maintaining a write-only counter, ...)
Cont: non-local control-flow (uses: too numerous to list)
Usually, you should generally design your monad by layering on the monad transformers from the standard Monad Transformer Library, which let you combine the above effects into a single monad. Together, these handle the majority of monads you might want to use. There are some additional monads not included in the MTL, such as the probability and supply monads.
As far as developing an intuition for whether a newly-defined type is a monad, and how it behaves as one, you can think of it by going up from Functor to Monad:
Functor lets you transform values with pure functions.
Applicative lets you embed pure values and express application — (<*>) lets you go from an embedded function and its embedded argument to an embedded result.
Monad lets the structure of embedded computations depend on the values of previous computations.
The easiest way to understand this is to look at the type of join:
join :: (Monad m) => m (m a) -> m a
This means that if you have an embedded computation whose result is a new embedded computation, you can create a computation that executes the result of that computation. So you can use monadic effects to create a new computation based on values of previous computations, and transfer control flow to that computation.
Interestingly, this can be a weakness of structuring things monadically: with Applicative, the structure of the computation is static (i.e. a given Applicative computation has a certain structure of effects that cannot change based on intermediate values), whereas with Monad it is dynamic. This can restrict the optimisation you can do; for instance, applicative parsers are less powerful than monadic ones (well, this isn't strictly true, but it effectively is), but they can be optimised better.
Note that (>>=) can be defined as
m >>= f = join (fmap f m)
and so a monad can be defined simply with return and join (assuming it's a Functor; all monads are applicative functors, but Haskell's typeclass hierarchy unfortunately doesn't require this for historical reasons).
As an additional note, you probably shouldn't focus too heavily on monads, no matter what kind of buzz they get from misguided non-Haskellers. There are many typeclasses that represent meaningful and powerful patterns, and not everything is best expressed as a monad. Applicative, Monoid, Foldable... which abstraction to use depends entirely on your situation. And, of course, just because something is a monad doesn't mean it can't be other things too; being a monad is just another property of a type.
So, you shouldn't think too much about "identifying monads"; the questions are more like:
Can this code be expressed in a simpler monadic form? With which monad?
Is this type I've just defined a monad? What generic patterns encoded by the standard functions on monads can I take advantage of?

Follow the types.
If you find you have written functions with all of these types
(a -> b) -> YourType a -> YourType b
a -> YourType a
YourType (YourType a) -> YourType a
or all of these types
a -> YourType a
YourType a -> (a -> YourType b) -> YourType b
then YourType may be a monad. (I say “may” because the functions must obey the monad laws as well.)
(Remember you can reorder arguments, so e.g. YourType a -> (a -> b) -> YourType b is just (a -> b) -> YourType a -> YourType b in disguise.)
Don't look out only for monads! If you have functions of all of these types
YourType
YourType -> YourType -> YourType
and they obey the monoid laws, you have a monoid! That can be valuable too. Similarly for other typeclasses, most importantly Functor.

There's the effect view of monads:
Maybe - partiality / failure short-circuiting
Either - error reporting / short-circuiting (like Maybe with more information)
Writer - write only "state", commonly logging
Reader - read-only state, commonly environment passing
State - read / write state
Resumption - pausable computation
List - multiple successes
Once you are familiar with these effects its easy to build monads combining them with monad transformers. Note that combining some monads needs special care (particularly Cont and any monads with backtracking).
One thing important to note is there aren't many monads. There are some exotic ones that aren't in the standard libraries e.g the probability monad and variations of the Cont monad like Codensity. But unless you are doing something mathematical its unlikely you will invent (or discover) a new monad, however if you use Haskell long enough you'll build many monads that are different combinations of the standard ones.
Edit - Also note that the order you stack monad transformers results in different monads:
If you add ErrorT (transformer) to a Writer monad, you get this monad Either err (log,a) - you can only access the log if you have no error.
If you add WriterT (transfomer) to an Error monad, you get this monad (log, Either err a) which always gives access to the log.

This is sort of a non-answer, but I feel it is important to say anyways. Just ask! StackOverflow, /r/haskell, and the #haskell irc channel are all great places to get quick feedback from smart people. If you are working on a problem, and you suspect that there's some monadic magic that could make it easier, just ask! The Haskell community loves to solve problems, and is ridiculously friendly.
Don't misunderstand, I'm not encouraging you to never learn for yourself. Quite the contrary, interacting with the Haskell community is one of the best ways to learn. LYAH and RWH, 2 Haskell books that are freely available online, come highly recommended as well.
Oh, and don't forget to play, play, play! As you play around with monadic code, you'll start to get the feel of what "shape" monads have, and when monadic combinators can be useful. If you're rolling your own monad, then usually the type system will guide you to an obvious, simple solution. But to be honest, you should rarely need to roll your own instance of Monad, since Haskell libraries provide tons of useful things as mentioned by other answerers.

There's a common notion that one sees in many programming languages of an "infectious function tag" -- some special behavior for a function that must extend to its callers as well.
Rust functions can be unsafe, meaning they perform operations that can potentially violate memory unsafety. unsafe functions can call normal functions, but any function that calls an unsafe function must be unsafe as well.
Python functions can be async, meaning they return a promise rather than an actual value. async functions can call normal functions, but invocation of an async function (via await) can only be done by another async function.
Haskell functions can be impure, meaning they return an IO a rather than an a. Impure functions can call pure functions, but impure functions can only be called by other impure functions.
Mathematical functions can be partial, meaning they don't map every value in their domain to an output. The definitions of partial functions can reference total functions, but if a total function maps some of its domain to a partial function, it becomes partial as well.
While there may be ways to invoke a tagged function from an untagged function, there is no general way, and doing so can often be dangerous and threatens to break the abstraction the language tries to provide.
The benefit, then, of having tags is that you can expose a set of special primitives that are given this tag and have any function that uses these primitives make that clear in its signature.
Say you're a language designer and you recognize this pattern, and you decide that you want to allow user-defined tags. Let's say the user defined a tag Err, representing computations that may throw an error. A function using Err might look like this:
function div <Err> (n: Int, d: Int): Int
if d == 0
throwError("division by 0")
else
return (n / d)
If we wanted to simplify things, we might observe that there's nothing erroneous about taking arguments - it's computing the return value where problems might arise. So we can restrict tags to functions that take no arguments, and have div return a closure rather than the actual value:
function div(n: Int, d: Int): <Err> () -> Int
() =>
if d == 0
throwError("division by 0")
else
return (n / d)
In a lazy language such as Haskell, we don't need the closure, and can just return a lazy value directly:
div :: Int -> Int -> Err Int
div _ 0 = throwError "division by 0"
div n d = return $ n / d
It is now apparent that, in Haskell, tags need no special language support - they are ordinary type constructors. Let's make a typeclass for them!
class Tag m where
We want to be able to call an untagged function from a tagged function, which is equivalent to turning an untagged value (a) into a tagged value (m a).
addTag :: a -> m a
We also want to be able to take a tagged value (m a) and apply a tagged function (a -> m b) to get a tagged result (m b):
embed :: m a -> (a -> m b) -> m b
This, of course, is precisely the definition of a monad! addTag corresponds to return, and embed corresponds to (>>=).
It is now clear that "tagged functions" are merely a type of monad. As such, whenever you spot a place where a "function tag" could apply, chances are you've got a place suitable for a monad.
P.S. Regarding the tags I've mentioned in this answer: Haskell models impurity with the IO monad and partiality with the Maybe monad. Most languages implement async/promises fairly transparently, and there seems to be a Haskell package called promise that mimics this functionality. The Err monad is equivalent to the Either String monad. I'm not aware of any language that models memory unsafety monadically, it could be done.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string