How to return a pure value from a impure method

How to return a pure value from a impure method - haskell

I know it must sound trivial but I was wondering how you can unwrap a value from a functor and return it as pure value?
I have tried:
f::IO a->a
f x=(x>>=)
f= >>=
What should I place in the right side? I can't use return since it will wrap it back again.

It's a frequently asked question: How do I extract 'the' value from my monad, not only in Haskell, but in other languages as well. I have a theory about why this question keeps popping up, so I'll try to answer according to that; I hope it helps.
Containers of single values
You can think of a functor (and therefore also a monad) as a container of values. This is most palpable with the (redundant) Identity functor:
Prelude Control.Monad.Identity> Identity 42
Identity 42
This is nothing but a wrapper around a value, in this case 42. For this particular container, you can extract the value, because it's guaranteed to be there:
Prelude Control.Monad.Identity> runIdentity $ Identity 42
42
While Identity seems fairly useless, you can find other functors that seem to wrap a single value. In F#, for example, you'll often encounter containers like Async<'a> or Lazy<'a>, which are used to represent asynchronous or lazy computations (Haskell doesn't need the latter, because it's lazy by default).
You'll find lots of other single-value containers in Haskell, such as Sum, Product, Last, First, Max, Min, etc. Common to all of those is that they wrap a single value, which means that you can extract the value.
I think that when people first encounter functors and monads, they tend to think of the concept of a data container in this way: as a container of a single value.
Containers of optional values
Unfortunately, some common monads in Haskell seem to support that idea. For example, Maybe is a data container as well, but one that can contain zero or one value. You can, unfortunately, still extract the value if it's there:
Prelude Data.Maybe> fromJust $ Just 42
42
The problem with this is that fromJust isn't total, so it'll crash if you call it with a Nothing value:
Prelude Data.Maybe> fromJust Nothing
*** Exception: Maybe.fromJust: Nothing
You can see the same sort of problem with Either. Although I'm not aware of a built-in partial function to extract a Right value, you can easily write one with pattern matching (if you ignore the compiler warning):
extractRight :: Either l r -> r
extractRight (Right x) = x
Again, it works in the 'happy path' scenario, but can just as easily crash:
Prelude> extractRight $ Right 42
42
Prelude> extractRight $ Left "foo"
*** Exception: <interactive>:12:1-26: Non-exhaustive patterns in function extractRight
Still, since functions like fromJust exists, I suppose it tricks people new to the concept of functors and monads into thinking about them as data containers from which you can extract a value.
When you encounter something like IO Int for the first time, then, I can understand why you'd be tempted to think of it as a container of a single value. In a sense, it is, but in another sense, it isn't.
Containers of multiple values
Even with lists, you can (attempt to) extract 'the' value from a list:
Prelude> head [42..1337]
42
Still, it could fail:
Prelude> head []
*** Exception: Prelude.head: empty list
At this point, however, it should be clear that attempting to extract 'the' value from any arbitrary functor is nonsense. A list is a functor, but it contains an arbitrary number of values, including zero and infinitely many.
What you can always do, though, is to write functions that take a 'contained' value as input and returns another value as output. Here's an arbitrary example of such a function:
countAndMultiply :: Foldable t => (t a, Int) -> Int
countAndMultiply (xs, factor) = length xs * factor
While you can't 'extract the value' out of a list, you can apply your function to each of the values in a list:
Prelude> fmap countAndMultiply [("foo", 2), ("bar", 3), ("corge", 2)]
[6,9,10]
Since IO is a functor, you can do the same with it as well:
Prelude> foo = return ("foo", 2) :: IO (String, Int)
Prelude> :t foo
foo :: IO (String, Int)
Prelude> fmap countAndMultiply foo
6
The point is that you don't extract a value from a functor, you step into the functor.
Monad
Sometimes, the function you apply to a functor returns a value that's already wrapped in the same data container. As an example, you may have a function that splits a string over a particular character. To keep things simple, let's just look at the built-in function words that splits a string into words:
Prelude> words "foo bar"
["foo","bar"]
If you have a list of strings, and apply words to each, you'll get a nested list:
Prelude> fmap words ["foo bar", "baz qux"]
[["foo","bar"],["baz","qux"]]
The result is a nested data container, in this case a list of lists. You can flatten it with join:
Prelude Control.Monad> join $ fmap words ["foo bar", "baz qux"]
["foo","bar","baz","qux"]
This is the original definition of a monad: it's a functor that you can flatten. In modern Haskell, Monad is defined by bind (>>=), from which one can derive join, but it's also possible to derive >>= from join.
IO as all values
At this point, you may be wondering: what does that have to do with IO? Isn't IO a a container of a single value of the type a?
Not really. One interpretation of IO is that it's a container that holds an arbitrary value of the type a. According to that interpretation, it's analogous to the many-worlds interpretation of quantum mechanics. IO a is the superposition of all possible values of the type a.
In Schrödinger's original thought experiment, the cat in the box is both alive and dead until observed. That's two possible states superimposed. If we think about a variable called catIsAlive, it would be equivalent to the superposition of True and False. So, you can think of IO Bool as a set of possible values {True, False} that will only collapse into a single value when observed.
Likewise, IO Word8 can be interpreted as a superposition of the set of all possible Word8 values, i.e. {0, 1, 2,.. 255}, IO Int as the superposition of all possible Int values, IO String as all possible String values (i.e. an infinite set), and so on.
So how do you observe the value, then?
You don't extract it, you work within the data container. You can, as shown above, fmap and join over it. So, you can write your application as pure functions that you then compose with impure values with fmap, >>=, join, and so on.

It is trivial, so this will be a long answer. In short, the problem lies in the signature, IO a -> a, is not a type properly allowed in Haskell. This really has less to do with IO being a functor than the fact that IO is special.
For some functors you can recover the pure value. For instance a partially applied pair, (,) a, is a functor. We unwrap the value via snd.
snd :: (a,b) -> b
snd (_,b) = b
So this is a functor that we can unwrap to a pure value, but this really has nothing to do with being a functor. It has more to do with pairs belonging to a different Category Theoretic concept, Comonad, with:
extract :: Comonad w => w a -> a
Any Comonad will be a functor for which you can recover the pure value.
Many (non-comonadic) functors have--lets say "evaluators"--which allow something like what is being asked. For instance, we can evaluate a Maybe with maybe :: a -> Maybe a -> a. By providing a default, maybe a has the desired type, Maybe a -> a. Another useful example from State, evalState :: State s a -> s -> a, has its arguments reversed but the concept is the same; given the monad, State s a, and initial state, s, we unwrap the pure value, a.
Finally to the specifics of IO. No "evaluator" for IO is provided in the Haskell language or libraries. We might consider running the program itself an evaluator--much in the same vein of evalState. But if that's a valid conceptual move, then it should only help to convince you that there is no sane way to unwrap from IO--any program written is just the IO a input to its evaluator function.
Instead, what you are forced to do--by design--is to work within the IO monad. For instance, if you have a pure function, f :: a -> b, you apply it within the IO context via, fmap f :: IO a -> IO b
TL;DR You can't get a pure value out of the IO monad. Apply pure functions within the IO context, for instance by fmap

Related

What is the logic behind allowing only same Monad types to be concatenated with `>>` operator?

Though it is okay to bind IO [[Char]] and IO () but its not allowed to bind Maybe with IO. Can someone give an example how this relaxation would lead to a bad design? Why freedom in the polymorphic type of Monad is allowed though not the Monad itself?

There are a lot of good theoretical reasons, including "that's not what Monad is." But let's step away from that for a moment and just look at the implementation details.
First off - Monad isn't magic. It's just a standard type class. Instances of Monad only get created when someone writes one.
Writing that instance is what defines how (>>) works. Usually it's done implicitly through the default definition in terms of (>>=), but that just is evidence that (>>=) is the more general operator, and writing it requires making all the same decisions that writing (>>) would take.
If you had a different operator that worked on more general types, you have to answer two questions. First, what would the types be? Second, how would you go about providing implementations? It's really not clear what the desired types would be, from your question. One of the following, I guess:
class Poly1 m n where
(>>) :: m a -> n b -> m b
class Poly2 m n where
(>>) :: m a -> n b -> n b
class Poly3 m n o | m n -> o where
(>>) :: m a -> n b -> o b
All of them could be implemented. But you lose two really important factors for using them practically.
You need to write an instance for every pair of types you plan to use together. This is a massively more complex undertaking than just an instance for each type. Something about n vs n^2.
You lose predictability. What does the operation even do? Here's where theory and practice intersect. The theory behind Monad places a lot of restrictions on the operations. Those restrictions are referred to as the "monad laws". They are beyond the ability to verify in Haskell, but any Monad instance that doesn't obey them is considered to be buggy. The end result is that you quickly can build an intuition for what the Monad operations do and don't do. You can use them without looking up the details of every type involved, because you know properties that they obey. None of those possible classes I suggested give you any kind of assurances like that. You just have no idea what they do.

I’m not sure that I understand your question correctly, but it’s definitely possible to compose Maybe with IO or [] in the same sense that you can compose IO with [].
For example, if you check the types in GHCI using :t,
getContents >>= return . lines
gives you an IO [String]. If you add
>>= return . map Text.Read.readMaybe
you get a type of IO [Maybe a], which is a composition of IO, [] and Maybe. You could then pass it to
>>= return . Data.Maybe.catMaybes
to flatten it to an IO [a]. Then you might pass the list of parsed valid input lines to a function that flattens it again and computes an output.
Putting this together, the program
import Text.Read (readMaybe)
import Data.Maybe (catMaybes)
main :: IO ()
main = getContents >>= -- IO String
return . lines >>= -- IO [String]
return . map readMaybe >>= -- IO [Maybe Int]
return . catMaybes >>= -- IO [Int]
return . (sum :: [Int] -> Int) >>= -- IO Int
print -- IO ()
with the input:
1
2
Ignore this!
3
prints 6.
It would also be possible to work with an IO (Maybe [String]), a Maybe [IO String], etc.
You can do this with >> as well. Contrived example: getContents >> (return . Just) False reads the input, ignores it, and gives you back an IO (Maybe Bool).

Why do we need monads?

In my humble opinion the answers to the famous question "What is a monad?", especially the most voted ones, try to explain what is a monad without clearly explaining why monads are really necessary. Can they be explained as the solution to a problem?

Why do we need monads?
We want to program only using functions. ("functional programming (FP)" after all).
Then, we have a first big problem. This is a program:
f(x) = 2 * x
g(x,y) = x / y
How can we say what is to be executed first? How can we form an ordered sequence of functions (i.e. a program) using no more than functions?
Solution: compose functions. If you want first g and then f, just write f(g(x,y)). This way, "the program" is a function as well: main = f(g(x,y)). OK, but ...
More problems: some functions might fail (i.e. g(2,0), divide by 0). We have no "exceptions" in FP (an exception is not a function). How do we solve it?
Solution: Let's allow functions to return two kind of things: instead of having g : Real,Real -> Real (function from two reals into a real), let's allow g : Real,Real -> Real | Nothing (function from two reals into (real or nothing)).
But functions should (to be simpler) return only one thing.
Solution: let's create a new type of data to be returned, a "boxing type" that encloses maybe a real or be simply nothing. Hence, we can have g : Real,Real -> Maybe Real. OK, but ...
What happens now to f(g(x,y))? f is not ready to consume a Maybe Real. And, we don't want to change every function we could connect with g to consume a Maybe Real.
Solution: let's have a special function to "connect"/"compose"/"link" functions. That way, we can, behind the scenes, adapt the output of one function to feed the following one.
In our case: g >>= f (connect/compose g to f). We want >>= to get g's output, inspect it and, in case it is Nothing just don't call f and return Nothing; or on the contrary, extract the boxed Real and feed f with it. (This algorithm is just the implementation of >>= for the Maybe type). Also note that >>= must be written only once per "boxing type" (different box, different adapting algorithm).
Many other problems arise which can be solved using this same pattern: 1. Use a "box" to codify/store different meanings/values, and have functions like g that return those "boxed values". 2. Have a composer/linker g >>= f to help connecting g's output to f's input, so we don't have to change any f at all.
Remarkable problems that can be solved using this technique are:
having a global state that every function in the sequence of functions ("the program") can share: solution StateMonad.
We don't like "impure functions": functions that yield different output for same input. Therefore, let's mark those functions, making them to return a tagged/boxed value: IO monad.
Total happiness!

The answer is, of course, "We don't". As with all abstractions, it isn't necessary.
Haskell does not need a monad abstraction. It isn't necessary for performing IO in a pure language. The IO type takes care of that just fine by itself. The existing monadic desugaring of do blocks could be replaced with desugaring to bindIO, returnIO, and failIO as defined in the GHC.Base module. (It's not a documented module on hackage, so I'll have to point at its source for documentation.) So no, there's no need for the monad abstraction.
So if it's not needed, why does it exist? Because it was found that many patterns of computation form monadic structures. Abstraction of a structure allows for writing code that works across all instances of that structure. To put it more concisely - code reuse.
In functional languages, the most powerful tool found for code reuse has been composition of functions. The good old (.) :: (b -> c) -> (a -> b) -> (a -> c) operator is exceedingly powerful. It makes it easy to write tiny functions and glue them together with minimal syntactic or semantic overhead.
But there are cases when the types don't work out quite right. What do you do when you have foo :: (b -> Maybe c) and bar :: (a -> Maybe b)? foo . bar doesn't typecheck, because b and Maybe b aren't the same type.
But... it's almost right. You just want a bit of leeway. You want to be able to treat Maybe b as if it were basically b. It's a poor idea to just flat-out treat them as the same type, though. That's more or less the same thing as null pointers, which Tony Hoare famously called the billion-dollar mistake. So if you can't treat them as the same type, maybe you can find a way to extend the composition mechanism (.) provides.
In that case, it's important to really examine the theory underlying (.). Fortunately, someone has already done this for us. It turns out that the combination of (.) and id form a mathematical construct known as a category. But there are other ways to form categories. A Kleisli category, for instance, allows the objects being composed to be augmented a bit. A Kleisli category for Maybe would consist of (.) :: (b -> Maybe c) -> (a -> Maybe b) -> (a -> Maybe c) and id :: a -> Maybe a. That is, the objects in the category augment the (->) with a Maybe, so (a -> b) becomes (a -> Maybe b).
And suddenly, we've extended the power of composition to things that the traditional (.) operation doesn't work on. This is a source of new abstraction power. Kleisli categories work with more types than just Maybe. They work with every type that can assemble a proper category, obeying the category laws.
Left identity: id . f = f
Right identity: f . id = f
Associativity: f . (g . h) = (f . g) . h
As long as you can prove that your type obeys those three laws, you can turn it into a Kleisli category. And what's the big deal about that? Well, it turns out that monads are exactly the same thing as Kleisli categories. Monad's return is the same as Kleisli id. Monad's (>>=) isn't identical to Kleisli (.), but it turns out to be very easy to write each in terms of the other. And the category laws are the same as the monad laws, when you translate them across the difference between (>>=) and (.).
So why go through all this bother? Why have a Monad abstraction in the language? As I alluded to above, it enables code reuse. It even enables code reuse along two different dimensions.
The first dimension of code reuse comes directly from the presence of the abstraction. You can write code that works across all instances of the abstraction. There's the entire monad-loops package consisting of loops that work with any instance of Monad.
The second dimension is indirect, but it follows from the existence of composition. When composition is easy, it's natural to write code in small, reusable chunks. This is the same way having the (.) operator for functions encourages writing small, reusable functions.
So why does the abstraction exist? Because it's proven to be a tool that enables more composition in code, resulting in creating reusable code and encouraging the creation of more reusable code. Code reuse is one of the holy grails of programming. The monad abstraction exists because it moves us a little bit towards that holy grail.

Benjamin Pierce said in TAPL
A type system can be regarded as calculating a kind of static
approximation to the run-time behaviours of the terms in a program.
That's why a language equipped with a powerful type system is strictly more expressive, than a poorly typed language. You can think about monads in the same way.
As #Carl and sigfpe point, you can equip a datatype with all operations you want without resorting to monads, typeclasses or whatever other abstract stuff. However monads allow you not only to write reusable code, but also to abstract away all redundant detailes.
As an example, let's say we want to filter a list. The simplest way is to use the filter function: filter (> 3) [1..10], which equals [4,5,6,7,8,9,10].
A slightly more complicated version of filter, that also passes an accumulator from left to right, is
swap (x, y) = (y, x)
(.*) = (.) . (.)
filterAccum :: (a -> b -> (Bool, a)) -> a -> [b] -> [b]
filterAccum f a xs = [x | (x, True) <- zip xs $ snd $ mapAccumL (swap .* f) a xs]
To get all i, such that i <= 10, sum [1..i] > 4, sum [1..i] < 25, we can write
filterAccum (\a x -> let a' = a + x in (a' > 4 && a' < 25, a')) 0 [1..10]
which equals [3,4,5,6].
Or we can redefine the nub function, that removes duplicate elements from a list, in terms of filterAccum:
nub' = filterAccum (\a x -> (x `notElem` a, x:a)) []
nub' [1,2,4,5,4,3,1,8,9,4] equals [1,2,4,5,3,8,9]. A list is passed as an accumulator here. The code works, because it's possible to leave the list monad, so the whole computation stays pure (notElem doesn't use >>= actually, but it could). However it's not possible to safely leave the IO monad (i.e. you cannot execute an IO action and return a pure value — the value always will be wrapped in the IO monad). Another example is mutable arrays: after you have leaved the ST monad, where a mutable array live, you cannot update the array in constant time anymore. So we need a monadic filtering from the Control.Monad module:
filterM :: (Monad m) => (a -> m Bool) -> [a] -> m [a]
filterM _ [] = return []
filterM p (x:xs) = do
flg <- p x
ys <- filterM p xs
return (if flg then x:ys else ys)
filterM executes a monadic action for all elements from a list, yielding elements, for which the monadic action returns True.
A filtering example with an array:
nub' xs = runST $ do
arr <- newArray (1, 9) True :: ST s (STUArray s Int Bool)
let p i = readArray arr i <* writeArray arr i False
filterM p xs
main = print $ nub' [1,2,4,5,4,3,1,8,9,4]
prints [1,2,4,5,3,8,9] as expected.
And a version with the IO monad, which asks what elements to return:
main = filterM p [1,2,4,5] >>= print where
p i = putStrLn ("return " ++ show i ++ "?") *> readLn
E.g.
return 1? -- output
True -- input
return 2?
False
return 4?
False
return 5?
True
[1,5] -- output
And as a final illustration, filterAccum can be defined in terms of filterM:
filterAccum f a xs = evalState (filterM (state . flip f) xs) a
with the StateT monad, that is used under the hood, being just an ordinary datatype.
This example illustrates, that monads not only allow you to abstract computational context and write clean reusable code (due to the composability of monads, as #Carl explains), but also to treat user-defined datatypes and built-in primitives uniformly.

I don't think IO should be seen as a particularly outstanding monad, but it's certainly one of the more astounding ones for beginners, so I'll use it for my explanation.
Naïvely building an IO system for Haskell
The simplest conceivable IO system for a purely-functional language (and in fact the one Haskell started out with) is this:
main₀ :: String -> String
main₀ _ = "Hello World"
With lazyness, that simple signature is enough to actually build interactive terminal programs – very limited, though. Most frustrating is that we can only output text. What if we added some more exciting output possibilities?
data Output = TxtOutput String
| Beep Frequency
main₁ :: String -> [Output]
main₁ _ = [ TxtOutput "Hello World"
-- , Beep 440 -- for debugging
]
cute, but of course a much more realistic “alterative output” would be writing to a file. But then you'd also want some way to read from files. Any chance?
Well, when we take our main₁ program and simply pipe a file to the process (using operating system facilities), we have essentially implemented file-reading. If we could trigger that file-reading from within the Haskell language...
readFile :: Filepath -> (String -> [Output]) -> [Output]
This would use an “interactive program” String->[Output], feed it a string obtained from a file, and yield a non-interactive program that simply executes the given one.
There's one problem here: we don't really have a notion of when the file is read. The [Output] list sure gives a nice order to the outputs, but we don't get an order for when the inputs will be done.
Solution: make input-events also items in the list of things to do.
data IO₀ = TxtOut String
| TxtIn (String -> [Output])
| FileWrite FilePath String
| FileRead FilePath (String -> [Output])
| Beep Double
main₂ :: String -> [IO₀]
main₂ _ = [ FileRead "/dev/null" $ \_ ->
[TxtOutput "Hello World"]
]
Ok, now you may spot an imbalance: you can read a file and make output dependent on it, but you can't use the file contents to decide to e.g. also read another file. Obvious solution: make the result of the input-events also something of type IO, not just Output. That sure includes simple text output, but also allows reading additional files etc..
data IO₁ = TxtOut String
| TxtIn (String -> [IO₁])
| FileWrite FilePath String
| FileRead FilePath (String -> [IO₁])
| Beep Double
main₃ :: String -> [IO₁]
main₃ _ = [ TxtIn $ \_ ->
[TxtOut "Hello World"]
]
That would now actually allow you to express any file operation you might want in a program (though perhaps not with good performance), but it's somewhat overcomplicated:
main₃ yields a whole list of actions. Why don't we simply use the signature :: IO₁, which has this as a special case?
The lists don't really give a reliable overview of program flow anymore: most subsequent computations will only be “announced” as the result of some input operation. So we might as well ditch the list structure, and simply cons a “and then do” to each output operation.
data IO₂ = TxtOut String IO₂
| TxtIn (String -> IO₂)
| Terminate
main₄ :: IO₂
main₄ = TxtIn $ \_ ->
TxtOut "Hello World"
Terminate
Not too bad!
So what has all of this to do with monads?
In practice, you wouldn't want to use plain constructors to define all your programs. There would need to be a good couple of such fundamental constructors, yet for most higher-level stuff we would like to write a function with some nice high-level signature. It turns out most of these would look quite similar: accept some kind of meaningfully-typed value, and yield an IO action as the result.
getTime :: (UTCTime -> IO₂) -> IO₂
randomRIO :: Random r => (r,r) -> (r -> IO₂) -> IO₂
findFile :: RegEx -> (Maybe FilePath -> IO₂) -> IO₂
There's evidently a pattern here, and we'd better write it as
type IO₃ a = (a -> IO₂) -> IO₂ -- If this reminds you of continuation-passing
-- style, you're right.
getTime :: IO₃ UTCTime
randomRIO :: Random r => (r,r) -> IO₃ r
findFile :: RegEx -> IO₃ (Maybe FilePath)
Now that starts to look familiar, but we're still only dealing with thinly-disguised plain functions under the hood, and that's risky: each “value-action” has the responsibility of actually passing on the resulting action of any contained function (else the control flow of the entire program is easily disrupted by one ill-behaved action in the middle). We'd better make that requirement explicit. Well, it turns out those are the monad laws, though I'm not sure we can really formulate them without the standard bind/join operators.
At any rate, we've now reached a formulation of IO that has a proper monad instance:
data IO₄ a = TxtOut String (IO₄ a)
| TxtIn (String -> IO₄ a)
| TerminateWith a
txtOut :: String -> IO₄ ()
txtOut s = TxtOut s $ TerminateWith ()
txtIn :: IO₄ String
txtIn = TxtIn $ TerminateWith
instance Functor IO₄ where
fmap f (TerminateWith a) = TerminateWith $ f a
fmap f (TxtIn g) = TxtIn $ fmap f . g
fmap f (TxtOut s c) = TxtOut s $ fmap f c
instance Applicative IO₄ where
pure = TerminateWith
(<*>) = ap
instance Monad IO₄ where
TerminateWith x >>= f = f x
TxtOut s c >>= f = TxtOut s $ c >>= f
TxtIn g >>= f = TxtIn $ (>>=f) . g
Obviously this is not an efficient implementation of IO, but it's in principle usable.

Monads serve basically to compose functions together in a chain. Period.
Now the way they compose differs across the existing monads, thus resulting in different behaviors (e.g., to simulate mutable state in the state monad).
The confusion about monads is that being so general, i.e., a mechanism to compose functions, they can be used for many things, thus leading people to believe that monads are about state, about IO, etc, when they are only about "composing functions".
Now, one interesting thing about monads, is that the result of the composition is always of type "M a", that is, a value inside an envelope tagged with "M". This feature happens to be really nice to implement, for example, a clear separation between pure from impure code: declare all impure actions as functions of type "IO a" and provide no function, when defining the IO monad, to take out the "a" value from inside the "IO a". The result is that no function can be pure and at the same time take out a value from an "IO a", because there is no way to take such value while staying pure (the function must be inside the "IO" monad to use such value). (NOTE: well, nothing is perfect, so the "IO straitjacket" can be broken using "unsafePerformIO : IO a -> a" thus polluting what was supposed to be a pure function, but this should be used very sparingly and when you really know to be not introducing any impure code with side-effects.

Monads are just a convenient framework for solving a class of recurring problems. First, monads must be functors (i.e. must support mapping without looking at the elements (or their type)), they must also bring a binding (or chaining) operation and a way to create a monadic value from an element type (return). Finally, bind and return must satisfy two equations (left and right identities), also called the monad laws. (Alternatively one could define monads to have a flattening operation instead of binding.)
The list monad is commonly used to deal with non-determinism. The bind operation selects one element of the list (intuitively all of them in parallel worlds), lets the programmer to do some computation with them, and then combines the results in all worlds to single list (by concatenating, or flattening, a nested list). Here is how one would define a permutation function in the monadic framework of Haskell:
perm [e] = [[e]]
perm l = do (leader, index) <- zip l [0 :: Int ..]
let shortened = take index l ++ drop (index + 1) l
trailer <- perm shortened
return (leader : trailer)
Here is an example repl session:
*Main> perm "a"
["a"]
*Main> perm "ab"
["ab","ba"]
*Main> perm ""
[]
*Main> perm "abc"
["abc","acb","bac","bca","cab","cba"]
It should be noted that the list monad is in no way a side effecting computation. A mathematical structure being a monad (i.e. conforming to the above mentioned interfaces and laws) does not imply side effects, though side-effecting phenomena often nicely fit into the monadic framework.

You need monads if you have a type constructor and functions that returns values of that type family. Eventually, you would like to combine these kind of functions together. These are the three key elements to answer why.
Let me elaborate. You have Int, String and Real and functions of type Int -> String, String -> Real and so on. You can combine these functions easily, ending with Int -> Real. Life is good.
Then, one day, you need to create a new family of types. It could be because you need to consider the possibility of returning no value (Maybe), returning an error (Either), multiple results (List) and so on.
Notice that Maybe is a type constructor. It takes a type, like Int and returns a new type Maybe Int. First thing to remember, no type constructor, no monad.
Of course, you want to use your type constructor in your code, and soon you end with functions like Int -> Maybe String and String -> Maybe Float. Now, you can't easily combine your functions. Life is not good anymore.
And here's when monads come to the rescue. They allow you to combine that kind of functions again. You just need to change the composition . for >==.

Why do we need monadic types?
Since it was the quandary of I/O and its observable effects in nonstrict languages like Haskell that brought the monadic interface to such prominence:
[...] monads are used to address the more general problem of computations (involving state, input/output, backtracking, ...) returning values: they do not solve any input/output-problems directly but rather provide an elegant and flexible abstraction of many solutions to related problems. [...] For instance, no less than three different input/output-schemes are used to solve these basic problems in Imperative functional programming, the paper which originally proposed `a new model, based on monads, for performing input/output in a non-strict, purely functional language'. [...]
[Such] input/output-schemes merely provide frameworks in which side-effecting operations can safely be used with a guaranteed order of execution and without affecting the properties of the purely functional parts of the language.
Claus Reinke (pages 96-97 of 210).
(emphasis by me.)
[...] When we write effectful code – monads or no monads – we have to constantly keep in mind the context of expressions we pass around.
The fact that monadic code ‘desugars’ (is implementable in terms of) side-effect-free code is irrelevant. When we use monadic notation, we program within that notation – without considering what this notation desugars into. Thinking of the desugared code breaks the monadic abstraction. A side-effect-free, applicative code is normally compiled to (that is, desugars into) C or machine code. If the desugaring argument has any force, it may be applied just as well to the applicative code, leading to the conclusion that it all boils down to the machine code and hence all programming is imperative.
[...] From the personal experience, I have noticed that the mistakes I make when writing monadic code are exactly the mistakes I made when programming in C. Actually, monadic mistakes tend to be worse, because monadic notation (compared to that of a typical imperative language) is ungainly and obscuring.
Oleg Kiselyov (page 21 of 26).
The most difficult construct for students to understand is the monad. I introduce IO without mentioning monads.
Olaf Chitil.
More generally:
Still, today, over 25 years after the introduction of the concept of monads to the world of functional programming, beginning functional programmers struggle to grasp the concept of monads. This struggle is exemplified by the numerous blog posts about the effort of trying to learn about monads. From our own experience we notice that even at university level, bachelor level students often struggle to comprehend monads and consistently score poorly on monad-related exam questions.
Considering that the concept of monads is not likely to disappear from the functional programming landscape any time soon, it is vital that we, as the functional programming community, somehow overcome the problems novices encounter when first studying monads.
Tim Steenvoorden, Jurriën Stutterheim, Erik Barendsen and Rinus Plasmeijer.
If only there was another way to specify "a guaranteed order of execution" in Haskell, while keeping the ability to separate regular Haskell definitions from those involved in I/O (and its observable effects) - translating this variation of Philip Wadler's echo:
val echoML : unit -> unit
fun echoML () = let val c = getcML () in
if c = #"\n" then
()
else
let val _ = putcML c in
echoML ()
end
fun putcML c = TextIO.output1(TextIO.stdOut,c);
fun getcML () = valOf(TextIO.input1(TextIO.stdIn));
...could then be as simple as:
echo :: OI -> ()
echo u = let !(u1:u2:u3:_) = partsOI u in
let !c = getChar u1 in
if c == '\n' then
()
else
let !_ = putChar c u2 in
echo u3
where:
data OI -- abstract
foreign import ccall "primPartOI" partOI :: OI -> (OI, OI)
⋮
foreign import ccall "primGetCharOI" getChar :: OI -> Char
foreign import ccall "primPutCharOI" putChar :: Char -> OI -> ()
⋮
and:
partsOI :: OI -> [OI]
partsOI u = let !(u1, u2) = partOI u in u1 : partsOI u2
How would this work? At run-time, Main.main receives an initial OI pseudo-data value as an argument:
module Main(main) where
main :: OI -> ()
⋮
...from which other OI values are produced, using partOI or partsOI. All you have to do is ensure each new OI value is used at most once, in each call to an OI-based definition, foreign or otherwise. In return, you get back a plain ordinary result - it isn't e.g. paired with some odd abstract state, or requires the use of a callback continuation, etc.
Using OI, instead of the unit type () like Standard ML does, means we can avoid always having to use the monadic interface:
Once you're in the IO monad, you're stuck there forever, and are reduced to Algol-style imperative programming.
Robert Harper.
But if you really do need it:
type IO a = OI -> a
unitIO :: a -> IO a
unitIO x = \ u -> let !_ = partOI u in x
bindIO :: IO a -> (a -> IO b) -> IO b
bindIO m k = \ u -> let !(u1, u2) = partOI u in
let !x = m u1 in
let !y = k x u2 in
y
⋮
So, monadic types aren't always needed - there are other interfaces out there:
LML had a fully fledged implementation of oracles running of a multi-processor (a Sequent Symmetry) back in ca 1989. The description in the Fudgets thesis refers to this implementation. It was fairly pleasant to work with and quite practical.
[...]
These days everything is done with monads so other solutions are sometimes forgotten.
Lennart Augustsson (2006).
Wait a moment: since it so closely resembles Standard ML's direct use of effects, is this approach and its use of pseudo-data referentially transparent?
Absolutely - just find a suitable definition of "referential transparency"; there's plenty to choose from...

How to handle side effect with Applicative?

I see everywhere that Applicative can handle side effects, but all the simple examples I've seen are just combining stuff together like:
> (,,) <$> [1,2] <*> ["a", "b", "c"] <*> ["foo", "bar"]
[(1,"a","foo"),(1,"a","bar"),(1,"b","foo"),(1,"b","bar"),
(1,"c","foo"),(1,"c","bar"),(2,"a","foo"),(2,"a","bar"),
(2,"b","foo"),(2,"b","bar"),(2,"c","foo"),(2,"c","bar")]
Which is cool but I can't see how that links to side effects. My understanding is that Applicative is a weak monad and so you can handle side effects (as you would do with a State monad) but you can't reuse the result of the previous side effect.
Does that mean that >> could be written for an Applicative and things like
do
print' "hello"
print' "world"
would make sense (with print' :: a -> Applicative something) (with the appropriate do-applicative extension).
In other world, is the difference between Monad and Applicative is that Monad allows x <- ... but Applicative doesn't.
Then, is the Writer monad, just an applicative?

Output
The applicative equivalent for >> is *>, so you can do
ghci> :m Control.Applicative
ghci> print 5 *> print 7
5
7
Input - a better case for Applicative
import Control.Applicative
data Company = Company {name :: String, size :: Int}
deriving Show
getCompany :: IO Company
getCompany = Company <$> getLine <*> readLn
Which works nicely for input:
ghci> getCompany >>= print
BigginsLtd
3
Company {name = "BigginsLtd", size = 3}
Notice that since we're using Applicative for IO, we're in the IO monad anyway, so can use >>= if we like. The benefit Applicative gives you is the nice syntax.
My favourite is with parsing, so I can do
data Statement = Expr Expression | If Condition Statement Statement
parseStatement = Expr <$> parseExpression <|>
If <$> (string "if" *> parseCondition)
<*> (string "then" *> parseStatement)
<*> (string "else" *> parseStatement)
The difference between Applicative and Monad
The difference between Applicative and Monad is that Monad has >>=, which lets you choose what side effect to use based on the value you have.
Using Monad:
don't_reformat_hard_drive :: Bool -> IO ()
don't_reformat_hard_drive yes = if yes then putStr "OK I didn't"
else putStr "oops!" >> System.IO.reformat "C:/"
maybeReformat :: IO ()
maybeReformat = WinXP.Dialogs.ask "Don't reformat hard drive?"
>>= don't_reformat_hard_drive
(There's no System.IO.reformat or WinXP.Dialogs.ask. This is just an example I found funny.)
Using Applicative:
response :: Bool -> () -> String
response yes () = if yes then "OK I didn't" else "oops!"
probablyReformat = response <$> WinXP.Dialogs.ask "Don't reformat hard drive?"
<*> System.IO.reformat "C:\"
Sadly, using Applicative I can't inspect the Boolean value to determine whether to reformat or not – the side effect order is determined at compile time, in an Applicative, and the hard drive will always be reformatted with this piece of code. I need the Monad's bind (>>=) to be able to stop the reformat.
.........your hard drive C: has been successfully reformatted.
"OK I didn't"

Applicative and Monad both provide ways of "combining" multiple side-effectful1 values into a single side-effectful value.
The Applicative interface for combining just lets you combine effectful values such that the resulting effectful value combines all their effects according to some "fixed" recipe.
The Monad interface for combining lets you combine effectful values in such a way that the effects of the combined value depends on what the original effectful values do when they're actually resolved.
For example, the State Integer monad/applicative is of values that depend upon (and affect) some Integer state. State Integer t values only have a concrete value in the presence of that state.
A function that takes two State Integer Char values (call them a and b) and gives us back a State Integer Char value and only uses the Applicative interface of State Integer must produce a value whose "statefulness" is always the same, regardless of what the Integer state value is and regardless of what Char values the inputs yield. For example, it could thread the state through a and then b, combining their Char values somehow. Or it could threat the state through b and then a. Or it could pick only a or only b. Or it could ignore both entirely, not taking either of their effects on the current Integer state, and just pure some char value. Or it could run either or both of them any fixed number of times in any fixed order, and it could incorporate any other State Integer t values it knows about. But whatever it does, it always does that, regardless of the current Integer state, or any values produced by any of the State Integer t values it manages to get its hands on.
A function that took the same inputs but was able to use the monad interface for State Integer can do much more than that. It can run a or b depending on whether the current Integer state is positive or negative. It can run a, then if the resulting Char is an ascii digit character it can turn the digit into a number and run b that many times. And so on.
So yes, a computation like:
do
print' "hello"
print' "world"
Is one that could be implemented using only the Applicative interface to whatever print' returns. You are close to correct that the difference between Monad and Applicative if both had a do-notation would be that monadic do would allow x <- ..., while applicative do would not. It's a bit more subtle than that though; this would work with Applicative too:
do x <- ...
y <- ...
pure $ f x y
What Applicative can't do is inspect x and y to decide what f to call on them (or do anything with the result of f x y other than just pure it.
You are not quite correct that there's no difference between Writer w as a monad and as an applicative, however. It's true that the monadic interface of Writer w doesn't allow the value yielded to depend on the effects (the "log"), so it must always be possible to rewrite any Writer w defined using monadic features to one that only uses applicative features and always yields the same value2. But the monadic interface allows the effects to depend on the values, which the applicative interface doesn't, so you can't always faithfully reproduce the effects of a Writer w using only the applicative interface.
See this (somewhat silly) example program:
import Control.Applicative
import Control.Monad.Writer
divM :: Writer [String] Int -> Writer [String] Int -> Writer [String] Int
divM numer denom
= do d <- denom
if d == 0
then do tell ["divide by zero"]
return 0
else do n <- numer
return $ n `div` d
divA :: Writer [String] Int -> Writer [String] Int -> Writer [String] Int
divA numer denom = divIfNotZero <$> numer <*> denom
where
divIfNotZero n d = if d == 0 then 0 else n `div` d
noisy :: Show a => a -> Writer [String] a
noisy x = tell [(show x)] >> return x
Then with that loaded in GHCi:
*Main> runWriter $ noisy 6 `divM` noisy 3
(2,["3","6"])
*Main> runWriter $ noisy 6 `divM` noisy 0
(0,["0","divide by zero"])
*Main> runWriter $ undefined `divM` noisy 0
(0,["0","divide by zero"])
*Main> runWriter $ noisy 6 `divA` noisy 3
(2,["6","3"])
*Main> runWriter $ noisy 6 `divA` noisy 0
(0,["6","0"])
*Main> runWriter $ undefined `divA` noisy 0
(0,*** Exception: Prelude.undefined
*Main> runWriter $ (tell ["undefined"] *> pure undefined) `divA` noisy 0
(0,["undefined","0"])
Note how with divM, whether numer's effects are included in numer `divM` denom depends on the value of denom (as does whether the effect of tell ["divide by zero"]). With the best the applicative interface can do, the effects of numer are always included in numerdivAdenom, even when lazy evaluation should mean that the value yielded by numer is never inspected. And it's not possible to helpfully add "divide by 0" to the log when the denominator is zero.
1 I don't like to think of "combining effectful values" as the definition of that monads and applicatives do, but it's an example of what you can do with them.
2 When bottoms aren't involved, anyway; you should be able to see from my example why bottom can mess up the equivalence.

In your example, the Applicative instance for lists is being used. The "effect" here is nondeterminism: returning more than one possible value.
The Applicative instance for lists calculates the possible combinations of elements from the individual list arguments.
However, what it can't do is make one of the lists depend on values contained in a previous list. You need the Monad instance for that.
For example, consider the code:
foo :: [Int]
foo = do
r <- [2,7]
if (even r)
then [2,5,6]
else [9,234,343]
Here we are generating lists whose values depend on a list that appeared earlier in the computation (the [2,7] one). You can't do that with Applicative.
An analogy
I hope the following analogy is not too obtuse, but Applicatives are like railroads.
In an Applicative, effects build the railroad upon which the locomotive of function application will travel. All the effects take place while building the railroad, not when the locomotive moves. The locomotive cannot change its course in any way.
A Monad is like having a locomotive in a railroad which is still under construction. Passengers can actually yell to the construction crew a few meters ahead to tell them things like "I like the scenery on this side, please lay the tracks a bit more towards the right" or "you can stop laying tracks, I get out right here". Of course, for this strange railway, the company can't provide a timetable or a fixed list of stops that can be checked before embarking on the train.
That's why Applicatives can be regarded as a form of "generalized function application", but Monads can't.

About value in context (applied in Monad)

I have a small question about value in context.
Take Just 'a', so the value in context of type Maybe in this case is 'a'
Take [3], so value in context of type [a] in this case is 3
And if you apply the monad for [3] like this: [3] >>= \x -> [x+3], it means you assign x with value 3. It's ok.
But now, take [3,2], so what is the value in the context of type [a]?. And it's so strange that if you apply monad for it like this:
[3,4] >>= \x -> x+3
It got the correct answer [6,7], but actually we don't understand what is x in this case. You can answer, ah x is 3 and then 4, and x feeds the function 2 times and concat as Monad does: concat (map f xs) like this:
[3,4] >>= concat (map f x)
So in this case, [3,4] will be assigned to the x. It means wrong, because [3,4] is not a value. Monad is wrong.

I think your problem is focusing too much on the values. A monad is a type constructor, and as such not concerned with how many and what kinds of values there are, but only the context.
A Maybe a can be an a, or nothing. Easy, and you correctly observed that.
An Either String a is either some a, or alternatively some information in form of a String (e.g. why the calculation of a failed).
Finally, [a] is an unknown number of as (or none at all), that may have resulted from an ambiguous computation, or one giving multiple results (like a quadratic equation).
Now, for the interpretation of (>>=), it is helpful to know that the essential property of a monad (how it is defined by category theorists) is
join :: m (m a) -> m a.
Together with fmap, (>>=) can be written in terms of join.
What join means is the following: A context, put in the same context again, still has the same resulting behavior (for this monad).
This is quite obvious for Maybe (Maybe a): Something can essentially be Just (Just x), or Nothing, or Just Nothing, which provides the same information as Nothing. So, instead of using Maybe (Maybe a), you could just have Maybe a and you wouldn't lose any information. That's what join does: it converts to the "easier" context.
[[a]] is somehow more difficult, but not much. You essentially have multiple/ambiguous results out of multiple/ambiguous results. A good example are the roots of a fourth-degree polynomial, found by solving a quadratic equation. You first get two solutions, and out of each you can find two others, resulting in four roots.
But the point is, it doesn't matter if you speak of an ambiguous ambiguous result, or just an ambiguous result. You could just always use the context "ambiguous", and transform multiple levels with join.
And here comes what (>>=) does for lists: it applies ambiguous functions to ambiguous values:
squareRoots :: Complex -> [Complex]
fourthRoots num = squareRoots num >>= squareRoots
can be rewritten as
fourthRoots num = join $ squareRoots `fmap` (squareRoots num)
-- [1,-1,i,-i] <- [[1,-1],[i,-i]] <- [1,-1] <- 1
since all you have to do is to find all possible results for each possible value.
This is why join is concat for lists, and in fact
m >>= f == join (fmap f) m
must hold in any monad.
A similar interpretation can be given to IO. A computation with side-effects, which can also have side-effects (IO (IO a)), is in essence just something with side-effects.

You have to take the word "context" quite broadly.
A common way of interpreting a list of values is that it represents an indeterminate value, so [3,4] represents a value which is three or four, but we don't know which (perhaps we just know it's a solution of x^2 - 7x + 12 = 0).
If we then apply f to that, we know it's 6 or 7 but we still don't know which.
Another example of an indeterminate value that you're more used to is 3. It could mean 3::Int or 3::Integer or even sometimes 3.0::Double. It feels easier because there's only one symbol representing the indeterminate value, whereas in a list, all the possibilities are listed (!).
If you write
asum = do
x <- [10,20]
y <- [1,2]
return (x+y)
You'll get a list with four possible answers: [11,12,21,22]
That's one for each of the possible ways you could add x and y.

It is not the values that are in the context, it's the types.
Just 'a' :: Maybe Char --- Char is in a Maybe context.
[3, 2] :: [Int] --- Int is in a [] context.
Whether there is one, none or many of the a in the m a is beside the point.
Edit: Consider the type of (>>=) :: Monad m => m a -> (a -> m b) -> m b.
You give the example Just 3 >>= (\x->Just(4+x)). But consider Nothing >>= (\x->Just(4+x)). There is no value in the context. But the type is in the context all the same.
It doesn't make sense to think of x as necessarily being a single value. x has a single type. If we are dealing with the Identity monad, then x will be a single value, yes. If we are in the Maybe monad, x may be a single value, or it may never be a value at all. If we are in the list monad, x may be a single value, or not be a value at all, or be various different values... but what it is not is the list of all those different values.
Your other example --- [2, 3] >>= (\x -> x + 3) --- [2, 3] is not passed to the function. [2, 3] + 3 would have a type error. 2 is passed to the function. And so is 3. The function is invoked twice, gives results for both those inputs, and the results are combined by the >>= operator. [2, 3] is not passed to the function.

"context" is one of my favorite ways to think about monads. But you've got a slight misconception.
Take Just 'a', so the value in context of type Maybe in this case is 'a'
Not quite. You keep saying the value in context, but there is not always a value "inside" a context, or if there is, then it is not necessarily the only value. It all depends on which context we are talking about.
The Maybe context is the context of "nullability", or potential absence. There might be something there, or there might be Nothing. There is no value "inside" of Nothing. So the maybe context might have a value inside, or it might not. If I give you a Maybe Foo, then you cannot assume that there is a Foo. Rather, you must assume that it is a Foo inside the context where there might actually be Nothing instead. You might say that something of type Maybe Foo is a nullable Foo.
Take [3], so value in context of type [a] in this case is 3
Again, not quite right. A list represents a nondeterministic context. We're not quite sure what "the value" is supposed to be, or if there is one at all. In the case of a singleton list, such as [3], then yes, there is just one. But one way to think about the list [3,4] is as some unobservable value which we are not quite sure what it is, but we are certain that it 3 or that it is 4. You might say that something of type [Foo] is a nondeterministic Foo.
[3,4] >>= \x -> x+3
This is a type error; not quite sure what you meant by this.
So in this case, [3,4] will be assigned to the x. It means wrong, because [3,4] is not a value. Monad is wrong.
You totally lost me here. Each instance of Monad has its own implementation of >>= which defines the context that it represents. For lists, the definition is
(xs >>= f) = (concat (map f xs))
You may want to learn about Functor and Applicative operations, which are related to the idea of Monad, and might help clear some confusion.

Unlike a Functor, a Monad can change shape?

I've always enjoyed the following intuitive explanation of a monad's power relative to a functor: a monad can change shape; a functor cannot.
For example: length $ fmap f [1,2,3] always equals 3.
With a monad, however, length $ [1,2,3] >>= g will often not equal 3. For example, if g is defined as:
g :: (Num a) => a -> [a]
g x = if x==2 then [] else [x]
then [1,2,3] >>= g is equal to [1,3].
The thing that troubles me slightly, is the type signature of g. It seems impossible to define a function which changes the shape of the input, with a generic monadic type such as:
h :: (Monad m, Num a) => a -> m a
The MonadPlus or MonadZero type classes have relevant zero elements, to use instead of [], but now we have something more than a monad.
Am I correct? If so, is there a way to express this subtlety to a newcomer to Haskell. I'd like to make my beloved "monads can change shape" phrase, just a touch more honest; if need be.

I've always enjoyed the following intuitive explanation of a monad's power relative to a functor: a monad can change shape; a functor cannot.
You're missing a bit of subtlety here, by the way. For the sake of terminology, I'll divide a Functor in the Haskell sense into three parts: The parametric component determined by the type parameter and operated on by fmap, the unchanging parts such as the tuple constructor in State, and the "shape" as anything else, such as choices between constructors (e.g., Nothing vs. Just) or parts involving other type parameters (e.g., the environment in Reader).
A Functor alone is limited to mapping functions over the parametric portion, of course.
A Monad can create new "shapes" based on the values of the parametric portion, which allows much more than just changing shapes. Duplicating every element in a list or dropping the first five elements would change the shape, but filtering a list requires inspecting the elements.
This is essentially how Applicative fits between them--it allows you to combine the shapes and parametric values of two Functors independently, without letting the latter influence the former.
Am I correct? If so, is there a way to express this subtlety to a newcomer to Haskell. I'd like to make my beloved "monads can change shape" phrase, just a touch more honest; if need be.
Perhaps the subtlety you're looking for here is that you're not really "changing" anything. Nothing in a Monad lets you explicitly mess with the shape. What it lets you do is create new shapes based on each parametric value, and have those new shapes recombined into a new composite shape.
Thus, you'll always be limited by the available ways to create shapes. With a completely generic Monad all you have is return, which by definition creates whatever shape is necessary such that (>>= return) is the identity function. The definition of a Monad tells you what you can do, given certain kinds of functions; it doesn't provide those functions for you.

Monad's operations can "change the shape" of values to the extent that the >>= function replaces leaf nodes in the "tree" that is the original value with a new substructure derived from the node's value (for a suitably general notion of "tree" - in the list case, the "tree" is associative).
In your list example what is happening is that each number (leaf) is being replaced by the new list that results when g is applied to that number. The overall structure of the original list still can be seen if you know what you're looking for; the results of g are still there in order, they've just been smashed together so you can't tell where one ends and the next begins unless you already know.
A more enlightening point of view may be to consider fmap and join instead of >>=. Together with return, either way gives an equivalent definition of a monad. In the fmap/join view, though, what is happening here is more clear. Continuing with your list example, first g is fmapped over the list yielding [[1],[],[3]]. Then that list is joined, which for list is just concat.

Just because the monad pattern includes some particular instances that allow shape changes doesn't mean every instance can have shape changes. For example, there is only one "shape" available in the Identity monad:
newtype Identity a = Identity a
instance Monad Identity where
return = Identity
Identity a >>= f = f a
In fact, it's not clear to me that very many monads have meaningful "shape"s: for example, what does shape mean in the State, Reader, Writer, ST, STM, or IO monads?

The key combinator for monads is (>>=). Knowing that it composes two monadic values and reading its type signature, the power of monads becomes more apparent:
(>>=) :: Monad m => m a -> (a -> m b) -> m b
The future action can depend entirely on the outcome of the first action, because it is a function of its result. This power comes at a price though: Functions in Haskell are entirely opaque, so there is no way for you to get any information about a composed action without actually running it. As a side note, this is where arrows come in.

A function with a signature like h indeed cannot do many interesting things beyond performing some arithmetic on its argument. So, you have the correct intuition there.
However, it might help to look at commonly used libraries for functions with similar signatures. You'll find that the most generic ones, as you'd expect, perform generic monad operations like return, liftM, or join. Also, when you use liftM or fmap to lift an ordinary function into a monadic function, you typically wind up with a similarly generic signature, and this is quite convenient for integrating pure functions with monadic code.
In order to use the structure that a particular monad offers, you inevitably need to use some knowledge about the specific monad you're in to build new and interesting computations in that monad. Consider the state monad, (s -> (a, s)). Without knowing that type, we can't write get = \s -> (s, s), but without being able to access the state, there's not much point to being in the monad.

The simplest type of a function satisfying the requirement I can imagine is this:
enigma :: Monad m => m () -> m ()
One can implement it in one of the following ways:
enigma1 m = m -- not changing the shape
enigma2 _ = return () -- changing the shape
This was a very simple change -- enigma2 just discards the shape and replaces it with the trivial one. Another kind of generic change is combining two shapes together:
foo :: Monad m => m () -> m () -> m ()
foo a b = a >> b
The result of foo can have shape different from both a and b.
A third obvious change of shape, requiring the full power of the monad, is a
join :: Monad m => m (m a) -> m a
join x = x >>= id
The shape of join x is usually not the same as of x itself.
Combining those primitive changes of shape, one can derive non-trivial things like sequence, foldM and alike.

Does
h :: (Monad m, Num a) => a -> m a
h 0 = fail "Failed."
h a = return a
suit your needs? For example,
> [0,1,2,3] >>= h
[1,2,3]

This isn't a full answer, but I have a few things to say about your question that don't really fit into a comment.
Firstly, Monad and Functor are typeclasses; they classify types. So it is odd to say that "a monad can change shape; a functor cannot." I believe what you are trying to talk about is a "Monadic value" or perhaps a "monadic action": a value whose type is m a for some Monad m of kind * -> * and some other type of kind *. I'm not entirely sure what to call Functor f :: f a, I suppose I'd call it a "value in a functor", though that's not the best description of, say, IO String (IO is a functor).
Secondly, note that all Monads are necessarily Functors (fmap = liftM), so I'd say the difference you observe is between fmap and >>=, or even between f and g, rather than between Monad and Functor.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string