Lifted „if“-function behaves unexpectedly

Lifted „if“-function behaves unexpectedly - haskell

in my program I use a function if' defined in one of the modules instead of the built-in if-then-else construct. It is defined trivially and works just fine.
However, there's one place in code where I need to apply it to monad values (IO in my case), i.e. the type signature should look somewhat like IO Bool -> IO a -> IO a -> IO a.
Naturally, I tried to lift it.
if' <$> mb <*> action1 <*> action2
But when I try to evaluate the expression, I don't get what I expect.
*Main> if' <$> return True <*> putStrLn "yes" <*> putStrLn "no"
yes
no
I know that <*> description reads „sequential application“, so maybe it's that. But what is happening here? Can I fix it without writing a whole new dedicated function?

(<*>) evaluates both of its arguments, so it effectively a pipeline of applications lifted over some applicative. The ability to inspect values and change the future of the computation is the extra power that the Monad class has over Applicative so you need to use that instead e.g.
mif' :: Monad m => m Bool -> m a -> m a -> m a
mif' bm xm ym = bm >>= (\b -> if b then xm else ym)

Related

What is the intuitive meaning of "join"?

What is the intuitive meaning of join for a Monad?
The monads-as-containers analogies make sense to me, and inside these analogies join makes sense. A value is double-wrapped and we unwrap one layer. But as we all know, a monad is not a container.
How might one write sensible, understandable code using join in normal circumstances, say when in IO?

An action :: IO (IO a) is a way of producing a way of producing an a. join action, then, is a way of producing an a by running the outermost producer of action, taking the producer it produced and then running that as well, to finally get to that juicy a.

join collapses consecutive layers of the type constructor.
A valid join must satisfy the property that, for any number of consecutive applications of the type constructor, it shouldn't matter the order in which we collapse the layers.
For example
ghci> let lolol = [[['a'],['b','c']],[['d'],['e']]]
ghci> lolol :: [[[Char]]]
ghci> lolol :: [] ([] ([] Char)) -- the type can also be expressed like this
ghci> join (fmap join lolol) -- collapse inner layers first
"abcde"
ghci> join (join lolol) -- collapse outer layers first
"abcde"
(We used fmap to "get inside" the outer monadic layer so that we could collapse the inner layers first.)
A small non container example where join is useful: for the function monad (->) a, join is equivalent to \f x -> f x x, a function of type (a -> a -> b) -> a -> b that applies two times the same argument to another function.

For the List monad, join is simply concat, and concatMap is join . fmap.
So join implicitly appears in any list expression which uses concat
or concatMap.
Suppose you were asked to find all of the numbers which are divisors of any
number in an input list. If you have a divisors function:
divisors :: Int -> [Int]
divisors n = [ d | d <- [1..n], mod n d == 0 ]
you might solve the problem like this:
foo xs = concat $ (map divisors xs)
Here we are thinking of solving the problem by first mapping the
divisors function over all of the input elements and then concatenating
all of the resulting lists. You might even think that this is a very
"functional" way of solving the problem.
Another approch would be to write a list comprehension:
bar xs = [ d | x <- xs, d <- divisors x ]
or using do-notation:
bar xs = do x <- xs
d <- divisors
return d
Here it might be said we're thinking a little more
imperatively - first draw a number from the list xs; then draw
a divisors from the divisors of the number and yield it.
It turns out, though, that foo and bar are exactly the same function.
Morever, these two approaches are exactly the same in any monad.
That is, for any monad, and appropriate monadic functions f and g:
do x <- f
y <- g x is the same as: (join . fmap g) f
return y
For instance, in the IO monad if we set f = getLine and g = readFile,
we have:
do x <- getLine
y <- readFile x is the same as: (join . fmap readFile) getLine
return y
The do-block is a more imperative way of expressing the action: first read a
line of input; then treat returned string as a file name, read the contents
of the file and finally return the result.
The equivalent join expression seems a little unnatural in the IO-monad.
However it shouldn't be as we are using it in exactly the same way as we
used concatMap in the first example.

Given an action that produces another action, run the action and then run the action that it produces.
If you imagine some kind of Parser x monad that parses an x, then Parser (Parser x) is a parser that does some parsing, and then returns another parser. So join would flatten this into a Parser x that just runs both actions and returns the final x.
Why would you even have a Parser (Parser x) in the first place? Basically, because fmap. If you have a parser, you can fmap a function that changes the result over it. But if you fmap a function that itself returns a parser, you end up with a Parser (Parser x), where you probably want to just run both actions. join implements "just run both actions".
I like the parsing example because a parser typically has a runParser function. And it's clear that a Parser Int is not an integer. It's something that can parse an integer, after you give it some input to parse from. I think a lot of people end up thinking of an IO Int as being just a normal integer but with this annoying IO bit that you can't get rid of. It isn't. It's an unexecuted I/O operation. There's no integer "inside" it; the integer doesn't exist until you actually perform the I/O.

I find these things easier to interpret by writing out the types and refactoring them a bit to reveal what the functions do.
Reader monad
The Reader type is defined thus, and its join function has the type shown:
newtype Reader r a = Reader { runReader :: r -> a }
join :: Reader r (Reader r a) -> Reader r a
Since this is a newtype, this means that the type Reader r a is isomorphic to r -> a. So we can refactor the type definition to give us this type that, albeit it's not the same, it's really "the same" with scare quotes:
In the (->) r monad, which is isomorphic to Reader r, join is the function:
join :: (r -> r -> a) -> r -> a
So the Reader join is the function that takes a two-place function (r -> r -> a) and applies to the same value at both its argument positions.
Writer monad
Since the Writer type has this definition:
newtype Writer w a = Writer { runWriter :: (a, w) }
...then when we remove the newtype, its join function has a type isomorphic to:
join :: Monoid w => ((a, w), w) -> (a, w)
The Monoid constraint needs to be there because the Monad instance for Writer requires it, and it lets us guess right away what the function does:
join ((a, w0), w1) = (a, w0 <> w1)
State monad
Similarly, since State has this definition:
newtype State s a = State { runState :: s -> (a, s) }
...then its join is like this:
join :: (s -> (s -> (a, s), s)) -> s -> (a, s)
...and you can also venture just writing it directly:
join f s0 = (a, s2)
where
(g, s1) = f s0
(a, s2) = g s1
{- Here's the "map" to the variable names in the function:
f g s2 s1 s0 s2
join :: (s -> (s -> (a, s ), s )) -> s -> (a, s )
-}
If you stare at this type a bit, you might think that it bears some resemblance to both the Reader and Writer's types for their join operations. And you'd be right! The Reader, Writer and State monads are all instances of a more general pattern called update monads.
List monad
join :: [[a]] -> [a]
As other people have pointed out, this is the type of the concat function.
Parsing monads
Here comes a really neat thing to realize. Very often, "fancy" monads turn out to be combinations or variants of "basic" ones like Reader, Writer, State or lists. So often what I do when confronted with a novel monad is ask: which of the basic monads does it resemble, and how?
Take for example parsing monads, which have been brought up in other answers here. A simplistic parser monad (with no support for important things like error reporting) looks like this:
newtype Parser a = Parser { runParser :: String -> [(a, String)] }
A Parser is a function that takes a string as input, and returns a list of candidate parses, where each candidate parse is a pair of:
A parse result of type a;
The leftovers (the suffix of the input string that was not consumed in that parse).
But notice that this type looks very much like the state monad:
newtype Parser a = Parser { runParser :: String -> [(a, String)] }
newtype State s a = State { runState :: s -> (a, s) }
And this is no accident! Parser monads are nondeterministic state monads, where the state is the unconsumed portion of the input string, and parse steps generate alternatives that may be later rejected in light of further input. List monads are often called "nondeterminism" monads, so it's no surprise that a parser resembles a mix of the state and list monads.
And this intuition can be systematized by using monad transfomers. The state monad transformer is defined like this:
newtype StateT s m a = StateT { runStateT :: s -> m (a, s) }
Which means that the Parser type from above can be written like this as well:
type Parser a = StateT String [] a
...and its Monad instance follows mechanically from those of StateT and [].
The IO monad
Imagine we could enumerate all of the possible primitive IO actions, somewhat like this:
{-# LANGUAGE GADTs #-}
data Command a where
-- An action that writes a char to stdout
putChar :: Char -> Command ()
-- An action that reads a char from stdin
getChar :: Command Char
-- ...
Then we could think of the IO type as this (which I've adapted from the highly-recommended Operational monad tutorial):
data IO a where
-- An `IO` action that just returns a constant value.
Return :: a -> IO a
-- An action that binds the result of a `Command` to
-- a function that computes the next step after it.
Bind :: Command x -> (x -> IO a) -> IO a
instance Monad IO where ...
Then join action would then look like this:
join :: IO (IO a) -> IO a
-- If the action is just `Return`, then its payload already
-- is what we need to return.
join (Return ioa) = ioa
-- If the action is a `Bind`, then its "next step" function
-- `f` produces `IO (IO a)`, so we can just recursively stick
-- a `join` to its result end.
join (Bind cmd f) = Bind cmd (join . f)
So all that the join does here is "chase down" the IO action until it sees a result that fits the pattern Return (ma :: IO a), and strip out the outer Return.
So what did I do here? Just like for parser monads, I just defined (or rather copied) a toy model of the IO type that has the virtue of being transparent. Then I work out the behavior of join from the toy model.

Why does bind (>>=) exist? What are typical cases where a solution without bind is ugly?

This is a type declaration of a bind method:
(>>=) :: (Monad m) => m a -> (a -> m b) -> m b
I read this as follows: apply a function that returns a wrapped value, to a wrapped value.
This method was included to Prelude as part of Monad typeclass. That means there are a lot of cases where it's needed.
OK, but I don't understand why it's a typical solution of a typical case at all.
If you already created a function which returns a wrapped value, why that function doesn't already take a wrapped value?
In other words, what are typical cases where there are many functions which take a normal value, but return a wrapped value? (instead of taking a wrapped value and return a wrapped value)

The 'unwrapping' of values is exactly what you want to keep hidden when dealing with monads, since it is this that causes a lot of boilerplate.
For example, if you have a sequence of operations which return Maybe values that you want to combine, you have to manually propagate Nothing if you receive one:
nested :: a -> Maybe b
nested x = case f x of
Nothing -> Nothing
Just r ->
case g r of
Nothing -> Nothing
Just r' ->
case h r' of
Nothing -> Nothing
r'' -> i r''
This is what bind does for you:
Nothing >>= _ = Nothing
Just a >>= f = f a
so you can just write:
nested x = f x >>= g >>= h >>= i
Some monads don't allow you to manually unpack the values at all - the most common example is IO. The only way to get the value from an IO is to map or >>= and both of these require you to propagate IO in the output.

Everyone focuses on IO monad and inability to "unwrap".
But a Monad is not always a container, so you can't unwrap.
Reader r a == r->a such that (Reader r) is a Monad
to my mind is the simplest best example of a Monad that is not a container.
You can easily write a function that can produce m b given a: a->(r->b). But you can't easily "unwrap" the value from m a, because a is not wrapped in it. Monad is a type-level concept.
Also, notice that if you have m a->m b, you don't have a Monad. What Monad gives you, is a way to build a function m a->m b from a->m b (compare: Functor gives you a way to build a function m a->m b from a->b; ApplicativeFunctor gives you a way to build a function m a->m b from m (a->b))

If you already created a function which returns a wrapped value, why that function doesn't already take a wrapped value?
Because that function would have to unwrap its argument in order to do something with it.
But for many choices of m, you can only unwrap a value if you will eventually rewrap your own result. This idea of "unwrap, do something, then rewrap" is embodied in the (>>=) function which unwraps for you, let's you do something, and forces you to rewrap by the type a -> m b.
To understand why you cannot unwrap without eventually rewrapping, we can look at some examples:
If m a = Maybe a, unwrapping for Just x would be easy: just return x. But how can we unwrap Nothing? We cannot. But if we know that we will eventually rewrap, we can skip the "do something" step and return Nothing for the overall operation.
If m a = [a], unwrapping for [x] would be easy: just return x. But for unwrapping [], we need the same trick as for Maybe a. And what about unwrapping [x, y, z]? If we know that we will eventually rewrap, we can execute the "do something" three times, for x, y and z and concat the results into a single list.
If m a = IO a, no unwrapping is easy because we only know the result sometimes in the future, when we actually run the IO action. But if we know that we will eventually rewrap, we can store the "do something" inside the IO action and perform it later, when we execute the IO action.
I hope these examples make it clear that for many interesting choices of m, we can only implement unwrapping if we know that we are going to rewrap. The type of (>>=) allows precisely this assumption, so it is cleverly chosen to make things work.

While (>>=) can sometimes be useful when used directly, its main purpose is to implement the <- bind syntax in do notation. It has the type m a -> (a -> m b) -> m b mainly because, when used in a do notation block, the right hand side of the <- is of type m a, the left hand side "binds" an a to the given identifier and, when combined with remainder of the do block, is of type a -> m b, the resulting monadic action is of type m b, and this is the only type it possibly could have to make this work.
For example:
echo = do
input <- getLine
putStrLn input
The right hand side of the <- is of type IO String
The left hands side of the <- with the remainder of the do block are of type String -> IO (). Compare with the desugared version using >>=:
echo = getLine >>= (\input -> putStrLn input)
The left hand side of the >>= is of type IO String. The right hand side is of type String -> IO (). Now, by applying an eta reduction to the lambda we can instead get:
echo = getLine >>= putStrLn
which shows why >>= is sometimes used directly rather than as the "engine" that powers do notation along with >>.
I'd also like to provide what I think is an important correction to the concept of "unwrapping" a monadic value, which is that it doesn't happen. The Monad class does not provide a generic function of type Monad m => m a -> a. Some particular instances do but this is not a feature of monads in general. Monads, generally speaking, cannot be "unwrapped".
Remember that m >>= k = join (fmap k m) is a law that must be true for any monad. Any particular implementation of >>= must satisfy this law and so must be equivalent to this general implementation.
What this means is that what really happens is that the monadic "computation" a -> m b is "lifted" to become an m a -> m (m b) using fmap and then applied the m a, giving an m (m b); and then join :: m (m a) -> m a is used to squish the two ms together to yield a m b. So the a never gets "out" of the monad. The monad is never "unwrapped". This is an incorrect way to think about monads and I would strongly recommend that you not get in the habit.

I will focus on your point
If you already created a function which returns a wrapped value, why
that function doesn't already take a wrapped value?
and the IO monad. Suppose you had
getLine :: IO String
putStrLn :: IO String -> IO () -- "already takes a wrapped value"
how one could write a program which reads a line and print it twice? An attempt would be
let line = getLine
in putStrLn line >> putStrLn line
but equational reasoning dictates that this is equivalent to
putStrLn getLine >> putStrLn getLine
which reads two lines instead.
What we lack is a way to "unwrap" the getLine once, and use it twice. The same issue would apply to reading a line, printing "hello", and then printing a line:
let line = getLine in putStrLn "hello" >> putStrLn line
-- equivalent to
putStrLn "hello" >> putStrLn getLine
So, we also lack a way to specify "when to unwrap" the getLine. The bind >>= operator provides a way to do this.
A more advanced theoretical note
If you swap the arguments around the (>>=) bind operator becomes (=<<)
(=<<) :: (a -> m b) -> (m a -> m b)
which turns any function f taking an unwrapped value into a function g taking a wrapped
value. Such g is known as the Kleisli extension of f. The bind operator guarantees
such an extension always exists, and provides a convenient way to use it.

Because we like to be able to apply functions like a -> b to our m as. Lifting such a function to m a -> m b is trivial (liftM, liftA, >>= return ., fmap) but the opposite is not necessarily possible.

You want some typical examples? How about putStrLn :: String -> IO ()? It would make no sense for this function to have the type IO String -> IO () because the origin of the string doesn't matter.
Anyway: You might have the wrong idea because of your "wrapped value" metaphor; I use it myself quite often, but it has its limitations. There isn't necessarily a pure way to get an a out of an m a - for example, if you have a getLine :: IO String, there's not a great deal of interesting things you can do with it - you can put it in a list, chain it in a row and other neat things, but you can't get any useful information out of it because you can't look inside an IO action. What you can do is use >>= which gives you a way to use the result of the action.
Similar things apply to monads where the "wrapping" metaphor applies too; For example the point Maybe monad is to avoid manually wrapping and unwrapping values with and from Just all the time.

My two most common examples:
1) I have a series of functions that generate a list of lists, but I finally need a flat list:
f :: a -> [a]
fAppliedThrice :: [a] -> [a]
fAppliedThrice aList = concat (map f (concat (map f (concat (map f a)))))
fAppliedThrice' :: [a] -> [a]
fAppliedThrice' aList = aList >>= f >>= f >>= f
A practical example of using this was when my functions fetched attributes of a foreign key relationship. I could just chain them together to finally obtain a flat list of attributes. Eg: Product hasMany Review hasMany Tag type relationship, and I finally want a list of all the tag names for a product. (I added some template-haskell and got a very good generic attribute fetcher for my purposes).
2) Say you have a series of filter-like functions to apply to some data. And they return Maybe values.
case (val >>= filter >>= filter2 >>= filter3) of
Nothing -> putStrLn "Bad data"
Just x -> putStrLn "Good data"

Monad with no wrapped value?

Most of the monad explanations use examples where the monad wraps a value. E.g. Maybe a, where the a type variable is what's wrapped. But I'm wondering about monads that never wrap anything.
For a contrived example, suppose I have a real-world robot that can be controlled, but has no sensors. Maybe I'd like to control it like this:
robotMovementScript :: RobotMonad ()
robotMovementScript = do
moveLeft 10
moveForward 25
rotate 180
main :: IO ()
main =
liftIO $ runRobot robotMovementScript connectToRobot
In our imaginary API, connectToRobot returns some kind of handle to the physical device. This connection becomes the "context" of the RobotMonad. Because our connection to the robot can never send a value back to us, the monad's concrete type is always RobotMonad ().
Some questions:
Does my contrived example seem right?
Am I understanding the idea of a monad's "context" correctly? Am I correct to describe the robot's connection as the context?
Does it make sense to have a monad--such as RobotMonad--that never wraps a value? Or is this contrary to the basic concept of monads?
Are monoids a better fit for this kind of application? I can imagine concatenating robot control actions with <>. Though do notation seems more readable.
In the monad's definition, would/could there be something that ensures the type is always RobotMonad ()?
I've looked at Data.Binary.Put as an example. It appears to be similar (or maybe identical?) to what I'm thinking of. But it also involves the Writer monad and the Builder monoid. Considering those added wrinkles and my current skill level, I think the Put monad might not be the most instructive example.
Edit
I don't actually need to build a robot or an API like this. The example is completely contrived. I just needed an example where there would never be a reason to pull a value out of the monad. So I'm not asking for the easiest way to solve the robot problem. Rather, this thought experiment about monads without inner values is an attempt to better understand monads generally.

TL;DR Monad without its wrapped value isn't very special and you get all the same power modeling it as a list.
There's a thing known as the Free monad. It's useful because it in some sense is a good representer for all other monads---if you can understand the behavior of the Free monad in some circumstance you have a good insight into how Monads generally will behave there.
It looks like this
data Free f a = Pure a
| Free (f (Free f a))
and whenever f is a Functor, Free f is a Monad
instance Functor f => Monad (Free f) where
return = Pure
Pure a >>= f = f a
Free w >>= f = Free (fmap (>>= f) w)
So what happens when a is always ()? We don't need the a parameter anymore
data Freed f = Stop
| Freed (f (Freed f))
Clearly this cannot be a Monad anymore as it has the wrong kind (type of types).
Monad f ===> f :: * -> *
Freed f :: *
But we can still define something like Monadic functionality onto it by getting rid of the a parts
returned :: Freed f
returned = Stop
bound :: Functor f -- compare with the Monad definition
=> Freed f -> Freed f -- with all `a`s replaced by ()
-> Freed f
bound Stop k = k Pure () >>= f = f ()
bound (Freed w) k = Free w >>= f =
Freed (fmap (`bound` k) w) Free (fmap (>>= f) w)
-- Also compare with (++)
(++) [] ys = ys
(++) (x:xs) ys = x : ((++) xs ys)
Which looks to be (and is!) a Monoid.
instance Functor f => Monoid (Freed f) where
mempty = returned
mappend = bound
And Monoids can be initially modeled by lists. We use the universal property of the list Monoid where if we have a function Monoid m => (a -> m) then we can turn a list [a] into an m.
convert :: Monoid m => (a -> m) -> [a] -> m
convert f = foldr mappend mempty . map f
convertFreed :: Functor f => [f ()] -> Freed f
convertFreed = convert go where
go :: Functor f => f () -> Freed f
go w = Freed (const Stop <$> w)
So in the case of your robot, we can get away with just using a list of actions
data Direction = Left | Right | Forward | Back
data ActionF a = Move Direction Double a
| Rotate Double a
deriving ( Functor )
-- and if we're using `ActionF ()` then we might as well do
data Action = Move Direction Double
| Rotate Double
robotMovementScript = [ Move Left 10
, Move Forward 25
, Rotate 180
]
Now when we cast it to IO we're clearly converting this list of directions into a Monad and we can see that as taking our initial Monoid and sending it to Freed and then treating Freed f as Free f () and interpreting that as an initial Monad over the IO actions we want.
But it's clear that if you're not making use of the "wrapped" values then you're not really making use of Monad structure. You might as well just have a list.

I'll try to give a partial answer for these parts:
Does it make sense to have a monad--such as RobotMonad--that never wraps a value? Or is this contrary to the basic concept of monads?
Are monoids a better fit for this kind of application? I can imagine concatenating robot control actions with <>. Though do notation seems more readable.
In the monad's definition, would/could there be something that ensures the type is always RobotMonad ()?
The core operation for monads is the monadic bind operation
(>>=) :: (Monad m) => m a -> (a -> m b) -> m b
This means that an action depends (or can depend) on the value of a previous action. So if you have a concept that inherently doesn't sometimes carry something that could be considered as a value (even in a complex form such as the continuation monad), monad isn't a good abstraction.
If we abandon >>= we're basically left with Applicative. It also allows us to compose actions, but their combinations can't depend on the values of preceding ones.
There is also an Applicative instance that carries no values, as you suggested: Data.Functor.Constant. Its actions of type a are required to be a monoid so that they can be composed together. This seems like the closest concept to your idea. And of course instead of Constant we could use a Monoid directly.
That said, perhaps simpler solution is to have a monad RobotMonad a that does carry a value (which would be essentially isomorphic to the Writer monad, as already mentioned). And declare runRobot to require RobotMonad (), so it'd be possible to execute only scripts with no value:
runRobot :: RobotMonad () -> RobotHandle -> IO ()
This would allow you to use the do notation and work with values inside the robot script. Even if the robot has no sensors, being able to pass values around can be often useful. And extending the concept would allow you to create a monad transformer such as RobotMonadT m a (resembling WriterT) with something like
runRobotT :: (Monad m) => RobotMonadT m () -> RobotHandle -> IO (m ())
or perhaps
runRobotT :: (MonadIO m) => RobotMonadT m () -> RobotHandle -> m ()
which would be a powerful abstraction that'd allow you to combine robotic actions with an arbitrary monad.

Well there is
data Useless a = Useless
instance Monad Useless where
return = const Useless
Useless >>= f = Useless
but as I indicated, that isn't usefull.
What you want is the Writer monad, which wraps up a monoid as a monad so you can use do notation.

Well it seems like you have a type that supports just
(>>) :: m a -> m b -> m b
But you further specify that you only want to be able to use m ()s. In this case I'd vote for
foo = mconcat
[ moveLeft 10
, moveForward 25
, rotate 180]
As the simple solution. The alternative is to do something like
type Robot = Writer [RobotAction]
inj :: RobotAction -> Robot ()
inj = tell . (:[])
runRobot :: Robot a -> [RobotAction]
runRobot = snd . runWriter
foo = runRobot $ do
inj $ moveLeft 10
inj $ moveForward 25
inj $ rotate 180
Using the Writer monad.
The problem with not wrapping the value is that
return a >>= f === f a
So suppose we had some monad that ignored the value, but contained other interesting information,
newtype Robot a = Robot {unRobot :: [RobotAction]}
addAction :: RobotAction -> Robot a -> Robot b
f a = Robot [a]
Now if we ignore the value,
instance Monad Robot where
return = const (Robot [])
a >>= f = a -- never run the function
Then
return a >>= f /= f a
so we don't have a monad. So if you want to the monad to have any interesting states, have == return false, then you need to store that value.

What advantage does Monad give us over an Applicative?

I've read this article, but didn't understand last section.
The author says that Monad gives us context sensitivity, but it's possible to achieve the same result using only an Applicative instance:
let maybeAge = (\futureYear birthYear -> if futureYear < birthYear
then yearDiff birthYear futureYear
else yearDiff futureYear birthYear) <$> (readMay futureYearString) <*> (readMay birthYearString)
It's uglier for sure without do-syntax, but beside that I don't see why we need Monad. Can anyone clear this up for me?

Here's a couple of functions that use the Monad interface.
ifM :: Monad m => m Bool -> m a -> m a -> m a
ifM c x y = c >>= \z -> if z then x else y
whileM :: Monad m => (a -> m Bool) -> (a -> m a) -> a -> m a
whileM p step x = ifM (p x) (step x >>= whileM p step) (return x)
You can't implement them with the Applicative interface. But for the sake of enlightenment, let's try and see where things go wrong. How about..
import Control.Applicative
ifA :: Applicative f => f Bool -> f a -> f a -> f a
ifA c x y = (\c' x' y' -> if c' then x' else y') <$> c <*> x <*> y
Looks good! It has the right type, it must be the same thing! Let's just check to make sure..
*Main> ifM (Just True) (Just 1) (Just 2)
Just 1
*Main> ifM (Just True) (Just 1) (Nothing)
Just 1
*Main> ifA (Just True) (Just 1) (Just 2)
Just 1
*Main> ifA (Just True) (Just 1) (Nothing)
Nothing
And there's your first hint at the difference. You can't write a function using just the Applicative interface that replicates ifM.
If you divide this up into thinking about values of the form f a as being about "effects" and "results" (both of which are very fuzzy approximate terms that are the best terms available, but not very good), you can improve your understanding here. In the case of values of type Maybe a, the "effect" is success or failure, as a computation. The "result" is a value of type a that might be present when the computation completes. (The meanings of these terms depends heavily on the concrete type, so don't think this is a valid description of anything other than Maybe as a type.)
Given that setting, we can look at the difference in a bit more depth. The Applicative interface allows the "result" control flow to be dynamic, but it requires the "effect" control flow to be static. If your expression involves 3 computations that can fail, the failure of any one of them causes the failure of the whole computation. The Monad interface is more flexible. It allows the "effect" control flow to depend on the "result" values. ifM chooses which argument's "effects" to include in its own "effects" based on its first argument. This is the huge fundamental difference between ifA and ifM.
There's something even more serious going on with whileM. Let's try to make whileA and see what happens.
whileA :: Applicative f => (a -> f Bool) -> (a -> f a) -> a -> f a
whileA p step x = ifA (p x) (whileA p step <*> step x) (pure x)
Well.. What happens is a compile error. (<*>) doesn't have the right type there. whileA p step has the type a -> f a and step x has the type f a. (<*>) isn't the right shape to fit them together. For it to work, the function type would need to be f (a -> a).
You can try lots more things - but you'll eventually find that whileA has no implementation that works anything even close to the way whileM does. I mean, you can implement the type, but there's just no way to make it both loop and terminate.
Making it work requires either join or (>>=). (Well, or one of the many equivalents of one of those) And those the extra things you get out of the Monad interface.

With monads, subsequent effects can depend on previous values. For example, you can have:
main = do
b <- readLn :: IO Bool
if b
then fireMissiles
else return ()
You can't do that with Applicatives - the result value of one effectfull computation can't determine what effect will follow.
Somewhat related:
Why can applicative functors have side effects, but functors can't?
Good examples of Not a Functor/Functor/Applicative/Monad?

As Stephen Tetley said in a comment, that example doesn't actually use context-sensitivity. One way to think about context-sensitivity is that it lets use choose which actions to take depending on monadic values. Applicative computations must always have the same "shape", in a certain sense, regardless of the values involved; monadic computations need not. I personally think this is easier to understand with a concrete example, so let's look at one. Here's two versions of a simple program which ask you to enter a password, check that you entered the right one, and print out a response depending on whether or not you did.
import Control.Applicative
checkPasswordM :: IO ()
checkPasswordM = do putStrLn "What's the password?"
pass <- getLine
if pass == "swordfish"
then putStrLn "Correct. The secret answer is 42."
else putStrLn "INTRUDER ALERT! INTRUDER ALERT!"
checkPasswordA :: IO ()
checkPasswordA = if' . (== "swordfish")
<$> (putStrLn "What's the password?" *> getLine)
<*> putStrLn "Correct. The secret answer is 42."
<*> putStrLn "INTRUDER ALERT! INTRUDER ALERT!"
if' :: Bool -> a -> a -> a
if' True t _ = t
if' False _ f = f
Let's load this into GHCi and check what happens with the monadic version:
*Main> checkPasswordM
What's the password?
swordfish
Correct. The secret answer is 42.
*Main> checkPasswordM
What's the password?
zvbxrpl
INTRUDER ALERT! INTRUDER ALERT!
So far, so good. But if we use the applicative version:
*Main> checkPasswordA
What's the password?
hunter2
Correct. The secret answer is 42.
INTRUDER ALERT! INTRUDER ALERT!
We entered the wrong password, but we still got the secret! And an intruder alert! This is because <$> and <*>, or equivalently liftAn/liftMn, always execute the effects of all their arguments. The applicative version translates, in do notation, to
do pass <- putStrLn "What's the password?" *> getLine)
unit1 <- putStrLn "Correct. The secret answer is 42."
unit2 <- putStrLn "INTRUDER ALERT! INTRUDER ALERT!"
pure $ if' (pass == "swordfish") unit1 unit2
And it should be clear why this has the wrong behavior. In fact, every use of applicative functors is equivalent to monadic code of the form
do val1 <- app1
val2 <- app2
...
valN <- appN
pure $ f val1 val2 ... valN
(where some of the appI are allowed to be of the form pure xI). And equivalently, any monadic code in that form can be rewritten as
f <$> app1 <*> app2 <*> ... <*> appN
or equivalently as
liftAN f app1 app2 ... appN
To think about this, consider Applicative's methods:
pure :: a -> f a
(<$>) :: (a -> b) -> f a -> f b
(<*>) :: f (a -> b) -> f a -> f b
And then consider what Monad adds:
(=<<) :: (a -> m b) -> m a -> m b
join :: m (m a) -> m a
(Remember that you only need one of those.)
Handwaving a lot, if you think about it, the only way we can put together the applicative functions is to construct chains of the form f <$> app1 <*> ... <*> appN, and possibly nest those chains (e.g., f <$> (g <$> x <*> y) <*> z). However, (=<<) (or (>>=)) allows us to take a value and produce different monadic computations depending on that value, that could be constructed on the fly. This is what we use to decide whether to compute "print out the secret", or compute "print out an intruder alert", and why we can't make that decision with applicative functors alone; none of the types for applicative functions allow you to consume a plain value.
You can think about join in concert with fmap in a similar way: as I mentioned in a comment, you can do something like
checkPasswordFn :: String -> IO ()
checkPasswordFn pass = if pass == "swordfish"
then putStrLn "Correct. The secret answer is 42."
else putStrLn "INTRUDER ALERT! INTRUDER ALERT!"
checkPasswordA' :: IO (IO ())
checkPasswordA' = checkPasswordFn <$> (putStrLn "What's the password?" *> getLine)
This is what happens when we want to pick a different computation depending on the value, but only have applicative functionality available us. We can pick two different computations to return, but they're wrapped inside the outer layer of the applicative functor. To actually use the computation we've picked, we need join:
checkPasswordM' :: IO ()
checkPasswordM' = join checkPasswordA'
And this does the same thing as the previous monadic version (as long as we import Control.Monad first, to get join):
*Main> checkPasswordM'
What's the password?
12345
INTRUDER ALERT! INTRUDER ALERT!

On the other hand, here's a a practical example of the Applicative/Monad divide where Applicatives have an advantage: error handling! We clearly have a Monad implementation of Either that carries along errors, but it always terminates early.
Left e1 >> Left e2 === Left e1
You can think of this as an effect of intermingling values and contexts. Since (>>=) will try to pass the result of the Either e a value to a function like a -> Either e b, it must fail immediately if the input Either is Left.
Applicatives only pass their values to the final pure computation after running all of the effects. This means they can delay accessing the values for longer and we can write this.
data AllErrors e a = Error e | Pure a deriving (Functor)
instance Monoid e => Applicative (AllErrors e) where
pure = Pure
(Pure f) <*> (Pure x) = Pure (f x)
(Error e) <*> (Pure _) = Error e
(Pure _) <*> (Error e) = Error e
-- This is the non-Monadic case
(Error e1) <*> (Error e2) = Error (e1 <> e2)
It's impossible to write a Monad instance for AllErrors such that ap matches (<*>) because (<*>) takes advantage of running both the first and second contexts before using any values in order to get both errors and (<>) them together. Monadic (>>=) and (join) can only access contexts interwoven with their values. That's why Either's Applicative instance is left-biased, so that it can also have a harmonious Monad instance.
> Left "a" <*> Left "b"
Left 'a'
> Error "a" <*> Error "b"
Error "ab"

With Applicative, the sequence of effectful actions to be performed is fixed at compile-time. With Monad, it can be varied at run-time based on the results of effects.
For example, with an Applicative parser, the sequence of parsing actions is fixed for all time. That means that you can potentially perform "optimisations" on it. On the other hand, I can write a Monadic parser which parses some a BNF grammar description, dynamically constructs a parser for that grammar, and then runs that parser over the rest of the input. Every time you run this parser, it potentially constructs a brand new parser to parse the second portion of the input. Applicative has no hope of doing such a thing - and there is no chance of performing compile-time optimisations on a parser that doesn't exist yet...
As you can see, sometimes the "limitation" of Applicative is actually beneficial - and sometimes the extra power offered by Monad is required to get the job done. This is why we have both.

If you try to convert the type signature of Monad's bind and Applicative <*> to natural language, you will find that:
bind : I will give you the contained value and you will return me a new packaged value
<*>: You give me a packaged function that accepts a contained value and return a value and I will use it to create new packaged value based on my rules.
Now as you can see from the above description, bind gives you more control as compared to <*>

If you work with Applicatives, the "shape" of the result is already determined by the "shape" of the input, e.g. if you call [f,g,h] <*> [a,b,c,d,e], your result will be a list of 15 elements, regardless which values the variables have. You don't have this guarantee/limitation with monads. Consider [x,y,z] >>= join replicate: For [0,0,0] you'll get the result [], for [1,2,3] the result [1,2,2,3,3,3].

Now that ApplicativeDo extension become pretty common thing, the difference between Monad and Applicative can be illustrated using simple code snippet.
With Monad you can do
do
r1 <- act1
if r1
then act2
else act3
but having only Applicative do-block, you can't use if on things you've pulled out with <-.

Why discarded values are () instead of ⊥ in Haskell?

Howcome in Haskell, when there is a value that would be discarded, () is used instead of ⊥?
Examples (can't really think of anything other than IO actions at the moment):
mapM_ :: (Monad m) => (a -> m b) -> [a] -> m ()
foldM_ :: (Monad m) => (a -> b -> m a) -> a -> [b] -> m ()
writeFile :: FilePath -> String -> IO ()
Under strict evaluation, this makes perfect sense, but in Haskell, it only makes the domain bigger.
Perhaps there are "unused parameter" functions d -> a which are strict on d (where d is an unconstrained type parameter and does not appear free in a)? Ex: seq, const' x y = yseqx.

I think this is because you need to specify the type of the value to be discarded. In Haskell-98, () is the obvious choice. And as long as you know the type is (), you may as well make the value () as well (presuming evaluation proceeds that far), just in case somebody tries to pattern-match on it or something. I think most programmers don't like introducing extra ⊥'s into code because it's just an extra trap to fall into. I certainly avoid it.
Instead of (), it is possible to create an uninhabited type (except by ⊥ of course).
{-# LANGUAGE EmptyDataDecls #-}
data Void
mapM_ :: (Monad m) => (a -> m b) -> [a] -> m Void
Now it's not even possible to pattern-match, because there's no Void constructor. I suspect the reason this isn't done more often is because it's not Haskell-98 compatible, as it requires the EmptyDataDecls extension.
Edit: you can't pattern-match on Void, but seq will ruin your day. Thanks to #sacundim for pointing this out.

Well, bottom type literally means an unterminating computation, and unit type is just what it is - a type inhabited with single value. Clearly, monadic computations usually meant to be finished, so it simply doesn't make sense to make them return undefined. And, of course, it is simply a safety measure - just like John L said, what if someone pattern matches on monadic result? So monadic computations return the 'lowest' possible (in Haskell 98) type - unit.

So, maybe we could have the following signatures:
mapM_ :: (Monad m) => (a -> m b) -> [a] -> m z
foldM_ :: (Monad m) => (a -> b -> m a) -> a -> [b] -> m z
writeFile :: FilePath -> String -> IO z
We'd reimplement the functions in question so that any attempt to bind the z in m z or IO z would bind the variable to undefined or any other bottom.
What do we gain? Now people can write programs that force the undefined result of these computations. How is that a good thing? All it means is that people can now write programs that fail to terminate for no good reason, that were impossible to write before.

You're getting confused between types and values.
In writeFile :: FilePath -> String -> IO (), the () is the unit type. The value you get for x by doing x <- writeFile foo bar in a do block is (normally) the value (), which is the sole non-bottom inhabitant of the type ().
⊥ OTOH is a value. Since ⊥ is a member of every type, it's also usable as a value for the type (). If you're discarding that x above without using it (we normally don't even extract it into a variable), it may very well be ⊥ and you'd never know. In that sense you already have what you want; if you're ever writing a function whose result you expect to be always ignored, you could use ⊥. But since ⊥ is a value of every type, there is no type ⊥, and so there is no type IO ⊥.
But really, they represent different conceptual things. The type () is the type of values that contain zero information (which is why there is only one value; if there were two or more values then () values would contain at least as much information as values of Bool). IO () is the type of IO actions that generate a value with no information, but may effects that will happen as a result of generating that non-informative value.
⊥ is in some sense a non-value. 1 `div` 0 gives ⊥ because there is no value that could be used as the result of that expression which satisfies the laws of integer division. Throwing an exception gives ⊥ because functions that contain exception throws do not give you a value of their type. Non-termination gives ⊥ because the expression never terminates with a value. ⊥ is a way of treating all of these non-values as if they were a value for some purposes. As far as I can tell it's mainly useful because Haskell's laziness means that ⊥ and a data structure containing ⊥ (i.e. [⊥]) are distinguishable.
The value () is not like the cases where we use ⊥. writeFile foo bar doesn't have an "impossible value" like return $ 1 `div` 0, it just has no information in its value (other than that contained in the monadic structure). There are perfectly sensible things I could do with the () I get from doing x <- writeFile foo bar; they're just not very interesting and so nobody ever does them. This is distinctly different from x <- return $ 1 `div` 0, where doing anything with that value has to give me another ill-defined value.

I would like to point out one severe downside to writing one particular form of returning ⊥: if you write types like this, you get bad programs:
mapM_ :: (Monad m) => (a -> m b) -> [a] -> m z
This is way too polymorphic. As an example, consider forever :: Monad m => m a -> m b. I encountered this gotcha a long time ago and I'm still bitter:
main :: IO ()
main = forever putStrLn "This is going to get printed a lot!"
The error is obvious and simple: missing parentheses.
It typechecks. This is exactly the sort of error that the type system is supposed to catch easily.
It silently infinite loops at runtime (without printing anything). It is a pain to debug.
Why? Well, because r -> is a monad. So m b matches virtually anything. For example:
forever :: m a -> m b
forever putStrLn :: String -> b
forever putStrLn "hello!" :: b -- eep!
forever putStrLn "hello" readFile id flip (Nothing,[17,0]) :: t -- no type error.
This sort of thing inclines me to the view that forever should be typed m a -> m Void.

() is ⊤, i.e. the unit type, not the ⊥ (the bottom type). The big difference is that the unit type is inhabited, so that it has a value (() in Haskell), on the other hand, the bottom type is uninhabited, so that you can't write functions like that:
absurd : ⊥
absurd = -- no way
Of course you can do this in Haskell since the "bottom type" (there is no such thing, of course) is inhabited here with undefined. This makes Haskell inconsistent.
Functions like this:
disprove : a → ⊥
disprove x = -- ...
can be written, it is the same as
disprove : ¬ a
disprove x = -- ...
i.e. it disproving the type a, so that a is an absurd.
In any case, you can see how the unit type is used in different languages, as () :: () in Haskell, () : unit in ML, () : Unit in Scala and tt : ⊤ in Agda. In languages like Haskell and Agda (with the IO monad) functions like putStrLn should have a type String → IO ⊤, not the String → IO ⊥ since this is an absurd (logically it states that there is no strings that can be printed, this is just not right).
DISCLAIMER: previous text use Agda notation and it is more about Agda than Haskell.
In Haskell if we have
data Void
It doesn't mean that Void is uninhabited. It is inhabited with undefined, non-terminating programs, errors and exceptions. For example:
data Void
instance Show Void where
show _ = "Void"
data Identity a = Identity { runIdentity :: a }
mapM__ :: (a -> Identity b) -> [a] -> Identity Void
mapM__ _ _ = Identity undefined
then
print $ runIdentity $ mapM__ (const $ Identity 0) [1, 2, 3]
-- ^ will print "Void".
case runIdentity $ mapM__ (const $ Identity 0) [1, 2, 3] of _ -> print "1"
-- ^ will print "1".
let x = runIdentity $ mapM__ (const $ Identity 0) [1, 2, 3]
x `seq` print x
-- ^ will thrown an exception.
But it also doesn't mean that Void is ⊥. So
mapM_ :: Monad m => (a -> m b) -> [a] -> m Void
where Void is decalred as empty data type, is ok. But
mapM_ :: Monad m => (a -> m b) -> [a] -> m ⊥
is nonsence, but there is no such type as ⊥ in Haskell.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string