Is `Monad` constraint necessary in `<$!>` - haskell

As claimed in the documentation <$!> is the strict version of <$>, but surprisingly
<$!> :: Monad m => (a -> b) -> m a -> m b
f <$!> m = do
x <- m
let z = f x
z `seq` return z
instead of the more natural (in my opinion; because it keeps the weaker constraint and mimics $!)
<$!> :: Functor f => (a -> b) -> f a -> f b
f <$!> x = x `seq` (f <$> x)
I guess that appliying seq after the binding is different than the "natural" approach, but I don't know how different it is. My question is: Is there any reason which makes the "natural" approach useless, and that's why the implementation is constraint to Monad?

GHC's commit message includes the following two links which sheds more light on this function:
https://mail.haskell.org/pipermail/libraries/2013-November/021728.html
https://mail.haskell.org/pipermail/libraries/2014-April/022864.html
This was the reason which is mentioned by Johan Tibell for it (quoting from the linked mailing list):
It works on Monads instead of Functors as required by us inspecting
the argument.
This version is highly convenient if you want to work with
functors/applicatives in e.g. parser and avoid spurious thunks at the
same time. I realized that it was needed while fixing large space
usage (but not space-leak) issues in cassava.

I guess that appliying seq after the binding is different than the "natural" approach, but I don't know how different it is
Since haskell is functional, seq must work through data dependencies; it sets up a relationship: "when seq x y is evaluated to WHNF, a will have been as well".
The idea here is to pin the evaluation of a to the outer m a which we know must be evaluated for each >>= or <*> to proceed.
In your version:
Prelude> f <$!> x = x `seq` (f <$> x)
Prelude> let thunk = error "explode"
Prelude> case (+) <$!> Just thunk <*> Just thunk of ; Just _ -> "we can easily build up thunks"
"we can easily build up thunks"
I do wonder if there's a better solution possible though

Related

Style vs Performance Using Vectors

Here's the code:
{-# LANGUAGE FlexibleContexts #-}
import Data.Int
import qualified Data.Vector.Unboxed as U
import qualified Data.Vector.Generic as V
{-# NOINLINE f #-} -- Note the 'NO'
--f :: (Num r, V.Vector v r) => v r -> v r -> v r
--f :: (V.Vector v Int64) => v Int64 -> v Int64 -> v Int64
--f :: (U.Unbox r, Num r) => U.Vector r -> U.Vector r -> U.Vector r
f :: U.Vector Int64 -> U.Vector Int64 -> U.Vector Int64
f = V.zipWith (+) -- or U.zipWith, it doesn't make a difference
main = do
let iters = 100
dim = 221184
y = U.replicate dim 0 :: U.Vector Int64
let ans = iterate ((f y)) y !! iters
putStr $ (show $ U.sum ans)
I compiled with ghc 7.6.2 and -O2, and it took 1.7 seconds to run.
I tried several different versions of f:
f x = U.zipWith (+) x
f x = (U.zipWith (+) x) . id
f x y = U.zipWith (+) x y
Version 1 is the same as the original while versions 2 and 3 run in in under 0.09 seconds (and INLINING f doesn't change anything).
I also noticed that if I make f polymorphic (with any of the three signatures above), even with a "fast" definition (i.e. 2 or 3), it slows back down...to exactly 1.7 seconds. This makes me wonder if the original problem is perhaps due to (lack of) type inference, even though I'm explicitly giving the types for the Vector type and element type.
I'm also interested in adding integers modulo q:
newtype Zq q i = Zq {unZq :: i}
As when adding Int64s, if I write a function with every type specified,
h :: U.Vector (Zq Q17 Int64) -> U.Vector (Zq Q17 Int64) -> U.Vector (Zq Q17 Int64)
I get an order of magnitude better performance than if I leave any polymorphism
h :: (Modulus q) => U.Vector (Zq q Int64) -> U.Vector (Zq q Int64) -> U.Vector (Zq q Int64)
But I should at least be able to remove the specific phantom type! It should be compiled out, since I'm dealing with a newtype.
Here are my questions:
Where is the slowdown coming from?
What is going on in versions 2 and 3 of f that affect performance in any way? It seems like a bug to me that (what amounts to) coding style can affect performance like this. Are there other examples outside of Vector where partially applying a function or other stylistic choices affect performance?
Why does polymorphism slow me down an order of magnitude independent of where the polymorphism is (i.e. in the vector type, in the Num type, both, or phantom type)? I know polymorphism makes code slower, but this is ridiculous. Is there a hack around it?
EDIT 1
I filed a issue with the Vector library page. I found a GHC
issue relating to this problem.
EDIT2
I rewrote the question after gaining some insight from #kqr's answer.
Below is the original for reference.
--------------ORIGINAL QUESTION--------------------
Here's the code:
{-# LANGUAGE FlexibleContexts #-}
import Control.DeepSeq
import Data.Int
import qualified Data.Vector.Unboxed as U
import qualified Data.Vector.Generic as V
{-# NOINLINE f #-} -- Note the 'NO'
--f :: (Num r, V.Vector v r) => v r -> v r -> v r
--f :: (V.Vector v Int64) => v Int64 -> v Int64 -> v Int64
--f :: (U.Unbox r, Num r) => U.Vector r -> U.Vector r -> U.Vector r
f :: U.Vector Int64 -> U.Vector Int64 -> U.Vector Int64
f = V.zipWith (+)
main = do
let iters = 100
dim = 221184
y = U.replicate dim 0 :: U.Vector Int64
let ans = iterate ((f y)) y !! iters
putStr $ (show $ U.sum ans)
I compiled with ghc 7.6.2 and -O2, and it took 1.7 seconds to run.
I tried several different versions of f:
f x = U.zipWith (+) x
f x = (U.zipWith (+) x) . U.force
f x = (U.zipWith (+) x) . Control.DeepSeq.force)
f x = (U.zipWith (+) x) . (\z -> z `seq` z)
f x = (U.zipWith (+) x) . id
f x y = U.zipWith (+) x y
Version 1 is the same as the original, version 2 runs in 0.111 seconds, and versions 3-6 run in in under 0.09 seconds (and INLINING f doesn't change anything).
So the order-of-magnitude slowdown appears to be due to laziness since force helped, but I'm not sure where the laziness is coming from. Unboxed types aren't allowed to be lazy, right?
I tried writing a strict version of iterate, thinking the vector itself must be lazy:
{-# INLINE iterate' #-}
iterate' :: (NFData a) => (a -> a) -> a -> [a]
iterate' f x = x `seq` x : iterate' f (f x)
but with the point-free version of f, this didn't help at all.
I also noticed something else, which could be just a coincidence and red herring:
If I make f polymorphic (with any of the three signatures above), even with a "fast" definition, it slows back down...to exactly 1.7 seconds. This makes me wonder if the original problem is perhaps due to (lack of) type inference, even though everything should be inferred nicely.
Here are my questions:
Where is the slowdown coming from?
Why does composing with force help, but using a strict iterate doesn't?
Why is U.force worse than DeepSeq.force? I have no idea what U.force is supposed to do, but it sounds a lot like DeepSeq.force, and seems to have a similar effect.
Why does polymorphism slow me down an order of magnitude independent of where the polymorphism is (i.e. in the vector type, in the Num type, or both)?
Why are versions 5 and 6, neither of which should have any strictness implications at all, just as fast as a strict function?
As #kqr pointed out, the problem doesn't seem to be strictness. So something about the way I write the function is causing the generic zipWith to be used rather than the Unboxed-specific version. Is this just a fluke between GHC and the Vector library, or is there something more general that can be said here?
While I don't have the definitive answer you want, there are two things that might help you along.
The first thing is that x `seq` x is, both semantically and computationally, the same thing as just x. The wiki says about seq:
A common misconception regarding seq is that seq x "evaluates" x. Well, sort of. seq doesn't evaluate anything just by virtue of existing in the source file, all it does is introduce an artificial data dependency of one value on another: when the result of seq is evaluated, the first argument must also (sort of; see below) be evaluated.
As an example, suppose x :: Integer, then seq x b behaves essentially like if x == 0 then b else b – unconditionally equal to b, but forcing x along the way. In particular, the expression x `seq` x is completely redundant, and always has exactly the same effect as just writing x.
What the first paragraph says is that writing seq a b doesn't mean that a will magically get evaluated this instant, it means that a will get evaluated as soon as b needs to be evaluated. This might occur later in the program, or maybe never at all. When you view it in that light, it is obvious that seq x x is a redundancy, because all it says is, "evaluate x as soon as x needs to be evaluated." Which of course is what would happen anyway if you had just written x.
This has two implications for you:
Your "strict" iterate' function isn't really any stricter than it would be without the seq. In fact, I have a hard time imagining how the iterate function could become any stricter than it already is. You can't make the tail of the list strict, because it is infinite. The main thing you can do is force the "accumulator", f x, but doing so doesn't give any significant performance increase on my system.[1]
Scratch that. Your strict iterate' does exactly the same thing as my bang pattern version. See the comments.
Writing (\z -> z `seq` z) does not give you a strict identity function, which I assume is what you were going for. In fact, the common identity function is as strict as you'll get – it will evaluate its result as soon as it is needed.
However, I peeked at the core GHC generates for
U.zipWith (+) y
and
U.zipWith (+) y . id
and there is only one big difference that my untrained eye can spot. The first one uses just a plain Data.Vector.Generic.zipWith (here's where your polymorphism coincidence might come into play – if GHC chooses a generic zipWith it will of course perform as if the code was polymorphic!) while the latter has exploded this single function call into almost 90 lines of state monad code and unpacked machine types.
The state monad code looks almost like the loops and destructive updates you would write in an imperative language, so I assume it's tailored pretty well to the machine it's running on. If I wasn't in such a hurry, I would take a longer look to see more exactly how it works and why GHC suddenly decided it needed a tight loop. I have attached the generated core as much for myself as anyone else who wants to take a look.[2]
[1]: Forcing the accumulator along the way: (This is what you already do, I misunderstood the code!)
{-# LANGUAGE BangPatterns #-}
iterate' f !x = x : iterate f (f x)
[2]: What core U.zipWith (+) y . id gets translated into.

Haskell - Lambda calculus equivalent syntax?

While writing some lambda functions in Haskell, I was originally writing the functions like:
tru = \t f -> t
fls = \t f -> f
However, I soon noticed from the examples online that such functions are frequently written like:
tru = \t -> \f -> t
fls = \t -> \f -> f
Specifically, each of the items passed to the function have their own \ and -> as opposed to above. When checking the types of these they appear to be the same. My question is, are they equivalent or do they actually differ in some way? And not only for these two functions, but does it make a difference for functions in general? Thank you much!
They're the same, Haskell automatically curries things to keep things syntax nice. The following are all equivalent**
foo a b = (a, b)
foo a = \b -> (a, b)
foo = \a b -> (a, b)
foo = \a -> \b -> (a, b)
-- Or we can simply eta convert leaving
foo = (,)
If you want to be idiomatic, prefer either the first or the last. Introducing unnecessary lambdas is good for teaching currying, but in real code just adds syntactic clutter.
However in raw lambda calculus (not Haskell) most manually curry with
\a -> \b -> a b
Because people don't write a lot of lambda calculus by hand and when they do they tend to stick unsugared lambda calculus to keep things simple.
** modulo the monomorphism restriction, which won't impact you anyways with a type signature.
Though, as jozefg said, they are themselves equivalent, they may lead to different execution behaviour when combined with local variable bindings. Consider
f, f' :: Int -> Int -> Int
with the two definitions
f a x = μ*x
where μ = sum [1..a]
and
f' a = \x -> μ*x
where μ = sum [1..a]
These sure look equivalent, and certainly will always yield the same results.
GHCi, version 7.6.2: http://www.haskell.org/ghc/ :? for help
...
[1 of 1] Compiling Main            ( def0.hs, interpreted )
Ok, modules loaded: Main.
*Main> sum $ map (f 10000) [1..10000]
2500500025000000
*Main> sum $ map (f' 10000) [1..10000]
2500500025000000
However, if you try this yourself, you'll notice that with f takes quite a lot of time whereas with f' you get the result immediately. The reason is that f' is written in a form that prompts GHC to compile it so that actually f' 10000 is evaluated before starting to map it over the list. In that step, the value μ is calculated and stored in the closure of (f' 10000). On the other hand, f is treated simply as "one function of two variables"; (f 10000) is merely stored as a closure containing the parameter 10000 and μ is not calculated at all at first. Only when map applies (f 10000) to each element in the list, the whole sum [1..a] is calculated, which takes some time for each element in [1..10000]. With f', this was not necessary because μ was pre-calculated.
In principle, common-subexpression elimination is an optimisation that GHC is able to do itself, so you might at times get good performance even with a definition like f. But you can't really count on it.

What advantage does Monad give us over an Applicative?

I've read this article, but didn't understand last section.
The author says that Monad gives us context sensitivity, but it's possible to achieve the same result using only an Applicative instance:
let maybeAge = (\futureYear birthYear -> if futureYear < birthYear
then yearDiff birthYear futureYear
else yearDiff futureYear birthYear) <$> (readMay futureYearString) <*> (readMay birthYearString)
It's uglier for sure without do-syntax, but beside that I don't see why we need Monad. Can anyone clear this up for me?
Here's a couple of functions that use the Monad interface.
ifM :: Monad m => m Bool -> m a -> m a -> m a
ifM c x y = c >>= \z -> if z then x else y
whileM :: Monad m => (a -> m Bool) -> (a -> m a) -> a -> m a
whileM p step x = ifM (p x) (step x >>= whileM p step) (return x)
You can't implement them with the Applicative interface. But for the sake of enlightenment, let's try and see where things go wrong. How about..
import Control.Applicative
ifA :: Applicative f => f Bool -> f a -> f a -> f a
ifA c x y = (\c' x' y' -> if c' then x' else y') <$> c <*> x <*> y
Looks good! It has the right type, it must be the same thing! Let's just check to make sure..
*Main> ifM (Just True) (Just 1) (Just 2)
Just 1
*Main> ifM (Just True) (Just 1) (Nothing)
Just 1
*Main> ifA (Just True) (Just 1) (Just 2)
Just 1
*Main> ifA (Just True) (Just 1) (Nothing)
Nothing
And there's your first hint at the difference. You can't write a function using just the Applicative interface that replicates ifM.
If you divide this up into thinking about values of the form f a as being about "effects" and "results" (both of which are very fuzzy approximate terms that are the best terms available, but not very good), you can improve your understanding here. In the case of values of type Maybe a, the "effect" is success or failure, as a computation. The "result" is a value of type a that might be present when the computation completes. (The meanings of these terms depends heavily on the concrete type, so don't think this is a valid description of anything other than Maybe as a type.)
Given that setting, we can look at the difference in a bit more depth. The Applicative interface allows the "result" control flow to be dynamic, but it requires the "effect" control flow to be static. If your expression involves 3 computations that can fail, the failure of any one of them causes the failure of the whole computation. The Monad interface is more flexible. It allows the "effect" control flow to depend on the "result" values. ifM chooses which argument's "effects" to include in its own "effects" based on its first argument. This is the huge fundamental difference between ifA and ifM.
There's something even more serious going on with whileM. Let's try to make whileA and see what happens.
whileA :: Applicative f => (a -> f Bool) -> (a -> f a) -> a -> f a
whileA p step x = ifA (p x) (whileA p step <*> step x) (pure x)
Well.. What happens is a compile error. (<*>) doesn't have the right type there. whileA p step has the type a -> f a and step x has the type f a. (<*>) isn't the right shape to fit them together. For it to work, the function type would need to be f (a -> a).
You can try lots more things - but you'll eventually find that whileA has no implementation that works anything even close to the way whileM does. I mean, you can implement the type, but there's just no way to make it both loop and terminate.
Making it work requires either join or (>>=). (Well, or one of the many equivalents of one of those) And those the extra things you get out of the Monad interface.
With monads, subsequent effects can depend on previous values. For example, you can have:
main = do
b <- readLn :: IO Bool
if b
then fireMissiles
else return ()
You can't do that with Applicatives - the result value of one effectfull computation can't determine what effect will follow.
Somewhat related:
Why can applicative functors have side effects, but functors can't?
Good examples of Not a Functor/Functor/Applicative/Monad?
As Stephen Tetley said in a comment, that example doesn't actually use context-sensitivity. One way to think about context-sensitivity is that it lets use choose which actions to take depending on monadic values. Applicative computations must always have the same "shape", in a certain sense, regardless of the values involved; monadic computations need not. I personally think this is easier to understand with a concrete example, so let's look at one. Here's two versions of a simple program which ask you to enter a password, check that you entered the right one, and print out a response depending on whether or not you did.
import Control.Applicative
checkPasswordM :: IO ()
checkPasswordM = do putStrLn "What's the password?"
pass <- getLine
if pass == "swordfish"
then putStrLn "Correct. The secret answer is 42."
else putStrLn "INTRUDER ALERT! INTRUDER ALERT!"
checkPasswordA :: IO ()
checkPasswordA = if' . (== "swordfish")
<$> (putStrLn "What's the password?" *> getLine)
<*> putStrLn "Correct. The secret answer is 42."
<*> putStrLn "INTRUDER ALERT! INTRUDER ALERT!"
if' :: Bool -> a -> a -> a
if' True t _ = t
if' False _ f = f
Let's load this into GHCi and check what happens with the monadic version:
*Main> checkPasswordM
What's the password?
swordfish
Correct. The secret answer is 42.
*Main> checkPasswordM
What's the password?
zvbxrpl
INTRUDER ALERT! INTRUDER ALERT!
So far, so good. But if we use the applicative version:
*Main> checkPasswordA
What's the password?
hunter2
Correct. The secret answer is 42.
INTRUDER ALERT! INTRUDER ALERT!
We entered the wrong password, but we still got the secret! And an intruder alert! This is because <$> and <*>, or equivalently liftAn/liftMn, always execute the effects of all their arguments. The applicative version translates, in do notation, to
do pass <- putStrLn "What's the password?" *> getLine)
unit1 <- putStrLn "Correct. The secret answer is 42."
unit2 <- putStrLn "INTRUDER ALERT! INTRUDER ALERT!"
pure $ if' (pass == "swordfish") unit1 unit2
And it should be clear why this has the wrong behavior. In fact, every use of applicative functors is equivalent to monadic code of the form
do val1 <- app1
val2 <- app2
...
valN <- appN
pure $ f val1 val2 ... valN
(where some of the appI are allowed to be of the form pure xI). And equivalently, any monadic code in that form can be rewritten as
f <$> app1 <*> app2 <*> ... <*> appN
or equivalently as
liftAN f app1 app2 ... appN
To think about this, consider Applicative's methods:
pure :: a -> f a
(<$>) :: (a -> b) -> f a -> f b
(<*>) :: f (a -> b) -> f a -> f b
And then consider what Monad adds:
(=<<) :: (a -> m b) -> m a -> m b
join :: m (m a) -> m a
(Remember that you only need one of those.)
Handwaving a lot, if you think about it, the only way we can put together the applicative functions is to construct chains of the form f <$> app1 <*> ... <*> appN, and possibly nest those chains (e.g., f <$> (g <$> x <*> y) <*> z). However, (=<<) (or (>>=)) allows us to take a value and produce different monadic computations depending on that value, that could be constructed on the fly. This is what we use to decide whether to compute "print out the secret", or compute "print out an intruder alert", and why we can't make that decision with applicative functors alone; none of the types for applicative functions allow you to consume a plain value.
You can think about join in concert with fmap in a similar way: as I mentioned in a comment, you can do something like
checkPasswordFn :: String -> IO ()
checkPasswordFn pass = if pass == "swordfish"
then putStrLn "Correct. The secret answer is 42."
else putStrLn "INTRUDER ALERT! INTRUDER ALERT!"
checkPasswordA' :: IO (IO ())
checkPasswordA' = checkPasswordFn <$> (putStrLn "What's the password?" *> getLine)
This is what happens when we want to pick a different computation depending on the value, but only have applicative functionality available us. We can pick two different computations to return, but they're wrapped inside the outer layer of the applicative functor. To actually use the computation we've picked, we need join:
checkPasswordM' :: IO ()
checkPasswordM' = join checkPasswordA'
And this does the same thing as the previous monadic version (as long as we import Control.Monad first, to get join):
*Main> checkPasswordM'
What's the password?
12345
INTRUDER ALERT! INTRUDER ALERT!
On the other hand, here's a a practical example of the Applicative/Monad divide where Applicatives have an advantage: error handling! We clearly have a Monad implementation of Either that carries along errors, but it always terminates early.
Left e1 >> Left e2 === Left e1
You can think of this as an effect of intermingling values and contexts. Since (>>=) will try to pass the result of the Either e a value to a function like a -> Either e b, it must fail immediately if the input Either is Left.
Applicatives only pass their values to the final pure computation after running all of the effects. This means they can delay accessing the values for longer and we can write this.
data AllErrors e a = Error e | Pure a deriving (Functor)
instance Monoid e => Applicative (AllErrors e) where
pure = Pure
(Pure f) <*> (Pure x) = Pure (f x)
(Error e) <*> (Pure _) = Error e
(Pure _) <*> (Error e) = Error e
-- This is the non-Monadic case
(Error e1) <*> (Error e2) = Error (e1 <> e2)
It's impossible to write a Monad instance for AllErrors such that ap matches (<*>) because (<*>) takes advantage of running both the first and second contexts before using any values in order to get both errors and (<>) them together. Monadic (>>=) and (join) can only access contexts interwoven with their values. That's why Either's Applicative instance is left-biased, so that it can also have a harmonious Monad instance.
> Left "a" <*> Left "b"
Left 'a'
> Error "a" <*> Error "b"
Error "ab"
With Applicative, the sequence of effectful actions to be performed is fixed at compile-time. With Monad, it can be varied at run-time based on the results of effects.
For example, with an Applicative parser, the sequence of parsing actions is fixed for all time. That means that you can potentially perform "optimisations" on it. On the other hand, I can write a Monadic parser which parses some a BNF grammar description, dynamically constructs a parser for that grammar, and then runs that parser over the rest of the input. Every time you run this parser, it potentially constructs a brand new parser to parse the second portion of the input. Applicative has no hope of doing such a thing - and there is no chance of performing compile-time optimisations on a parser that doesn't exist yet...
As you can see, sometimes the "limitation" of Applicative is actually beneficial - and sometimes the extra power offered by Monad is required to get the job done. This is why we have both.
If you try to convert the type signature of Monad's bind and Applicative <*> to natural language, you will find that:
bind : I will give you the contained value and you will return me a new packaged value
<*>: You give me a packaged function that accepts a contained value and return a value and I will use it to create new packaged value based on my rules.
Now as you can see from the above description, bind gives you more control as compared to <*>
If you work with Applicatives, the "shape" of the result is already determined by the "shape" of the input, e.g. if you call [f,g,h] <*> [a,b,c,d,e], your result will be a list of 15 elements, regardless which values the variables have. You don't have this guarantee/limitation with monads. Consider [x,y,z] >>= join replicate: For [0,0,0] you'll get the result [], for [1,2,3] the result [1,2,2,3,3,3].
Now that ApplicativeDo extension become pretty common thing, the difference between Monad and Applicative can be illustrated using simple code snippet.
With Monad you can do
do
r1 <- act1
if r1
then act2
else act3
but having only Applicative do-block, you can't use if on things you've pulled out with <-.

Monads with Join() instead of Bind()

Monads are usually explained in turns of return and bind. However, I gather you can also implement bind in terms of join (and fmap?)
In programming languages lacking first-class functions, bind is excruciatingly awkward to use. join, on the other hand, looks quite easy.
I'm not completely sure I understand how join works, however. Obviously, it has the [Haskell] type
join :: Monad m => m (m x) -> m x
For the list monad, this is trivially and obviously concat. But for a general monad, what, operationally, does this method actually do? I see what it does to the type signatures, but I'm trying to figure out how I'd write something like this in, say, Java or similar.
(Actually, that's easy: I wouldn't. Because generics is broken. ;-) But in principle the question still stands...)
Oops. It looks like this has been asked before:
Monad join function
Could somebody sketch out some implementations of common monads using return, fmap and join? (I.e., not mentioning >>= at all.) I think perhaps that might help it to sink in to my dumb brain...
Without plumbing the depths of metaphor, might I suggest to read a typical monad m as "strategy to produce a", so the type m value is a first class "strategy to produce a value". Different notions of computation or external interaction require different types of strategy, but the general notion requires some regular structure to make sense:
if you already have a value, then you have a strategy to produce a value (return :: v -> m v) consisting of nothing other than producing the value that you have;
if you have a function which transforms one sort of value into another, you can lift it to strategies (fmap :: (v -> u) -> m v -> m u) just by waiting for the strategy to deliver its value, then transforming it;
if you have a strategy to produce a strategy to produce a value, then you can construct a strategy to produce a value (join :: m (m v) -> m v) which follows the outer strategy until it produces the inner strategy, then follows that inner strategy all the way to a value.
Let's have an example: leaf-labelled binary trees...
data Tree v = Leaf v | Node (Tree v) (Tree v)
...represent strategies to produce stuff by tossing a coin. If the strategy is Leaf v, there's your v; if the strategy is Node h t, you toss a coin and continue by strategy h if the coin shows "heads", t if it's "tails".
instance Monad Tree where
return = Leaf
A strategy-producing strategy is a tree with tree-labelled leaves: in place of each such leaf, we can just graft in the tree which labels it...
join (Leaf tree) = tree
join (Node h t) = Node (join h) (join t)
...and of course we have fmap which just relabels leaves.
instance Functor Tree where
fmap f (Leaf x) = Leaf (f x)
fmap f (Node h t) = Node (fmap f h) (fmap f t)
Here's an strategy to produce a strategy to produce an Int.
Toss a coin: if it's "heads", toss another coin to decide between two strategies (producing, respectively, "toss a coin for producing 0 or producing 1" or "produce 2"); if it's "tails" produce a third ("toss a coin for producing 3 or tossing a coin for 4 or 5").
That clearly joins up to make a strategy producing an Int.
What we're making use of is the fact that a "strategy to produce a value" can itself be seen as a value. In Haskell, the embedding of strategies as values is silent, but in English, I use quotation marks to distinguish using a strategy from just talking about it. The join operator expresses the strategy "somehow produce then follow a strategy", or "if you are told a strategy, you may then use it".
(Meta. I'm not sure whether this "strategy" approach is a suitably generic way to think about monads and the value/computation distinction, or whether it's just another crummy metaphor. I do find leaf-labelled tree-like types a useful source of intuition, which is perhaps not a surprise as they're the free monads, with just enough structure to be monads at all, but no more.)
PS The type of "bind"
(>>=) :: m v -> (v -> m w) -> m w
says "if you have a strategy to produce a v, and for each v a follow-on strategy to produce a w, then you have a strategy to produce a w". How can we capture that in terms of join?
mv >>= v2mw = join (fmap v2mw mv)
We can relabel our v-producing strategy by v2mw, producing instead of each v value the w-producing strategy which follows on from it — ready to join!
join = concat -- []
join f = \x -> f x x -- (e ->)
join f = \s -> let (f', s') = f s in f' s' -- State
join (Just (Just a)) = Just a; join _ = Nothing -- Maybe
join (Identity (Identity a)) = Identity a -- Identity
join (Right (Right a)) = Right a; join (Right (Left e)) = Left e;
join (Left e) = Left e -- Either
join ((a, m), m') = (a, m' `mappend` m) -- Writer
-- N.B. there is a non-newtype-wrapped Monad instance for tuples that
-- behaves like the Writer instance, but with the tuple order swapped
join f = \k -> f (\f' -> f' k) -- Cont
Calling fmap (f :: a -> m b) (x ::ma) produces values (y ::m(m b)) so it is a very natural thing to use join to get back values (z :: m b).
Then bind is defined simply as bind ma f = join (fmap f ma), thus achieving the Kleisly compositionality of functions of (:: a -> m b) variety, which is what it is really all about:
ma `bind` (f >=> g) = (ma `bind` f) `bind` g -- bind = (>>=)
= (`bind` g) . (`bind` f) $ ma
= join . fmap g . join . fmap f $ ma
And so, with flip bind = (=<<), we have
((g <=< f) =<<) = (g =<<) . (f =<<) = join . (g <$>) . join . (f <$>)
OK, so it's not really good form to answer your own question, but I'm going to note down my thinking in case it enlightens anybody else. (I doubt it...)
If a monad can be thought of as a "container", then both return and join have pretty obvious semantics. return generates a 1-element container, and join turns a container of containers into a single container. Nothing hard about that.
So let us focus on monads which are more naturally thought of as "actions". In that case, m x is some sort of action which yields a value of type x when you "execute" it. return x does nothing special, and then yields x. fmap f takes an action that yields an x, and constructs an action that computes x and then applies f to it, and returns the result. So far, so good.
It's fairly obvious that if f itself generates an action, then what you end up with is m (m x). That is, an action that computes another action. In a way, that's maybe even simpler to wrap your mind around than the >>= function which takes an action and a "function that produces an action" and so on.
So, logically speaking, it seems join would run the first action, take the action it produces, and then run that. (Or rather, join would return an action that does what I just described, if you want to split hairs.)
That seems to be the central idea. To implement join, you want to run an action, which then gives you another action, and then you run that. (Whatever "run" happens to mean for this particular monad.)
Given this insight, I can take a stab at writing some join implementations:
join Nothing = Nothing
join (Just mx) = mx
If the outer action is Nothing, return Nothing, else return the inner action. Then again, Maybe is more of a container than an action, so let's try something else...
newtype Reader s x = Reader (s -> x)
join (Reader f) = Reader (\ s -> let Reader g = f s in g s)
That was... painless. A Reader is really just a function that takes a global state and only then returns its result. So to unstack, you apply the global state to the outer action, which returns a new Reader. You then apply the state to this inner function as well.
In a way, it's perhaps easier than the usual way:
Reader f >>= g = Reader (\ s -> let x = f s in g x)
Now, which one is the reader function, and which one is the function that computes the next reader...?
Now let's try the good old State monad. Here every function takes an initial state as input but also returns a new state along with its output.
data State s x = State (s -> (s, x))
join (State f) = State (\ s0 -> let (s1, State g) = f s0 in g s1)
That wasn't too hard. It's basically run followed by run.
I'm going to stop typing now. Feel free to point out all the glitches and typos in my examples... :-/
I've found many explanations of monads that say "you don't have to know anything about category theory, really, just think of monads as burritos / space suits / whatever".
Really, the article that demystified monads for me just said what categories were, described monads (including join and bind) in terms of categories, and didn't bother with any bogus metaphors:
http://en.wikibooks.org/wiki/Haskell/Category_theory
I think the article is very readable without much math knowledge required.
Asking what a type signature in Haskell does is rather like asking what an interface in Java does.
It, in some literal sense, "doesn't". (Though, of course, you will typically have some sort of purpose associated with it, that's mostly in your mind, and mostly not in the implementation.)
In both cases you are declaring legal sequences of symbols in the language which will be used in later definitions.
Of course, in Java, I suppose you could say that an interface corresponds to a type signature which is going to be implemented literally in the VM. You can get some polymorphism this way -- you can define a name that accepts an interface, and you can provide a different definition for the name which accepts a different interface. Something similar happens in Haskell, where you can provide a declaration for a name which accepts one type and then another declaration for that name which treats a different type.
This is Monad explained in one picture. The 2 functions in the green category are not composable, when being mapped to the blue category with join . fmap (strictly speaking, they are one category), they become composable. Monad is about turning a function of type T -> Monad<U> into a function of type Monad<T> -> Monad<U>.

Evaluation strategy

How should one reason about function evaluation in examples like the following in Haskell:
let f x = ...
x = ...
in map (g (f x)) xs
In GHC, sometimes (f x) is evaluated only once, and sometimes once for each element in xs, depending on what exactly f and g are. This can be important when f x is an expensive computation. It has just tripped a Haskell beginner I was helping and I didn't know what to tell him other than that it is up to the compiler. Is there a better story?
Update
In the following example (f x) will be evaluated 4 times:
let f x = trace "!" $ zip x x
x = "abc"
in map (\i -> lookup i (f x)) "abcd"
With language extensions, we can create situations where f x must be evaluated repeatedly:
{-# LANGUAGE GADTs, Rank2Types #-}
module MultiEvG where
data BI where
B :: (Bounded b, Integral b) => b -> BI
foo :: [BI] -> [Integer]
foo xs = let f :: (Integral c, Bounded c) => c -> c
f x = maxBound - x
g :: (forall a. (Integral a, Bounded a) => a) -> BI -> Integer
g m (B y) = toInteger (m + y)
x :: (Integral i) => i
x = 3
in map (g (f x)) xs
The crux is to have f x polymorphic even as the argument of g, and we must create a situation where the type(s) at which it is needed can't be predicted (my first stab used an Either a b instead of BI, but when optimising, that of course led to only two evaluations of f x at most).
A polymorphic expression must be evaluated at least once for each type it is used at. That's one reason for the monomorphism restriction. However, when the range of types it can be needed at is restricted, it is possible to memoise the values at each type, and in some circumstances GHC does that (needs optimising, and I expect the number of types involved mustn't be too large). Here we confront it with what is basically an inhomogeneous list, so in each invocation of g (f x), it can be needed at an arbitrary type satisfying the constraints, so the computation cannot be lifted outside the map (technically, the compiler could still build a cache of the values at each used type, so it would be evaluated only once per type, but GHC doesn't, in all likelihood it wouldn't be worth the trouble).
Monomorphic expressions need only be evaluated once, they can be shared. Whether they are is up to the implementation; by purity, it doesn't change the semantics of the programme. If the expression is bound to a name, in practice you can rely on it being shared, since it's easy and obviously what the programmer wants. If it isn't bound to a name, it's a question of optimisation. With the bytecode generator or without optimisations, the expression will often be evaluated repeatedly, but with optimisations repeated evaluation would indicate a compiler bug.
Polymorphic expressions must be evaluated at least once for every type they're used at, but with optimisations, when GHC can see that it may be used multiple times at the same type, it will (usually) still be shared for that type during a larger computation.
Bottom line: Always compile with optimisations, help the compiler by binding expressions you want shared to a name, and give monomorphic type signatures where possible.
Your examples are indeed quite different.
In the first example, the argument to map is g (f x) and is passed once to map most likely as partially applied function.
Should g (f x), when applied to an argument within map evaluate its first argument, then this will be done only once and then the thunk (f x) will be updated with the result.
Hence, in your first example, f xwill be evaluated at most 1 time.
Your second example requires a deeper analysis before the compiler can arrive at the conclusion that (f x) is always constant in the lambda expression. Perhaps it will never optimize it at all, because it may have knowledge that trace is not quite kosher. So, this may evaluate 4 times when tracing, and 4 times or 1 time when not tracing.
This is really dependent on GHC's optimizations, as you've been able to tell.
The best thing to do is to study the GHC core that you get after optimizing the program. I would look at the generated Core and examine whether f x had its own let statement outside the map or not.
If you want to be sure, then you should factor f x out into its own variable assigned in a let, but there's not really a guaranteed way to figure it out other than reading through Core.
All that said, with the exception of things like trace that use unsafePerformIO, this will never change the semantics of your program: how it actually behaves.
In GHC without optimizations, the body of a function is evaluated every time the function is called. (A "call" means the function is applied to arguments and the result is evaluated.) In the following example, f x is inside a function, so it will execute each time the function is called.
(GHC may optimize this expression as discussed in the FAQ [1].)
let f x = trace "!" $ zip x x
x = "abc"
in map (\i -> lookup i (f x)) "abcd"
However, if we move f x out of the function, it will execute only once.
let f x = trace "!" $ zip x x
x = "abc"
in map ((\f_x i -> lookup i f_x) (f x)) "abcd"
This can be rewritten more readably as
let f x = trace "!" $ zip x x
x = "abc"
g f_x i = lookup i f_x
in map (g (f x)) "abcd"
The general rule is that, each time a function is applied to an argument, a new "copy" of the function body is created. Function application is the only thing that may cause an expression to re-execute. However, be warned that some functions and function calls do not look like functions syntactically.
[1] http://www.haskell.org/haskellwiki/GHC/FAQ#Subexpression_Elimination

Resources