What does adjoining mean in practical uses? - haskell

I often run into term "adjoin" when trying to understand some concepts. Those things are too abstract for me to understand as I'm neither expert in field nor category theory.
The simplest case I found is a Monoid Maybe a instance which often behaves not as I would sometimes expect with regard to Nothing.
From Wikipedia we can learn that by "adjoining" an element to a semigroup we can get a different Monoid instance. I don't understand the sentence but the equations given suggest it's exactly what I need (and is not default for some reason):
Any semigroup S may be turned into a monoid simply by adjoining an element e not in S and defining e • s = s = s • e for all s ∈ S.
Doe "adjoining" mean the same as "adding" at least in this case?
Are there other simple examples of this concept?
What would be simplest possible instance of something that is "left-adjoint"?

Sometimes "adjoining" means "adding something new", as in the semigroup-related sentence you quote. E.g. someone might say that using Maybe a means adding/adjoining a new element Nothing to the type a. Personally, I would only use "adding" for this, though.
This has nothing to do with adjoints in the categorical sense, which are a tricky concept.
Roughly put, assume you have a functional type of the form
F a -> b
where F is some mapping from types to types (more precisely, a functor). Sometimes, you can express an isomorphic type to the above one having the form
a -> G b
where "magically" the function F on the left side moved to the right side, changing into G.
The canonical example is currying: Let e.g.
F T = (T, Int)
G T = Int -> T
Then we have
(F a) -> b
-- definition of F
= (a, Int) -> b
-- currying
=~ a -> (Int -> b)
-- definition of G
= a -> G b
In such case, we write F -| G which is read as "F is left adjoint to G".
Every time you can "nicely move" an operation on the input type on the other side of the arrow, changing it into another operation on the output type, you have an adjoint. (Technically, "nicely" means we have a natural isomorphism)

Related

What is the difference between functor and monad Intuitively

I have learned the definitions of functor and monad, But I am still unable to figure out the difference between them except for the definitions. I have read some answers to this problem, In What is the difference between a Functor and a Monad? one comment say
A functor takes a pure function (and a functorial value) whereas a monad takes a Kleisli arrow, i.e. a function that returns a monad (and a monadic value). Hence you can chain two monads and the second monad can depend on the result of the previous one. You cannot do this with functors.
This comment is interesting and gives me a read of their difference. But I still have some questions.
why functor cannot use the result of previous one? since fmap :: (a -> b) -> f a -> f b, when I currying fmap with a pure function, I can get a f a -> f b function,f b depends on f a, Does the result mean data inside the functor?
In category theory I can understand the comment since I cannot get the element in category theory, but In Haskell I find out that I can use the result of functor since Haskell can remember the data constructor, Does Haskell prevent me from understanding this comment? Should I understand this in pure category theory?
When using fmap :: Functor f => (a -> b) -> f a -> f b, the argument function can never depend on any of the structure from the functor f itself; how can it, when all it ever gets passed are a values, nothing to do with f? When you use fmap to get an f a -> f b function, the code that is responsible for building the final f b value is the code of fmap itself, not the code of the a -> b function being mapped. The function being mapped might not even be called at all (e.g. if you're using the Maybe functor and the f a value is Nothing, there are no a values inside that for fmap to pass to the a -> b function, and it will never be called).
But fmap is fully polymorphic in a and b, meaning the code of fmap doesn't know what those types are and cannot assume anything about them. So fmap has to build the final f b value in a way that only depends on the functor structure added by f; the code of fmap cannot look at any a values and make decisions about what to do. In fact really the only thing a lawful fmap can do is return a value with exactly the same structure as the input f a value, only with any as that were inside replaced by the b value returned by the a -> b function. That's why you can just deriving Functor any data type; there's at most one way to make any given data type a Functor, and the compiler can just do it for you.
But the monad =<< function1 has this type: Monad m => (a -> m b) -> m a -> m b. Here the mapped function has type a -> m b. That means the final m b structure returned at the end isn't purely determined by the code of =<< (at least not necessarily). The a -> m b function knows what monad is being used and returns some monadic structure in its m b value, not just a raw b. The mapped function still doesn't receive any monadic structure, so its intermediate m b result can't depend on that; any dependency the final m b value has on the monadic structure in the original m a has to be determined by the code for =<< (which again, cannot make any decisions based on a values itself). But the mapped function (if it is called) does get to contribute some monadic structure, and that can depend on the a values (because the mapped function doesn't have to be completely polymorphic in a; it can inspect a values and make decisions based on them).
This means that ultimately any part of the final output m b might depend on any part of the input (if we don't know what function was mapped, or how the =<< function works internally). This is quite different from the functor case, where even without knowing anything about the specific functor or the mapped function we know that the final output will always look like "a copy of the input with as replaced by bs" (and specifically which b replaces each a is determined by the mapped function).
It's possibly worth noting: almost everything I've said here is a direct consequence of what these types mean:
Functor f => (a -> b) -> f a -> f b
Monad m => (a -> m a) -> m a -> m b
These aren't special properties of functors and monads you have to remember, it's just a consequence of those operations having those types. (The functor and monad laws express things that the types don't, but I've not needed to invoke those to explain this difference)
Once you are really deeply used to thinking about types the Haskell way, you can figure this out for yourself just by looking at the types. I didn't of course; I had it explained to me (a dozen times) when I was learning, and I've remembered it since then. But now I can figure out similar things about new APIs I've never seen before, without any tutorials or explanations.
1 I'm using the reversed =<< operator rather than the traditional >>= operator merely because it lines up better with the fmap / <$> operator.
A monad is a functor, or rather a functor along with two additional operations. (The original(?) name for a monad was a triple.)
It helps to look at the original mathematical definition, which provided an operation unit :: Monad m => a -> m a, but instead of (>>=) :: Monad m => m a -> (a -> m b) -> m b, defined an operation called join :: Monad m => m (m a) -> m a, which lets you "remove" a layer of lifting. The two formulations are equivalent, as join can be defined in terms of (>>=)
join ma = ma >>= id
and vice versa:
ma >>= f = join $ fmap f ma
The last definition gives a different interpretation of a monad: it's a functor that you can "compress"; chaining comes for free by mapping a function over an already lifted value, then combining the two layers of lifting back to a single layer.
unit is required in either case because there's no way to get a lifted value in the first place with just fmap.

Partially applying a type in a type constructor

In the book "Learn you a Haskell", chapter 11 we have the introduction of the newtype keyword and it all makes sense to me up to the part where we see the Pair type being made an instance of a Functor.
This discussion begins with a problem statement that "only type constructors that take exactly one parameter can be made an instance of Functor." I don't understand this statement. Where is this restriction ever described?
In the context of the discussion we have a type Pair defined as:
newtype Pair b a = Pair { getPair :: (a,b) }
Given this, I would have thought that we could have used something like:
instance Functor (Pair a b) where ...
I would have thought that (Pair a b) would have substituted for 'f' in the definition of Functor without any problem. What keeps this from working?
Next, the author goes on to make Pair an instance of a Functor using the following:
instance Functor (Pair c) where ...
Here is where I get lost. I don't recall ever seeing 'partially applying' types to a type constructor before. I am not really sure what this even means in this context, let alone how it solves the problem identified above it.
I think I have a misconception somewhere but I don't know where to go looking for it. I did find this answer but it only seems to go part of the way in answering the question.
"only type constructors that take exactly one parameter can be made an instance of Functor." I don't understand this statement. Where is this restriction ever described?
The restriction comes from the type of fmap in the definition of Functor:
class Functor f where
fmap :: (a -> b) -> f a -> f b
Because the class variable f is applied to one argument in f a, the type used to instantiate Functor must be able to take one parameter. Because the application f a is part of a function type, f must not take more than one parameter. (There's one more subtle piece about what kinds of arguments f can take that's specified here, but I think it's worth skipping that for now; hopefully your tutorial will touch on this subtlety more later.)
Given this, I would have thought that we could have used something like:
instance Functor (Pair a b) where ...
I would have thought that (Pair a b) would have substituted for 'f' in the definition of Functor without any problem. What keeps this from working?
It is once again the type of fmap that keeps this from working. Filling in Pair c d for f (modifying a to c and b to d to avoid name clashes), we would get
fmap :: (a -> b) -> Pair c d a -> Pair c d b
which doesn't make a lot of sense.
Here is where I get lost. I don't recall ever seeing 'partially applying' types to a type constructor before.
Well... now you have. =)
I am not really sure what this even means in this context, let alone how it solves the problem identified above it.
It means basically the same thing partial application at the term level means: Pair a is akin to a type function which takes another type as an argument. So, e.g., Pair a applied to Int would give the Pair a Int type. It solves the problem because now we are supplying a "type-function-like" thing rather than a "type-like" thing, so applying it to another type makes sense; that is, the substituted type of fmap for a Pair c instance would be
fmap :: (a -> b) -> Pair c a -> Pair c b
which now makes sense because Pair c a and Pair c b are sensible types, whereas before Pair c d a and Pair c d b were over-applied and therefore not sensible types.

The fixed point functors of Free and Cofree

To make that clear, I'm not talking about how the free monad looks a lot like a fixpoint combinator applied to a functor, i.e. how Free f is basically a fixed point of f. (Not that this isn't interesting!)
What I'm talking about are fixpoints of Free, Cofree :: (*->*) -> (*->*), i.e. functors f such that Free f is isomorphic to f itself.
Background: today, to firm up my rather lacking grasp on free monads, I decided to just write a few of them out for different simple functors, both for Free and for Cofree and see what better-known [co]monads they'd be isomorphic to. What intrigued me particularly was the discovery that Cofree Empty is isomorphic to Empty (meaning, Const Void, the functor that maps any type to the uninhabited). Ok, perhaps this is just stupid – I've discovered that if you put empty garbage in you get empty garbage out, yeah! – but hey, this is category theory, where whole universes rise up from seeming trivialities... right?
The immediate question is, if Cofree has such a fixed point, what about Free? Well, it certainly can't be Empty as that's not a monad. The quick suspect would be something nearby like Const () or Identity, but no:
Free (Const ()) ~~ Either () ~~ Maybe
Free Identity ~~ (Nat,) ~~ Writer Nat
Indeed, the fact that Free always adds an extra constructor suggests that the structure of any functor that's a fixed point would have to be already infinite. But it seems odd that, if Cofree has such a simple fixed point, Free should only have a much more complex one (like the fix-by-construction FixFree a = C (Free FixFree a) that Reid Barton brings up in the comments).
Is the boring truth just that Free has no “accidental fixed point” and it's a mere coincidence that Cofree has one, or am I missing something?
Your observation that Empty is a fixed point of Cofree (which is not really true in Haskell, but I guess you want to work in some model that ignores ⊥, like Set) boils down to the fact that
there is a set E (the empty set) such that for every set X, the projection p₂ : X × E -> E is an isomorphism.
We could say in this situation that E is an absorbing object for the product. We can replace the word “set” by “object of C” for any category C with products, and we get a statement about C that may or may not be true. For Set, it happens to be true.
If we pick C = Setop, which also has products (because Set has coproducts), and then dualize the language to talk about sets again, we get the statement
there is a set F such that for every set Y, the inclusion i₂ : F -> Y + F is an isomorphism.
Obviously, this statement is not true for any set F (we can pick any non-empty set Y as a counterexample for any F). No surprise there, after all Setop is a different category from Set.
So, we won't get a “trivial fixed point” of Free in the same way we got one for Cofree, because Setop is qualitatively different from Set. The initial object of Set is an absorbing element for the product, but the terminal object of Set is not an absorbing object for the coproduct.
If I may get on my soapbox for a moment:
There is much discussion among Haskell programmers about which constructions are the “duals” of which other constructions. Most of this is in a formal sense meaningless, because in category theory dualizing a construction works like this:
Suppose I have a construction which I can perform on any category C (or any category with certain extra structure and/or properties). Then the dual construction on a category C is the original construction on the opposite category Cop (which had better have the extra structure and properties we needed, if any).
For example: The notion of products makes sense in any category C (though products might not always exist), via the universal property defining products. To get a dual notion of coproducts in C we should ask what are the products in Cop, and we have just defined what products are in any category, so this notion makes sense.
The trouble with applying duality to the setting of Haskell is that the Haskell language prefers overwhelmingly to talk about just one category, Hask, in which we do our constructions. This causes two problems for talking about duality:
To obtain the dual of a construction as described above, I am supposed to be able to be able to do the construction in any category, or at least any category of a particular form. So we must first generalize the construction that, typically, we have only done in the category Hask to a larger class of categories. (And having done so, there are plenty of other interesting categories we could potentially interpret the resulting notion in besides Haskop, such as Kleisli categories of monads.)
The category Hask enjoys many special properties which can be summarized by saying that (ignoring ⊥) Hask is a cartesian closed category. For example, this implies that the initial object is an absorbing object for the product. Haskop does not have these properties, which means that the generalized notion may not make sense in Haskop; and it can also mean that two notions which happened to be equivalent in Hask are distinct in general, and have different duals.
For an example of the latter, take lenses. In Hask they can be constructed in a number of ways; two ways are in terms of getter/setter pairs and as coalgebras for the costate comonad. The former generalizes to categories with products and the second to categories enriched in a particular way over Hask. If we apply the former construction to Haskop then we get out prisms, but if we apply the latter construction to Haskop then we get algebras for the state monad and these are not the same thing.
A more familiar example might be comonads: starting from the Haskell-centric presentation
return :: a -> m a
(>>=) :: m a -> (a -> m b) -> m b
some insight seems to be needed to determine which arrows to reverse to obtain
extract :: w a -> a
extend :: w a -> (w b -> a) -> w b
The point is that it would have been much easier to start from join :: m (m a) -> m a instead of (>>=); but finding this alternative presentation (equivalent due to special features of Hask) is a creative process, not a mechanical one.
In a question like yours, and many others like it, where it is pretty clear what sense of dual is intended, there's still absolutely no reason to expect a priori that the dual construction will actually exist or have the same properties as the original, because Haskop qualitatively behaves quite differently from Hask. A slogan might be
the theory of categories is self-dual, but the theory of any particular category is not!
Since you asked about the structure of the fixed points of Free, I'm going to sketch an informal argument that Free only has one fixed point which is a Functor, namely the type
newtype FixFree a = C (Free FixFree a)
that Reid Barton described. Indeed, I make a somewhat stronger claim. Let's start with a few pieces:
newtype Fix f a = Fix (f (Fix f) a)
instance Functor (f (Fix f)) => Functor (Fix f) where
fmap f (Fix x) = Fix (fmap f x)
-- This is basically `MFunctor` from `Control.Monad.Morph`
class FFunctor (g :: (* -> *) -> * -> *) where
hoistF :: Functor f => (forall a . f a -> f' a) -> g f b -> g f' b
Notably,
instance FFunctor Free where
hoistF _f (Pure a) = Pure a
hoistF f (Free fffa) = Free . f . fmap (hoistF f) $ fffa
Then
fToFixG :: (Functor f, FFunctor g) => (forall a . f a -> g f a) -> f a -> Fix g a
fToFixG fToG fa = Fix $ hoistF (fToFixG fToG) $ fToG fa
fixGToF :: forall f b (g :: (* -> *) -> * -> *) .
(FFunctor g, Functor (g (Fix g)))
=> (forall a . g f a -> f a) -> Fix g b -> f b
fixGToF gToF (Fix ga) = gToF $ hoistF (fixGToF gToF) ga
If I'm not mistaken (which I could be), passing each side of an isomorphism between f and g f to each of these functions will yield each side of an isomorphism between f and Fix g. Substituting Free for g will demonstrate the claim. This argument is very hand-wavey, of course, because Haskell is inconsistent.

Why doesn't Haskell have a stronger alternative to Eq?

The reason why Set is not a functor is given here. It seems to boil down to the fact that a == b && f a /= f b is possible. So, why doesn't Haskell have as standard an alternative to Eq, something like
class Eq a => StrongEq a where
(===) :: a -> a -> Bool
(/==) :: a -> a -> Bool
x /== y = not (x === y)
x === y = not (x /== y)
for which instances are supposed to obey the laws
∀a,b,f. not (a === b) || (f a === f b)
∀a. a === a
∀a,b. (a === b) == (b === a)
and maybe some others? Then we could have:
instance StrongEq a => Functor (Set a) where
-- ...
Or am I missing something?
Edit: my problem is not “Why are there types without an Eq instance?”, like some of you seem to have answered. It's the opposite: “Why are there instances of Eq that aren't extensionally equal? Why are there too many Eq instances?”, combined with “If a == b does imply extensional equality, why is Set not an instance of Functor?”.
Also, my instance declaration is rubbish (thanks #n.m.). I should have said:
newtype StrongSet a = StrongSet (Set a)
instance Functor StrongSet where
fmap :: (StrongEq a, StrongEq b) => (a -> b) -> StrongSet a -> StrongSet b
fmap (StrongSet s) = StrongSet (map s)
instance StrongEq a => Functor (Set a) where
This makes sense neither in Haskell nor in the grand mathematical/categorical scheme of things, regardless of what StrongEq means.
In Haskell, Functor requires a type constructor of kind * -> *. The arrow reflects the fact that in category theory, a functor is a kind of mapping. [] and (the hypothetical) Set are such type constructors. [a] and Set a have kind * and cannot be functors.
In Haskell, it is hard to define Set such that it can be made into a Functor because equality cannot be sensibly defined for some types no matter what. You cannot compare two things of type Integer->Integer, for example.
Let's suppose there is a function
goedel :: Integer -> Integer -> Integer
goedel x y = -- compute the result of a function with
-- Goedel number x, applied to y
Suppose you have a value s :: Set Integer. What fmap goedel s should look like? How do you eliminate duplicates?
In your typical set theory equality is magically defined for everything, including functions, so Set (or Powerset to be precise) is a functor, no problem with that.
Since I'm not a category theorist, I'll try to write a more concrete/practical explanation (i.e., one I can understand):
The key point is the one that #leftaroundabout made in a comment:
== is supposed to
witness "equivalent by all observable means" (that doesn't necessarily
require a == b must hold only for identical implementations; but
anything you can "officially" do with a and b should again yield
equivalent results. So unAlwaysEq should never be exposed in the first
place). If you can't ensure this for some type, you shouldn't give it
an Eq instance.
That is, there should be no need for your StrongEq because that's what Eq is supposed to be already.
Haskell values are often intended to represent some sort of mathematical or "real-life" value. Many times, this representation is one-to-one. For example, consider the type
data PlatonicSolid = Tetrahedron | Cube |
Octahedron | Dodecahedron | Icosahedron
This type contains exactly one representation of each Platonic solid. We can take advantage of this by adding deriving Eq to the declaration, and it will produce the correct instance.
In many cases, however, the same abstract value may be represented by more than one Haskell value. For example, the red-black trees Node B (Node R Leaf 1 Leaf) 2 Leaf and Node B Leaf 1 (Node R Leaf 2 Leaf) can both represent the set {1,2}. If we added deriving Eq to our declaration, we would get an instance of Eq that distinguishes things we want to be considered the same (outside of the implementation of the set operations).
It's important to make sure that types are only made instances of Eq (and Ord) when appropriate! It's very tempting to make something an instance of Ord just so you can stick it in a data structure that requires ordering, but if the ordering is not truly a total ordering of the abstract values, all manner of breakage may ensue. Unless the documentation absolutely guarantees it, for example, a function called sort :: Ord a => [a] -> [a] may not only be an unstable sort, but may not even produce a list containing all the Haskell values that go into it. sort [Bad 1 "Bob", Bad 1 "James"] can reasonably produce [Bad 1 "Bob", Bad 1 "James"], [Bad 1 "James", Bad 1 "Bob"], [Bad 1 "James", Bad 1 "James"], or [Bad 1 "Bob", Bad 1 "Bob"]. All of these are perfectly legitimate. A function that uses unsafePerformIO in the back room to implement a Las Vegas-style randomized algorithm or to race threads against each other to get an answer from the fastest may even give different results different times, as long as they're == to each other.
tl;dr: Making something an instance of Eq is a way of making a very strong statement to the world; don't make that statement if you don't mean it.
Your second Functor instance also doesn't make any sense. The biggest reason why Set can't be a Functor in Haskell is fmap can't have constraints. Inventing different notions of equality as StrongEq doesn't change the fact that you can't write those constraints on fmap in your Set instance.
fmap in general shouldn't have the constraints you need. It makes perfect sense to have functors of functions, for example (without it the whole notion of using Applicative to apply functions inside a functor breaks down), and functions can't be members of Eq or your StrongEq in general.
fmap can't have extra constraints on only some instances, because of code like this:
fmapBoth :: (Functor f, Functor g) => (a -> b, c -> d) -> (f a, g c) -> (f b, g d)
fmapBoth (h, j) (x, y) = (fmap h x, fmap j y)
This code claims to work regardless of the functors f and g, and regardless of the functions h and j. It has no way of checking whether one of the functors is a special one that has extra constraints on fmap, nor any way of checking whether one of the functions it's applying would violate those constraints.
Saying that Set is a Functor in Haskell, is saying that there is a (lawful) operation fmap :: (a -> b) -> Set a -> Set b, with that exact type. That is precisely what Functor means. fmap :: (Eq a -> Eq b) => (a -> b) -> Set a -> Set b is not an example of such an operation.
It is possible, I understand, to use the ConstraintKinds GHC extendsion to write a different Functor class that permits constraints on the values which vary by Functor (and what you actually need is an Ord constraint, not just Eq). This blog post talks about doing so to make a new Monad class which can have an instance for Set. I've never played around with code like this, so I don't know much more than that the technique exists. It wouldn't help you hand off Sets to existing code that needs Functors, but you should be able to use it instead of Functor in your own code if you wish.
This notion of StrongEq is tough. In general, equality is a place where computer science becomes significantly more rigorous than typical mathematics in the kind of way which makes things challenging.
In particular, typical mathematics likes to talk about objects as though they exist in a set and can be uniquely identified. Computer programs usually deal with types which are not always computable (as a simple counterexample, tell me what the set corresponding to the type data U = U (U -> U) is). This means that it may be undecidable as to whether two values are identifiable.
This becomes an enormous topic in dependently typed languages since typechecking requires identifying like types and dependently typed languages may have arbitrary values in their types and thus need a way to project equality.
So, StrongEq could be defined over a restricted part of Haskell containing only the types which can be decidably compared for equality. We can consider this a category with the arrows as computable functions and then see Set as an endofunctor from types to the type of sets of values of that type. Unfortunately, these restrictions have taken us far from standard Haskell and make defining StrongEq or Functor (Set a) a little less than practical.

Monad theory and Haskell

Most tutorials seem to give a lot of examples of monads (IO, state, list and so on) and then expect the reader to be able to abstract the overall principle and then they mention category theory. I don't tend to learn very well by trying generalise from examples and I would like to understand from a theoretical point of view why this pattern is so important.
Judging from this thread:
Can anyone explain Monads?
this is a common problem, and I've tried looking at most of the tutorials suggested (except the Brian Beck videos which won't play on my linux machine):
Does anyone know of a tutorial that starts from category theory and explains IO, state, list monads in those terms? the following is my unsuccessful attempt to do so:
As I understand it a monad consists of a triple: an endo-functor and two natural transformations.
The functor is usually shown with the type:
(a -> b) -> (m a -> m b)
I included the second bracket just to emphasise the symmetry.
But, this is an endofunctor, so shouldn't the domain and codomain be the same like this?:
(a -> b) -> (a -> b)
I think the answer is that the domain and codomain both have a type of:
(a -> b) | (m a -> m b) | (m m a -> m m b) and so on ...
But I'm not really sure if that works or fits in with the definition of the functor given?
When we move on to the natural transformation it gets even worse. If I understand correctly a natural transformation is a second order functor (with certain rules) that is a functor from one functor to another one. So since we have defined the functor above the general type of the natural transformations would be:
((a -> b) -> (m a -> m b)) -> ((a -> b) -> (m a -> m b))
But the actual natural transformations we are using have type:
a -> m a
m a -> (a ->m b) -> m b
Are these subsets of the general form above? and why are they natural transformations?
Martin
A quick disclaimer: I'm a little shaky on category theory in general, while I get the impression you have at least some familiarity with it. Hopefully I won't make too much of a hash of this...
Does anyone know of a tutorial that starts from category theory and explains IO, state, list monads in those terms?
First of all, ignore IO for now, it's full of dark magic. It works as a model of imperative computations for the same reasons that State works for modelling stateful computations, but unlike the latter IO is a black box with no way to deduce the monadic structure from the outside.
The functor is usually shown with the type: (a -> b) -> (m a -> m b) I included the second bracket just to emphasise the symmetry.
But, this is an endofunctor, so shouldn't the domain and codomain be the same like this?:
I suspect you are misinterpreting how type variables in Haskell relate to the category theory concepts.
First of all, yes, that specifies an endofunctor, on the category of Haskell types. A type variable such as a is not anything in this category, however; it's a variable that is (implicitly) universally quantified over all objects in the category. Thus, the type (a -> b) -> (a -> b) describes only endofunctors that map every object to itself.
Type constructors describe an injective function on objects, where the elements of the constructor's codomain cannot be described by any means except as an application of the type constructor. Even if two type constructors produce isomorphic results, the resulting types remain distinct. Note that type constructors are not, in the general case, functors.
The type variable m in the functor signature, then, represents a one-argument type constructor. Out of context this would normally be read as universal quantification, but that's incorrect in this case since no such function can exist. Rather, the type class definition binds m and allows the definition of such functions for specific type constructors.
The resulting function, then, says that for any type constructor m which has fmap defined, for any two objects a and b and a morphism between them, we can find a morphism between the types given by applying m to a and b.
Note that while the above does, of course, define an endofunctor on Hask, it is not even remotely general enough to describe all such endofunctors.
But the actual natural transformations we are using have type:
a -> m a
m a -> (a ->m b) -> m b
Are these subsets of the general form above? and why are they natural transformations?
Well, no, they aren't. A natural transformation is roughly a function (not a functor) between functors. The two natural transformations that specify a monad M look like I -> M where I is the identity functor, and M ∘ M -> M where ∘ is functor composition. In Haskell, we have no good way of working directly with either a true identity functor or with functor composition. Instead, we discard the identity functor to get just (Functor m) => a -> m a for the first, and write out nested type constructor application as (Functor m) => m (m a) -> m a for the second.
The first of these is obviously return; the second is a function called join, which is not part of the type class. However, join can be written in terms of (>>=), and the latter is more often useful in day-to-day programming.
As far as specific monads go, if you want a more mathematical description, here's a quick sketch of an example:
For some fixed type S, consider two functors F and G where F(x) = (S, x) and G(x) = S -> x (It should hopefully be obvious that these are indeed valid functors).
These functors are also adjoints; consider natural transformations unit :: x -> G (F x) and counit :: F (G x) -> x. Expanding the definitions gives us unit :: x -> (S -> (S, x)) and counit :: (S, S -> x) -> x. The types suggest uncurried function application and tuple construction; feel free to verify that those work as expected.
An adjunction gives rise to a monad by composition of the functors, so taking G ∘ F and expanding the definition, we get G (F x) = S -> (S, x), which is the definition of the State monad. The unit for the adjunction is of course return; and you should be able to use counit to define join.
This page does exactly that. I think your main confusion is that the class doesn't really make the Type a functor, but it defines a functor from the category of Haskell types into the category of that type.
Following the notation of the link, assuming F is a Haskell Functor, it means that there is a functor from the category of Hask to the category of F.
Roughly speaking, Haskell does its category theory all in just one category, whose objects are Haskell types and whose arrows are functions between these types. It's definitely not a general-purpose language for modelling category theory.
A (mathematical) functor is an operation turning things in one category into things in another, possibly entirely different, category. An endofunctor is then a functor which happens to have the same source and target categories. In Haskell, a functor is an operation turning things in the category of Haskell types into other things also in the category of Haskell types, so it is always an endofunctor.
[If you're following the mathematical literature, technically, the operation '(a->b)->(m a -> m b)' is just the arrow part of the endofunctor m, and 'm' is the object part]
When Haskellers talk about working 'in a monad' they really mean working in the Kleisli category of the monad. The Kleisli category of a monad is a thoroughly confusing beast at first, and normally needs at least two colours of ink to give a good explanation, so take the following attempt for what it is and check out some references (unfortunately Wikipedia is useless here for all but the straight definitions).
Suppose you have a monad 'm' on the category C of Haskell types. Its Kleisli category Kl(m) has the same objects as C, namely Haskell types, but an arrow a ~(f)~> b in Kl(m) is an arrow a -(f)-> mb in C. (I've used a squiggly line in my Kleisli arrow to distinguish the two). To reiterate: the objects and arrows of the Kl(C) are also objects and arrows of C but the arrows point to different objects in Kl(C) than in C. If this doesn't strike you as odd, read it again more carefully!
Concretely, consider the Maybe monad. Its Kleisli category is just the collection of Haskell types, and its arrows a ~(f)~> b are functions a -(f)-> Maybe b. Or consider the (State s) monad whose arrows a ~(f)~> b are functions a -(f)-> (State s b) == a -(f)-> (s->(s,b)). In any case, you're always writing a squiggly arrow as a shorthand for doing something to the type of the codomain of your functions.
[Note that State is not a monad, because the kind of State is * -> * -> *, so you need to supply one of the type parameters to turn it into a mathematical monad.]
So far so good, hopefully, but suppose you want to compose arrows a ~(f)~> b and b ~(g)~> c. These are really Haskell functions a -(f)-> mb and b -(g)-> mc which you cannot compose because the types don't match. The mathematical solution is to use the 'multiplication' natural transformation u:mm->m of the monad as follows: a ~(f)~> b ~(g)~> c == a -(f)-> mb -(mg)-> mmc -(u_c)-> mc to get an arrow a->mc which is a Kleisli arrow a ~(f;g)~> c as required.
Perhaps a concrete example helps here. In the Maybe monad, you cannot compose functions f : a -> Maybe b and g : b -> Maybe c directly, but by lifting g to
Maybe_g :: Maybe b -> Maybe (Maybe c)
Maybe_g Nothing = Nothing
Maybe_g (Just a) = Just (g a)
and using the 'obvious'
u :: Maybe (Maybe c) -> Maybe c
u Nothing = Nothing
u (Just Nothing) = Nothing
u (Just (Just c)) = Just c
you can form the composition u . Maybe_g . f which is the function a -> Maybe c that you wanted.
In the (State s) monad, it's similar but messier: Given two monadic functions a ~(f)~> b and b ~(g)~> c which are really a -(f)-> (s->(s,b)) and b -(g)-> (s->(s,c)) under the hood, you compose them by lifting g into
State_s_g :: (s->(s,b)) -> (s->(s,(s->(s,c))))
State_s_g p s1 = let (s2, b) = p s1 in (s2, g b)
then you apply the 'multiplication' natural transformation u, which is
u :: (s->(s,(s->(s,c)))) -> (s->(s,c))
u p1 s1 = let (s2, p2) = p1 s1 in p2 s2
which (sort of) plugs the final state of f into the initial state of g.
In Haskell, this turns out to be a bit of an unnatural way to work so instead there's the (>>=) function which basically does the same thing as u but in a way that makes it easier to implement and use. This is important: (>>=) is not the natural transformation 'u'. You can define each in terms of the other, so they're equivalent, but they're not the same thing. The Haskell version of 'u' is written join.
The other thing missing from this definition of Kleisli categories is the identity on each object: a ~(1_a)~> a which is really a -(n_a)-> ma where n is the 'unit' natural transformation. This is written return in Haskell, and doesn't seem to cause as much confusion.
I learned category theory before I came to Haskell, and I too have had difficulty with the mismatch between what mathematicians call a monad and what they look like in Haskell. It's no easier from the other direction!
Not sure I understand what was the question but yes, you are right, monad in Haskell is defined as a triple:
m :: * -> * -- this is endofunctor from haskell types to haskell types!
return :: a -> m a
(>>=) :: m a -> (a -> m b) -> m b
but common definition from category theory is another triple:
m :: * -> *
return :: a -> m a
join :: m (m a) -> m a
It is slightly confusing but it's not so hard to show that these two definitions are equal.
To do that we need to define join in terms of (>>=) (and vice versa).
First step:
join :: m (m a) -> m a
join x = ?
This gives us x :: m (m a).
All we can do with something that have type m _ is to aply (>>=) to it:
(x >>=) :: (m a -> m b) -> m b
Now we need something as a second argument for (>>=), and also,
from the type of join we have constraint (x >>= y) :: ma.
So y here will have type y :: ma -> ma and id :: a -> a fits it very well:
join x = x >>= id
The other way
(>>=) :: ma -> (a -> mb) -> m b
(>>=) x y = ?
Where x :: m a and y :: a -> m b.
To get m b from x and y we need something of type a.
Unfortunately, we can't extract a from m a. But we can substitute it for something else (remember, monad is a functor also):
fmap :: (a -> b) -> m a -> m b
fmap y x :: m (m b)
And it's perfectly fits as argument for join: (>>=) x y = join (fmap y x).
The best way to look at monads and computational effects is to start with where Haskell got the notion of monads for computational effects from, and then look at Haskell after you understand that. See this paper in particular: Notions of Computation and Monads, by E. Moggi.
See also Moggi's earlier paper which shows how monads work for the lambda calculus alone: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.26.2787
The fact that monads capture substitution, among other things (http://blog.sigfpe.com/2009/12/where-do-monads-come-from.html), and substitution is key to the lambda calculus, should give a good clue as to why they have so much expressive power.
While monads originally came from category theory, this doesn't mean that category theory is the only abstract context in which you can view them. A different viewpoint is given by operational semantics. For an introduction, have a look at my Operational Monad Tutorial.
One way to look at IO is to consider it as a strange kind of state monad. Remember that the state monad looks like:
data State s a = State (s -> (s, a))
where the "s" argument is the data type you want to thread through the computation. Also, this version of "State" doesn't have "get" and "put" actions and we don't export the constructor.
Now imagine a type
data RealWorld = RealWorld ......
This has no real definition, but notionally a value of type RealWorld holds the state of the entire universe outside the computer. Of course we can never have a value of type RealWorld, but you can imagine something like:
getChar :: RealWorld -> (RealWorld, Char)
In other words the "getChar" function takes a state of the universe before the keyboard button has been pressed, and returns the key pressed plus the state of the universe after the key has been pressed. Of course the problem is that the previous state of the world is still available to be referenced, which can't happen in reality.
But now we write this:
type IO = State RealWorld
getChar :: IO Char
Notionally, all we have done is wrap the previous version of "getChar" as a state action. But by doing this we can no longer access the "RealWorld" values because they are wrapped up inside the State monad.
So when a Haskell program wants to change a lightbulb it takes hold of the bulb and applies a "rotate" function to the RealWorld value inside IO.
For me, so far, the explanation that comes closest to tie together monads in category theory and monads in Haskell is that monads are a monid whose objects have the type a->m b. I can see that these objects are very close to an endofunctor and so the composition of such functions are related to an imperative sequence of program statements. Also functions which return IO functions are valid in pure functional code until the inner function is called from outside.
This id element is 'a -> m a' which fits in very well but the multiplication element is function composition which should be:
(>=>) :: Monad m => (a -> m b) -> (b -> m c) -> (a -> m c)
This is not quite function composition, but close enough (I think to get true function composition we need a complementary function which turns m b back into a, then we get function composition if we apply these in pairs?), I'm not quite sure how to get from that to this:
(>>=) :: Monad m => m a -> (a -> m b) -> m b
I've got a feeling I may have seen an explanation of this in all the stuff that I read, without understanding its significance the first time through, so I will do some re-reading to try to (re)find an explanation of this.
The other thing I would like to do is link together all the different category theory explanations: endofunctor+2 natural transformations, Kleisli category, a monoid whose objects are monoids and so on. For me the thing that seems to link all these explanations is that they are two level. That is, normally we treat category objects as black-boxes where we imply their properties from their outside interactions, but here there seems to be a need to go one level inside the objects to see what’s going on? We can explain monads without this but only if we accept apparently arbitrary constructions.
Martin
See this question: is chaining operations the only thing that the monad class solves?
In it, I explain my vision that we must differentiate between the Monad class and individual types that solve individual problems. The Monad class, by itself, only solve the important problem of "chaining operations with choice" and mades this solution available to types being instance of it (by means of "inheritance").
On the other hand, if a given type that solves a given problem faces the problem of "chaining operations with choice" then, it should be made an instance (inherit) of the Monad class.
The fact is that problems not get solved merely by being a Monad. It would be like saying that "wheels" solve many problems, but actually "wheels" only solve a problem, and things with wheels solve many different problems.

Resources