Algebraically interpreting polymorphism - haskell

So I understand the basic algebraic interpretation of types:
Either a b ~ a + b
(a, b) ~ a * b
a -> b ~ b^a
() ~ 1
Void ~ 0 -- from Data.Void
... and that these relations are true for concrete types, like Bool, as opposed to polymorphic types like a. I also know how to translate type signatures with polymorphic types into their concrete type representations by just translating the Church encoding according to the following isomorphism:
(forall r . (a -> r) -> r) ~ a
So if I have:
id :: forall a . a -> a
I know that it does not mean id ~ a^a, but it actually means:
id :: forall a . (() -> a) -> a
id ~ ()
~ 1
Similarly:
pair :: forall r . (a -> b -> r) -> r
pair ~ ((a, b) -> r) - > r
~ (a, b)
~ a * b
Which brings me to my question. What is the "algebraic" interpretation of this rule:
(forall r . (a -> r) -> r) ~ a
For every concrete type isomorphism I can point to an equivalent algebraic rule, such as:
(a, (b, c)) ~ ((a, b), c)
a * (b * c) = (a * b) * c
a -> (b -> c) ~ (a, b) -> c
(c^b)^a = c^(b * a)
But I don't understand the algebraic equality that is analogous to:
(forall r . (a -> r) -> r) ~ a

This is the famous Yoneda lemma for the identity functor.
Check this post for a readable introduction, and any category theory textbook for more.
Briefly, given f :: forall r. (a -> r) -> r you can apply f id to get an a, and conversely, given x :: a you can take ($x) to get forall r. (a -> r) -> r.
These operations are mutually inverse. Proof:
Obviously ($x) id == x. I will show that
($(f id)) == f,
since functions are equal when they are equal on all arguments, let's take x :: a -> r and show that
($(f id)) x == f x i.e.
x (f id) == f x.
Since f is polymorphic, it works as a natural transformation; this is the naturality diagram for f:
f_A
Hom(A, A) → A
(x.) ↓ ↓ x
Hom(A, R) → R
f_R
So x . f == f . (x.).
Plugging identity, (x . f) id == f x. QED

(Rewritten for clarity)
There seem to be two parts to your question. One is implied and is asking what the algebraic interpretation of forall is, and the other is asking about the cont/Yoneda transformation, which sdcvvc's answer already covered pretty well.
I'll try to address the algebraic interpretation of forall for you. You mention that A -> B is B^A but I'd like to take that a step further and expand it out to B * B * B * ... * B (|A| times). Although we do have exponentiation as a notation for repeated multiplication like that, there's a more flexible notation, ∏ (uppercase Pi) representing arbitrary indexed products. There are two components to a Pi: the range of values we want to multiply over, and the expression that we're multiplying out. For example, at the value level, you might express the factorial function as fact i = ∏ [1..i] (λx -> x).
Going back to the world of types, we can view the exponentiation operator in the A -> B ~ B^A correspondence as a Pi: B^A ~ ∏ A (λ_ -> B). This says that we're defining an A-ary product of Bs, such that the Bs cannot depend on the particular A we've chosen. Sure, it's equivalent to plain exponentiation, but it lets us move up to cases in which there is a dependence.
In the most general case, we get dependent types, like what you see in Agda or Coq: in Agda syntax, replicate : Bool -> ((n : Nat) -> Vec Bool n) is one possible application of a Pi type, which could be expressed more explicitly as replicate : Bool -> ∏ Nat (Vec Bool), or further as replicate : ∏ Bool (λ_ -> ∏ Nat (Vec Bool)).
Note that as you might expect from the underlying algebra, you can fuse both of the ∏s in the definition of replicate above into a single ∏ ranging over the cartesian product of the domains: ∏ Bool (\_ -> ∏ Nat (Vec Bool)) is equivalent to ∏ (Bool, Nat) (λ(_, n) -> Vec Bool n) just like it would be at the "value level". This is simply uncurrying from the perspective of type theory.
I do realize your question was about polymorphism, so I'll stop going on about dependent types, but they are relevant: forall in Haskell is roughly equivalent to a ∏ with a domain over the type (kind) of types, *. Indeed, the function-like behavior of polymorphism can be observed directly in GHC core, which types them as capital lambdas (Λ). As such, a polymorphic type like forall a. a -> a is actually just ∏ * (Λ a -> (a -> a)) (using the Λ notation now that we distinguish between types and values), which can be expanded out to the infinite product (Bool -> Bool, Int -> Int, () -> (), (Int -> Bool) -> (Int -> Bool), ...) for every possible type. Instantiation of the type variable is simply projecting out the suitable element from the *-ary product (or applying the type function).
Now, for the big piece I missed in my original version of this answer: parametricity. Parametricity can be described in several different ways, but none of the ones I know of (viewing types as relations, or (di)naturality in category theory) really has a very algebraic interpretation. For our purposes, though, it boils down to something fairly simple: you can't pattern-match on *. I know that GHC lets you do that at the type level with type families, but you can only cover a finite chunk of * when doing that, so there are necessarily always points at which your type family is undefined.
What this means, from the point of view of polymorphism, is that any type function F we write in ∏ * F must either be constant (i.e., completely ignore the type it was polymorphic over) or pass the type through unchanged. Thus, ∏ * (Λ _ -> B) is valid because it ignores its argument, and corresponds to forall a. B. The other case is something like ∏ * (Λ x -> Maybe x), which corresponds to forall a. Maybe a, which doesn't ignore the type argument, but only "passes it through". As such, a ∏ A that has an irrelevant domain A (such as when A = *) can be seen as more of an A-ary indexed intersection (picking the common elements across all instantiations of the index), rather than a product.
Crucially, at the value level, the rules of parametricity prevent any funny behavior that might suggest the types are larger than they really are. Because we don't have typecase, we can't construct a value of type forall a. B that does something different based on what a was instantiated to. Thus, although the type is technically a function * -> B, it is always a constant function, and is thus equivalent to a single value of B. Using the ∏ interpretation, it is indeed equivalent to an infinite *-ary product of Bs, but those B values must always be identical, so the infinite product is effectively as big as a single B.
Similarly, although ∏ * (Λ x -> (x -> x)) (a.k.a., forall a. a -> a) is technically equivalent to an infinite product of functions, none of those functions can inspect the type, so all are constrained to only return their input value and not do any funny business like (+1) : Int -> Int when instantiated to Int. Because there is only one (assuming a total language) function that can't inspect the type of its argument but must return a value of that same type, the infinite product is thus just as large as a single value.
Now, about your direct question on (forall r . (a -> r) -> r) ~ a. First, let's express your ~ operator more formally. It's really isomorphism, so we need two functions going back and forth, and an argument that they're inverses.
data Iso a b = Iso
{ to :: a -> b
, from :: b -> a
-- proof1 :: forall x. to (from x) == x
-- proof2 :: forall x. from (to x) == x
}
and now we express your original question in more formal terms. Your question amounts to constructing a term of the following (impredicative, so GHC has trouble with it, but we'll survive) type:
forall a. Iso (forall r. (a -> r) -> r) a
Which, using my earlier terminology, amounts to ∏ * (Λ a -> Iso (∏ * (Λ r -> ((a -> r) -> r))) a). Once again we have an infinite product that can't inspect its type argument. By handwaving, we can argue that the only possible values considering the parametricity rules (the other two proofs are respected automatically) for to and from are ($ id) and flip id.
If this feels unsatisfying, it's probably because the algebraic interpretation of forall didn't really add anything to the proof. It's really just plain old type theory, but I hope I was able to provide something that feels a little less categorical than the Yoneda form of it. It's worth noting that we don't actually need to use parametricity to write proof1 and proof2 above, though. Parametricity only enters the picture when we want to state that ($ id) and flip id are our only options for to and from (which we can't prove in Agda or Coq, for that reason).

To (attempt to) answer the actual question (which is less interesting than the answers to the broader issues raised), the question is ill formed because of a "type error"
Either ~ (+)
(,) ~ (*)
(->) b ~ flip (^)
() ~ 1
Void ~ 0
These all map types to integers, and type constructors to functions on naturals. In a sense, you have a functor from the category of types to the category of naturals. In the other direction, you "forget" stuff, since the types preserve algebraic structure while the naturals throw it away. I.e. given Either () () you can get a unique natural, but given that natural, you can get many types.
But this is different:
(forall r . (a -> r) -> r) ~ a
It maps a type to another type! It is not part of the above functor. It's just an isomorphism within the category of types. So let's give that a different symbol, <=>
Now we have
(forall r . (a -> r) -> r) <=> a
Now you note that we can not only send types to nats and arrows to arrows, but also some isomorphisms to other isomorphisms:
(a, (b, c)) <=> ((a, b), c) ~ a * (b * c) = (a * b) * c
But something subtle is going on here. In a sense, the latter isomorphism on pairs is true because the algebraic identity is true. This is to say that the "isomorphism" in the latter simply means that the two types are equivalent under the image of our functor to the nats.
The former isomorphism we need to prove directly, which is where we start to get to the underlying question -- is given our functor to the nats, what does forall r. map to? But the answer is that forall r. is neither a type, nor a meaningful arrow between types.
By introducing forall, we have moved away from first order types. There's no reason to expect that forall should fit in our above Functor, and indeed, it doesn't.
So we can explore, as others have above, why the isomorphism holds (which is itself very interesting) -- but in doing so we've abandoned the algebraic core of the question. A question which can be answered, I think, is, given the category of higher-order types and constructors as arrows between them, what is there meaningful Functor to?
Edit:
So now I have another approach which shows why adding polymorphism makes things go nuts. We start by asking a simpler question -- does a given polymorphic type have zero or more than zero inhabitants? This is the type inhabitation problem, and winds up being, via Curry-Howard, a problem in modified realizability, since it's the same thing as asking if a formula in some logic is realizable in an appropriate computational model. Now as that page explains, this is decidable in the simply typed lambda calculus but is PSPACE-complete. But once we move to anything more complicated, by adding polymorphism for example and going to System F, then it goes to undecidable!
So, if we can't decide if an arbitrary type is inhabited at all, then we clearly can't decide how many inhabitants it has!

It's an interesting question. I don't have a full answer, but this was too long for a comment.
The type signature (forall r. (a -> r) -> r) can be expressed as me saying
For any type r that you care to name, if you give me a function that takes a and produces an r, then I will give you back an r.
Now, this has to work for any type r, but it can be a specific type a. So the way for me to pull of this neat trick is to have an a sitting around somewhere, that I feed to the function (which produces an r for me) and then I hand that r back to you.
But if I have an a sitting around, I could give it to you:
If you give me a 1, I'll give you an a.
which corresponds to the type signature 1 -> a or simply a. By this informal argument we have
(forall r. (a -> r) -> r) ~ a
The next step would be to generate the corresponding algebraic expression, but I'm not clear on how the algebraic quantities interact with the universal quantification. We may need to wait for an expert!

A few links to the nLab:
Universal quantifier, corresponds to dependent product.
Existential quantifier, corresponds to dependent sum (dependent coproduct).
Thus, in settings of category theory:
Type | Modeled¹ as | In category
-------------------+---------------------------+-------------
Unit | Terminal object | CCC
Bottom | Initial object |
Record | Product |
Union | Sum (coproduct) |
Function | Exponential |
-------------------+---------------------------+-------------
Dependent product² | Right adjoint to pullback | LCCC
Dependent sum | Left adjoint to pullback |
¹) in appropriate category ─ CCC for total and non-polymorphic subset of Haskell (link), CPO for non-total traits of Haskell (link), LCCC for dependently typed languages.
²) forall quantification is a special case of dependent product:
∀(x :: *). y[x] ~ ∏(x : Set)y[x]
where Set is the universe of all small types.

Related

Is anything generative?

In the paper "Higher-order Type-level Programming in Haskell", an f :: Type -> Type is defined to be "generative" in the following way:
Definition (Generativity). f is generative ⇔ f a ~ g b ⇒ f ~ g
I'm going to explicitly write out the intended quantification as I understand it:
type IsGenerative :: (Type -> Type) -> Constraint
class (forall g a b. f a ~ g b => f ~ g) => IsGenerative f
Conversely, in words:
F :: Type -> Type is generative if there is no G :: Type -> Type besides F such that there exist A, B :: Type for which F A ~ G B
The paper goes on to make a statement about the generativity of unsaturated type-families (they're not generative). To my understanding, in order to be able to form the proposition of whether or not unsaturated type-families are generative, the variables f, g :: Type -> Type should range over type-families as well as type constructors. Note that this means the ~ in f ~ g must represent some more abstract sense of definitional equality than GHC's (~) :: (Type -> Type) -> (Type -> Type) -> Constraint, which cannot be applied to unsaturated type families.
Now here's the problem: it doesn't seem like anything is generative. You'd expect that a datatype constructor like Maybe :: Type -> Type would be generative, but I can easily construct a distinct type family G :: Type -> Type and A, B :: Type for which F A ~ G B (despite F /~ G).
type G :: Type -> Type
type family G a
where
G _ = Maybe Int
data Dict c
where
Dict :: c => Dict c
lhs :: Dict (Maybe Int ~ G String)
lhs = Dict
As I said before, we can't actually form the proposition Maybe ~ G within GHC (because G is not saturated), but if F ~ G is taken to mean "F is definitionally equal to G", it's pretty obvious that Maybe /~ G. So it seems like Maybe is not actually generative in the sense defined in the paper. And it seems to me that any data/newtype is susceptible to a similar sequence of reasoning.
So where am I going wrong?
Is my assumption that F, G are allowed to range over type-families as well as type constructors justified? If not, generativity seems like a rather trivial property: "we cannot form the proposition of whether type families are generative, so type families are not generative".
Am I misunderstanding how the variables are quantified in the statement of generativity?
Are there actually any type-level expressions f :: Type -> Type that satisfy the formal property of being generative?
Eh, you're overthinking it. The ~ really is the one from GHC. If you prefer, replace the claim "unsaturated type families are not generative" with "if we expanded ~ to allow unsaturated type families1, then they would not be guaranteed generative2". This latter fact is (part of) the reason we don't bother expanding ~ to allow unsaturated type families -- it would be much less useful for them than it is for other type expressions.
If they were not precise about this divide in the paper, it's just a bit of slightly sloppy writing, such as we've all done at one point or another.
1 You can probably deal with the G/Maybe situation by simply allowing type families on one side of ~ but not the other.
2 In fact, I believe it's even stronger: they would be guaranteed not to be generative.

SystemT Compiler and dealing with Infinite Types in Haskell

I'm following this blog post: http://semantic-domain.blogspot.com/2012/12/total-functional-programming-in-partial.html
It shows a small OCaml compiler program for System T (a simple total functional language).
The entire pipeline takes OCaml syntax (via Camlp4 metaprogramming) transforms them to OCaml AST that is translated to SystemT Lambda Calculus (see: module Term) and then finally SystemT Combinator Calculus (see:
module Goedel). The final step is also wrapped with OCaml metaprogramming Ast.expr type.
I'm attempting to translate it to Haskell without the use of Template Haskell.
For the SystemT Combinator form, I've written it as
{-# LANGUAGE GADTs #-}
data TNat = Zero | Succ TNat
data THom a b where
Id :: THom a a
Unit :: THom a ()
ZeroH :: THom () TNat
SuccH :: THom TNat TNat
Compose :: THom a b -> THom b c -> THom a c
Pair :: THom a b -> THom a c -> THom a (b, c)
Fst :: THom (a, b) a
Snd :: THom (a, b) b
Curry :: THom (a, b) c -> THom a (b -> c)
Eval :: THom ((a -> b), a) b -- (A = B) * A -> B
Iter :: THom a b -> THom (a, b) b -> THom (a, TNat) b
Note that Compose is forward composition, which differs from (.).
During the translation of SystemT Lambda Calculus to SystemT Combinator Calculus, the Elaborate.synth function tries to convert SystemT Lambda calculus variables into series of composed projection expressions (related to De Brujin Indices). This is because combinator calculus doesn't have variables/variable names. This is done with the Elaborate.lookup which uses the Quote.find function.
The problem is that with my encoding of the combinator calculus as the GADT THom a b. Translating the Quote.find function:
let rec find x = function
| [] -> raise Not_found
| (x', t) :: ctx' when x = x' -> <:expr< Goedel.snd >>
| (x', t) :: ctx' -> <:expr< Goedel.compose Goedel.fst $find x ctx'$ >>
Into Haskell:
find :: TVar -> Context -> _
find tvar [] = error "Not Found"
find tvar ((tvar', ttype):ctx)
| tvar == tvar' = Snd
| otherwise = Compose Fst (find tvar ctx)
Results in an infinite type error.
• Occurs check: cannot construct the infinite type: a ~ (a, c)
Expected type: THom (a, c) c
Actual type: THom ((a, c), c) c
The problem stems from the fact that using Compose and Fst and Snd from the THom a b GADT can result in infinite variations of the type signature.
The article doesn't have this problem because it appears to use the Ast.expr OCaml thing to wrap the underlying expressions.
I'm not sure how to resolve in Haskell. Should I be using an existentially quantified type like
data TExpr = forall a b. TExpr (THom a b)
Or some sort of type-level Fix to adapt the infinite type problem. However I'm unsure how this changes the find or lookup functions.
This answer will have to be a bit high-level, because there are three entirely different families of possible designs to fix that problem. What you’re doing seems closer to approach three, but the approaches are ordered by increasing complexity.
The approach in the original post
Translating the original post requires Template Haskell and partiality; find would return a Q.Exp representing some Hom a b, avoiding this problem just like the original post. Just like in the original post, a type error in the original code would be caught when typechecking the output of all the Template Haskell functions. So, type errors are still caught before runtime, but you will still need to write tests to find cases where your macros output ill-typed expressions. One can give stronger guarantees, at the cost of a significant increase in complexity.
Dependent typing/GADTs in input and output
If you want to diverge from the post, one alternative is to use “dependent typing” throughout and make the input dependently-typed. I use the term loosely to include both actually dependently-typed languages, actual Dependent Haskell (when it lands), and ways to fake dependent typing in Haskell today (via GADTs, singletons, and what not).
However, you lose the ability to write your own typechecker and choose which type system to use; typically, you end up embedding a simply-typed lambda calculus, and can simulate polymorphism via polymorphic Haskell functions that can generate terms at a given type. This is easier in dependently-typed languages, but possible at all in Haskell.
But honestly, in this road it’s easier to use higher-order abstract syntax and Haskell functions, with something like:
data Exp a where
Abs :: (Exp a -> Exp b) -> Exp (a -> b)
App :: Exp (a -> b) -> Exp a -> Exp b
Var :: String -> Exp a —- only use for free variables
exampleId :: Exp (a -> a)
exampleId = Abs (\x -> x)
If you can use this approach (details omitted here), you get high assurance from GADTs with limited complexity. However, this approach is too inflexible for many scenarios, for instance because the typing contexts are only visible to the Haskell compiler and not in your types or terms.
From untyped to typed terms
A third option is go dependently-typed and to still make your program turn weakly-typed input to strongly typed output. In this case, your typechecker overall transforms terms of some type Expr into terms of a GADT TExp gamma t, Hom a b, or such. Since you don’t know statically what gamma and t (or a and b) are, you’ll indeed need some sort of existential.
But a plain existential is too weak: to build bigger well-typed expression, you’ll need to check that the produced types allow it. For instance, to build a TExpr containing a Compose expression out of two smaller TExpr, you'll need to check (at runtime) that their types match. And with a plain existential, you can't. So you’ll need to have a representation of types also at the value level.
What's more existentials are annoying to deal with (as usual), since you can’t ever expose the hidden type variables in your return type, or project those out (unlike dependent records/sigma-types).
I am honestly not entirely sure this could eventually be made to work. Here is a possible partial sketch in Haskell, up to one case of find.
data Type t where
VNat :: Type Nat
VString :: Type String
VArrow :: Type a -> Type b -> Type (a -> b)
VPair :: Type a -> Type b -> Type (a, b)
VUnit :: Type ()
data SomeType = forall t. SomeType (Type t)
data SomeHom = forall a b. SomeHom (Type a) (Type b) (THom a b)
type Context = [(TVar, SomeType)]
getType :: Context -> SomeType
getType [] = SomeType VUnit
getType ((_, SomeType ttyp) :: gamma) =
case getType gamma of
SomeType ctxT -> SomeType (VPair ttyp
find :: TVar -> Context -> SomeHom
find tvar ((tvar’, ttyp) :: gamma)
| tvar == tvar’ =
case (ttyp, getType gamma) of
(SomeType t, SomeType ctxT) ->
SomeHom (VPair t ctxT) t Fst

What's the difference between parametric polymorphism and higher-kinded types?

I am pretty sure they are not the same. However, I am bogged down by the
common notion that "Rust does not support" higher-kinded types (HKT), but
instead offers parametric polymorphism. I tried to get my head around that and understand the difference between these, but got just more and more entangled.
To my understanding, there are higher-kinded types in Rust, at least the basics. Using the "*"-notation, a HKT does have a kind of e.g. * -> *.
For example, Maybe is of kind * -> * and could be implemented like this in Haskell.
data Maybe a = Just a | Nothing
Here,
Maybe is a type constructor and needs to be applied to a concrete type
to become a concrete type of kind "*".
Just a and Nothing are data constructors.
In textbooks about Haskell, this is often used as an example for a higher-kinded type. However, in Rust it can simply be implemented as an enum, which after all is a sum type:
enum Maybe<T> {
Just(T),
Nothing,
}
Where is the difference? To my understanding this is a
perfectly fine example of a higher-kinded type.
If in Haskell this is used as a textbook example of HKTs, why is it
said that Rust doesn't have HKT? Doesn't the Maybe enum qualify as a
HKT?
Should it rather be said that Rust doesn't fully support HKT?
What's the fundamental difference between HKT and parametric polymorphism?
This confusion continues when looking at functions, I can write a parametric
function that takes a Maybe, and to my understanding a HKT as a function
argument.
fn do_something<T>(input: Maybe<T>) {
// implementation
}
again, in Haskell that would be something like
do_something :: Maybe a -> ()
do_something :: Maybe a -> ()
do_something _ = ()
which leads to the fourth question.
Where exactly does the support for higher-kinded types end? Whats the
minimal example to make Rust's type system fail to express HKT?
Related Questions:
I went through a lot of questions related to the topic (including links they have to blogposts, etc.) but I could not find an answer to my main questions (1 and 2).
In Haskell, are "higher-kinded types" *really* types? Or do they merely denote collections of *concrete* types and nothing more?
Generic struct over a generic type without type parameter
Higher Kinded Types in Scala
What types of problems helps "higher-kinded polymorphism" solve better?
Abstract Data Types vs. Parametric Polymorphism in Haskell
Update
Thank you for the many good answers which are all very detailed and helped a lot. I decided to accept Andreas Rossberg's answer since his explanation helped me the most to get on the right track. Especially the part about terminology.
I was really locked in the cycle of thinking that everything of kind * -> * ... -> * is higher-kinded. The explanation that stressed the difference between * -> * -> * and (* -> *) -> * was crucial for me.
Some terminology:
The kind * is sometimes called ground. You can think of it as 0th order.
Any kind of the form * -> * -> ... -> * with at least one arrow is first-order.
A higher-order kind is one that has a "nested arrow on the left", e.g., (* -> *) -> *.
The order essentially is the depth of left-side nesting of arrows, e.g., (* -> *) -> * is second-order, ((* -> *) -> *) -> * is third-order, etc. (FWIW, the same notion applies to types themselves: a second-order function is one whose type has e.g. the form (A -> B) -> C.)
Types of non-ground kind (order > 0) are also called type constructors (and some literature only refers to types of ground kind as "types"). A higher-kinded type (constructor) is one whose kind is higher-order (order > 1).
Consequently, a higher-kinded type is one that takes an argument of non-ground kind. That would require type variables of non-ground kind, which are not supported in many languages. Examples in Haskell:
type Ground = Int
type FirstOrder a = Maybe a -- a is ground
type SecondOrder c = c Int -- c is a first-order constructor
type ThirdOrder c = c Maybe -- c is second-order
The latter two are higher-kinded.
Likewise, higher-kinded polymorphism describes the presence of (parametrically) polymorphic values that abstract over types that are not ground. Again, few languages support that. Example:
f : forall c. c Int -> c Int -- c is a constructor
The statement that Rust supports parametric polymorphism "instead" of higher-kinded types does not make sense. Both are different dimensions of parameterisation that complement each other. And when you combine both you have higher-kinded polymorphism.
A simple example of what Rust can't do is something like Haskell's Functor class.
class Functor f where
fmap :: (a -> b) -> f a -> f b
-- a couple examples:
instance Functor Maybe where
-- fmap :: (a -> b) -> Maybe a -> Maybe b
fmap _ Nothing = Nothing
fmap f (Just x) = Just (f x)
instance Functor [] where
-- fmap :: (a -> b) -> [a] -> [b]
fmap _ [] = []
fmap f (x:xs) = f x : fmap f xs
Note that the instances are defined on the type constructor, Maybe or [], instead of the fully-applied type Maybe a or [a].
This isn't just a parlor trick. It has a strong interaction with parametric polymorphism. Since the type variables a and b in the type fmap are not constrained by the class definition, instances of Functor cannot change their behavior based on them. This is an incredibly strong property in reasoning about code from types, and where a lot of where the strength of Haskell's type system comes from.
It has one other property - you can write code that's abstract in higher-kinded type variables. Here's a couple examples:
focusFirst :: Functor f => (a -> f b) -> (a, c) -> f (b, c)
focusFirst f (a, c) = fmap (\x -> (x, c)) (f a)
focusSecond :: Functor f => (a -> f b) -> (c, a) -> f (c, b)
focusSecond f (c, a) = fmap (\x -> (c, x)) (f a)
I admit, those types are beginning to look like abstract nonsense. But they turn out to be really practical when you have a couple helpers that take advantage of the higher-kinded abstraction.
newtype Identity a = Identity { runIdentity :: a }
instance Functor Identity where
-- fmap :: (a -> b) -> Identity a -> Identity b
fmap f (Identity x) = Identity (f x)
newtype Const c b = Const { getConst :: c }
instance Functor (Const c) where
-- fmap :: (a -> b) -> Const c a -> Const c b
fmap _ (Const c) = Const c
set :: ((a -> Identity b) -> s -> Identity t) -> b -> s -> t
set f b s = runIdentity (f (\_ -> Identity b) s)
get :: ((a -> Const a b) -> s -> Const a t) -> s -> a
get f s = getConst (f (\x -> Const x) s)
(If I made any mistakes in there, can someone just fix them? I'm reimplementing the most basic starting point of lens from memory without a compiler.)
The functions focusFirst and focusSecond can be passed as the first argument to either get or set, because the type variable f in their types can be unified with the more concrete types in get and set. Being able to abstract over the higher-kinded type variable f allows functions of a particular shape can be used both as setters and getters in arbitrary data types. This is one of the two core insights that led to the lens library. It couldn't exist without this kind of abstraction.
(For what it's worth, the other key insight is that defining lenses as a function like that allows composition of lenses to be simple function composition.)
So no, there's more to it than just being able to accept a type variable. The important part is being able to use type variables that correspond to type constructors, rather than some concrete (if unknown) type.
I'm going to resume it: a higher-kinded type is just a type-level higher-order function.
But take a minute:
Consider monad transformers:
newtype StateT s m a :: * -> (* -> *) -> * -> *
Here,
- s is the desired type of the state
- m is a functor, another monad that StateT will wrap
- a is the return type of an expression of type StateT s m
What is the higher-kinded type?
m :: (* -> *)
Because takes a type of kind * and returns a kind of type *.
It's like a function on types, that is, a type constructor of kind
* -> *
In languages like Java, you can't do
class ClassExample<T, a> {
T<a> function()
}
In Haskell T would have kind *->*, but a Java type (i.e. class) cannot have a type parameter of that kind, a higher-kinded type.
Also, if you don't know, in basic Haskell an expression must have a type that has kind *, that is, a "concrete type". Any other type like * -> *.
For instance, you can't create an expression of type Maybe. It has to be types applied to an argument like Maybe Int, Maybe String, etc. In other words, fully applied type constructors.
Parametric polymorphism just refers to the property that the function cannot make use of any particular feature of a type (or kind) in its definition; it is a complete blackbox. The standard example is length :: [a] -> Int, which only works with the structure of the list, not the particular values stored in the list.
The standard example of HKT is the Functor class, where fmap :: (a -> b) -> f a -> f b. Unlike length, where a has kind *, f has kind * -> *. fmap also exhibits parametric polymorphism, because fmap cannot make use of any property of either a or b in its definition.
fmap exhibits ad hoc polymorphism as well, because the definition can be tailored to the specific type constructor f for which it is defined. That is, there are separate definitions of fmap for f ~ [], f ~ Maybe, etc. The difference is that f is "declared" as part of the typeclass definition, rather than just being part of the definition of fmap. (Indeed, typeclasses were added to support some degree of ad hoc polymorphism. Without type classes, only parametric polymorphism exists. You can write a function that supports one concrete type or any concrete type, but not some smaller collection in between.)

Confusing about Haskell type inference

I have just started learning Haskell. As Haskell is static typed and has polymorphic type inference, the type of the identity function is
id :: a -> a
suggesting id can take any type as its parameter and return itself. It works fine when I try:
a = (id 1, id True)
I just suppose that at compile time, the first id is Num a :: a -> a, and the second id is Bool -> Bool. When I try the following code, it gives an error:
foo f a b = (f a, f b)
result = foo id 1 True
It shows the type of a must be the same type of b, since it works fine with
result = foo id 1 2
But is that true that the type of id's parameter can be polymorphic, so that a and b can be different type?
All right, this is a weird spooky corner of Haskell's type system. The problem here is that there are two ways to type inference your function foo.
-- rank 1
foo :: forall a b. (a -> b) -> a -> a -> (b, b)
foo f a b = (f a, f b)
-- rank 2
foo' :: (forall a. a -> a) -> a -> b -> (a, b)
foo' f a b = (f a, f b)
The second type is the one you want, but the first type is the one you're getting. The second type, as amalloy pointed out, is a rank-2 type (we're going to ignore what the two means but read the introduction in "Practical type inference for arbitrary-rank types" if you want a good explanation of ranks – don't be put off by the academic nature of the PDF file as the beginning is accessibly and clearly written).
We'll defer the definition of higher-ranked types for now and just say that the problem is that GHC is unable to infer the rank-2 type. Quote the paper:
Complete type inference is known to be undecidable for higher-rank (impredicative) type systems, but in practice programmers are more than willing to add type annotations to guide the type inference engine, and to document their code....
Kfoury and Wells show that typeability is decidable for rank ≤ 2, and undecidable for all ranks ≥ 3 (Kfoury & Wells, 1994). For the rank-2 fragment, the same paper gives a type inference algorithm. This inference algorithm is somewhat subtle, does not interact well with user-supplied type annotations, and has not, to our knowledge, been implemented in a production compiler.
Undecidable means there can be no algorithm that always leads to a correct yes-or-no decision. So there you have it: impossible to infer a rank-3-or-higher type, and it's too gosh-darn-hard to infer the rank-2 type.
Now, back to rank 2. The (forall a. a -> a) is what makes it rank-2. There's already an excellent Stack Overflow question about what the forall keyword means so I'll refer you to that, but basically it means you're able to call f a and f b in the expression (f a, f b) while having a and b be different types, which is what you wanted in the first place, before all this hot mess.
One last thing: The reason you don't normally see foralls in GHCi is that any foralls on the very outer scope are left off. So forall a b. (a -> b) -> a -> a -> (b, b) is equivalent to (a -> b) -> a -> a -> (b, b).
Overall this is a pain point of the language that's poorly explained.
(Hat tip to #amalloy in the comments.)

Monad theory and Haskell

Most tutorials seem to give a lot of examples of monads (IO, state, list and so on) and then expect the reader to be able to abstract the overall principle and then they mention category theory. I don't tend to learn very well by trying generalise from examples and I would like to understand from a theoretical point of view why this pattern is so important.
Judging from this thread:
Can anyone explain Monads?
this is a common problem, and I've tried looking at most of the tutorials suggested (except the Brian Beck videos which won't play on my linux machine):
Does anyone know of a tutorial that starts from category theory and explains IO, state, list monads in those terms? the following is my unsuccessful attempt to do so:
As I understand it a monad consists of a triple: an endo-functor and two natural transformations.
The functor is usually shown with the type:
(a -> b) -> (m a -> m b)
I included the second bracket just to emphasise the symmetry.
But, this is an endofunctor, so shouldn't the domain and codomain be the same like this?:
(a -> b) -> (a -> b)
I think the answer is that the domain and codomain both have a type of:
(a -> b) | (m a -> m b) | (m m a -> m m b) and so on ...
But I'm not really sure if that works or fits in with the definition of the functor given?
When we move on to the natural transformation it gets even worse. If I understand correctly a natural transformation is a second order functor (with certain rules) that is a functor from one functor to another one. So since we have defined the functor above the general type of the natural transformations would be:
((a -> b) -> (m a -> m b)) -> ((a -> b) -> (m a -> m b))
But the actual natural transformations we are using have type:
a -> m a
m a -> (a ->m b) -> m b
Are these subsets of the general form above? and why are they natural transformations?
Martin
A quick disclaimer: I'm a little shaky on category theory in general, while I get the impression you have at least some familiarity with it. Hopefully I won't make too much of a hash of this...
Does anyone know of a tutorial that starts from category theory and explains IO, state, list monads in those terms?
First of all, ignore IO for now, it's full of dark magic. It works as a model of imperative computations for the same reasons that State works for modelling stateful computations, but unlike the latter IO is a black box with no way to deduce the monadic structure from the outside.
The functor is usually shown with the type: (a -> b) -> (m a -> m b) I included the second bracket just to emphasise the symmetry.
But, this is an endofunctor, so shouldn't the domain and codomain be the same like this?:
I suspect you are misinterpreting how type variables in Haskell relate to the category theory concepts.
First of all, yes, that specifies an endofunctor, on the category of Haskell types. A type variable such as a is not anything in this category, however; it's a variable that is (implicitly) universally quantified over all objects in the category. Thus, the type (a -> b) -> (a -> b) describes only endofunctors that map every object to itself.
Type constructors describe an injective function on objects, where the elements of the constructor's codomain cannot be described by any means except as an application of the type constructor. Even if two type constructors produce isomorphic results, the resulting types remain distinct. Note that type constructors are not, in the general case, functors.
The type variable m in the functor signature, then, represents a one-argument type constructor. Out of context this would normally be read as universal quantification, but that's incorrect in this case since no such function can exist. Rather, the type class definition binds m and allows the definition of such functions for specific type constructors.
The resulting function, then, says that for any type constructor m which has fmap defined, for any two objects a and b and a morphism between them, we can find a morphism between the types given by applying m to a and b.
Note that while the above does, of course, define an endofunctor on Hask, it is not even remotely general enough to describe all such endofunctors.
But the actual natural transformations we are using have type:
a -> m a
m a -> (a ->m b) -> m b
Are these subsets of the general form above? and why are they natural transformations?
Well, no, they aren't. A natural transformation is roughly a function (not a functor) between functors. The two natural transformations that specify a monad M look like I -> M where I is the identity functor, and M ∘ M -> M where ∘ is functor composition. In Haskell, we have no good way of working directly with either a true identity functor or with functor composition. Instead, we discard the identity functor to get just (Functor m) => a -> m a for the first, and write out nested type constructor application as (Functor m) => m (m a) -> m a for the second.
The first of these is obviously return; the second is a function called join, which is not part of the type class. However, join can be written in terms of (>>=), and the latter is more often useful in day-to-day programming.
As far as specific monads go, if you want a more mathematical description, here's a quick sketch of an example:
For some fixed type S, consider two functors F and G where F(x) = (S, x) and G(x) = S -> x (It should hopefully be obvious that these are indeed valid functors).
These functors are also adjoints; consider natural transformations unit :: x -> G (F x) and counit :: F (G x) -> x. Expanding the definitions gives us unit :: x -> (S -> (S, x)) and counit :: (S, S -> x) -> x. The types suggest uncurried function application and tuple construction; feel free to verify that those work as expected.
An adjunction gives rise to a monad by composition of the functors, so taking G ∘ F and expanding the definition, we get G (F x) = S -> (S, x), which is the definition of the State monad. The unit for the adjunction is of course return; and you should be able to use counit to define join.
This page does exactly that. I think your main confusion is that the class doesn't really make the Type a functor, but it defines a functor from the category of Haskell types into the category of that type.
Following the notation of the link, assuming F is a Haskell Functor, it means that there is a functor from the category of Hask to the category of F.
Roughly speaking, Haskell does its category theory all in just one category, whose objects are Haskell types and whose arrows are functions between these types. It's definitely not a general-purpose language for modelling category theory.
A (mathematical) functor is an operation turning things in one category into things in another, possibly entirely different, category. An endofunctor is then a functor which happens to have the same source and target categories. In Haskell, a functor is an operation turning things in the category of Haskell types into other things also in the category of Haskell types, so it is always an endofunctor.
[If you're following the mathematical literature, technically, the operation '(a->b)->(m a -> m b)' is just the arrow part of the endofunctor m, and 'm' is the object part]
When Haskellers talk about working 'in a monad' they really mean working in the Kleisli category of the monad. The Kleisli category of a monad is a thoroughly confusing beast at first, and normally needs at least two colours of ink to give a good explanation, so take the following attempt for what it is and check out some references (unfortunately Wikipedia is useless here for all but the straight definitions).
Suppose you have a monad 'm' on the category C of Haskell types. Its Kleisli category Kl(m) has the same objects as C, namely Haskell types, but an arrow a ~(f)~> b in Kl(m) is an arrow a -(f)-> mb in C. (I've used a squiggly line in my Kleisli arrow to distinguish the two). To reiterate: the objects and arrows of the Kl(C) are also objects and arrows of C but the arrows point to different objects in Kl(C) than in C. If this doesn't strike you as odd, read it again more carefully!
Concretely, consider the Maybe monad. Its Kleisli category is just the collection of Haskell types, and its arrows a ~(f)~> b are functions a -(f)-> Maybe b. Or consider the (State s) monad whose arrows a ~(f)~> b are functions a -(f)-> (State s b) == a -(f)-> (s->(s,b)). In any case, you're always writing a squiggly arrow as a shorthand for doing something to the type of the codomain of your functions.
[Note that State is not a monad, because the kind of State is * -> * -> *, so you need to supply one of the type parameters to turn it into a mathematical monad.]
So far so good, hopefully, but suppose you want to compose arrows a ~(f)~> b and b ~(g)~> c. These are really Haskell functions a -(f)-> mb and b -(g)-> mc which you cannot compose because the types don't match. The mathematical solution is to use the 'multiplication' natural transformation u:mm->m of the monad as follows: a ~(f)~> b ~(g)~> c == a -(f)-> mb -(mg)-> mmc -(u_c)-> mc to get an arrow a->mc which is a Kleisli arrow a ~(f;g)~> c as required.
Perhaps a concrete example helps here. In the Maybe monad, you cannot compose functions f : a -> Maybe b and g : b -> Maybe c directly, but by lifting g to
Maybe_g :: Maybe b -> Maybe (Maybe c)
Maybe_g Nothing = Nothing
Maybe_g (Just a) = Just (g a)
and using the 'obvious'
u :: Maybe (Maybe c) -> Maybe c
u Nothing = Nothing
u (Just Nothing) = Nothing
u (Just (Just c)) = Just c
you can form the composition u . Maybe_g . f which is the function a -> Maybe c that you wanted.
In the (State s) monad, it's similar but messier: Given two monadic functions a ~(f)~> b and b ~(g)~> c which are really a -(f)-> (s->(s,b)) and b -(g)-> (s->(s,c)) under the hood, you compose them by lifting g into
State_s_g :: (s->(s,b)) -> (s->(s,(s->(s,c))))
State_s_g p s1 = let (s2, b) = p s1 in (s2, g b)
then you apply the 'multiplication' natural transformation u, which is
u :: (s->(s,(s->(s,c)))) -> (s->(s,c))
u p1 s1 = let (s2, p2) = p1 s1 in p2 s2
which (sort of) plugs the final state of f into the initial state of g.
In Haskell, this turns out to be a bit of an unnatural way to work so instead there's the (>>=) function which basically does the same thing as u but in a way that makes it easier to implement and use. This is important: (>>=) is not the natural transformation 'u'. You can define each in terms of the other, so they're equivalent, but they're not the same thing. The Haskell version of 'u' is written join.
The other thing missing from this definition of Kleisli categories is the identity on each object: a ~(1_a)~> a which is really a -(n_a)-> ma where n is the 'unit' natural transformation. This is written return in Haskell, and doesn't seem to cause as much confusion.
I learned category theory before I came to Haskell, and I too have had difficulty with the mismatch between what mathematicians call a monad and what they look like in Haskell. It's no easier from the other direction!
Not sure I understand what was the question but yes, you are right, monad in Haskell is defined as a triple:
m :: * -> * -- this is endofunctor from haskell types to haskell types!
return :: a -> m a
(>>=) :: m a -> (a -> m b) -> m b
but common definition from category theory is another triple:
m :: * -> *
return :: a -> m a
join :: m (m a) -> m a
It is slightly confusing but it's not so hard to show that these two definitions are equal.
To do that we need to define join in terms of (>>=) (and vice versa).
First step:
join :: m (m a) -> m a
join x = ?
This gives us x :: m (m a).
All we can do with something that have type m _ is to aply (>>=) to it:
(x >>=) :: (m a -> m b) -> m b
Now we need something as a second argument for (>>=), and also,
from the type of join we have constraint (x >>= y) :: ma.
So y here will have type y :: ma -> ma and id :: a -> a fits it very well:
join x = x >>= id
The other way
(>>=) :: ma -> (a -> mb) -> m b
(>>=) x y = ?
Where x :: m a and y :: a -> m b.
To get m b from x and y we need something of type a.
Unfortunately, we can't extract a from m a. But we can substitute it for something else (remember, monad is a functor also):
fmap :: (a -> b) -> m a -> m b
fmap y x :: m (m b)
And it's perfectly fits as argument for join: (>>=) x y = join (fmap y x).
The best way to look at monads and computational effects is to start with where Haskell got the notion of monads for computational effects from, and then look at Haskell after you understand that. See this paper in particular: Notions of Computation and Monads, by E. Moggi.
See also Moggi's earlier paper which shows how monads work for the lambda calculus alone: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.26.2787
The fact that monads capture substitution, among other things (http://blog.sigfpe.com/2009/12/where-do-monads-come-from.html), and substitution is key to the lambda calculus, should give a good clue as to why they have so much expressive power.
While monads originally came from category theory, this doesn't mean that category theory is the only abstract context in which you can view them. A different viewpoint is given by operational semantics. For an introduction, have a look at my Operational Monad Tutorial.
One way to look at IO is to consider it as a strange kind of state monad. Remember that the state monad looks like:
data State s a = State (s -> (s, a))
where the "s" argument is the data type you want to thread through the computation. Also, this version of "State" doesn't have "get" and "put" actions and we don't export the constructor.
Now imagine a type
data RealWorld = RealWorld ......
This has no real definition, but notionally a value of type RealWorld holds the state of the entire universe outside the computer. Of course we can never have a value of type RealWorld, but you can imagine something like:
getChar :: RealWorld -> (RealWorld, Char)
In other words the "getChar" function takes a state of the universe before the keyboard button has been pressed, and returns the key pressed plus the state of the universe after the key has been pressed. Of course the problem is that the previous state of the world is still available to be referenced, which can't happen in reality.
But now we write this:
type IO = State RealWorld
getChar :: IO Char
Notionally, all we have done is wrap the previous version of "getChar" as a state action. But by doing this we can no longer access the "RealWorld" values because they are wrapped up inside the State monad.
So when a Haskell program wants to change a lightbulb it takes hold of the bulb and applies a "rotate" function to the RealWorld value inside IO.
For me, so far, the explanation that comes closest to tie together monads in category theory and monads in Haskell is that monads are a monid whose objects have the type a->m b. I can see that these objects are very close to an endofunctor and so the composition of such functions are related to an imperative sequence of program statements. Also functions which return IO functions are valid in pure functional code until the inner function is called from outside.
This id element is 'a -> m a' which fits in very well but the multiplication element is function composition which should be:
(>=>) :: Monad m => (a -> m b) -> (b -> m c) -> (a -> m c)
This is not quite function composition, but close enough (I think to get true function composition we need a complementary function which turns m b back into a, then we get function composition if we apply these in pairs?), I'm not quite sure how to get from that to this:
(>>=) :: Monad m => m a -> (a -> m b) -> m b
I've got a feeling I may have seen an explanation of this in all the stuff that I read, without understanding its significance the first time through, so I will do some re-reading to try to (re)find an explanation of this.
The other thing I would like to do is link together all the different category theory explanations: endofunctor+2 natural transformations, Kleisli category, a monoid whose objects are monoids and so on. For me the thing that seems to link all these explanations is that they are two level. That is, normally we treat category objects as black-boxes where we imply their properties from their outside interactions, but here there seems to be a need to go one level inside the objects to see what’s going on? We can explain monads without this but only if we accept apparently arbitrary constructions.
Martin
See this question: is chaining operations the only thing that the monad class solves?
In it, I explain my vision that we must differentiate between the Monad class and individual types that solve individual problems. The Monad class, by itself, only solve the important problem of "chaining operations with choice" and mades this solution available to types being instance of it (by means of "inheritance").
On the other hand, if a given type that solves a given problem faces the problem of "chaining operations with choice" then, it should be made an instance (inherit) of the Monad class.
The fact is that problems not get solved merely by being a Monad. It would be like saying that "wheels" solve many problems, but actually "wheels" only solve a problem, and things with wheels solve many different problems.

Resources