SystemT Compiler and dealing with Infinite Types in Haskell - haskell

I'm following this blog post: http://semantic-domain.blogspot.com/2012/12/total-functional-programming-in-partial.html
It shows a small OCaml compiler program for System T (a simple total functional language).
The entire pipeline takes OCaml syntax (via Camlp4 metaprogramming) transforms them to OCaml AST that is translated to SystemT Lambda Calculus (see: module Term) and then finally SystemT Combinator Calculus (see:
module Goedel). The final step is also wrapped with OCaml metaprogramming Ast.expr type.
I'm attempting to translate it to Haskell without the use of Template Haskell.
For the SystemT Combinator form, I've written it as
{-# LANGUAGE GADTs #-}
data TNat = Zero | Succ TNat
data THom a b where
Id :: THom a a
Unit :: THom a ()
ZeroH :: THom () TNat
SuccH :: THom TNat TNat
Compose :: THom a b -> THom b c -> THom a c
Pair :: THom a b -> THom a c -> THom a (b, c)
Fst :: THom (a, b) a
Snd :: THom (a, b) b
Curry :: THom (a, b) c -> THom a (b -> c)
Eval :: THom ((a -> b), a) b -- (A = B) * A -> B
Iter :: THom a b -> THom (a, b) b -> THom (a, TNat) b
Note that Compose is forward composition, which differs from (.).
During the translation of SystemT Lambda Calculus to SystemT Combinator Calculus, the Elaborate.synth function tries to convert SystemT Lambda calculus variables into series of composed projection expressions (related to De Brujin Indices). This is because combinator calculus doesn't have variables/variable names. This is done with the Elaborate.lookup which uses the Quote.find function.
The problem is that with my encoding of the combinator calculus as the GADT THom a b. Translating the Quote.find function:
let rec find x = function
| [] -> raise Not_found
| (x', t) :: ctx' when x = x' -> <:expr< Goedel.snd >>
| (x', t) :: ctx' -> <:expr< Goedel.compose Goedel.fst $find x ctx'$ >>
Into Haskell:
find :: TVar -> Context -> _
find tvar [] = error "Not Found"
find tvar ((tvar', ttype):ctx)
| tvar == tvar' = Snd
| otherwise = Compose Fst (find tvar ctx)
Results in an infinite type error.
• Occurs check: cannot construct the infinite type: a ~ (a, c)
Expected type: THom (a, c) c
Actual type: THom ((a, c), c) c
The problem stems from the fact that using Compose and Fst and Snd from the THom a b GADT can result in infinite variations of the type signature.
The article doesn't have this problem because it appears to use the Ast.expr OCaml thing to wrap the underlying expressions.
I'm not sure how to resolve in Haskell. Should I be using an existentially quantified type like
data TExpr = forall a b. TExpr (THom a b)
Or some sort of type-level Fix to adapt the infinite type problem. However I'm unsure how this changes the find or lookup functions.

This answer will have to be a bit high-level, because there are three entirely different families of possible designs to fix that problem. What you’re doing seems closer to approach three, but the approaches are ordered by increasing complexity.
The approach in the original post
Translating the original post requires Template Haskell and partiality; find would return a Q.Exp representing some Hom a b, avoiding this problem just like the original post. Just like in the original post, a type error in the original code would be caught when typechecking the output of all the Template Haskell functions. So, type errors are still caught before runtime, but you will still need to write tests to find cases where your macros output ill-typed expressions. One can give stronger guarantees, at the cost of a significant increase in complexity.
Dependent typing/GADTs in input and output
If you want to diverge from the post, one alternative is to use “dependent typing” throughout and make the input dependently-typed. I use the term loosely to include both actually dependently-typed languages, actual Dependent Haskell (when it lands), and ways to fake dependent typing in Haskell today (via GADTs, singletons, and what not).
However, you lose the ability to write your own typechecker and choose which type system to use; typically, you end up embedding a simply-typed lambda calculus, and can simulate polymorphism via polymorphic Haskell functions that can generate terms at a given type. This is easier in dependently-typed languages, but possible at all in Haskell.
But honestly, in this road it’s easier to use higher-order abstract syntax and Haskell functions, with something like:
data Exp a where
Abs :: (Exp a -> Exp b) -> Exp (a -> b)
App :: Exp (a -> b) -> Exp a -> Exp b
Var :: String -> Exp a —- only use for free variables
exampleId :: Exp (a -> a)
exampleId = Abs (\x -> x)
If you can use this approach (details omitted here), you get high assurance from GADTs with limited complexity. However, this approach is too inflexible for many scenarios, for instance because the typing contexts are only visible to the Haskell compiler and not in your types or terms.
From untyped to typed terms
A third option is go dependently-typed and to still make your program turn weakly-typed input to strongly typed output. In this case, your typechecker overall transforms terms of some type Expr into terms of a GADT TExp gamma t, Hom a b, or such. Since you don’t know statically what gamma and t (or a and b) are, you’ll indeed need some sort of existential.
But a plain existential is too weak: to build bigger well-typed expression, you’ll need to check that the produced types allow it. For instance, to build a TExpr containing a Compose expression out of two smaller TExpr, you'll need to check (at runtime) that their types match. And with a plain existential, you can't. So you’ll need to have a representation of types also at the value level.
What's more existentials are annoying to deal with (as usual), since you can’t ever expose the hidden type variables in your return type, or project those out (unlike dependent records/sigma-types).
I am honestly not entirely sure this could eventually be made to work. Here is a possible partial sketch in Haskell, up to one case of find.
data Type t where
VNat :: Type Nat
VString :: Type String
VArrow :: Type a -> Type b -> Type (a -> b)
VPair :: Type a -> Type b -> Type (a, b)
VUnit :: Type ()
data SomeType = forall t. SomeType (Type t)
data SomeHom = forall a b. SomeHom (Type a) (Type b) (THom a b)
type Context = [(TVar, SomeType)]
getType :: Context -> SomeType
getType [] = SomeType VUnit
getType ((_, SomeType ttyp) :: gamma) =
case getType gamma of
SomeType ctxT -> SomeType (VPair ttyp
find :: TVar -> Context -> SomeHom
find tvar ((tvar’, ttyp) :: gamma)
| tvar == tvar’ =
case (ttyp, getType gamma) of
(SomeType t, SomeType ctxT) ->
SomeHom (VPair t ctxT) t Fst

Related

What's the difference between parametric polymorphism and higher-kinded types?

I am pretty sure they are not the same. However, I am bogged down by the
common notion that "Rust does not support" higher-kinded types (HKT), but
instead offers parametric polymorphism. I tried to get my head around that and understand the difference between these, but got just more and more entangled.
To my understanding, there are higher-kinded types in Rust, at least the basics. Using the "*"-notation, a HKT does have a kind of e.g. * -> *.
For example, Maybe is of kind * -> * and could be implemented like this in Haskell.
data Maybe a = Just a | Nothing
Here,
Maybe is a type constructor and needs to be applied to a concrete type
to become a concrete type of kind "*".
Just a and Nothing are data constructors.
In textbooks about Haskell, this is often used as an example for a higher-kinded type. However, in Rust it can simply be implemented as an enum, which after all is a sum type:
enum Maybe<T> {
Just(T),
Nothing,
}
Where is the difference? To my understanding this is a
perfectly fine example of a higher-kinded type.
If in Haskell this is used as a textbook example of HKTs, why is it
said that Rust doesn't have HKT? Doesn't the Maybe enum qualify as a
HKT?
Should it rather be said that Rust doesn't fully support HKT?
What's the fundamental difference between HKT and parametric polymorphism?
This confusion continues when looking at functions, I can write a parametric
function that takes a Maybe, and to my understanding a HKT as a function
argument.
fn do_something<T>(input: Maybe<T>) {
// implementation
}
again, in Haskell that would be something like
do_something :: Maybe a -> ()
do_something :: Maybe a -> ()
do_something _ = ()
which leads to the fourth question.
Where exactly does the support for higher-kinded types end? Whats the
minimal example to make Rust's type system fail to express HKT?
Related Questions:
I went through a lot of questions related to the topic (including links they have to blogposts, etc.) but I could not find an answer to my main questions (1 and 2).
In Haskell, are "higher-kinded types" *really* types? Or do they merely denote collections of *concrete* types and nothing more?
Generic struct over a generic type without type parameter
Higher Kinded Types in Scala
What types of problems helps "higher-kinded polymorphism" solve better?
Abstract Data Types vs. Parametric Polymorphism in Haskell
Update
Thank you for the many good answers which are all very detailed and helped a lot. I decided to accept Andreas Rossberg's answer since his explanation helped me the most to get on the right track. Especially the part about terminology.
I was really locked in the cycle of thinking that everything of kind * -> * ... -> * is higher-kinded. The explanation that stressed the difference between * -> * -> * and (* -> *) -> * was crucial for me.
Some terminology:
The kind * is sometimes called ground. You can think of it as 0th order.
Any kind of the form * -> * -> ... -> * with at least one arrow is first-order.
A higher-order kind is one that has a "nested arrow on the left", e.g., (* -> *) -> *.
The order essentially is the depth of left-side nesting of arrows, e.g., (* -> *) -> * is second-order, ((* -> *) -> *) -> * is third-order, etc. (FWIW, the same notion applies to types themselves: a second-order function is one whose type has e.g. the form (A -> B) -> C.)
Types of non-ground kind (order > 0) are also called type constructors (and some literature only refers to types of ground kind as "types"). A higher-kinded type (constructor) is one whose kind is higher-order (order > 1).
Consequently, a higher-kinded type is one that takes an argument of non-ground kind. That would require type variables of non-ground kind, which are not supported in many languages. Examples in Haskell:
type Ground = Int
type FirstOrder a = Maybe a -- a is ground
type SecondOrder c = c Int -- c is a first-order constructor
type ThirdOrder c = c Maybe -- c is second-order
The latter two are higher-kinded.
Likewise, higher-kinded polymorphism describes the presence of (parametrically) polymorphic values that abstract over types that are not ground. Again, few languages support that. Example:
f : forall c. c Int -> c Int -- c is a constructor
The statement that Rust supports parametric polymorphism "instead" of higher-kinded types does not make sense. Both are different dimensions of parameterisation that complement each other. And when you combine both you have higher-kinded polymorphism.
A simple example of what Rust can't do is something like Haskell's Functor class.
class Functor f where
fmap :: (a -> b) -> f a -> f b
-- a couple examples:
instance Functor Maybe where
-- fmap :: (a -> b) -> Maybe a -> Maybe b
fmap _ Nothing = Nothing
fmap f (Just x) = Just (f x)
instance Functor [] where
-- fmap :: (a -> b) -> [a] -> [b]
fmap _ [] = []
fmap f (x:xs) = f x : fmap f xs
Note that the instances are defined on the type constructor, Maybe or [], instead of the fully-applied type Maybe a or [a].
This isn't just a parlor trick. It has a strong interaction with parametric polymorphism. Since the type variables a and b in the type fmap are not constrained by the class definition, instances of Functor cannot change their behavior based on them. This is an incredibly strong property in reasoning about code from types, and where a lot of where the strength of Haskell's type system comes from.
It has one other property - you can write code that's abstract in higher-kinded type variables. Here's a couple examples:
focusFirst :: Functor f => (a -> f b) -> (a, c) -> f (b, c)
focusFirst f (a, c) = fmap (\x -> (x, c)) (f a)
focusSecond :: Functor f => (a -> f b) -> (c, a) -> f (c, b)
focusSecond f (c, a) = fmap (\x -> (c, x)) (f a)
I admit, those types are beginning to look like abstract nonsense. But they turn out to be really practical when you have a couple helpers that take advantage of the higher-kinded abstraction.
newtype Identity a = Identity { runIdentity :: a }
instance Functor Identity where
-- fmap :: (a -> b) -> Identity a -> Identity b
fmap f (Identity x) = Identity (f x)
newtype Const c b = Const { getConst :: c }
instance Functor (Const c) where
-- fmap :: (a -> b) -> Const c a -> Const c b
fmap _ (Const c) = Const c
set :: ((a -> Identity b) -> s -> Identity t) -> b -> s -> t
set f b s = runIdentity (f (\_ -> Identity b) s)
get :: ((a -> Const a b) -> s -> Const a t) -> s -> a
get f s = getConst (f (\x -> Const x) s)
(If I made any mistakes in there, can someone just fix them? I'm reimplementing the most basic starting point of lens from memory without a compiler.)
The functions focusFirst and focusSecond can be passed as the first argument to either get or set, because the type variable f in their types can be unified with the more concrete types in get and set. Being able to abstract over the higher-kinded type variable f allows functions of a particular shape can be used both as setters and getters in arbitrary data types. This is one of the two core insights that led to the lens library. It couldn't exist without this kind of abstraction.
(For what it's worth, the other key insight is that defining lenses as a function like that allows composition of lenses to be simple function composition.)
So no, there's more to it than just being able to accept a type variable. The important part is being able to use type variables that correspond to type constructors, rather than some concrete (if unknown) type.
I'm going to resume it: a higher-kinded type is just a type-level higher-order function.
But take a minute:
Consider monad transformers:
newtype StateT s m a :: * -> (* -> *) -> * -> *
Here,
- s is the desired type of the state
- m is a functor, another monad that StateT will wrap
- a is the return type of an expression of type StateT s m
What is the higher-kinded type?
m :: (* -> *)
Because takes a type of kind * and returns a kind of type *.
It's like a function on types, that is, a type constructor of kind
* -> *
In languages like Java, you can't do
class ClassExample<T, a> {
T<a> function()
}
In Haskell T would have kind *->*, but a Java type (i.e. class) cannot have a type parameter of that kind, a higher-kinded type.
Also, if you don't know, in basic Haskell an expression must have a type that has kind *, that is, a "concrete type". Any other type like * -> *.
For instance, you can't create an expression of type Maybe. It has to be types applied to an argument like Maybe Int, Maybe String, etc. In other words, fully applied type constructors.
Parametric polymorphism just refers to the property that the function cannot make use of any particular feature of a type (or kind) in its definition; it is a complete blackbox. The standard example is length :: [a] -> Int, which only works with the structure of the list, not the particular values stored in the list.
The standard example of HKT is the Functor class, where fmap :: (a -> b) -> f a -> f b. Unlike length, where a has kind *, f has kind * -> *. fmap also exhibits parametric polymorphism, because fmap cannot make use of any property of either a or b in its definition.
fmap exhibits ad hoc polymorphism as well, because the definition can be tailored to the specific type constructor f for which it is defined. That is, there are separate definitions of fmap for f ~ [], f ~ Maybe, etc. The difference is that f is "declared" as part of the typeclass definition, rather than just being part of the definition of fmap. (Indeed, typeclasses were added to support some degree of ad hoc polymorphism. Without type classes, only parametric polymorphism exists. You can write a function that supports one concrete type or any concrete type, but not some smaller collection in between.)

How can I write self-application function in Haskell?

I tried following code, but it generates type errors.
sa f = f f
• Occurs check: cannot construct the infinite type: t ~ t -> t1
• In the first argument of ‘f’, namely ‘f’
In the expression: f f
In an equation for ‘sa’: sa f = f f
• Relevant bindings include
f :: t -> t1
(bound at fp-through-lambda-calculus-michaelson.hs:9:4)
sa :: (t -> t1) -> t1
(bound at fp-through-lambda-calculus-michaelson.hs:9:1)
Use a newtype to construct the infinite type.
newtype Eventually a = NotYet (Eventually a -> a)
sa :: Eventually a -> a
sa eventually#(NotYet f) = f eventually
In GHC, eventually and f will be the same object in memory.
I don't think there is a single self-application function that will work for all terms in Haskell. Self-application is a peculiar thing in typed lambda calculus, which will often evade typing. This is related to the fact that with self-application we can express the fixed-point combinator, which introduces inconsistencies into the type system when viewed as a logical system (see Curry-Howard correspondence).
You asked about applying it to the id function. In the self application id id, the two ids have different types. More explicitly it's (id :: (A -> A) -> (A -> A)) (id :: A -> A) (for any type A). We could make a self-application specifically designed for the id function:
sa :: (forall a. a -> a) -> b -> b
sa f = f f
ghci> :t sa id
sa id :: b -> b
which works just fine, but is rather limited by its type.
Using RankNTypes you can make families of self-application functions like this, but you're not going to be able to make a general self-application function such that sa t will be well-typed iff t t is well-typed (at least not in System Fω ("F-omega"), which GHC's core calculus is based on).
The reason, if you work it out formally (probably), is that then we could get sa sa, which has no normal form, and Fω is known to be normalizing (until we add fix of course).
This is because the untyped lambda calculus is in some way more powerful than Haskell. Or, to put it differently, the untyped lambda calculus has no type system. Thus, it has no sound type system. Whereas Haskell does have one.
This shows up not only with self application, but in any cases where infinite types are involved. Try this, for example:
i x = x
s f g x = f x (g x)
s i i
It is astonishing how the type system finds out that the seemingly harmless expresion s i i should not be allowed with a sound type system. Because, if it were allowed, self application would be possible.

How to define a function inside haskell newtype?

I am trying to decipher the record syntax in haskell for newtype and my understanding breaks when there is a function inside newtype. Consider this simple example
newtype C a b = C { getC :: (a -> b) -> a }
As per my reasoning C is a type which accepts a function and a parameter in it's constructor.
so,
let d1 = C $ (2 *) 3
:t d1 also gives
d1 :: Num ((a -> b) -> a) => C a b
Again to check this I do :t getC d1, which shows this
getC d1 :: Num ((a -> b) -> a) => (a -> b) -> a
Why the error if I try getC d1? getC should return the function and it's parameter or at least apply the parameter.
I can't have newtype C a b = C { getC :: (a->b)->b } deriving (Show), because this won't make sense!
It's always good to emphasise that Haskell has two completely separate namespaces, the type language and the value language. In your case, there's
A type constructor C :: Type -> Type -> Type, which lives in the type language. It takes two types a, b (of kind Type) and maps them to a type C a b (also of kind Type)†.
A value constructor C :: ((a->b) -> a) -> C a b, which lives in the value language. It takes a function f (of type (a->b) -> a) and maps it to a value C f (of type C a b).
Perhaps it would be less confusing if you had
newtype CT a b = CV ((a->b) -> a)
but because for a newtype there is always exactly one value constructor (and exactly one type constructor) it makes sense to name them the same.
CV is a value constructor that accepts one function, full stop. That function will have signature (a->b) -> a, i.e. its argument is also a function, but as far as CT is concerned this doesn't really matter.
Really, it's kind of wrong that data and newtype declarations use a = symbol, because it doesn't mean the things on the left and right are “the same” – can't, because they don't even belong to the same language. There's an alternative syntax which expresses the relation better:
{-# LANGUAGE GADTs #-}
import Data.Kind
data CT :: Type -> Type -> Type where
CV :: ((a->b) -> a) -> CT a b
As for that value you tried to construct
let d1 = CV $ (\x->(2*x)) 3
here you did not pass “a function and a parameter” to CV. What you actually did‡ was, you applied the function \x->2*x to the value 3 (might as well have written 6) and passed that number to CV. But as I said, CV expects a function. What then happens is, GHC tries to interpret 6 as a function, which gives the bogus constraint Num ((a->b) -> a). What that means is: “if (a->b)->a is a number type, then...”. Of course it isn't a number type, so the rest doesn't make sense either.
†It may seem redundant to talk of “types of kind Type”. Actually, when talking about “types” we often mean “entities in the type-level language”. These have kinds (“type-level types”) of which Type (the kind of (lifted) value-level values) is the most prominent, but not the only one – you can also have type-level numbers and type-level functions – C is indeed one.Note that Type was historically written *, but this notation is deprecated because it's inconsistent (confusion with multiplication operator).
‡This is because $ has the lowest precedence, i.e. the expression CV $ (\x->(2*x)) 3 is actually parsed as CV ((\x->(2*x)) 3), or equivalently let y = 2*3 in CV y.
As per my reasoning C is a type which accepts a function and a parameter
How so? The constructor has only one argument.
Newtypes always have a single constructor with exactly one argument.
The type C, otoh, has two type parameters. But that has nothing to do with the number of arguments you can apply to the constructor.

When are type signatures necessary in Haskell?

Many introductory texts will tell you that in Haskell type signatures are "almost always" optional. Can anybody quantify the "almost" part?
As far as I can tell, the only time you need an explicit signature is to disambiguate type classes. (The canonical example being read . show.) Are there other cases I haven't thought of, or is this it?
(I'm aware that if you go beyond Haskell 2010 there are plenty for exceptions. For example, GHC will never infer rank-N types. But rank-N types are a language extension, not part of the official standard [yet].)
Polymorphic recursion needs type annotations, in general.
f :: (a -> a) -> (a -> b) -> Int -> a -> b
f f1 g n x =
if n == (0 :: Int)
then g x
else f f1 (\z h -> g (h z)) (n-1) x f1
(Credit: Patrick Cousot)
Note how the recursive call looks badly typed (!): it calls itself with five arguments, despite f having only four! Then remember that b can be instantiated with c -> d, which causes an extra argument to appear.
The above contrived example computes
f f1 g n x = g (f1 (f1 (f1 ... (f1 x))))
where f1 is applied n times. Of course, there is a much simpler way to write an equivalent program.
Monomorphism restriction
If you have MonomorphismRestriction enabled, then sometimes you will need to add a type signature to get the most general type:
{-# LANGUAGE MonomorphismRestriction #-}
-- myPrint :: Show a => a -> IO ()
myPrint = print
main = do
myPrint ()
myPrint "hello"
This will fail because myPrint is monomorphic. You would need to uncomment the type signature to make it work, or disable MonomorphismRestriction.
Phantom constraints
When you put a polymorphic value with a constraint into a tuple, the tuple itself becomes polymorphic and has the same constraint:
myValue :: Read a => a
myValue = read "0"
myTuple :: Read a => (a, String)
myTuple = (myValue, "hello")
We know that the constraint affects the first part of the tuple but does not affect the second part. The type system doesn't know that, unfortunately, and will complain if you try to do this:
myString = snd myTuple
Even though intuitively one would expect myString to be just a String, the type checker needs to specialize the type variable a and figure out whether the constraint is actually satisfied. In order to make this expression work, one would need to annotate the type of either snd or myTuple:
myString = snd (myTuple :: ((), String))
In Haskell, as I'm sure you know, types are inferred. In other words, the compiler works out what type you want.
However, in Haskell, there are also polymorphic typeclasses, with functions that act in different ways depending on the return type. Here's an example of the Monad class, though I haven't defined everything:
class Monad m where
return :: a -> m a
(>>=) :: m a -> (a -> m b) -> m b
fail :: String -> m a
We're given a lot of functions with just type signatures. Our job is to make instance declarations for different types that can be treated as Monads, like Maybe t or [t].
Have a look at this code - it won't work in the way we might expect:
return 7
That's a function from the Monad class, but because there's more than one Monad, we have to specify what return value/type we want, or it automatically becomes an IO Monad. So:
return 7 :: Maybe Int
-- Will return...
Just 7
return 6 :: [Int]
-- Will return...
[6]
This is because [t] and Maybe have both been defined in the Monad type class.
Here's another example, this time from the random typeclass. This code throws an error:
random (mkStdGen 100)
Because random returns something in the Random class, we'll have to define what type we want to return, with a StdGen object tupelo with whatever value we want:
random (mkStdGen 100) :: (Int, StdGen)
-- Returns...
(-3650871090684229393,693699796 2103410263)
random (mkStdGen 100) :: (Bool, StdGen)
-- Returns...
(True,4041414 40692)
This can all be found at learn you a Haskell online, though you'll have to do some long reading. This, I'm pretty much 100% certain, it the only time when types are necessary.

Algebraically interpreting polymorphism

So I understand the basic algebraic interpretation of types:
Either a b ~ a + b
(a, b) ~ a * b
a -> b ~ b^a
() ~ 1
Void ~ 0 -- from Data.Void
... and that these relations are true for concrete types, like Bool, as opposed to polymorphic types like a. I also know how to translate type signatures with polymorphic types into their concrete type representations by just translating the Church encoding according to the following isomorphism:
(forall r . (a -> r) -> r) ~ a
So if I have:
id :: forall a . a -> a
I know that it does not mean id ~ a^a, but it actually means:
id :: forall a . (() -> a) -> a
id ~ ()
~ 1
Similarly:
pair :: forall r . (a -> b -> r) -> r
pair ~ ((a, b) -> r) - > r
~ (a, b)
~ a * b
Which brings me to my question. What is the "algebraic" interpretation of this rule:
(forall r . (a -> r) -> r) ~ a
For every concrete type isomorphism I can point to an equivalent algebraic rule, such as:
(a, (b, c)) ~ ((a, b), c)
a * (b * c) = (a * b) * c
a -> (b -> c) ~ (a, b) -> c
(c^b)^a = c^(b * a)
But I don't understand the algebraic equality that is analogous to:
(forall r . (a -> r) -> r) ~ a
This is the famous Yoneda lemma for the identity functor.
Check this post for a readable introduction, and any category theory textbook for more.
Briefly, given f :: forall r. (a -> r) -> r you can apply f id to get an a, and conversely, given x :: a you can take ($x) to get forall r. (a -> r) -> r.
These operations are mutually inverse. Proof:
Obviously ($x) id == x. I will show that
($(f id)) == f,
since functions are equal when they are equal on all arguments, let's take x :: a -> r and show that
($(f id)) x == f x i.e.
x (f id) == f x.
Since f is polymorphic, it works as a natural transformation; this is the naturality diagram for f:
f_A
Hom(A, A) → A
(x.) ↓ ↓ x
Hom(A, R) → R
f_R
So x . f == f . (x.).
Plugging identity, (x . f) id == f x. QED
(Rewritten for clarity)
There seem to be two parts to your question. One is implied and is asking what the algebraic interpretation of forall is, and the other is asking about the cont/Yoneda transformation, which sdcvvc's answer already covered pretty well.
I'll try to address the algebraic interpretation of forall for you. You mention that A -> B is B^A but I'd like to take that a step further and expand it out to B * B * B * ... * B (|A| times). Although we do have exponentiation as a notation for repeated multiplication like that, there's a more flexible notation, ∏ (uppercase Pi) representing arbitrary indexed products. There are two components to a Pi: the range of values we want to multiply over, and the expression that we're multiplying out. For example, at the value level, you might express the factorial function as fact i = ∏ [1..i] (λx -> x).
Going back to the world of types, we can view the exponentiation operator in the A -> B ~ B^A correspondence as a Pi: B^A ~ ∏ A (λ_ -> B). This says that we're defining an A-ary product of Bs, such that the Bs cannot depend on the particular A we've chosen. Sure, it's equivalent to plain exponentiation, but it lets us move up to cases in which there is a dependence.
In the most general case, we get dependent types, like what you see in Agda or Coq: in Agda syntax, replicate : Bool -> ((n : Nat) -> Vec Bool n) is one possible application of a Pi type, which could be expressed more explicitly as replicate : Bool -> ∏ Nat (Vec Bool), or further as replicate : ∏ Bool (λ_ -> ∏ Nat (Vec Bool)).
Note that as you might expect from the underlying algebra, you can fuse both of the ∏s in the definition of replicate above into a single ∏ ranging over the cartesian product of the domains: ∏ Bool (\_ -> ∏ Nat (Vec Bool)) is equivalent to ∏ (Bool, Nat) (λ(_, n) -> Vec Bool n) just like it would be at the "value level". This is simply uncurrying from the perspective of type theory.
I do realize your question was about polymorphism, so I'll stop going on about dependent types, but they are relevant: forall in Haskell is roughly equivalent to a ∏ with a domain over the type (kind) of types, *. Indeed, the function-like behavior of polymorphism can be observed directly in GHC core, which types them as capital lambdas (Λ). As such, a polymorphic type like forall a. a -> a is actually just ∏ * (Λ a -> (a -> a)) (using the Λ notation now that we distinguish between types and values), which can be expanded out to the infinite product (Bool -> Bool, Int -> Int, () -> (), (Int -> Bool) -> (Int -> Bool), ...) for every possible type. Instantiation of the type variable is simply projecting out the suitable element from the *-ary product (or applying the type function).
Now, for the big piece I missed in my original version of this answer: parametricity. Parametricity can be described in several different ways, but none of the ones I know of (viewing types as relations, or (di)naturality in category theory) really has a very algebraic interpretation. For our purposes, though, it boils down to something fairly simple: you can't pattern-match on *. I know that GHC lets you do that at the type level with type families, but you can only cover a finite chunk of * when doing that, so there are necessarily always points at which your type family is undefined.
What this means, from the point of view of polymorphism, is that any type function F we write in ∏ * F must either be constant (i.e., completely ignore the type it was polymorphic over) or pass the type through unchanged. Thus, ∏ * (Λ _ -> B) is valid because it ignores its argument, and corresponds to forall a. B. The other case is something like ∏ * (Λ x -> Maybe x), which corresponds to forall a. Maybe a, which doesn't ignore the type argument, but only "passes it through". As such, a ∏ A that has an irrelevant domain A (such as when A = *) can be seen as more of an A-ary indexed intersection (picking the common elements across all instantiations of the index), rather than a product.
Crucially, at the value level, the rules of parametricity prevent any funny behavior that might suggest the types are larger than they really are. Because we don't have typecase, we can't construct a value of type forall a. B that does something different based on what a was instantiated to. Thus, although the type is technically a function * -> B, it is always a constant function, and is thus equivalent to a single value of B. Using the ∏ interpretation, it is indeed equivalent to an infinite *-ary product of Bs, but those B values must always be identical, so the infinite product is effectively as big as a single B.
Similarly, although ∏ * (Λ x -> (x -> x)) (a.k.a., forall a. a -> a) is technically equivalent to an infinite product of functions, none of those functions can inspect the type, so all are constrained to only return their input value and not do any funny business like (+1) : Int -> Int when instantiated to Int. Because there is only one (assuming a total language) function that can't inspect the type of its argument but must return a value of that same type, the infinite product is thus just as large as a single value.
Now, about your direct question on (forall r . (a -> r) -> r) ~ a. First, let's express your ~ operator more formally. It's really isomorphism, so we need two functions going back and forth, and an argument that they're inverses.
data Iso a b = Iso
{ to :: a -> b
, from :: b -> a
-- proof1 :: forall x. to (from x) == x
-- proof2 :: forall x. from (to x) == x
}
and now we express your original question in more formal terms. Your question amounts to constructing a term of the following (impredicative, so GHC has trouble with it, but we'll survive) type:
forall a. Iso (forall r. (a -> r) -> r) a
Which, using my earlier terminology, amounts to ∏ * (Λ a -> Iso (∏ * (Λ r -> ((a -> r) -> r))) a). Once again we have an infinite product that can't inspect its type argument. By handwaving, we can argue that the only possible values considering the parametricity rules (the other two proofs are respected automatically) for to and from are ($ id) and flip id.
If this feels unsatisfying, it's probably because the algebraic interpretation of forall didn't really add anything to the proof. It's really just plain old type theory, but I hope I was able to provide something that feels a little less categorical than the Yoneda form of it. It's worth noting that we don't actually need to use parametricity to write proof1 and proof2 above, though. Parametricity only enters the picture when we want to state that ($ id) and flip id are our only options for to and from (which we can't prove in Agda or Coq, for that reason).
To (attempt to) answer the actual question (which is less interesting than the answers to the broader issues raised), the question is ill formed because of a "type error"
Either ~ (+)
(,) ~ (*)
(->) b ~ flip (^)
() ~ 1
Void ~ 0
These all map types to integers, and type constructors to functions on naturals. In a sense, you have a functor from the category of types to the category of naturals. In the other direction, you "forget" stuff, since the types preserve algebraic structure while the naturals throw it away. I.e. given Either () () you can get a unique natural, but given that natural, you can get many types.
But this is different:
(forall r . (a -> r) -> r) ~ a
It maps a type to another type! It is not part of the above functor. It's just an isomorphism within the category of types. So let's give that a different symbol, <=>
Now we have
(forall r . (a -> r) -> r) <=> a
Now you note that we can not only send types to nats and arrows to arrows, but also some isomorphisms to other isomorphisms:
(a, (b, c)) <=> ((a, b), c) ~ a * (b * c) = (a * b) * c
But something subtle is going on here. In a sense, the latter isomorphism on pairs is true because the algebraic identity is true. This is to say that the "isomorphism" in the latter simply means that the two types are equivalent under the image of our functor to the nats.
The former isomorphism we need to prove directly, which is where we start to get to the underlying question -- is given our functor to the nats, what does forall r. map to? But the answer is that forall r. is neither a type, nor a meaningful arrow between types.
By introducing forall, we have moved away from first order types. There's no reason to expect that forall should fit in our above Functor, and indeed, it doesn't.
So we can explore, as others have above, why the isomorphism holds (which is itself very interesting) -- but in doing so we've abandoned the algebraic core of the question. A question which can be answered, I think, is, given the category of higher-order types and constructors as arrows between them, what is there meaningful Functor to?
Edit:
So now I have another approach which shows why adding polymorphism makes things go nuts. We start by asking a simpler question -- does a given polymorphic type have zero or more than zero inhabitants? This is the type inhabitation problem, and winds up being, via Curry-Howard, a problem in modified realizability, since it's the same thing as asking if a formula in some logic is realizable in an appropriate computational model. Now as that page explains, this is decidable in the simply typed lambda calculus but is PSPACE-complete. But once we move to anything more complicated, by adding polymorphism for example and going to System F, then it goes to undecidable!
So, if we can't decide if an arbitrary type is inhabited at all, then we clearly can't decide how many inhabitants it has!
It's an interesting question. I don't have a full answer, but this was too long for a comment.
The type signature (forall r. (a -> r) -> r) can be expressed as me saying
For any type r that you care to name, if you give me a function that takes a and produces an r, then I will give you back an r.
Now, this has to work for any type r, but it can be a specific type a. So the way for me to pull of this neat trick is to have an a sitting around somewhere, that I feed to the function (which produces an r for me) and then I hand that r back to you.
But if I have an a sitting around, I could give it to you:
If you give me a 1, I'll give you an a.
which corresponds to the type signature 1 -> a or simply a. By this informal argument we have
(forall r. (a -> r) -> r) ~ a
The next step would be to generate the corresponding algebraic expression, but I'm not clear on how the algebraic quantities interact with the universal quantification. We may need to wait for an expert!
A few links to the nLab:
Universal quantifier, corresponds to dependent product.
Existential quantifier, corresponds to dependent sum (dependent coproduct).
Thus, in settings of category theory:
Type | Modeled¹ as | In category
-------------------+---------------------------+-------------
Unit | Terminal object | CCC
Bottom | Initial object |
Record | Product |
Union | Sum (coproduct) |
Function | Exponential |
-------------------+---------------------------+-------------
Dependent product² | Right adjoint to pullback | LCCC
Dependent sum | Left adjoint to pullback |
¹) in appropriate category ─ CCC for total and non-polymorphic subset of Haskell (link), CPO for non-total traits of Haskell (link), LCCC for dependently typed languages.
²) forall quantification is a special case of dependent product:
∀(x :: *). y[x] ~ ∏(x : Set)y[x]
where Set is the universe of all small types.

Resources