How can I write self-application function in Haskell? - haskell

I tried following code, but it generates type errors.
sa f = f f
• Occurs check: cannot construct the infinite type: t ~ t -> t1
• In the first argument of ‘f’, namely ‘f’
In the expression: f f
In an equation for ‘sa’: sa f = f f
• Relevant bindings include
f :: t -> t1
(bound at fp-through-lambda-calculus-michaelson.hs:9:4)
sa :: (t -> t1) -> t1
(bound at fp-through-lambda-calculus-michaelson.hs:9:1)

Use a newtype to construct the infinite type.
newtype Eventually a = NotYet (Eventually a -> a)
sa :: Eventually a -> a
sa eventually#(NotYet f) = f eventually
In GHC, eventually and f will be the same object in memory.

I don't think there is a single self-application function that will work for all terms in Haskell. Self-application is a peculiar thing in typed lambda calculus, which will often evade typing. This is related to the fact that with self-application we can express the fixed-point combinator, which introduces inconsistencies into the type system when viewed as a logical system (see Curry-Howard correspondence).
You asked about applying it to the id function. In the self application id id, the two ids have different types. More explicitly it's (id :: (A -> A) -> (A -> A)) (id :: A -> A) (for any type A). We could make a self-application specifically designed for the id function:
sa :: (forall a. a -> a) -> b -> b
sa f = f f
ghci> :t sa id
sa id :: b -> b
which works just fine, but is rather limited by its type.
Using RankNTypes you can make families of self-application functions like this, but you're not going to be able to make a general self-application function such that sa t will be well-typed iff t t is well-typed (at least not in System Fω ("F-omega"), which GHC's core calculus is based on).
The reason, if you work it out formally (probably), is that then we could get sa sa, which has no normal form, and Fω is known to be normalizing (until we add fix of course).

This is because the untyped lambda calculus is in some way more powerful than Haskell. Or, to put it differently, the untyped lambda calculus has no type system. Thus, it has no sound type system. Whereas Haskell does have one.
This shows up not only with self application, but in any cases where infinite types are involved. Try this, for example:
i x = x
s f g x = f x (g x)
s i i
It is astonishing how the type system finds out that the seemingly harmless expresion s i i should not be allowed with a sound type system. Because, if it were allowed, self application would be possible.

Related

SystemT Compiler and dealing with Infinite Types in Haskell

I'm following this blog post: http://semantic-domain.blogspot.com/2012/12/total-functional-programming-in-partial.html
It shows a small OCaml compiler program for System T (a simple total functional language).
The entire pipeline takes OCaml syntax (via Camlp4 metaprogramming) transforms them to OCaml AST that is translated to SystemT Lambda Calculus (see: module Term) and then finally SystemT Combinator Calculus (see:
module Goedel). The final step is also wrapped with OCaml metaprogramming Ast.expr type.
I'm attempting to translate it to Haskell without the use of Template Haskell.
For the SystemT Combinator form, I've written it as
{-# LANGUAGE GADTs #-}
data TNat = Zero | Succ TNat
data THom a b where
Id :: THom a a
Unit :: THom a ()
ZeroH :: THom () TNat
SuccH :: THom TNat TNat
Compose :: THom a b -> THom b c -> THom a c
Pair :: THom a b -> THom a c -> THom a (b, c)
Fst :: THom (a, b) a
Snd :: THom (a, b) b
Curry :: THom (a, b) c -> THom a (b -> c)
Eval :: THom ((a -> b), a) b -- (A = B) * A -> B
Iter :: THom a b -> THom (a, b) b -> THom (a, TNat) b
Note that Compose is forward composition, which differs from (.).
During the translation of SystemT Lambda Calculus to SystemT Combinator Calculus, the Elaborate.synth function tries to convert SystemT Lambda calculus variables into series of composed projection expressions (related to De Brujin Indices). This is because combinator calculus doesn't have variables/variable names. This is done with the Elaborate.lookup which uses the Quote.find function.
The problem is that with my encoding of the combinator calculus as the GADT THom a b. Translating the Quote.find function:
let rec find x = function
| [] -> raise Not_found
| (x', t) :: ctx' when x = x' -> <:expr< Goedel.snd >>
| (x', t) :: ctx' -> <:expr< Goedel.compose Goedel.fst $find x ctx'$ >>
Into Haskell:
find :: TVar -> Context -> _
find tvar [] = error "Not Found"
find tvar ((tvar', ttype):ctx)
| tvar == tvar' = Snd
| otherwise = Compose Fst (find tvar ctx)
Results in an infinite type error.
• Occurs check: cannot construct the infinite type: a ~ (a, c)
Expected type: THom (a, c) c
Actual type: THom ((a, c), c) c
The problem stems from the fact that using Compose and Fst and Snd from the THom a b GADT can result in infinite variations of the type signature.
The article doesn't have this problem because it appears to use the Ast.expr OCaml thing to wrap the underlying expressions.
I'm not sure how to resolve in Haskell. Should I be using an existentially quantified type like
data TExpr = forall a b. TExpr (THom a b)
Or some sort of type-level Fix to adapt the infinite type problem. However I'm unsure how this changes the find or lookup functions.
This answer will have to be a bit high-level, because there are three entirely different families of possible designs to fix that problem. What you’re doing seems closer to approach three, but the approaches are ordered by increasing complexity.
The approach in the original post
Translating the original post requires Template Haskell and partiality; find would return a Q.Exp representing some Hom a b, avoiding this problem just like the original post. Just like in the original post, a type error in the original code would be caught when typechecking the output of all the Template Haskell functions. So, type errors are still caught before runtime, but you will still need to write tests to find cases where your macros output ill-typed expressions. One can give stronger guarantees, at the cost of a significant increase in complexity.
Dependent typing/GADTs in input and output
If you want to diverge from the post, one alternative is to use “dependent typing” throughout and make the input dependently-typed. I use the term loosely to include both actually dependently-typed languages, actual Dependent Haskell (when it lands), and ways to fake dependent typing in Haskell today (via GADTs, singletons, and what not).
However, you lose the ability to write your own typechecker and choose which type system to use; typically, you end up embedding a simply-typed lambda calculus, and can simulate polymorphism via polymorphic Haskell functions that can generate terms at a given type. This is easier in dependently-typed languages, but possible at all in Haskell.
But honestly, in this road it’s easier to use higher-order abstract syntax and Haskell functions, with something like:
data Exp a where
Abs :: (Exp a -> Exp b) -> Exp (a -> b)
App :: Exp (a -> b) -> Exp a -> Exp b
Var :: String -> Exp a —- only use for free variables
exampleId :: Exp (a -> a)
exampleId = Abs (\x -> x)
If you can use this approach (details omitted here), you get high assurance from GADTs with limited complexity. However, this approach is too inflexible for many scenarios, for instance because the typing contexts are only visible to the Haskell compiler and not in your types or terms.
From untyped to typed terms
A third option is go dependently-typed and to still make your program turn weakly-typed input to strongly typed output. In this case, your typechecker overall transforms terms of some type Expr into terms of a GADT TExp gamma t, Hom a b, or such. Since you don’t know statically what gamma and t (or a and b) are, you’ll indeed need some sort of existential.
But a plain existential is too weak: to build bigger well-typed expression, you’ll need to check that the produced types allow it. For instance, to build a TExpr containing a Compose expression out of two smaller TExpr, you'll need to check (at runtime) that their types match. And with a plain existential, you can't. So you’ll need to have a representation of types also at the value level.
What's more existentials are annoying to deal with (as usual), since you can’t ever expose the hidden type variables in your return type, or project those out (unlike dependent records/sigma-types).
I am honestly not entirely sure this could eventually be made to work. Here is a possible partial sketch in Haskell, up to one case of find.
data Type t where
VNat :: Type Nat
VString :: Type String
VArrow :: Type a -> Type b -> Type (a -> b)
VPair :: Type a -> Type b -> Type (a, b)
VUnit :: Type ()
data SomeType = forall t. SomeType (Type t)
data SomeHom = forall a b. SomeHom (Type a) (Type b) (THom a b)
type Context = [(TVar, SomeType)]
getType :: Context -> SomeType
getType [] = SomeType VUnit
getType ((_, SomeType ttyp) :: gamma) =
case getType gamma of
SomeType ctxT -> SomeType (VPair ttyp
find :: TVar -> Context -> SomeHom
find tvar ((tvar’, ttyp) :: gamma)
| tvar == tvar’ =
case (ttyp, getType gamma) of
(SomeType t, SomeType ctxT) ->
SomeHom (VPair t ctxT) t Fst

What's the difference between parametric polymorphism and higher-kinded types?

I am pretty sure they are not the same. However, I am bogged down by the
common notion that "Rust does not support" higher-kinded types (HKT), but
instead offers parametric polymorphism. I tried to get my head around that and understand the difference between these, but got just more and more entangled.
To my understanding, there are higher-kinded types in Rust, at least the basics. Using the "*"-notation, a HKT does have a kind of e.g. * -> *.
For example, Maybe is of kind * -> * and could be implemented like this in Haskell.
data Maybe a = Just a | Nothing
Here,
Maybe is a type constructor and needs to be applied to a concrete type
to become a concrete type of kind "*".
Just a and Nothing are data constructors.
In textbooks about Haskell, this is often used as an example for a higher-kinded type. However, in Rust it can simply be implemented as an enum, which after all is a sum type:
enum Maybe<T> {
Just(T),
Nothing,
}
Where is the difference? To my understanding this is a
perfectly fine example of a higher-kinded type.
If in Haskell this is used as a textbook example of HKTs, why is it
said that Rust doesn't have HKT? Doesn't the Maybe enum qualify as a
HKT?
Should it rather be said that Rust doesn't fully support HKT?
What's the fundamental difference between HKT and parametric polymorphism?
This confusion continues when looking at functions, I can write a parametric
function that takes a Maybe, and to my understanding a HKT as a function
argument.
fn do_something<T>(input: Maybe<T>) {
// implementation
}
again, in Haskell that would be something like
do_something :: Maybe a -> ()
do_something :: Maybe a -> ()
do_something _ = ()
which leads to the fourth question.
Where exactly does the support for higher-kinded types end? Whats the
minimal example to make Rust's type system fail to express HKT?
Related Questions:
I went through a lot of questions related to the topic (including links they have to blogposts, etc.) but I could not find an answer to my main questions (1 and 2).
In Haskell, are "higher-kinded types" *really* types? Or do they merely denote collections of *concrete* types and nothing more?
Generic struct over a generic type without type parameter
Higher Kinded Types in Scala
What types of problems helps "higher-kinded polymorphism" solve better?
Abstract Data Types vs. Parametric Polymorphism in Haskell
Update
Thank you for the many good answers which are all very detailed and helped a lot. I decided to accept Andreas Rossberg's answer since his explanation helped me the most to get on the right track. Especially the part about terminology.
I was really locked in the cycle of thinking that everything of kind * -> * ... -> * is higher-kinded. The explanation that stressed the difference between * -> * -> * and (* -> *) -> * was crucial for me.
Some terminology:
The kind * is sometimes called ground. You can think of it as 0th order.
Any kind of the form * -> * -> ... -> * with at least one arrow is first-order.
A higher-order kind is one that has a "nested arrow on the left", e.g., (* -> *) -> *.
The order essentially is the depth of left-side nesting of arrows, e.g., (* -> *) -> * is second-order, ((* -> *) -> *) -> * is third-order, etc. (FWIW, the same notion applies to types themselves: a second-order function is one whose type has e.g. the form (A -> B) -> C.)
Types of non-ground kind (order > 0) are also called type constructors (and some literature only refers to types of ground kind as "types"). A higher-kinded type (constructor) is one whose kind is higher-order (order > 1).
Consequently, a higher-kinded type is one that takes an argument of non-ground kind. That would require type variables of non-ground kind, which are not supported in many languages. Examples in Haskell:
type Ground = Int
type FirstOrder a = Maybe a -- a is ground
type SecondOrder c = c Int -- c is a first-order constructor
type ThirdOrder c = c Maybe -- c is second-order
The latter two are higher-kinded.
Likewise, higher-kinded polymorphism describes the presence of (parametrically) polymorphic values that abstract over types that are not ground. Again, few languages support that. Example:
f : forall c. c Int -> c Int -- c is a constructor
The statement that Rust supports parametric polymorphism "instead" of higher-kinded types does not make sense. Both are different dimensions of parameterisation that complement each other. And when you combine both you have higher-kinded polymorphism.
A simple example of what Rust can't do is something like Haskell's Functor class.
class Functor f where
fmap :: (a -> b) -> f a -> f b
-- a couple examples:
instance Functor Maybe where
-- fmap :: (a -> b) -> Maybe a -> Maybe b
fmap _ Nothing = Nothing
fmap f (Just x) = Just (f x)
instance Functor [] where
-- fmap :: (a -> b) -> [a] -> [b]
fmap _ [] = []
fmap f (x:xs) = f x : fmap f xs
Note that the instances are defined on the type constructor, Maybe or [], instead of the fully-applied type Maybe a or [a].
This isn't just a parlor trick. It has a strong interaction with parametric polymorphism. Since the type variables a and b in the type fmap are not constrained by the class definition, instances of Functor cannot change their behavior based on them. This is an incredibly strong property in reasoning about code from types, and where a lot of where the strength of Haskell's type system comes from.
It has one other property - you can write code that's abstract in higher-kinded type variables. Here's a couple examples:
focusFirst :: Functor f => (a -> f b) -> (a, c) -> f (b, c)
focusFirst f (a, c) = fmap (\x -> (x, c)) (f a)
focusSecond :: Functor f => (a -> f b) -> (c, a) -> f (c, b)
focusSecond f (c, a) = fmap (\x -> (c, x)) (f a)
I admit, those types are beginning to look like abstract nonsense. But they turn out to be really practical when you have a couple helpers that take advantage of the higher-kinded abstraction.
newtype Identity a = Identity { runIdentity :: a }
instance Functor Identity where
-- fmap :: (a -> b) -> Identity a -> Identity b
fmap f (Identity x) = Identity (f x)
newtype Const c b = Const { getConst :: c }
instance Functor (Const c) where
-- fmap :: (a -> b) -> Const c a -> Const c b
fmap _ (Const c) = Const c
set :: ((a -> Identity b) -> s -> Identity t) -> b -> s -> t
set f b s = runIdentity (f (\_ -> Identity b) s)
get :: ((a -> Const a b) -> s -> Const a t) -> s -> a
get f s = getConst (f (\x -> Const x) s)
(If I made any mistakes in there, can someone just fix them? I'm reimplementing the most basic starting point of lens from memory without a compiler.)
The functions focusFirst and focusSecond can be passed as the first argument to either get or set, because the type variable f in their types can be unified with the more concrete types in get and set. Being able to abstract over the higher-kinded type variable f allows functions of a particular shape can be used both as setters and getters in arbitrary data types. This is one of the two core insights that led to the lens library. It couldn't exist without this kind of abstraction.
(For what it's worth, the other key insight is that defining lenses as a function like that allows composition of lenses to be simple function composition.)
So no, there's more to it than just being able to accept a type variable. The important part is being able to use type variables that correspond to type constructors, rather than some concrete (if unknown) type.
I'm going to resume it: a higher-kinded type is just a type-level higher-order function.
But take a minute:
Consider monad transformers:
newtype StateT s m a :: * -> (* -> *) -> * -> *
Here,
- s is the desired type of the state
- m is a functor, another monad that StateT will wrap
- a is the return type of an expression of type StateT s m
What is the higher-kinded type?
m :: (* -> *)
Because takes a type of kind * and returns a kind of type *.
It's like a function on types, that is, a type constructor of kind
* -> *
In languages like Java, you can't do
class ClassExample<T, a> {
T<a> function()
}
In Haskell T would have kind *->*, but a Java type (i.e. class) cannot have a type parameter of that kind, a higher-kinded type.
Also, if you don't know, in basic Haskell an expression must have a type that has kind *, that is, a "concrete type". Any other type like * -> *.
For instance, you can't create an expression of type Maybe. It has to be types applied to an argument like Maybe Int, Maybe String, etc. In other words, fully applied type constructors.
Parametric polymorphism just refers to the property that the function cannot make use of any particular feature of a type (or kind) in its definition; it is a complete blackbox. The standard example is length :: [a] -> Int, which only works with the structure of the list, not the particular values stored in the list.
The standard example of HKT is the Functor class, where fmap :: (a -> b) -> f a -> f b. Unlike length, where a has kind *, f has kind * -> *. fmap also exhibits parametric polymorphism, because fmap cannot make use of any property of either a or b in its definition.
fmap exhibits ad hoc polymorphism as well, because the definition can be tailored to the specific type constructor f for which it is defined. That is, there are separate definitions of fmap for f ~ [], f ~ Maybe, etc. The difference is that f is "declared" as part of the typeclass definition, rather than just being part of the definition of fmap. (Indeed, typeclasses were added to support some degree of ad hoc polymorphism. Without type classes, only parametric polymorphism exists. You can write a function that supports one concrete type or any concrete type, but not some smaller collection in between.)

Confusing about Haskell type inference

I have just started learning Haskell. As Haskell is static typed and has polymorphic type inference, the type of the identity function is
id :: a -> a
suggesting id can take any type as its parameter and return itself. It works fine when I try:
a = (id 1, id True)
I just suppose that at compile time, the first id is Num a :: a -> a, and the second id is Bool -> Bool. When I try the following code, it gives an error:
foo f a b = (f a, f b)
result = foo id 1 True
It shows the type of a must be the same type of b, since it works fine with
result = foo id 1 2
But is that true that the type of id's parameter can be polymorphic, so that a and b can be different type?
All right, this is a weird spooky corner of Haskell's type system. The problem here is that there are two ways to type inference your function foo.
-- rank 1
foo :: forall a b. (a -> b) -> a -> a -> (b, b)
foo f a b = (f a, f b)
-- rank 2
foo' :: (forall a. a -> a) -> a -> b -> (a, b)
foo' f a b = (f a, f b)
The second type is the one you want, but the first type is the one you're getting. The second type, as amalloy pointed out, is a rank-2 type (we're going to ignore what the two means but read the introduction in "Practical type inference for arbitrary-rank types" if you want a good explanation of ranks – don't be put off by the academic nature of the PDF file as the beginning is accessibly and clearly written).
We'll defer the definition of higher-ranked types for now and just say that the problem is that GHC is unable to infer the rank-2 type. Quote the paper:
Complete type inference is known to be undecidable for higher-rank (impredicative) type systems, but in practice programmers are more than willing to add type annotations to guide the type inference engine, and to document their code....
Kfoury and Wells show that typeability is decidable for rank ≤ 2, and undecidable for all ranks ≥ 3 (Kfoury & Wells, 1994). For the rank-2 fragment, the same paper gives a type inference algorithm. This inference algorithm is somewhat subtle, does not interact well with user-supplied type annotations, and has not, to our knowledge, been implemented in a production compiler.
Undecidable means there can be no algorithm that always leads to a correct yes-or-no decision. So there you have it: impossible to infer a rank-3-or-higher type, and it's too gosh-darn-hard to infer the rank-2 type.
Now, back to rank 2. The (forall a. a -> a) is what makes it rank-2. There's already an excellent Stack Overflow question about what the forall keyword means so I'll refer you to that, but basically it means you're able to call f a and f b in the expression (f a, f b) while having a and b be different types, which is what you wanted in the first place, before all this hot mess.
One last thing: The reason you don't normally see foralls in GHCi is that any foralls on the very outer scope are left off. So forall a b. (a -> b) -> a -> a -> (b, b) is equivalent to (a -> b) -> a -> a -> (b, b).
Overall this is a pain point of the language that's poorly explained.
(Hat tip to #amalloy in the comments.)

Simple example for ImpredicativeTypes

The GHC user's guide describes the impredicative polymorphism extension with reference to the following example:
f :: Maybe (forall a. [a] -> [a]) -> Maybe ([Int], [Char])
f (Just g) = Just (g [3], g "hello")
f Nothing = Nothing
However, when I define this example in a file and try to call it, I get a type error:
ghci> f (Just reverse)
<interactive>:8:9:
Couldn't match expected type `forall a. [a] -> [a]'
with actual type `[a0] -> [a0]'
In the first argument of `Just', namely `reverse'
In the first argument of `f', namely `(Just reverse)'
In the expression: f (Just reverse)
ghci> f (Just id)
<interactive>:9:9:
Couldn't match expected type `forall a. [a] -> [a]'
with actual type `a0 -> a0'
In the first argument of `Just', namely `id'
In the first argument of `f', namely `(Just id)'
In the expression: f (Just id)
Seemingly only undefined, Nothing, or Just undefined satisfies the type-checker.
I have two questions, therefore:
Can the above function be called with Just f for any non-bottom f?
Can someone provide an example of a value only definable with impredicative polymorphism, and usable in a non-trivial way?
The latter is particularly with the HaskellWiki page on Impredicative Polymorphism in mind, which currently makes a decidedly unconvincing case for the existence of the extension.
Here's an example of how one project, const-math-ghc-plugin, uses ImpredicativeTypes to specify a list of matching rules.
The idea is that when we have an expression of the form App (PrimOp nameStr) (Lit litVal), we want to look up the appropriate rule based upon the primop name. A litVal will be either a MachFloat d or MachDouble d (d is a Rational). If we find a rule, we want to apply the function for that rule to d converted to the correct type.
The function mkUnaryCollapseIEEE does this for unary functions.
mkUnaryCollapseIEEE :: (forall a. RealFloat a => (a -> a))
-> Opts
-> CoreExpr
-> CoreM CoreExpr
mkUnaryCollapseIEEE fnE opts expr#(App f1 (App f2 (Lit lit)))
| isDHash f2, MachDouble d <- lit = e d mkDoubleLitDouble
| isFHash f2, MachFloat d <- lit = e d mkFloatLitFloat
where
e d = evalUnaryIEEE opts fnE f1 f2 d expr
The first argument needs to have a Rank-2 type, because it will be instantiated at either Float or Double depending on the literal constructor. The list of rules looks like this:
unarySubIEEE :: String -> (forall a. RealFloat a => a -> a) -> CMSub
unarySubIEEE nm fn = CMSub nm (mkUnaryCollapseIEEE fn)
subs =
[ unarySubIEEE "GHC.Float.exp" exp
, unarySubIEEE "GHC.Float.log" log
, unarySubIEEE "GHC.Float.sqrt" sqrt
-- lines omitted
, unarySubIEEE "GHC.Float.atanh" atanh
]
This is ok, if a bit too much boilerplate for my taste.
However, there's a similar function mkUnaryCollapsePrimIEEE. In this case, the rules are different for different GHC versions. If we want to support multiple GHCs, it gets a bit tricky. If we took the same approach, the subs definition would require a lot of CPP, which can be unmaintainable. Instead, we defined the rules in a separate file for each GHC version. However, mkUnaryCollapsePrimIEEE isn't available in those modules due to circular import issues. We could probably re-structure the modules to make it work, but instead we defined the rulesets as:
unaryPrimRules :: [(String, (forall a. RealFloat a => a -> a))]
unaryPrimRules =
[ ("GHC.Prim.expDouble#" , exp)
, ("GHC.Prim.logDouble#" , log)
-- lines omitted
, ("GHC.Prim.expFloat#" , exp)
, ("GHC.Prim.logFloat#" , log)
]
By using ImpredicativeTypes, we can keep a list of Rank-2 functions, ready to use for the first argument to mkUnaryCollapsePrimIEEE. The alternatives would be much more CPP/boilerplate, changing the module structure (or circular imports), or a lot of code duplication. None of which I would like.
I do seem to recall GHC HQ indicating that they would like to drop support for the extension, but perhaps they've reconsidered. It is quite useful at times.
Isn't it just that ImpredicativeTypes has been quietly dropped with the new typechecker in ghc-7+ ? Note that ideone.com still uses ghc-6.8 and indeed your program use to run fine :
{-# OPTIONS -fglasgow-exts #-}
f :: Maybe (forall a. [a] -> [a]) -> Maybe ([Int], [Char])
f (Just g) = Just (g [3], g "hello")
f Nothing = Nothing
main = print $ f (Just reverse)
prints Just ([3],"olleh") as expected; see http://ideone.com/KMASZy
augustss gives a handsome use case -- some sort of imitation Python dsl -- and a defense of the extension here: http://augustss.blogspot.com/2011/07/impredicative-polymorphism-use-case-in.html referred to in the ticket here http://hackage.haskell.org/trac/ghc/ticket/4295
Note this workaround:
justForF :: (forall a. [a] -> [a]) -> Maybe (forall a. [a] -> [a])
justForF = Just
ghci> f (justForF reverse)
Just ([3],"olleh")
Or this one (which is basically the same thing inlined):
ghci> f $ (Just :: (forall a. [a] -> [a]) -> Maybe (forall a. [a] -> [a])) reverse
Just ([3],"olleh")
Seems like the type inference has problems infering the type of the Just in your case and we have to tell it the type.
I have no clue if it's a bug or if there is a good reason for it.. :)

Algebraically interpreting polymorphism

So I understand the basic algebraic interpretation of types:
Either a b ~ a + b
(a, b) ~ a * b
a -> b ~ b^a
() ~ 1
Void ~ 0 -- from Data.Void
... and that these relations are true for concrete types, like Bool, as opposed to polymorphic types like a. I also know how to translate type signatures with polymorphic types into their concrete type representations by just translating the Church encoding according to the following isomorphism:
(forall r . (a -> r) -> r) ~ a
So if I have:
id :: forall a . a -> a
I know that it does not mean id ~ a^a, but it actually means:
id :: forall a . (() -> a) -> a
id ~ ()
~ 1
Similarly:
pair :: forall r . (a -> b -> r) -> r
pair ~ ((a, b) -> r) - > r
~ (a, b)
~ a * b
Which brings me to my question. What is the "algebraic" interpretation of this rule:
(forall r . (a -> r) -> r) ~ a
For every concrete type isomorphism I can point to an equivalent algebraic rule, such as:
(a, (b, c)) ~ ((a, b), c)
a * (b * c) = (a * b) * c
a -> (b -> c) ~ (a, b) -> c
(c^b)^a = c^(b * a)
But I don't understand the algebraic equality that is analogous to:
(forall r . (a -> r) -> r) ~ a
This is the famous Yoneda lemma for the identity functor.
Check this post for a readable introduction, and any category theory textbook for more.
Briefly, given f :: forall r. (a -> r) -> r you can apply f id to get an a, and conversely, given x :: a you can take ($x) to get forall r. (a -> r) -> r.
These operations are mutually inverse. Proof:
Obviously ($x) id == x. I will show that
($(f id)) == f,
since functions are equal when they are equal on all arguments, let's take x :: a -> r and show that
($(f id)) x == f x i.e.
x (f id) == f x.
Since f is polymorphic, it works as a natural transformation; this is the naturality diagram for f:
f_A
Hom(A, A) → A
(x.) ↓ ↓ x
Hom(A, R) → R
f_R
So x . f == f . (x.).
Plugging identity, (x . f) id == f x. QED
(Rewritten for clarity)
There seem to be two parts to your question. One is implied and is asking what the algebraic interpretation of forall is, and the other is asking about the cont/Yoneda transformation, which sdcvvc's answer already covered pretty well.
I'll try to address the algebraic interpretation of forall for you. You mention that A -> B is B^A but I'd like to take that a step further and expand it out to B * B * B * ... * B (|A| times). Although we do have exponentiation as a notation for repeated multiplication like that, there's a more flexible notation, ∏ (uppercase Pi) representing arbitrary indexed products. There are two components to a Pi: the range of values we want to multiply over, and the expression that we're multiplying out. For example, at the value level, you might express the factorial function as fact i = ∏ [1..i] (λx -> x).
Going back to the world of types, we can view the exponentiation operator in the A -> B ~ B^A correspondence as a Pi: B^A ~ ∏ A (λ_ -> B). This says that we're defining an A-ary product of Bs, such that the Bs cannot depend on the particular A we've chosen. Sure, it's equivalent to plain exponentiation, but it lets us move up to cases in which there is a dependence.
In the most general case, we get dependent types, like what you see in Agda or Coq: in Agda syntax, replicate : Bool -> ((n : Nat) -> Vec Bool n) is one possible application of a Pi type, which could be expressed more explicitly as replicate : Bool -> ∏ Nat (Vec Bool), or further as replicate : ∏ Bool (λ_ -> ∏ Nat (Vec Bool)).
Note that as you might expect from the underlying algebra, you can fuse both of the ∏s in the definition of replicate above into a single ∏ ranging over the cartesian product of the domains: ∏ Bool (\_ -> ∏ Nat (Vec Bool)) is equivalent to ∏ (Bool, Nat) (λ(_, n) -> Vec Bool n) just like it would be at the "value level". This is simply uncurrying from the perspective of type theory.
I do realize your question was about polymorphism, so I'll stop going on about dependent types, but they are relevant: forall in Haskell is roughly equivalent to a ∏ with a domain over the type (kind) of types, *. Indeed, the function-like behavior of polymorphism can be observed directly in GHC core, which types them as capital lambdas (Λ). As such, a polymorphic type like forall a. a -> a is actually just ∏ * (Λ a -> (a -> a)) (using the Λ notation now that we distinguish between types and values), which can be expanded out to the infinite product (Bool -> Bool, Int -> Int, () -> (), (Int -> Bool) -> (Int -> Bool), ...) for every possible type. Instantiation of the type variable is simply projecting out the suitable element from the *-ary product (or applying the type function).
Now, for the big piece I missed in my original version of this answer: parametricity. Parametricity can be described in several different ways, but none of the ones I know of (viewing types as relations, or (di)naturality in category theory) really has a very algebraic interpretation. For our purposes, though, it boils down to something fairly simple: you can't pattern-match on *. I know that GHC lets you do that at the type level with type families, but you can only cover a finite chunk of * when doing that, so there are necessarily always points at which your type family is undefined.
What this means, from the point of view of polymorphism, is that any type function F we write in ∏ * F must either be constant (i.e., completely ignore the type it was polymorphic over) or pass the type through unchanged. Thus, ∏ * (Λ _ -> B) is valid because it ignores its argument, and corresponds to forall a. B. The other case is something like ∏ * (Λ x -> Maybe x), which corresponds to forall a. Maybe a, which doesn't ignore the type argument, but only "passes it through". As such, a ∏ A that has an irrelevant domain A (such as when A = *) can be seen as more of an A-ary indexed intersection (picking the common elements across all instantiations of the index), rather than a product.
Crucially, at the value level, the rules of parametricity prevent any funny behavior that might suggest the types are larger than they really are. Because we don't have typecase, we can't construct a value of type forall a. B that does something different based on what a was instantiated to. Thus, although the type is technically a function * -> B, it is always a constant function, and is thus equivalent to a single value of B. Using the ∏ interpretation, it is indeed equivalent to an infinite *-ary product of Bs, but those B values must always be identical, so the infinite product is effectively as big as a single B.
Similarly, although ∏ * (Λ x -> (x -> x)) (a.k.a., forall a. a -> a) is technically equivalent to an infinite product of functions, none of those functions can inspect the type, so all are constrained to only return their input value and not do any funny business like (+1) : Int -> Int when instantiated to Int. Because there is only one (assuming a total language) function that can't inspect the type of its argument but must return a value of that same type, the infinite product is thus just as large as a single value.
Now, about your direct question on (forall r . (a -> r) -> r) ~ a. First, let's express your ~ operator more formally. It's really isomorphism, so we need two functions going back and forth, and an argument that they're inverses.
data Iso a b = Iso
{ to :: a -> b
, from :: b -> a
-- proof1 :: forall x. to (from x) == x
-- proof2 :: forall x. from (to x) == x
}
and now we express your original question in more formal terms. Your question amounts to constructing a term of the following (impredicative, so GHC has trouble with it, but we'll survive) type:
forall a. Iso (forall r. (a -> r) -> r) a
Which, using my earlier terminology, amounts to ∏ * (Λ a -> Iso (∏ * (Λ r -> ((a -> r) -> r))) a). Once again we have an infinite product that can't inspect its type argument. By handwaving, we can argue that the only possible values considering the parametricity rules (the other two proofs are respected automatically) for to and from are ($ id) and flip id.
If this feels unsatisfying, it's probably because the algebraic interpretation of forall didn't really add anything to the proof. It's really just plain old type theory, but I hope I was able to provide something that feels a little less categorical than the Yoneda form of it. It's worth noting that we don't actually need to use parametricity to write proof1 and proof2 above, though. Parametricity only enters the picture when we want to state that ($ id) and flip id are our only options for to and from (which we can't prove in Agda or Coq, for that reason).
To (attempt to) answer the actual question (which is less interesting than the answers to the broader issues raised), the question is ill formed because of a "type error"
Either ~ (+)
(,) ~ (*)
(->) b ~ flip (^)
() ~ 1
Void ~ 0
These all map types to integers, and type constructors to functions on naturals. In a sense, you have a functor from the category of types to the category of naturals. In the other direction, you "forget" stuff, since the types preserve algebraic structure while the naturals throw it away. I.e. given Either () () you can get a unique natural, but given that natural, you can get many types.
But this is different:
(forall r . (a -> r) -> r) ~ a
It maps a type to another type! It is not part of the above functor. It's just an isomorphism within the category of types. So let's give that a different symbol, <=>
Now we have
(forall r . (a -> r) -> r) <=> a
Now you note that we can not only send types to nats and arrows to arrows, but also some isomorphisms to other isomorphisms:
(a, (b, c)) <=> ((a, b), c) ~ a * (b * c) = (a * b) * c
But something subtle is going on here. In a sense, the latter isomorphism on pairs is true because the algebraic identity is true. This is to say that the "isomorphism" in the latter simply means that the two types are equivalent under the image of our functor to the nats.
The former isomorphism we need to prove directly, which is where we start to get to the underlying question -- is given our functor to the nats, what does forall r. map to? But the answer is that forall r. is neither a type, nor a meaningful arrow between types.
By introducing forall, we have moved away from first order types. There's no reason to expect that forall should fit in our above Functor, and indeed, it doesn't.
So we can explore, as others have above, why the isomorphism holds (which is itself very interesting) -- but in doing so we've abandoned the algebraic core of the question. A question which can be answered, I think, is, given the category of higher-order types and constructors as arrows between them, what is there meaningful Functor to?
Edit:
So now I have another approach which shows why adding polymorphism makes things go nuts. We start by asking a simpler question -- does a given polymorphic type have zero or more than zero inhabitants? This is the type inhabitation problem, and winds up being, via Curry-Howard, a problem in modified realizability, since it's the same thing as asking if a formula in some logic is realizable in an appropriate computational model. Now as that page explains, this is decidable in the simply typed lambda calculus but is PSPACE-complete. But once we move to anything more complicated, by adding polymorphism for example and going to System F, then it goes to undecidable!
So, if we can't decide if an arbitrary type is inhabited at all, then we clearly can't decide how many inhabitants it has!
It's an interesting question. I don't have a full answer, but this was too long for a comment.
The type signature (forall r. (a -> r) -> r) can be expressed as me saying
For any type r that you care to name, if you give me a function that takes a and produces an r, then I will give you back an r.
Now, this has to work for any type r, but it can be a specific type a. So the way for me to pull of this neat trick is to have an a sitting around somewhere, that I feed to the function (which produces an r for me) and then I hand that r back to you.
But if I have an a sitting around, I could give it to you:
If you give me a 1, I'll give you an a.
which corresponds to the type signature 1 -> a or simply a. By this informal argument we have
(forall r. (a -> r) -> r) ~ a
The next step would be to generate the corresponding algebraic expression, but I'm not clear on how the algebraic quantities interact with the universal quantification. We may need to wait for an expert!
A few links to the nLab:
Universal quantifier, corresponds to dependent product.
Existential quantifier, corresponds to dependent sum (dependent coproduct).
Thus, in settings of category theory:
Type | Modeled¹ as | In category
-------------------+---------------------------+-------------
Unit | Terminal object | CCC
Bottom | Initial object |
Record | Product |
Union | Sum (coproduct) |
Function | Exponential |
-------------------+---------------------------+-------------
Dependent product² | Right adjoint to pullback | LCCC
Dependent sum | Left adjoint to pullback |
¹) in appropriate category ─ CCC for total and non-polymorphic subset of Haskell (link), CPO for non-total traits of Haskell (link), LCCC for dependently typed languages.
²) forall quantification is a special case of dependent product:
∀(x :: *). y[x] ~ ∏(x : Set)y[x]
where Set is the universe of all small types.

Resources