(Disclaimer: I'm not 100% sure how codatatype works, especially when not referring to terminal algebras).
Consider the "category of types", something like Hask but with whatever adjustment that fits the discussion. Within such a category, it is said that (1) the initial algebras define datatypes, and (2) terminal algebras define codatatypes.
I'm struggling to convince myself of (2).
Consider the functor T(t) = 1 + a * t. I agree that the initial T-algebra is well-defined and indeed defines [a], the list of a. By definition, the initial T-algebra is a type X together with a function f :: 1+a*X -> X, such that for any other type Y and function g :: 1+a*Y -> Y, there is exactly one function m :: X -> Y such that m . f = g . T(m) (where . denotes the function combination operator as in Haskell). With f interpreted as the list constructor(s), g the initial value and the step function, and T(m) the recursion operation, the equation essentially asserts the unique existance of the function m given any initial value and any step function defined in g, which necessitates an underlying well-behaved fold together with the underlying type, the list of a.
For example, g :: Unit + (a, Nat) -> Nat could be () -> 0 | (_,n) -> n+1, in which case m defines the length function, or g could be () -> 0 | (_,n) -> 0, then m defines a constant zero function. An important fact here is that, for whatever g, m can always be uniquely defined, just as fold does not impose any contraint on its arguments and always produce a unique well-defined result.
This does not seem to hold for terminal algebras.
Consider the same functor T defined above. The definition of the terminal T-algebra is the same as the initial one, except that m is now of type X -> Y and the equation now becomes m . g = f . T(m). It is said that this should define a potentially infinite list.
I agree that this is sometimes true. For example, when g :: Unit + (Unit, Int) -> Int is defined as () -> 0 | (_,n) -> n+1 like before, m then behaves such that m(0) = () and m(n+1) = Cons () m(n). For non-negative n, m(n) should be a finite list of units. For any negative n, m(n) should be of infinite length. It can be verified that the equation above holds for such g and m.
With any of the two following modified definition of g, however, I don't see any well-defined m anymore.
First, when g is again () -> 0 | (_,n) -> n+1 but is of type g :: Unit + (Bool, Int) -> Int, m must satisfy that m(g((b,i))) = Cons b m(g(i)), which means that the result depends on b. But this is impossible, because m(g((b,i))) is really just m(i+1) which has no mentioning of b whatsoever, so the equation is not well-defined.
Second, when g is again of type g :: Unit + (Unit, Int) -> Int but is defined as the constant zero function g _ = 0, m must satisfy that m(g(())) = Nil and m(g(((),i))) = Cons () m(g(i)), which are contradictory because their left hand sides are the same, both being m(0), while the right hand sides are never the same.
In summary, there are T-algebras that have no morphism into the supposed terminal T-algebra, which implies that the terminal T-algebra does not exist. The theoretical modeling of the codatatype Stream (or infinite list), if any, cannot be based on the nonexistant terminal algebra of the functor T(t) = 1 + a * t.
Many thanks to any hint of any flaw in the story above.
(2) terminal algebras define codatatypes.
This is not right, codatatypes are terminal coalgebras. For your T functor, a coalgebra is a type x together with f :: x -> T x. A T-coalgebra morphism between (x1, f1) and (x2, f2) is a g :: x1 -> x2 such that fmap g . f1 = f2 . g. Using this definition, the terminal T-algebra defines the possibly infinite lists (so-called "colists"), and the terminality is witnessed by the unfold function:
unfold :: (x -> Unit + (a, x)) -> x -> Colist a
Note though that a terminal T-algebra does exist: it is simply the Unit type together with the constant function T Unit -> Unit (and this works as a terminal algebra for any T). But this is not very interesting for writing programs.
it is said that (1) the initial algebras define datatypes, and (2) terminal algebras define codatatypes.
On the second point, it is actually said that terminal coalgebras define codatatypes.
A datatype t is defined by its constructors and a fold.
Constructors can be modelled by an algebra F t -> t (for example, the Peano constructors O : nat S : Nat -> Nat are collected as a single function in : Unit + Nat -> Nat).
The fold then gives the catamorphism fold f : t -> x for any algebra f : F x -> x (for nats, fold : ((Unit + x) -> x) -> Nat -> x).
A codatatype t is defined by its destructors and an unfold.
Destructors can be modelled by a coalgebra t -> F t (for example, streams have two destructors head : Stream a -> a and tail : Stream a -> Stream a, and they are collected as a single function out : Stream a -> a * Stream a).
The unfold then gives the anamorphism unfold f : x -> t for any coalgebra f : x -> F x (for streams, unfold : (x -> a * x) -> x -> Stream a).
(Disclaimer: I'm not 100% sure how codatatype works, especially when not referring to terminal algebras).
A codata type, or coinductive data type, is just one defined by its eliminations rather than its introductions.
It seems that sometimes terminal algebra is used (very confusingly) to refer to a final coalgebra, which is what actually defines a codata type.
Consider the same functor T defined above. The definition of the terminal T-algebra is the same as the initial one, except that m is now of type X -> Y and the equation now becomes m . g = f . T(m). It is said that this should define a potentially infinite list.
So I think this is where you’ve gone wrong: “m ∘ g = f ∘ T(m)” should be reversed, and read “T(m) ∘ f = g ∘ m”. That is, the final coalgebra is defined by a carrier set S and a map g : S → T(S) such that for any other coalgebra (R, f : R → T(R)) there is a unique map m : R → S such that T(m) ∘ f = g ∘ m.
m is uniquely defined recursively by the map that returns Left () whenever f maps to Left (), and Right (x, m xs) whenever f maps to Right (x, xs), i.e. it’s the assignment of the coalgebra to its unique morphism to the final coalgebra, and denotes the unique anamorphism/unfold of this type, which should be easy to convince yourself is in fact a possibly-empty & possibly-infinite stream.
Related
Consider the following wrapper:
newtype F a = Wrap { unwrap :: Int }
I want to disprove (as an exercise to wrap my head around this interesting post) that there’s a legitimate Functor F instance which allows us to apply functions of Int -> Int type to the actual contents and to ~ignore~ all other functions (i. e. fmap nonIntInt = id).
I believe this should be done with a free theorem for fmap (which I read here):
for given f, g, h and k, such that g . f = k . h: $map g . fmap f = fmap k . $map h, where $map is the natural map for the given constructor.
What defines a natural map? Am I right to assume that it is a simple flip const for F?
As far as I get it: $map f is what we denote as Ff in category theory. Thus, in a categorical sense, we simply want something among the lines of the following diagram to commute:
Yet, I do not know what to put instead of ???s (that is, what functor do we apply to get such a diagram and how do we denote this almost-fmap?).
So, what is a natural map in general, and for F? What is the proper diagram for fmap's free theorem?
Where am I going with this?
Consider:
f = const 42
g = id
h = const ()
k () = 42
It is easy to see that f . g is h . k. And yet, the non-existant fmap will execute only f, not k, giving different results. If my intuition about the naturality is correct, such a proof would work. That's what I am trying to figure out.
#leftaroundabout proposed a simpler piece of proof: fmap show . fmap (+1) alters the contents, unlike fmap $ show . (+1). It is a nice piece of proof, and yet I would still like to work with free theorems as an exercise.
So we are entertaining a function m :: forall a b . (a->b) -> F a -> F b such that (among other things)
m (1 +) (Wrap x) = (Wrap (1+x))
m (show) (Wrap x) = (Wrap x)
There are two somewhat related questions here.
Can a well-behaved fmap do this?
Can a parametric function do this?
The answer to both questions is "no".
A well-behaved fmap can't do this because fmap has to obey the axioms of Functor. Whether our environment is parametric or not is irrelevant. The axiom of Functor says that for all functions a and b, fmap (a . b) = fmap a . fmap b must hold, and this fails for a = show and b = (1 +). So m cannot be a well-behaved fmap.
A parametric function can't do this because that is what the parametricity theorem says. When viewing types as relations between terms, related functions take related arguments to related results. It is easy to see that m fails parametricity, but it is slightly easier to look at m': forall a b. (a -> b) -> (Int -> Int) (the two can be trivially converted to each other). (1 +) is related to show because m' is polymorphic in its argument, so different values of the argument can be related by any relation. Functions are relations, and there exists a function that sends (1 +) to show. However, the result type of m' has no type variables, so it corresponds to the constant relation (its values are only related to themselves). Since every value including m' is related to itself, it follows that all parametric functions m :: forall a b. (a -> b) -> (Int -> Int) must obey m f = m g, i.e. they must ignore their first argument. Which is intuitively obvious since there is nothing to apply it to.
One can in fact deduce the first statement from the second by observing that a well-behaved fmap must be parametric. So even if the language allows non-parametricity, fmap cannot make any non-trivial use of it.
A while ago I read that the function type a -> b corresponds to the relation a ≤ b, or is it a ≥ b? This makes sense to me because two types are isomorphic if we have a bijection between them (i.e. (a ≈ b) ≡ (a -> b, b -> a)). Similarly, (a = b) ≡ (a ≤ b) ∧ (a ≥ b).
I know that this is not the Curry-Howard-Lambek correspondence (i.e. the correspondence between type theory, logic and category theory). It's the correspondence between type theory and something else. I want to know learn more about this correspondence. Could somebody point me in the right direction?
I know that this doesn't seem like a programming question but it is related to programming and I'm hoping that some functional programmer knows more about it and can point me in the right direction.
Every pre-ordered set forms a category. Let (S, «) be a pre-ordered set. Define a category C whose objects are the elements of S and with Hom(a, b) inhabited by (a, b) if a « b and uninhabited otherwise. Define composition the only way you possibly can. The category laws follow immediately from the transitivity and reflexivity of the pre-order.
A lattice, in particular, will form a category admitting finite products and coproducts. A bounded lattice will form one with initial and final objects.
Types and functions in a sufficiently well-behaved functional language also form a category with finite products and coproducts, and initial and final objects. So if you squint out to a categorical blur, these things will start to look vaguely similar.
(This is more a comment than an answer, but I need more space.)
The type a -> b corresponds to a <= b. This is useful, e.g., to speak about fixed points at the type level, which are needed to properly define recursive types (lists, trees, ...).
Recall how recursion is solved, without categories. In domain theory, given a function f :: a -> a we look for a least x satisfying f x = x (least fixed point). This turns out to also be the least x satisfying f x <= x (least prefixed point). We then get the induction principle
f y <= y ==> fix f <= y
which basically states that, if we have any prefixed point y, then the least (pre)fixed point fix f must be less than y -- indeed, it is the least!
Now, let's sprinkle some category powder over that. Implication becomes a -> arrow, and <= also becomes ->. We get
(f y -> y) -> fix f -> y
Looks familiar, where did I see that...? Ah!
newtype Fix f = Fix { unFix :: f (Fix f) }
cata :: Functor f => (f y -> y) -> Fix f -> y
cata g = g . fmap (cata g) . unFix
Hence, the cata general eliminator/catamorphism is just a category-empowered version of the good old induction principle.
Note how domain points y are now object in our category (i.e. types). Also, functions f must be applicable to y, so these are not morphisms in our category (which would be function values :: A -> B, from some type to some type), but correspond to functors int the category of types (mapping types to types :: * -> *).
Situation
I have function f, which I want to augment with function g, resulting in function named h.
Definitions
By "augment", in the general case, I mean: transform either input (one or more arguments) or output (return value) of function f.
By "augment", in the specific case, (specific to my current situation) I mean: transform only the output (return value) of function f while leaving all the arguments intact.
By "transparent", in the context of "augmentation", (both the general case and the specific case) I mean: To couple g's implementation as loosely to f's implementation as possible.
Specific case
In my current situation, this is what I need to do:
h a b c = g $ f a b c
I am interested in rewriting it to something like this:
h = g . f -- Doesn't type-check.
Because from the perspective of h and g, it doesn't matter what arguments f take, they only care about the return value, hence it would be tight coupling to mention the arguments in any way. For instance, if f's argument count changes in the future, h will also need to be changed.
So far
I asked lambdabot on the #haskell IRC channel: #pl h a b c = g $ f a b c to which I got the response:
h = ((g .) .) . f
Which is still not good enough since the number of (.)'s is dependent on the number of f's arguments.
General case
I haven't done much research in this direction, but erisco on #haskell pointed me towards http://matt.immute.net/content/pointless-fun which hints to me that a solution for the general case could be possible.
So far
Using the functions defined by Luke Palmer in the above article this seems to be an equivalent of what we have discussed so far:
h = f $. id ~> id ~> id ~> g
However, it seems that this method sadly also suffers from being dependent on the number of arguments of f if we want to transform the return value of f -- just as the previous methods.
Working example
In JavaScript, for instance, it is possible to achieve transparent augmentation like this:
function h () { return g(f.apply(this, arguments)) }
Question
How can a function be "transparently augmented" in Haskell?
I am mainly interested in the specific case, but it would be also nice to know how to handle the general case.
You can sort-of do it, but since there is no way to specify a behavior for everything that isn't a function, you'll need a lot of trivial instances for all the other types you care about.
{-# LANGUAGE TypeFamilies, DefaultSignatures #-}
class Augment a where
type Result a
type Result a = a
type Augmented a r
type Augmented a r = r
augment :: (Result a -> r) -> a -> Augmented a r
default augment :: (a -> r) -> a -> r
augment g x = g x
instance Augment b => Augment (a -> b) where
type Result (a -> b) = Result b
type Augmented (a -> b) r = a -> Augmented b r
augment g f x = augment g (f x)
instance Augment Bool
instance Augment Char
instance Augment Integer
instance Augment [a]
-- and so on for every result type of every function you want to augment...
Example:
> let g n x ys = replicate n x ++ ys
> g 2 'a' "bc"
"aabc"
> let g' = augment length g
> g' 2 'a' "bc"
4
> :t g
g :: Int -> a -> [a] -> [a]
> :t g'
g' :: Int -> a -> [a] -> Int
Well, technically, with just enough IncoherentInstances you can do pretty much anything:
{-# LANGUAGE MultiParamTypeClasses, TypeFamilies,
FlexibleInstances, UndecidableInstances, IncoherentInstances #-}
class Augment a b f h where
augment :: (a -> b) -> f -> h
instance (a ~ c, h ~ b) => Augment a b c h where
augment = ($)
instance (Augment a b d h', h ~ (c -> h')) => Augment a b (c -> d) h where
augment g f = augment g . f
-- Usage
t1 = augment not not
r1 = t1 True
t2 = augment (+1) (+)
r2 = t2 2 3
t3 = augment (+1) foldr
r3 = t3 (+) 0 [2,3]
The problem is that the real return value of something like a -> b -> c isn't
c, but b -> c. What you want require some kind of test that tells you if a type isn't
a function type. You could enumerate the types you are interested in, but that's not so
nice. I think HList solve this problem somehow, look at the paper. I managed to understand a bit of the solution with overlapping instances, but the rest goes a bit over my head I'm afraid.
JavaScript works, because its arguments are a sequence, or a list, so there is just one argument, really. In that sense it is the same as a curried version of the functions with a tuple representing the collection of arguments.
In a strongly typed language you need a lot more information to do that "transparently" for a function type - for example, dependent types can express this idea, but require the functions to be of specific types, not a arbitrary function type.
I think I saw a workaround in Haskell that can do this, too, but, again, that works only for specific types, which capture the arity of the function, not any function.
I was somewhat undecided as to whether this was a math.SE question or an SO one, but I suspect that mathematicians in general are fairly unlikely to know or care much about this category in particular, whereas Haskell programmers might well do.
So, we know that Hask has products, more or less (I'm working with idealised-Hask, here, of course). I'm interested in whether or not it has equalisers (in which case it would have all finite limits).
Intuitively it seems not, since you can't do separation like you can on sets, and so subobjects seem hard to construct in general. But for any specific case you'd like to come up with, it seems like you'd be able to hack it by working out the equaliser in Set and counting it (since after all, every Haskell type is countable and every countable set is isomorphic either to a finite type or the naturals, both of which Haskell has). So I can't see how I'd go about finding a counterexample.
Now, Agda seems a bit more promising: there it is relatively easy to form subobjects. Is the obvious sigma type Σ A (λ x → f x == g x) an equaliser? If the details don't work, is it morally an equaliser?
tl;dr the proposed candidate is not quite an equaliser, but its irrelevant counterpart is
The candidate for an equaliser in Agda looks good. So let's just try it. We'll need some basic kit. Here are my refusenik ASCII dependent pair type and homogeneous intensional equality.
record Sg (S : Set)(T : S -> Set) : Set where
constructor _,_
field
fst : S
snd : T fst
open Sg
data _==_ {X : Set}(x : X) : X -> Set where
refl : x == x
Here's your candidate for an equaliser for two functions
Q : {S T : Set}(f g : S -> T) -> Set
Q {S}{T} f g = Sg S \ s -> f s == g s
with the fst projection sending Q f g into S.
What it says: an element of Q f g is an element s of the source type, together with a proof that f s == g s. But is this an equaliser? Let's try to make it so.
To say what an equaliser is, I should define function composition.
_o_ : {R S T : Set} -> (S -> T) -> (R -> S) -> R -> T
(f o g) x = f (g x)
So now I need to show that any h : R -> S which identifies f o h and g o h must factor through the candidate fst : Q f g -> S. I need to deliver both the other component, u : R -> Q f g and the proof that indeed h factors as fst o u. Here's the picture: (Q f g , fst) is an equalizer if whenever the diagram commutes without u, there is a unique way to add u with the diagram still commuting.
Here goes existence of the mediating u.
mediator : {R S T : Set}(f g : S -> T)(h : R -> S) ->
(q : (f o h) == (g o h)) ->
Sg (R -> Q f g) \ u -> h == (fst o u)
Clearly, I should pick the same element of S that h picks.
mediator f g h q = (\ r -> (h r , ?0)) , ?1
leaving me with two proof obligations
?0 : f (h r) == g (h r)
?1 : h == (\ r -> h r)
Now, ?1 can just be refl as Agda's definitional equality has the eta-law for functions. For ?0, we are blessed by q. Equal functions respect application
funq : {S T : Set}{f g : S -> T} -> f == g -> (s : S) -> f s == g s
funq refl s = refl
so we may take ?0 = funq q r.
But let us not celebrate prematurely, for the existence of a mediating morphism is not sufficient. We require also its uniqueness. And here the wheel is likely to go wonky, because == is intensional, so uniqueness means there's only ever one way to implement the mediating map. But then, our assumptions are also intensional...
Here's our proof obligation. We must show that any other mediating morphism is equal to the one chosen by mediator.
mediatorUnique :
{R S T : Set}(f g : S -> T)(h : R -> S) ->
(qh : (f o h) == (g o h)) ->
(m : R -> Q f g) ->
(qm : h == (fst o m)) ->
m == fst (mediator f g h qh)
We can immediately substitute via qm and get
mediatorUnique f g .(fst o m) qh m refl = ?
? : m == (\ r -> (fst (m r) , funq qh r))
which looks good, because Agda has eta laws for records, so we know that
m == (\ r -> (fst (m r) , snd (m r)))
but when we try to make ? = refl, we get the complaint
snd (m _) != funq qh _ of type f (fst (m _)) == g (fst (m _))
which is annoying, because identity proofs are unique (in the standard configuration). Now, you can get out of this by postulating extensionality and using a few other facts about equality
postulate ext : {S T : Set}{f g : S -> T} -> ((s : S) -> f s == g s) -> f == g
sndq : {S : Set}{T : S -> Set}{s : S}{t t' : T s} ->
t == t' -> _==_ {Sg S T} (s , t) (s , t')
sndq refl = refl
uip : {X : Set}{x y : X}{q q' : x == y} -> q == q'
uip {q = refl}{q' = refl} = refl
? = ext (\ s -> sndq uip)
but that's overkill, because the only problem is the annoying equality proof mismatch: the computable parts of the implementations match on the nose. So the fix is to work with irrelevance. I replace Sg by the Existential quantifier, whose second component is marked as irrelevant with a dot. Now it matters not which proof we use that the witness is good.
record Ex (S : Set)(T : S -> Set) : Set where
constructor _,_
field
fst : S
.snd : T fst
open Ex
and the new candidate equaliser is
Q : {S T : Set}(f g : S -> T) -> Set
Q {S}{T} f g = Ex S \ s -> f s == g s
The entire construction goes through as before, except that in the last obligation
? = refl
is accepted!
So yes, even in the intensional setting, eta laws and the ability to mark fields as irrelevant give us equalisers.
No undecidable typechecking was involved in this construction.
Hask
Hask doesn't have equalizers. An important thing to remember is that thinking about a type (or the objects in any category) and their isomorphism classes really requires thinking about the arrows. What you say about the underlying sets is true, but types with isomorphic underlying sets certainly aren't necessarily isomorphic. One difference between Hask and Set as that Hask's arrows must be computable, and in fact for idealized Hask, they must be total.
I spent a while trying to come up with a real defensible counterexample, and found some references suggesting it cannot be done, but without proofs. However, I do have some "moral" counterexamples if you will; I cannot prove that no equalizer exists in Haskell, but it certainly seems impossible!
Example 1
f, g: ([Int], Int) -> Int
f (p,v) = treat p as a polynomial with given coefficients, and evaluate p(v).
g _ = 0
The equalizer "should" be the type of all pairs (p,n) where p(n) = 0, along with a function injecting these pairs into ([Int], Int). By Hilbert's 10th problem, this set is undecidable. It seems to me that this should exclude the possibility of it being a Haskell type, but I can't prove that (is it possible that there's some bizarre way to construct this type that nobody has discovered?). It maybe I haven't connected a dot or two -- perhaps proving this is impossible isn't hard?
Example 2
Say you have a programing language. You have a compiler that takes the source code and an input and produces a function, for which the fixed point of the function is the output. (While we don't have compilers like this, specifying semantics sort of like this isn't unheard of). So, you have
compiler : String -> Int -> (Int -> Int)
(Un)curry that into a function
compiler' : (String, Int, Int) -> Int
and add a function
id' : (String, Int, Int) -> Int
id' (_,_,x) = x
Then the equalizer of compiler', id' would be the collection of triplets of source program, input, output -- and this is uncomputable because the programing language is fully general.
More Examples
Pick your favorite undecidable problem: it generally involves deciding if an object is the member of some set. You often have a total function that can be used to check this property for a particular object. You can use this function to create an equalizer where the type should be all the items in your undecidable set. That's where the first two examples came from, and there are tons more.
Agda
I'm not as familiar with Agda. My intuition is that your sigma-type should be an equalizer: you can write the type down, along with the necessary injection function, and it looks like it satisfies the definition entirely. However, as someone who doesn't use Agda, I don't think I'm really qualified to check the details.
The real practical issue though is that typechecking that sigma type won't always be computable, so it's not always useful to do this. In all the examples above, you can write down the Sigma type you provided, but you won't be able to readily check if something is a member of that type without a proof.
Incidentally, this is why Haskell shouldn't be able to have equalizers: if it did, the typechecking would be undecidable! Dependent types is what makes everything tick. They should be able to express interesting mathematical structures in its types, while Haskell can't since its typesystem is decideable. So, I would naturally expect idealized Agda to have all finite limits (I would be disappointed otherwise). The same goes for other dependently typed languages; Coq, for example, should definitely have all limits.
So I understand the basic algebraic interpretation of types:
Either a b ~ a + b
(a, b) ~ a * b
a -> b ~ b^a
() ~ 1
Void ~ 0 -- from Data.Void
... and that these relations are true for concrete types, like Bool, as opposed to polymorphic types like a. I also know how to translate type signatures with polymorphic types into their concrete type representations by just translating the Church encoding according to the following isomorphism:
(forall r . (a -> r) -> r) ~ a
So if I have:
id :: forall a . a -> a
I know that it does not mean id ~ a^a, but it actually means:
id :: forall a . (() -> a) -> a
id ~ ()
~ 1
Similarly:
pair :: forall r . (a -> b -> r) -> r
pair ~ ((a, b) -> r) - > r
~ (a, b)
~ a * b
Which brings me to my question. What is the "algebraic" interpretation of this rule:
(forall r . (a -> r) -> r) ~ a
For every concrete type isomorphism I can point to an equivalent algebraic rule, such as:
(a, (b, c)) ~ ((a, b), c)
a * (b * c) = (a * b) * c
a -> (b -> c) ~ (a, b) -> c
(c^b)^a = c^(b * a)
But I don't understand the algebraic equality that is analogous to:
(forall r . (a -> r) -> r) ~ a
This is the famous Yoneda lemma for the identity functor.
Check this post for a readable introduction, and any category theory textbook for more.
Briefly, given f :: forall r. (a -> r) -> r you can apply f id to get an a, and conversely, given x :: a you can take ($x) to get forall r. (a -> r) -> r.
These operations are mutually inverse. Proof:
Obviously ($x) id == x. I will show that
($(f id)) == f,
since functions are equal when they are equal on all arguments, let's take x :: a -> r and show that
($(f id)) x == f x i.e.
x (f id) == f x.
Since f is polymorphic, it works as a natural transformation; this is the naturality diagram for f:
f_A
Hom(A, A) → A
(x.) ↓ ↓ x
Hom(A, R) → R
f_R
So x . f == f . (x.).
Plugging identity, (x . f) id == f x. QED
(Rewritten for clarity)
There seem to be two parts to your question. One is implied and is asking what the algebraic interpretation of forall is, and the other is asking about the cont/Yoneda transformation, which sdcvvc's answer already covered pretty well.
I'll try to address the algebraic interpretation of forall for you. You mention that A -> B is B^A but I'd like to take that a step further and expand it out to B * B * B * ... * B (|A| times). Although we do have exponentiation as a notation for repeated multiplication like that, there's a more flexible notation, ∏ (uppercase Pi) representing arbitrary indexed products. There are two components to a Pi: the range of values we want to multiply over, and the expression that we're multiplying out. For example, at the value level, you might express the factorial function as fact i = ∏ [1..i] (λx -> x).
Going back to the world of types, we can view the exponentiation operator in the A -> B ~ B^A correspondence as a Pi: B^A ~ ∏ A (λ_ -> B). This says that we're defining an A-ary product of Bs, such that the Bs cannot depend on the particular A we've chosen. Sure, it's equivalent to plain exponentiation, but it lets us move up to cases in which there is a dependence.
In the most general case, we get dependent types, like what you see in Agda or Coq: in Agda syntax, replicate : Bool -> ((n : Nat) -> Vec Bool n) is one possible application of a Pi type, which could be expressed more explicitly as replicate : Bool -> ∏ Nat (Vec Bool), or further as replicate : ∏ Bool (λ_ -> ∏ Nat (Vec Bool)).
Note that as you might expect from the underlying algebra, you can fuse both of the ∏s in the definition of replicate above into a single ∏ ranging over the cartesian product of the domains: ∏ Bool (\_ -> ∏ Nat (Vec Bool)) is equivalent to ∏ (Bool, Nat) (λ(_, n) -> Vec Bool n) just like it would be at the "value level". This is simply uncurrying from the perspective of type theory.
I do realize your question was about polymorphism, so I'll stop going on about dependent types, but they are relevant: forall in Haskell is roughly equivalent to a ∏ with a domain over the type (kind) of types, *. Indeed, the function-like behavior of polymorphism can be observed directly in GHC core, which types them as capital lambdas (Λ). As such, a polymorphic type like forall a. a -> a is actually just ∏ * (Λ a -> (a -> a)) (using the Λ notation now that we distinguish between types and values), which can be expanded out to the infinite product (Bool -> Bool, Int -> Int, () -> (), (Int -> Bool) -> (Int -> Bool), ...) for every possible type. Instantiation of the type variable is simply projecting out the suitable element from the *-ary product (or applying the type function).
Now, for the big piece I missed in my original version of this answer: parametricity. Parametricity can be described in several different ways, but none of the ones I know of (viewing types as relations, or (di)naturality in category theory) really has a very algebraic interpretation. For our purposes, though, it boils down to something fairly simple: you can't pattern-match on *. I know that GHC lets you do that at the type level with type families, but you can only cover a finite chunk of * when doing that, so there are necessarily always points at which your type family is undefined.
What this means, from the point of view of polymorphism, is that any type function F we write in ∏ * F must either be constant (i.e., completely ignore the type it was polymorphic over) or pass the type through unchanged. Thus, ∏ * (Λ _ -> B) is valid because it ignores its argument, and corresponds to forall a. B. The other case is something like ∏ * (Λ x -> Maybe x), which corresponds to forall a. Maybe a, which doesn't ignore the type argument, but only "passes it through". As such, a ∏ A that has an irrelevant domain A (such as when A = *) can be seen as more of an A-ary indexed intersection (picking the common elements across all instantiations of the index), rather than a product.
Crucially, at the value level, the rules of parametricity prevent any funny behavior that might suggest the types are larger than they really are. Because we don't have typecase, we can't construct a value of type forall a. B that does something different based on what a was instantiated to. Thus, although the type is technically a function * -> B, it is always a constant function, and is thus equivalent to a single value of B. Using the ∏ interpretation, it is indeed equivalent to an infinite *-ary product of Bs, but those B values must always be identical, so the infinite product is effectively as big as a single B.
Similarly, although ∏ * (Λ x -> (x -> x)) (a.k.a., forall a. a -> a) is technically equivalent to an infinite product of functions, none of those functions can inspect the type, so all are constrained to only return their input value and not do any funny business like (+1) : Int -> Int when instantiated to Int. Because there is only one (assuming a total language) function that can't inspect the type of its argument but must return a value of that same type, the infinite product is thus just as large as a single value.
Now, about your direct question on (forall r . (a -> r) -> r) ~ a. First, let's express your ~ operator more formally. It's really isomorphism, so we need two functions going back and forth, and an argument that they're inverses.
data Iso a b = Iso
{ to :: a -> b
, from :: b -> a
-- proof1 :: forall x. to (from x) == x
-- proof2 :: forall x. from (to x) == x
}
and now we express your original question in more formal terms. Your question amounts to constructing a term of the following (impredicative, so GHC has trouble with it, but we'll survive) type:
forall a. Iso (forall r. (a -> r) -> r) a
Which, using my earlier terminology, amounts to ∏ * (Λ a -> Iso (∏ * (Λ r -> ((a -> r) -> r))) a). Once again we have an infinite product that can't inspect its type argument. By handwaving, we can argue that the only possible values considering the parametricity rules (the other two proofs are respected automatically) for to and from are ($ id) and flip id.
If this feels unsatisfying, it's probably because the algebraic interpretation of forall didn't really add anything to the proof. It's really just plain old type theory, but I hope I was able to provide something that feels a little less categorical than the Yoneda form of it. It's worth noting that we don't actually need to use parametricity to write proof1 and proof2 above, though. Parametricity only enters the picture when we want to state that ($ id) and flip id are our only options for to and from (which we can't prove in Agda or Coq, for that reason).
To (attempt to) answer the actual question (which is less interesting than the answers to the broader issues raised), the question is ill formed because of a "type error"
Either ~ (+)
(,) ~ (*)
(->) b ~ flip (^)
() ~ 1
Void ~ 0
These all map types to integers, and type constructors to functions on naturals. In a sense, you have a functor from the category of types to the category of naturals. In the other direction, you "forget" stuff, since the types preserve algebraic structure while the naturals throw it away. I.e. given Either () () you can get a unique natural, but given that natural, you can get many types.
But this is different:
(forall r . (a -> r) -> r) ~ a
It maps a type to another type! It is not part of the above functor. It's just an isomorphism within the category of types. So let's give that a different symbol, <=>
Now we have
(forall r . (a -> r) -> r) <=> a
Now you note that we can not only send types to nats and arrows to arrows, but also some isomorphisms to other isomorphisms:
(a, (b, c)) <=> ((a, b), c) ~ a * (b * c) = (a * b) * c
But something subtle is going on here. In a sense, the latter isomorphism on pairs is true because the algebraic identity is true. This is to say that the "isomorphism" in the latter simply means that the two types are equivalent under the image of our functor to the nats.
The former isomorphism we need to prove directly, which is where we start to get to the underlying question -- is given our functor to the nats, what does forall r. map to? But the answer is that forall r. is neither a type, nor a meaningful arrow between types.
By introducing forall, we have moved away from first order types. There's no reason to expect that forall should fit in our above Functor, and indeed, it doesn't.
So we can explore, as others have above, why the isomorphism holds (which is itself very interesting) -- but in doing so we've abandoned the algebraic core of the question. A question which can be answered, I think, is, given the category of higher-order types and constructors as arrows between them, what is there meaningful Functor to?
Edit:
So now I have another approach which shows why adding polymorphism makes things go nuts. We start by asking a simpler question -- does a given polymorphic type have zero or more than zero inhabitants? This is the type inhabitation problem, and winds up being, via Curry-Howard, a problem in modified realizability, since it's the same thing as asking if a formula in some logic is realizable in an appropriate computational model. Now as that page explains, this is decidable in the simply typed lambda calculus but is PSPACE-complete. But once we move to anything more complicated, by adding polymorphism for example and going to System F, then it goes to undecidable!
So, if we can't decide if an arbitrary type is inhabited at all, then we clearly can't decide how many inhabitants it has!
It's an interesting question. I don't have a full answer, but this was too long for a comment.
The type signature (forall r. (a -> r) -> r) can be expressed as me saying
For any type r that you care to name, if you give me a function that takes a and produces an r, then I will give you back an r.
Now, this has to work for any type r, but it can be a specific type a. So the way for me to pull of this neat trick is to have an a sitting around somewhere, that I feed to the function (which produces an r for me) and then I hand that r back to you.
But if I have an a sitting around, I could give it to you:
If you give me a 1, I'll give you an a.
which corresponds to the type signature 1 -> a or simply a. By this informal argument we have
(forall r. (a -> r) -> r) ~ a
The next step would be to generate the corresponding algebraic expression, but I'm not clear on how the algebraic quantities interact with the universal quantification. We may need to wait for an expert!
A few links to the nLab:
Universal quantifier, corresponds to dependent product.
Existential quantifier, corresponds to dependent sum (dependent coproduct).
Thus, in settings of category theory:
Type | Modeled¹ as | In category
-------------------+---------------------------+-------------
Unit | Terminal object | CCC
Bottom | Initial object |
Record | Product |
Union | Sum (coproduct) |
Function | Exponential |
-------------------+---------------------------+-------------
Dependent product² | Right adjoint to pullback | LCCC
Dependent sum | Left adjoint to pullback |
¹) in appropriate category ─ CCC for total and non-polymorphic subset of Haskell (link), CPO for non-total traits of Haskell (link), LCCC for dependently typed languages.
²) forall quantification is a special case of dependent product:
∀(x :: *). y[x] ~ ∏(x : Set)y[x]
where Set is the universe of all small types.