Related
Consider the following wrapper:
newtype F a = Wrap { unwrap :: Int }
I want to disprove (as an exercise to wrap my head around this interesting post) that there’s a legitimate Functor F instance which allows us to apply functions of Int -> Int type to the actual contents and to ~ignore~ all other functions (i. e. fmap nonIntInt = id).
I believe this should be done with a free theorem for fmap (which I read here):
for given f, g, h and k, such that g . f = k . h: $map g . fmap f = fmap k . $map h, where $map is the natural map for the given constructor.
What defines a natural map? Am I right to assume that it is a simple flip const for F?
As far as I get it: $map f is what we denote as Ff in category theory. Thus, in a categorical sense, we simply want something among the lines of the following diagram to commute:
Yet, I do not know what to put instead of ???s (that is, what functor do we apply to get such a diagram and how do we denote this almost-fmap?).
So, what is a natural map in general, and for F? What is the proper diagram for fmap's free theorem?
Where am I going with this?
Consider:
f = const 42
g = id
h = const ()
k () = 42
It is easy to see that f . g is h . k. And yet, the non-existant fmap will execute only f, not k, giving different results. If my intuition about the naturality is correct, such a proof would work. That's what I am trying to figure out.
#leftaroundabout proposed a simpler piece of proof: fmap show . fmap (+1) alters the contents, unlike fmap $ show . (+1). It is a nice piece of proof, and yet I would still like to work with free theorems as an exercise.
So we are entertaining a function m :: forall a b . (a->b) -> F a -> F b such that (among other things)
m (1 +) (Wrap x) = (Wrap (1+x))
m (show) (Wrap x) = (Wrap x)
There are two somewhat related questions here.
Can a well-behaved fmap do this?
Can a parametric function do this?
The answer to both questions is "no".
A well-behaved fmap can't do this because fmap has to obey the axioms of Functor. Whether our environment is parametric or not is irrelevant. The axiom of Functor says that for all functions a and b, fmap (a . b) = fmap a . fmap b must hold, and this fails for a = show and b = (1 +). So m cannot be a well-behaved fmap.
A parametric function can't do this because that is what the parametricity theorem says. When viewing types as relations between terms, related functions take related arguments to related results. It is easy to see that m fails parametricity, but it is slightly easier to look at m': forall a b. (a -> b) -> (Int -> Int) (the two can be trivially converted to each other). (1 +) is related to show because m' is polymorphic in its argument, so different values of the argument can be related by any relation. Functions are relations, and there exists a function that sends (1 +) to show. However, the result type of m' has no type variables, so it corresponds to the constant relation (its values are only related to themselves). Since every value including m' is related to itself, it follows that all parametric functions m :: forall a b. (a -> b) -> (Int -> Int) must obey m f = m g, i.e. they must ignore their first argument. Which is intuitively obvious since there is nothing to apply it to.
One can in fact deduce the first statement from the second by observing that a well-behaved fmap must be parametric. So even if the language allows non-parametricity, fmap cannot make any non-trivial use of it.
I have code (in C# actually, but this question has nothing to do with C# specifically, so I will speak of all my types in Haskell-speak) where I am working inside of an Either a b. I then bind a function with a signature that in Haskell-speak is b -> (c, d), after which I want to pull c to the outside and default it in the left case, i.e. I want (c, Either a d). Now this pattern occurred many times one particular service I was writing so I pulled out a method to do it. However it bothers me whenever I just "make up" a method like this without understanding the correct theoretical underpinnings. In other words, what abstraction are we dealing with here?
I had a similar situation in some F# code where my pair and my either were reversed: (a, b) -> (b -> Either c d) -> Either c (a, d). I asked a friend what this was and he turned me on to traverse which made me very happy even though I have to make horrifically monomorphic implementations in F# due to the lack of typeclasses. (I wish I could remap my F1 in Visual Studio to Hackage; it is one of my primary resources for writing .NET code). The problem though is that traverse is:
class (Functor t, Foldable t) => Traversable t where
traverse :: Applicative f => (a -> f b) -> t a -> f (t b)
Which means it works great when you start with a pair and want to "bind" an either to it, but does not work when you start with an either and want to end up with a pair, because pair is not an Applicative.
However I thought about my first case more, the one that is not traverse, and realize that "defaulting c in the left case" can just be done with mapping over the left case, which changes the problem to having this shape: Either (c, a) (c, d) -> (c, Either a d) which I recognize as the pattern that we see in arithmetic with multiplication and addition: a(b + c) = ab + ac. I also remembered that the same pattern exists in Boolean algebra and in set theory (if memory serves, A intersect (B union C) = (A intersect B) union (A intersect C)). Clearly there is some abstract algebraic structure here. However, memory does not serve, and I could not remember what it was called. A little poking around on Wikipedia quickly solved this: these are the distributive laws. And joy, oh joy, Kmett has given us distribute:
class Functor g => Distributive g where
distribute :: Functor f => f (g a) -> g (f a)
It even has a cotraverse because it is dual to Travsersable! Lovely!! However, I noticed that there is no (,) instance. Uh oh. Because, yeah, where does the "default c value" come into all this? Then I realized, uh oh, I perhaps I need something like a bidistributive based on a bifunctor? perhaps dual to bitraversable? Conceptually:
class Bifunctor g => Bidistributive g where
bidistribute :: Bifunctor f => f (g a b) (g a c) -> g a (f b c)
This seems to be the structure of the distributive law I am talking about. I can't find such a thing in Haskell which doesn't matter to me in and of itself since I am actually writing C#. However, the thing that is important to me is to not be coming up with bogus abstractions, and yet to recognize as many lawful abstractions in my code as possible, whether they are expressed as such or not, for my own understanding.
I currently have a .InsideOut(<default>) function (extension method) in my C# code (what a hack, right!). Would I be totally off-base to create a (yes, sadly monomorphic) .Bidistribute(...) function (extension method) to replace it and map the "default" for the left case into the left case before invoking it (or just recognize the "bidistributive" character of "inside out")?
bidistribute can't be implemented as such. Consider the trivial example
data Biconst c a b = Biconst c
instance Bifunctor (Biconst c) where
bimap _ _ (Biconst c) = Biconst c
Then we'd have the specialisation
bidistribute :: Biconst () (Void, ()) (Void, ()) -> (Void, Biconst () () ())
bidistribute (Biconst ()) = ( ????, Biconst () )
There's clearly no way to fill in the gap, which would need to have type Void.
Actually, I think you really need Either there (or something isomorphic to it) rather than an arbitrary bifunctor. Then your function is just
uncozipL :: Functor f => Either (f a) (f b) -> f (Either a b)
uncozipL (Left l) = Left <$> l
uncozipL (Right r) = Right <$> l
It's defined in adjunctions (found using Hoogle).
Based on #leftaroundabout's tip-off to look at adjunctions, in addition to uncozipL that he mentions in his answer, if we defer the "default the first value of the pair in the left case of either", we can also solve this with unzipR:
unzipR :: Functor u => u (a, b) -> (u a, u b)
Then it would still be necessary to map over the first element in the pair and pull out the value with something like either (const "default") id. The interesting thing about this is that it if you use uncozipL, you need to know that one of the things is a pair. If you use unzipR, you need to know that one is an either. In neither case do you use an abstract bifunctor.
Further, it seems that the pattern or abstraction that I'm looking for is a distributive lattice. Wikipedia says:
A lattice (L,∨,∧) is distributive if the following additional identity holds for all x, y, and z in L:
x ∧ (y ∨ z) = (x ∧ y) ∨ (x ∧ z).
which is exactly the property I have observed occuring in many different places.
I am learning Haskell and in internet I've found is paper from Philip Wadler.
I read it and did not understand at all, but it somehow connects to polymorphic function.
For example:
polyfunc :: a -> a -> a
It is a polymorphic function of any type.
What is the free theorem in connection of the example polyfunc?
I feel like if I actually understood that paper then any code I wrote would be coauthored by God.
My best guess for this problem though is that all polyfunc can do is either always return the first argument or always return the second argument. So there are actually only two implementations of polyfunc,
polyfuncA a _ = a
polyfuncB _ b = b
The paper gives you a way to prove that claim.
This is a very important concept. For example, I've been involved in data quality research previously. This free theorem says that there is no function which can select the best data from two arbitrary pieces of data. We have to know something more. Its actually a no-brainer that I was surprised to find some people willing to overlook.
I've never really understood the algorithm laid out in that paper either, so I thought I would try to figure it out.
(1) Type of function in question
f :: a -> a -> a
(2) Rephrasing as a relation
f : ∀X. X -> X -> X
(3) By parametricity
(f, f) ∈ ∀X. X -> X -> X
(4) By definition of ∀ on relations
for all Q : A <=> A',
(fA, fA') ∈ Q -> Q -> Q
(5) Applying definition of -> on relations to the first -> in (4)
for all Q : A <=> A',
for all (x, x') ∈ Q,
(fA x, fA' x') ∈ Q -> Q
(6) Applying definition of -> on relations to (5)
for all Q : A <=> A',
for all (x, x') ∈ Q,
for all (y, y') ∈ Q,
(fA x y, fA' x' y') ∈ Q
At this point I was done expanding the relational definition, but wasn't sure how to get this back from relations into terms of functions and types, so I went and found a webapp that will automatically derive the free theorem for a type. I won't spoil (yet) what result it gives, but looking at it did help me figure out the next step in my proof.
The next step is to get back into function-land from relation-land, by noting that Q can be the type of any function at all and this will still hold.
(7) Specializing Q to a function g :: p -> q
for all p, q
for all g :: p -> q
where g x = x'
and g y = y'
g (f x y) = f x' y'
(8) By definitions of x' and y'
for all p, q
for all g :: p -> q
g (f x y) = f (g x) (g y)
That looks true, right? It is equivalent to use g to transform both elements and then let f choose between them, or to let f choose an element and then transform it with g. By parametricity, f can't change whether it chooses the left or right element based on anything that g does.
Of course the claim given in trevor cook's answer is true: f must either always choose its first argument, or always choose its second. I'm not sure whether the free theorem I derived is equivalent to that, or is a weaker version of it.
And incidentally, this is a special case of something that is already covered explicitly in the paper. It gives the theorem for
k :: x -> y -> x
which of course is the same as your function f, where x ~ a and y ~ a. The result that it gives is the same as the one I described:
for all a, b, x, y
a (k x y) = k (a x) (b y)
if we choose b=a to make the two results equivalent.
Wadler's "Theorems for free" (TFF) paper is not a good reference for learning about relational parametricity. The TFF paper is focused on abstract theory but, if you need a practical algorithm, the paper does not actually give it to you. The explanations in TFF have a number of important omissions that will confuse anyone who does not already know a great deal about relational parametricity.
The first thing TFF does not explain is that a function will satisfy a "free theorem" only if the code of the function is fully parametric (restricted in a number of ways).
"Fully parametric" code is a function with type parameters, whose arguments are typed using only those type parameters, and whose code is purely functional and does not try to examine at run time what types are assigned to the type parameters. The code must treat all types as totally unknown, arbitrary type parameters. The code must work with all types in the same way.
With these restrictions, the code will satisfy a certain law, in most cases this will be the "naturality" law, but in some cases the law will be more complicated. The paper "Theorems for free" shows many examples of such laws but does not explain a general algorithm for deriving those theorems.
To prove that those laws always hold, one uses the technique of relational parametricity. This is a complicated and powerful technique where one replaces functions (viewed as many-to-one binary relations) by arbitrary (many-to-many) binary relations and then reformulates the naturality law in terms of relations. The result is a "relational naturality law". At the end, one replaces relations again by functions and tries to derive an equation.
I recently recorded a tutorial about relational parametricity, with code examples in Scala. https://www.youtube.com/watch?v=Jf2VFB90Q0s&list=PLcoadSpY7rHUO0I1zdcbu9EeByYbwPSQ6
My tutorial does not follow Wadler's TFF paper but instead explains a simple and straightforward approach focused on practical results: how to derive the free theorem and reason about relations effectively. In this approach, it becomes easier to derive the "free theorem" for a given type and also to prove the "parametricity theorem": fully parametric functions will always satisfy one "free theorem" per type parameter.
Of course, for practical usage you don't necessarily need to go through the proof of the parametricity theorem, but you do need to be able to write the free theorem itself.
The key element of my tutorial is the idea of "lifting a relation to a type constructor". If you have a relation r: A <=> B and a type constructor F A then you can lift r to a relation of type F A <=> F B.
This lifting operation is denoted by rmap_F. The relation rmap_F r has type F A <=> F B.
The lifting operation rmap_F is defined by induction on the type structure of F. The details of that definition are somewhat technical (and are not adequately explained in the TFF paper!). The most important step in learning about relational parametricity is to understand the practical technique for lifting relations to type constructors. This is explained in my tutorial, and it's too long to write it here. The definition explains how to to lift r to a trivial type constructor F A = C where C is a fixed type, to F A = A, to F A = Either (G A) (H A), to F A = (G A, H A), to F A = G A -> H A, etc.
The definition of rmap is analogous to the functor lifting fmap that lifts a function of type A -> B to a function of type F A -> F B. However, the functor lifting works only for covariant F, while the relational lifting works for any F, even if it is neither covariant nor contravariant, such as F A = A -> A -> A. This is the crucial feature that shows why relational technique is useful at all.
Let us apply the relational technique to the type forall A. A -> A -> A.
We define F A = A -> A -> A. We take an arbitrary fully parametric function t of type forall A. F A. The relational naturality law says: for any relation r: A <=> B between any types A and B the function t must be in the relation (t, t) ∈ rmap_F r.
Now we need to do two things: 1) select r to be the graph relation of some function f: A -> B, denoted r = graph f, and 2) use the definition of rmap_F to compute rmap_F (graph f) explicitly.
The definition of rmap_F gives:
(t1, t2) ∈ rmap_F r ===
for all a1: A, a2: A, b1: B, b2: B,
if (a1, b1) ∈ r and (a2, b2) ∈ r
then (t a1 a2, t b1 b2) ∈ r
Translating this with r = graph f, we get:
(a1, b1) ∈ r === b1 = f a1
(a2, b2) ∈ r === b2 = f a2
(t a1 a2, t b1 b2) ∈ r === t b1 b2 = f (t a1 a2)
So, we obtain the following law:
for all a1: A, a2: A,
t (f a1) (f a2) = f (t a1 a2)
This is actually a naturality law. This is the "free theorem" satisfied by t.
To make that clear, I'm not talking about how the free monad looks a lot like a fixpoint combinator applied to a functor, i.e. how Free f is basically a fixed point of f. (Not that this isn't interesting!)
What I'm talking about are fixpoints of Free, Cofree :: (*->*) -> (*->*), i.e. functors f such that Free f is isomorphic to f itself.
Background: today, to firm up my rather lacking grasp on free monads, I decided to just write a few of them out for different simple functors, both for Free and for Cofree and see what better-known [co]monads they'd be isomorphic to. What intrigued me particularly was the discovery that Cofree Empty is isomorphic to Empty (meaning, Const Void, the functor that maps any type to the uninhabited). Ok, perhaps this is just stupid – I've discovered that if you put empty garbage in you get empty garbage out, yeah! – but hey, this is category theory, where whole universes rise up from seeming trivialities... right?
The immediate question is, if Cofree has such a fixed point, what about Free? Well, it certainly can't be Empty as that's not a monad. The quick suspect would be something nearby like Const () or Identity, but no:
Free (Const ()) ~~ Either () ~~ Maybe
Free Identity ~~ (Nat,) ~~ Writer Nat
Indeed, the fact that Free always adds an extra constructor suggests that the structure of any functor that's a fixed point would have to be already infinite. But it seems odd that, if Cofree has such a simple fixed point, Free should only have a much more complex one (like the fix-by-construction FixFree a = C (Free FixFree a) that Reid Barton brings up in the comments).
Is the boring truth just that Free has no “accidental fixed point” and it's a mere coincidence that Cofree has one, or am I missing something?
Your observation that Empty is a fixed point of Cofree (which is not really true in Haskell, but I guess you want to work in some model that ignores ⊥, like Set) boils down to the fact that
there is a set E (the empty set) such that for every set X, the projection p₂ : X × E -> E is an isomorphism.
We could say in this situation that E is an absorbing object for the product. We can replace the word “set” by “object of C” for any category C with products, and we get a statement about C that may or may not be true. For Set, it happens to be true.
If we pick C = Setop, which also has products (because Set has coproducts), and then dualize the language to talk about sets again, we get the statement
there is a set F such that for every set Y, the inclusion i₂ : F -> Y + F is an isomorphism.
Obviously, this statement is not true for any set F (we can pick any non-empty set Y as a counterexample for any F). No surprise there, after all Setop is a different category from Set.
So, we won't get a “trivial fixed point” of Free in the same way we got one for Cofree, because Setop is qualitatively different from Set. The initial object of Set is an absorbing element for the product, but the terminal object of Set is not an absorbing object for the coproduct.
If I may get on my soapbox for a moment:
There is much discussion among Haskell programmers about which constructions are the “duals” of which other constructions. Most of this is in a formal sense meaningless, because in category theory dualizing a construction works like this:
Suppose I have a construction which I can perform on any category C (or any category with certain extra structure and/or properties). Then the dual construction on a category C is the original construction on the opposite category Cop (which had better have the extra structure and properties we needed, if any).
For example: The notion of products makes sense in any category C (though products might not always exist), via the universal property defining products. To get a dual notion of coproducts in C we should ask what are the products in Cop, and we have just defined what products are in any category, so this notion makes sense.
The trouble with applying duality to the setting of Haskell is that the Haskell language prefers overwhelmingly to talk about just one category, Hask, in which we do our constructions. This causes two problems for talking about duality:
To obtain the dual of a construction as described above, I am supposed to be able to be able to do the construction in any category, or at least any category of a particular form. So we must first generalize the construction that, typically, we have only done in the category Hask to a larger class of categories. (And having done so, there are plenty of other interesting categories we could potentially interpret the resulting notion in besides Haskop, such as Kleisli categories of monads.)
The category Hask enjoys many special properties which can be summarized by saying that (ignoring ⊥) Hask is a cartesian closed category. For example, this implies that the initial object is an absorbing object for the product. Haskop does not have these properties, which means that the generalized notion may not make sense in Haskop; and it can also mean that two notions which happened to be equivalent in Hask are distinct in general, and have different duals.
For an example of the latter, take lenses. In Hask they can be constructed in a number of ways; two ways are in terms of getter/setter pairs and as coalgebras for the costate comonad. The former generalizes to categories with products and the second to categories enriched in a particular way over Hask. If we apply the former construction to Haskop then we get out prisms, but if we apply the latter construction to Haskop then we get algebras for the state monad and these are not the same thing.
A more familiar example might be comonads: starting from the Haskell-centric presentation
return :: a -> m a
(>>=) :: m a -> (a -> m b) -> m b
some insight seems to be needed to determine which arrows to reverse to obtain
extract :: w a -> a
extend :: w a -> (w b -> a) -> w b
The point is that it would have been much easier to start from join :: m (m a) -> m a instead of (>>=); but finding this alternative presentation (equivalent due to special features of Hask) is a creative process, not a mechanical one.
In a question like yours, and many others like it, where it is pretty clear what sense of dual is intended, there's still absolutely no reason to expect a priori that the dual construction will actually exist or have the same properties as the original, because Haskop qualitatively behaves quite differently from Hask. A slogan might be
the theory of categories is self-dual, but the theory of any particular category is not!
Since you asked about the structure of the fixed points of Free, I'm going to sketch an informal argument that Free only has one fixed point which is a Functor, namely the type
newtype FixFree a = C (Free FixFree a)
that Reid Barton described. Indeed, I make a somewhat stronger claim. Let's start with a few pieces:
newtype Fix f a = Fix (f (Fix f) a)
instance Functor (f (Fix f)) => Functor (Fix f) where
fmap f (Fix x) = Fix (fmap f x)
-- This is basically `MFunctor` from `Control.Monad.Morph`
class FFunctor (g :: (* -> *) -> * -> *) where
hoistF :: Functor f => (forall a . f a -> f' a) -> g f b -> g f' b
Notably,
instance FFunctor Free where
hoistF _f (Pure a) = Pure a
hoistF f (Free fffa) = Free . f . fmap (hoistF f) $ fffa
Then
fToFixG :: (Functor f, FFunctor g) => (forall a . f a -> g f a) -> f a -> Fix g a
fToFixG fToG fa = Fix $ hoistF (fToFixG fToG) $ fToG fa
fixGToF :: forall f b (g :: (* -> *) -> * -> *) .
(FFunctor g, Functor (g (Fix g)))
=> (forall a . g f a -> f a) -> Fix g b -> f b
fixGToF gToF (Fix ga) = gToF $ hoistF (fixGToF gToF) ga
If I'm not mistaken (which I could be), passing each side of an isomorphism between f and g f to each of these functions will yield each side of an isomorphism between f and Fix g. Substituting Free for g will demonstrate the claim. This argument is very hand-wavey, of course, because Haskell is inconsistent.
So I understand the basic algebraic interpretation of types:
Either a b ~ a + b
(a, b) ~ a * b
a -> b ~ b^a
() ~ 1
Void ~ 0 -- from Data.Void
... and that these relations are true for concrete types, like Bool, as opposed to polymorphic types like a. I also know how to translate type signatures with polymorphic types into their concrete type representations by just translating the Church encoding according to the following isomorphism:
(forall r . (a -> r) -> r) ~ a
So if I have:
id :: forall a . a -> a
I know that it does not mean id ~ a^a, but it actually means:
id :: forall a . (() -> a) -> a
id ~ ()
~ 1
Similarly:
pair :: forall r . (a -> b -> r) -> r
pair ~ ((a, b) -> r) - > r
~ (a, b)
~ a * b
Which brings me to my question. What is the "algebraic" interpretation of this rule:
(forall r . (a -> r) -> r) ~ a
For every concrete type isomorphism I can point to an equivalent algebraic rule, such as:
(a, (b, c)) ~ ((a, b), c)
a * (b * c) = (a * b) * c
a -> (b -> c) ~ (a, b) -> c
(c^b)^a = c^(b * a)
But I don't understand the algebraic equality that is analogous to:
(forall r . (a -> r) -> r) ~ a
This is the famous Yoneda lemma for the identity functor.
Check this post for a readable introduction, and any category theory textbook for more.
Briefly, given f :: forall r. (a -> r) -> r you can apply f id to get an a, and conversely, given x :: a you can take ($x) to get forall r. (a -> r) -> r.
These operations are mutually inverse. Proof:
Obviously ($x) id == x. I will show that
($(f id)) == f,
since functions are equal when they are equal on all arguments, let's take x :: a -> r and show that
($(f id)) x == f x i.e.
x (f id) == f x.
Since f is polymorphic, it works as a natural transformation; this is the naturality diagram for f:
f_A
Hom(A, A) → A
(x.) ↓ ↓ x
Hom(A, R) → R
f_R
So x . f == f . (x.).
Plugging identity, (x . f) id == f x. QED
(Rewritten for clarity)
There seem to be two parts to your question. One is implied and is asking what the algebraic interpretation of forall is, and the other is asking about the cont/Yoneda transformation, which sdcvvc's answer already covered pretty well.
I'll try to address the algebraic interpretation of forall for you. You mention that A -> B is B^A but I'd like to take that a step further and expand it out to B * B * B * ... * B (|A| times). Although we do have exponentiation as a notation for repeated multiplication like that, there's a more flexible notation, ∏ (uppercase Pi) representing arbitrary indexed products. There are two components to a Pi: the range of values we want to multiply over, and the expression that we're multiplying out. For example, at the value level, you might express the factorial function as fact i = ∏ [1..i] (λx -> x).
Going back to the world of types, we can view the exponentiation operator in the A -> B ~ B^A correspondence as a Pi: B^A ~ ∏ A (λ_ -> B). This says that we're defining an A-ary product of Bs, such that the Bs cannot depend on the particular A we've chosen. Sure, it's equivalent to plain exponentiation, but it lets us move up to cases in which there is a dependence.
In the most general case, we get dependent types, like what you see in Agda or Coq: in Agda syntax, replicate : Bool -> ((n : Nat) -> Vec Bool n) is one possible application of a Pi type, which could be expressed more explicitly as replicate : Bool -> ∏ Nat (Vec Bool), or further as replicate : ∏ Bool (λ_ -> ∏ Nat (Vec Bool)).
Note that as you might expect from the underlying algebra, you can fuse both of the ∏s in the definition of replicate above into a single ∏ ranging over the cartesian product of the domains: ∏ Bool (\_ -> ∏ Nat (Vec Bool)) is equivalent to ∏ (Bool, Nat) (λ(_, n) -> Vec Bool n) just like it would be at the "value level". This is simply uncurrying from the perspective of type theory.
I do realize your question was about polymorphism, so I'll stop going on about dependent types, but they are relevant: forall in Haskell is roughly equivalent to a ∏ with a domain over the type (kind) of types, *. Indeed, the function-like behavior of polymorphism can be observed directly in GHC core, which types them as capital lambdas (Λ). As such, a polymorphic type like forall a. a -> a is actually just ∏ * (Λ a -> (a -> a)) (using the Λ notation now that we distinguish between types and values), which can be expanded out to the infinite product (Bool -> Bool, Int -> Int, () -> (), (Int -> Bool) -> (Int -> Bool), ...) for every possible type. Instantiation of the type variable is simply projecting out the suitable element from the *-ary product (or applying the type function).
Now, for the big piece I missed in my original version of this answer: parametricity. Parametricity can be described in several different ways, but none of the ones I know of (viewing types as relations, or (di)naturality in category theory) really has a very algebraic interpretation. For our purposes, though, it boils down to something fairly simple: you can't pattern-match on *. I know that GHC lets you do that at the type level with type families, but you can only cover a finite chunk of * when doing that, so there are necessarily always points at which your type family is undefined.
What this means, from the point of view of polymorphism, is that any type function F we write in ∏ * F must either be constant (i.e., completely ignore the type it was polymorphic over) or pass the type through unchanged. Thus, ∏ * (Λ _ -> B) is valid because it ignores its argument, and corresponds to forall a. B. The other case is something like ∏ * (Λ x -> Maybe x), which corresponds to forall a. Maybe a, which doesn't ignore the type argument, but only "passes it through". As such, a ∏ A that has an irrelevant domain A (such as when A = *) can be seen as more of an A-ary indexed intersection (picking the common elements across all instantiations of the index), rather than a product.
Crucially, at the value level, the rules of parametricity prevent any funny behavior that might suggest the types are larger than they really are. Because we don't have typecase, we can't construct a value of type forall a. B that does something different based on what a was instantiated to. Thus, although the type is technically a function * -> B, it is always a constant function, and is thus equivalent to a single value of B. Using the ∏ interpretation, it is indeed equivalent to an infinite *-ary product of Bs, but those B values must always be identical, so the infinite product is effectively as big as a single B.
Similarly, although ∏ * (Λ x -> (x -> x)) (a.k.a., forall a. a -> a) is technically equivalent to an infinite product of functions, none of those functions can inspect the type, so all are constrained to only return their input value and not do any funny business like (+1) : Int -> Int when instantiated to Int. Because there is only one (assuming a total language) function that can't inspect the type of its argument but must return a value of that same type, the infinite product is thus just as large as a single value.
Now, about your direct question on (forall r . (a -> r) -> r) ~ a. First, let's express your ~ operator more formally. It's really isomorphism, so we need two functions going back and forth, and an argument that they're inverses.
data Iso a b = Iso
{ to :: a -> b
, from :: b -> a
-- proof1 :: forall x. to (from x) == x
-- proof2 :: forall x. from (to x) == x
}
and now we express your original question in more formal terms. Your question amounts to constructing a term of the following (impredicative, so GHC has trouble with it, but we'll survive) type:
forall a. Iso (forall r. (a -> r) -> r) a
Which, using my earlier terminology, amounts to ∏ * (Λ a -> Iso (∏ * (Λ r -> ((a -> r) -> r))) a). Once again we have an infinite product that can't inspect its type argument. By handwaving, we can argue that the only possible values considering the parametricity rules (the other two proofs are respected automatically) for to and from are ($ id) and flip id.
If this feels unsatisfying, it's probably because the algebraic interpretation of forall didn't really add anything to the proof. It's really just plain old type theory, but I hope I was able to provide something that feels a little less categorical than the Yoneda form of it. It's worth noting that we don't actually need to use parametricity to write proof1 and proof2 above, though. Parametricity only enters the picture when we want to state that ($ id) and flip id are our only options for to and from (which we can't prove in Agda or Coq, for that reason).
To (attempt to) answer the actual question (which is less interesting than the answers to the broader issues raised), the question is ill formed because of a "type error"
Either ~ (+)
(,) ~ (*)
(->) b ~ flip (^)
() ~ 1
Void ~ 0
These all map types to integers, and type constructors to functions on naturals. In a sense, you have a functor from the category of types to the category of naturals. In the other direction, you "forget" stuff, since the types preserve algebraic structure while the naturals throw it away. I.e. given Either () () you can get a unique natural, but given that natural, you can get many types.
But this is different:
(forall r . (a -> r) -> r) ~ a
It maps a type to another type! It is not part of the above functor. It's just an isomorphism within the category of types. So let's give that a different symbol, <=>
Now we have
(forall r . (a -> r) -> r) <=> a
Now you note that we can not only send types to nats and arrows to arrows, but also some isomorphisms to other isomorphisms:
(a, (b, c)) <=> ((a, b), c) ~ a * (b * c) = (a * b) * c
But something subtle is going on here. In a sense, the latter isomorphism on pairs is true because the algebraic identity is true. This is to say that the "isomorphism" in the latter simply means that the two types are equivalent under the image of our functor to the nats.
The former isomorphism we need to prove directly, which is where we start to get to the underlying question -- is given our functor to the nats, what does forall r. map to? But the answer is that forall r. is neither a type, nor a meaningful arrow between types.
By introducing forall, we have moved away from first order types. There's no reason to expect that forall should fit in our above Functor, and indeed, it doesn't.
So we can explore, as others have above, why the isomorphism holds (which is itself very interesting) -- but in doing so we've abandoned the algebraic core of the question. A question which can be answered, I think, is, given the category of higher-order types and constructors as arrows between them, what is there meaningful Functor to?
Edit:
So now I have another approach which shows why adding polymorphism makes things go nuts. We start by asking a simpler question -- does a given polymorphic type have zero or more than zero inhabitants? This is the type inhabitation problem, and winds up being, via Curry-Howard, a problem in modified realizability, since it's the same thing as asking if a formula in some logic is realizable in an appropriate computational model. Now as that page explains, this is decidable in the simply typed lambda calculus but is PSPACE-complete. But once we move to anything more complicated, by adding polymorphism for example and going to System F, then it goes to undecidable!
So, if we can't decide if an arbitrary type is inhabited at all, then we clearly can't decide how many inhabitants it has!
It's an interesting question. I don't have a full answer, but this was too long for a comment.
The type signature (forall r. (a -> r) -> r) can be expressed as me saying
For any type r that you care to name, if you give me a function that takes a and produces an r, then I will give you back an r.
Now, this has to work for any type r, but it can be a specific type a. So the way for me to pull of this neat trick is to have an a sitting around somewhere, that I feed to the function (which produces an r for me) and then I hand that r back to you.
But if I have an a sitting around, I could give it to you:
If you give me a 1, I'll give you an a.
which corresponds to the type signature 1 -> a or simply a. By this informal argument we have
(forall r. (a -> r) -> r) ~ a
The next step would be to generate the corresponding algebraic expression, but I'm not clear on how the algebraic quantities interact with the universal quantification. We may need to wait for an expert!
A few links to the nLab:
Universal quantifier, corresponds to dependent product.
Existential quantifier, corresponds to dependent sum (dependent coproduct).
Thus, in settings of category theory:
Type | Modeled¹ as | In category
-------------------+---------------------------+-------------
Unit | Terminal object | CCC
Bottom | Initial object |
Record | Product |
Union | Sum (coproduct) |
Function | Exponential |
-------------------+---------------------------+-------------
Dependent product² | Right adjoint to pullback | LCCC
Dependent sum | Left adjoint to pullback |
¹) in appropriate category ─ CCC for total and non-polymorphic subset of Haskell (link), CPO for non-total traits of Haskell (link), LCCC for dependently typed languages.
²) forall quantification is a special case of dependent product:
∀(x :: *). y[x] ~ ∏(x : Set)y[x]
where Set is the universe of all small types.