Relationship between Haskell's 'forall' and '=>'

Relationship between Haskell's 'forall' and '=>' - haskell

I'm having trouble wrapping my mind around the relationship (and interactions) between Haskell's forall and => (and for that matter the . that often connects them).
For example
λ> :t (+)
λ> :t id
give
(+) :: forall a. Num a => a -> a -> a
id :: forall a. a -> a
and while I understand how these work in these specific cases, I'm not comfortable parsing the expressions (signatures?) forall a. Num a => or forall a. themselves into something meaningful, or that I can generally understand in more complex contexts.
What do forall a. Num a => and forall a. mean? Specifically, what is the roles played in each by forall, => and a?

(As another perspective, without invoking the "implicit dictionary passing" implementation of type classes):
forall a. in Haskell means "for every type a".1 It's introducing a type variable, and declaring that the rest of the type expression has to be valid whatever choice is made for a.
You usually don't see it in basic Haskell (without turning on any extensions in GHC), because it's not necessary; you just use type variables in your type signature, and GHC automatically assumes there are foralls introducing those variables at the start of the expression.
For example:
zip :: forall a. ( forall b. ( [a] -> [b] -> [(a, b)] ))
zip :: forall a. forall b. [a] -> [b] -> [(a, b)]
zip :: forall a b. [a] -> [b] -> [(a, b)]
zip :: [a] -> [b] -> [(a, b)]
The above are all the same; they just tell us that zip can be a way of zipping a list of a together with a list of b to make a list of (a, b) pairs, whatever choice we feel like making for a and b.
forall mainly comes into play with extensions, because then you can introduce type variables with scopes other than the default ones assumed by GHC if you don't explicitly write them.
Now, the constraints => type syntax can be read roughly as "these constraints imply this type", or "provided these constraints hold, you can use this type". It's used all the time, even in vanilla Haskell with no extensions, so it's important to understand what it means and how it works and not just copy and paste and hope.
The => arrow allows us to state a set of constraints on the variables in the rest of the type expression; it lets us put limitations on what choices can be made to introduce the type variable. You should read it first by ignoring everything left of the => arrow, and reading the the right part on its own. This gives you the "shape" of the type. The stuff to the left of the => arrow tells you what kind of types you can use the rest of the type with.
An example:
(+) :: Num a => a -> a -> a
This means that (+) is exactly the same kind of thing as anything with a simpler type like a -> a -> a, except the Num a => is telling us that we're not free to choose any type a. We can only choose a type for a when we know that it is a member of the Num type class (another slightly more precise way of saying "a is a member of Num is "the constraint Num a holds").
Note that GHC is still assuming that there's an implicit forall a to introduce the type variable a here, so it really looks like:
(+) :: forall a. Num a => a -> a -> a
In which case you can read this off moderately easily as an English sentence once you know what forall a. and Num a => means: "For every type a, provided Num a holds, plus has the type a -> a -> a".
1 If you're familiar with formal logic at all, it's just an ASCII-friendly way of writing ∀a, a "universally quantified variable".

As the forall matter appears to be settled, I'll attempt to explain the => a bit. The things to the left of the => are arguments, much like ones to the left of a ->. But you don't apply these arguments manually, and they can only have specific types.
f :: Num a => a -> a
is a function that takes two arguments:
A Num a dictionary.
An a.
When you apply f, you just provide the a. GHC has to provide the Num a. If it's applied to a specific concrete type like Int, GHC knows Num Int and can supply it at the call site. Otherwise, it checks that Num a is provided by some outer context and uses that one. The great thing about Haskell's typeclass system is that it ensures that any two Num a dictionaries, however they are found, will be identical. So it doesn't matter where the dictionary comes from—it is sure to be the right one.
Further discussion
A lot of these things we're talking about aren't exactly part of Haskell so much as they're part of the way GHC interprets Haskell by translation to GHC core, AKA System FC, an extension of the very well-studied System F, AKA the Girard-Reynolds calculus. System FC is an explicitly typed polymorphic lambda calculus with algebraic datatypes, etc., but no type inference, no instance resolution, etc. After GHC checks the types in your Haskell code, it translates that code to System FC by a thoroughly mechanical process. It can do this confidently because the type checker "decorates" the code with all the information the desugarer needs to plumb all the dictionaries around. If you have a Haskell function that looks like
foo :: forall a . Num a => a -> a -> a
foo x y= x + y
then that will translate to something that looks like
foo :: forall a . Num a -> a -> a -> a
foo = /\ (a :: *) -> \ (d :: Num a) -> \ (x :: a) -> \ (y :: a) -> (+) #a d x y
The /\ is a type lambda—it's just line a normal lambda except it takes a type variable. The # represents application of a type to a function that takes one. The + is really just a record selector. It chooses the right field from the dictionary it's passed.

I suppose it helps if we add the implied parentheses:
(+) :: ∀ a . ( Num a => (a -> (a -> a)) )
id :: ∀ a . ( a -> a )
The ∀ always goes together with a .. It's basically special syntax meaning “anything between ∀ and . are type variables that I want to introduce into the following scope”†
=> denotes what Idris calls an implicit function: Num a is a dictionary for the instance Num a, and such a dictionary is implicitly needed whenever you're adding numbers. But whether a is a type variable here that was previously introduced by some ∀, or a fixed type, doesn't really matter. You could also have
(+) :: Num Int => Int -> Int -> Int
That's just superfluous, because the compiler knows that Int is a Num instance and hence automatically (implicitly!) chooses the right dictionary.
Really, there's no particular relationship between ∀ and =>, they just happen to be used often together.
†Actually this is a type-level lambda. The type expression ∀ a . b behaves analogously to the value level expression \a -> b.

Related

Confusing about Haskell type inference

I have just started learning Haskell. As Haskell is static typed and has polymorphic type inference, the type of the identity function is
id :: a -> a
suggesting id can take any type as its parameter and return itself. It works fine when I try:
a = (id 1, id True)
I just suppose that at compile time, the first id is Num a :: a -> a, and the second id is Bool -> Bool. When I try the following code, it gives an error:
foo f a b = (f a, f b)
result = foo id 1 True
It shows the type of a must be the same type of b, since it works fine with
result = foo id 1 2
But is that true that the type of id's parameter can be polymorphic, so that a and b can be different type?

All right, this is a weird spooky corner of Haskell's type system. The problem here is that there are two ways to type inference your function foo.
-- rank 1
foo :: forall a b. (a -> b) -> a -> a -> (b, b)
foo f a b = (f a, f b)
-- rank 2
foo' :: (forall a. a -> a) -> a -> b -> (a, b)
foo' f a b = (f a, f b)
The second type is the one you want, but the first type is the one you're getting. The second type, as amalloy pointed out, is a rank-2 type (we're going to ignore what the two means but read the introduction in "Practical type inference for arbitrary-rank types" if you want a good explanation of ranks – don't be put off by the academic nature of the PDF file as the beginning is accessibly and clearly written).
We'll defer the definition of higher-ranked types for now and just say that the problem is that GHC is unable to infer the rank-2 type. Quote the paper:
Complete type inference is known to be undecidable for higher-rank (impredicative) type systems, but in practice programmers are more than willing to add type annotations to guide the type inference engine, and to document their code....
Kfoury and Wells show that typeability is decidable for rank ≤ 2, and undecidable for all ranks ≥ 3 (Kfoury & Wells, 1994). For the rank-2 fragment, the same paper gives a type inference algorithm. This inference algorithm is somewhat subtle, does not interact well with user-supplied type annotations, and has not, to our knowledge, been implemented in a production compiler.
Undecidable means there can be no algorithm that always leads to a correct yes-or-no decision. So there you have it: impossible to infer a rank-3-or-higher type, and it's too gosh-darn-hard to infer the rank-2 type.
Now, back to rank 2. The (forall a. a -> a) is what makes it rank-2. There's already an excellent Stack Overflow question about what the forall keyword means so I'll refer you to that, but basically it means you're able to call f a and f b in the expression (f a, f b) while having a and b be different types, which is what you wanted in the first place, before all this hot mess.
One last thing: The reason you don't normally see foralls in GHCi is that any foralls on the very outer scope are left off. So forall a b. (a -> b) -> a -> a -> (b, b) is equivalent to (a -> b) -> a -> a -> (b, b).
Overall this is a pain point of the language that's poorly explained.
(Hat tip to #amalloy in the comments.)

Why is context reduction necessary?

I've just read this paper ("Type classes: an exploration of the design space" by Peyton Jones & Jones), which explains some challenges with the early typeclass system of Haskell, and how to improve it.
Many of the issues that they raise are related to context reduction which is a way to reduce the set of constraints over instance and function declarations by following the "reverse entailment" relationship.
e.g. if you have somewhere instance (Ord a, Ord b) => Ord (a, b) ... then within contexts, Ord (a, b) gets reduced to {Ord a, Ord b} (reduction does not always shrink the number of constrains).
I did not understand from the paper why this reduction was necessary.
Well, I gathered it was used to perform some form of type checking. When you have your reduced set of constraint, you can check that there exist some instance that can satisfy them, otherwise it's an error. I'm not too sure what the added value of that is, since you would notice the problem at the use site, but okay.
But even if you have to do that check, why use the result of reduction inside inferred types? The paper points out it leads to unintuitive inferred types.
The paper is quite ancient (1997) but as far as I can tell, context reduction is still an ongoing concern. The Haskell 2010 spec does mention the inference behaviour I explain above (link).
So, why do it this way?

I don't know if this is The Reason, necessarily, but it might be considered A Reason: in early Haskell, type signatures were only permitted to have "simple" constraints, namely, a type class name applied to a type variable. Thus, for example, all of these were okay:
Ord a => a -> a -> Bool
Eq a => a -> a -> Bool
Graph gr => gr n e -> [n]
But none of these:
Ord (Tree a) => Tree a -> Tree a -> Bool
Eq (a -> b) => (a -> b) -> (a -> b) -> Bool
Graph Gr => Gr n e -> [n]
I think there was a feeling then -- and still today, as well -- that allowing the compiler to infer a type which one couldn't write manually would be a bit unfortunate. Context reduction was a way of turning the above signatures either into ones that could be written by hand as well or an informative error. For example, since one might reasonably have
instance Ord a => Ord (Tree a)
in scope, we could turn the illegal signature Ord (Tree a) => ... into the legal signature Ord a => .... On the other hand, if we don't have any instance of Eq for functions in scope, one would report an error about the type which was inferred to require Eq (a -> b) in its context.
This has a couple of other benefits:
Intuitively pleasing. Many of the context reduction rules do not change whether the type is legal, but do reflect things humans would do when writing the type. I'm thinking here of the de-duplication and subsumption rules that let you turn, e.g. (Eq a, Eq a, Ord a) into just Ord a -- a transformation one definitely would want to do for readability.
This can frequently catch stupid errors; rather than inferring a type like Eq (Integer -> Integer) => Bool which can't be satisfied in a law-abiding way, one can report an error like Perhaps you did not apply a function to enough arguments?. Much friendlier!
It becomes the compiler's job to pinpoint what went wrong. Instead of inferring a complicated context like Eq (Tree (Grizwump a, [Flagle (Gr n e) (Gr n' e') c])) and complaining that the context is not satisfiable, it instead is forced to reduce this to the constituent constraints; it will instead complain that we couldn't determine Eq (Grizwump a) from the existing context -- a much more precise and actionable error.

I think this is indeed desirable in a dictionary passing implementation. In such an implementation, a "dictionary", that is, a tuple or record of functions is passed as implicit argument for every type class constraint in the type of the applied function.
Now, the question is simply when and how those dictionaries are created. Observe that for simple types like Int by necessity all dictionaries for whatever type class Int is an instance of will be a constant.
Not so in the case of parameterized types like lists, Maybe or tuples. It is clear that to show a tuple, for instance, the Show instances of the actual tuple elements need to be known. Hence such a polymorphic dictionary cannot be a constant.
It appears that the principle guiding the dictionary passing is such that only dictionaries for types that appear as type variables in the type of the applied function are passed. Or, to put it differently: no redundant information is replicated.
Consider this function:
f :: (Show a, Show b) => (a,b) -> Int
f ab = length (show ab)
The information that a tuple of show-able components is also showable, thus a constraint like Show (a,b) needs not to appear when we already know (Show a, Show b).
An alternative implementation would be possible, though, where the caller .would be responsible to create and pass dictionaries. This could work without context reduction, such that the type of f would look like:
f :: Show (a,b) => (a,b) -> Int
But this would mean that the code to create the tuple dictionary would have to be repeated on every call site. And it is easy to come up with examples where the number of necessary constraints actually increases, like in:
g :: (Show (a,a), Show(b,b), Show (a,b), Show (b, a)) => a -> b -> Int
g a b = maximum (map length [show (a,a), show (a,b), show (b,a), show(b,b)])
It is instructive to implement a type class/instance system with actual records that are explicitly passed. For example:
data Show' a = Show' { show' :: a -> String }
showInt :: Show' Int
showInt = Show' { show' = intshow } where
intshow :: Int -> String
intshow = show
Once you do this you will probably easily recognize the need for "context reduction".

How are variable names chosen in type signatures inferred by GHC?

When I play with checking types of functions in Haskell with :t, for example like those in my previous question, I tend to get results such as:
Eq a => a -> [a] -> Bool
(Ord a, Num a, Ord a1, Num a1) => a -> a1 -> a
(Num t2, Num t1, Num t, Enum t2, Enum t1, Enum t) => [(t, t1, t2)]
It seems that this is not such a trivial question - how does the Haskell interpreter pick literals to symbolize typeclasses? When would it choose a rather than t? When would it choose a1 rather than b? Is it important from the programmer's point of view?

The names of the type variables aren't significant. The type:
Eq element => element -> [element] -> Bool
Is exactly the same as:
Eq a => a -> [a] -> Bool
Some names are simply easier to read/remember.
Now, how can an inferencer choose the best names for types?
Disclaimer: I'm absolutely not a GHC developer. However I'm working on a type-inferencer for Haskell in my bachelor thesis.
During inferencing the names chosen for the variables aren't probably that readable. In fact they are almost surely something along the lines of _N with N a number or aN with N a number.
This is due to the fact that you often have to "refresh" type variables in order to complete inferencing, so you need a fast way to create new names. And using numbered variables is pretty straightforward for this purpose.
The names displayed when inference is completed can be "pretty printed". The inferencer can rename the variables to use a, b, c and so on instead of _1, _2 etc.
The trick is that most operations have explicit type signatures. Some definitions require to quantify some type variables (class, data and instance for example).
All these names that the user explicitly provides can be used to display the type in a better way.
When inferencing you can somehow keep track of where the fresh type variables came from, in order to be able to rename them with something more sensible when displaying them to the user.
An other option is to refresh variables by adding a number to them. For example a fresh type of return could be Monad m0 => a0 -> m0 a0 (Here we know to use m and a simply because the class definition for Monad uses those names). When inferencing is finished you can get rid of the numbers and obtain the pretty names.
In general the inferencer will try to use names that were explicitly provided through signatures. If such a name was already used it might decide to add a number instead of using a different name (e.g. use b1 instead of c if b was already bound).
There are probably some other ad hoc rules. For example the fact that tuple elements have like t, t1, t2, t3 etc. is probably something done with a custom rule. In fact t doesn't appear in the signature for (,,) for example.

How does GHCi pick names for type variables? explains how many of these variable names come about. As Ganesh Sittampalam pointed out in a comment, something strange seems to be happening with arithmetic sequences. Both the Haskell 98 report and the Haskell 2010 report indicate that
[e1..] = enumFrom e1
GHCi, however, gives the following:
Prelude> :t [undefined..]
[undefined..] :: Enum t => [t]
Prelude> :t enumFrom undefined
enumFrom undefined :: Enum a => [a]
This makes it clear that the weird behavior has nothing to do with the Enum class itself, but rather comes in from some stage in translating the syntactic sequence to the enumFrom form. I wondered if maybe GHC wasn't really using that translation, but it really is:
{-# LANGUAGE NoMonomorphismRestriction #-}
module X (aoeu,htns) where
aoeu = [undefined..]
htns = enumFrom undefined
compiled using ghc -ddump-simpl enumlit.hs gives
X.htns :: forall a_aiD. GHC.Enum.Enum a_aiD => [a_aiD]
[GblId, Arity=1]
X.htns =
\ (# a_aiG) ($dEnum_aiH :: GHC.Enum.Enum a_aiG) ->
GHC.Enum.enumFrom # a_aiG $dEnum_aiH (GHC.Err.undefined # a_aiG)
X.aoeu :: forall t_aiS. GHC.Enum.Enum t_aiS => [t_aiS]
[GblId, Arity=1]
X.aoeu =
\ (# t_aiV) ($dEnum_aiW :: GHC.Enum.Enum t_aiV) ->
GHC.Enum.enumFrom # t_aiV $dEnum_aiW (GHC.Err.undefined # t_aiV)
so the only difference between these two representations is the assigned type variable name. I don't know enough about how GHC works to know where that t comes from, but at least I've narrowed it down!
Ørjan Johansen has noted in a comment that something similar seems to happen with function definitions and lambda abstractions.
Prelude> :t \x -> x
\x -> x :: t -> t
but
Prelude> :t map (\x->x) $ undefined
map (\x->x) $ undefined :: [b]
In the latter case, the type b comes from an explicit type signature given to map.

Are you familiar with the concepts of alpha equivalence and alpha substitution? This captures the notion that, for example, both of the following are completely equivalent and interconvertible (in certain circumstances) even though they differ:
\x -> (x, x)
\y -> (y, y)
The same concept can be extended to the level of types and type variables (see "System F" for further reading). Haskell in fact has a notion of "lambdas at the type level" for binding type variables, but it's hard to see because they're implicit by default. However, you can make them explicit by using the ExplicitForAll extension, and play around with explicitly binding your type variables:
ghci> :set -XExplicitForAll
ghci> let f x = x; f :: forall a. a -> a
In the second line, I use the forall keyword to introduce a new type variable, which is then used in a type.
In other words, it doesn't matter whether you choose a or t in your example, as long as the type expressions satisfy alpha-equivalence. Choosing type variable names so as to maximize human convenience is an entirely different topic, and probably far more complicated!

What are these explicit "forall"s doing?

What is the purpose of the foralls in this code?
class Monad m where
(>>=) :: forall a b. m a -> (a -> m b) -> m b
(>>) :: forall a b. m a -> m b -> m b
-- Explicit for-alls so that we know what order to
-- give type arguments when desugaring
(some code omitted). This is from the code for Monads.
My background: I don't really understand forall or when Haskell has them implicitly.
Also, and it may not be significant, but GHCi allows me to omit the forall when giving >> a type:
Prelude> :t (>>) :: Monad m => m a -> m b -> m b
(>>) :: Monad m => m a -> m b -> m b
:: (Monad m) => m a -> m b -> m b
(no error).

My background: I don't really understand forall or when Haskell has them implicitly.
Okay, consider the type of id, a -> a. What does a mean, and where does it come from? When you define a value, you can't just use arbitrary variables that aren't defined anywhere. You need a top-level definition, or a function argument, or a where clause, &c. In general, if you use a variable, it must be bound somewhere.
The same is true of type variables, and forall is one such way to bind a type variable. Anywhere you see a type variable that isn't explicitly bound (for example, class Foo a where ... binds a inside the class definition), it's implicitly bound by a forall.
So, the type of id is implicitly forall a. a -> a. What does this mean? Pretty much what it says. We can get a type a -> a for all possible types a, or from another perspective, if you pick any specific type you can get a type representing "functions from your chosen type to itself". The latter phrasing should sound a bit like defining a function, and as such you can think of forall as being similar to a lambda abstraction for types.
GHC uses various intermediate representations during compilation, and one of the transformations it applies is making the similarity to functions more direct: implicit foralls are made explicit, and anywhere a polymorphic value is used for a specific type, it is first applied to a type argument.
We can even write both foralls and lambdas as one expression. I'll abuse notation for a moment and replace forall a. with /\a => for visual consistency. In this style, we can define id = /\a => \(x::a) -> (x::a) or something similar. So, an expression like id True in your code would end up translated to something like id Bool True instead; just id True would no longer even make sense.
Just as you can reorder function arguments, you can likewise reorder the type arguments, subject only to the (rather obvious) restriction that type arguments must come before any value arguments of that type. Since implicit foralls are always the outermost layer, GHC could potentially choose any order it wanted when making them explicit. Under normal circumstances, this obviously doesn't matter.
I'm not sure exactly what's going on in this case, but based on the comments I would guess that the conversion to using explicit type arguments and the desugaring of do notation are, in some sense, not aware of each other, and therefore the order of type arguments is specified explicitly to ensure consistency. After all, if something is blindly applying two type arguments to an expression, it matters a great deal whether that expression's type is forall a b. m a -> m b -> m b or forall b a. m a -> m b -> m b!

Algebraically interpreting polymorphism

So I understand the basic algebraic interpretation of types:
Either a b ~ a + b
(a, b) ~ a * b
a -> b ~ b^a
() ~ 1
Void ~ 0 -- from Data.Void
... and that these relations are true for concrete types, like Bool, as opposed to polymorphic types like a. I also know how to translate type signatures with polymorphic types into their concrete type representations by just translating the Church encoding according to the following isomorphism:
(forall r . (a -> r) -> r) ~ a
So if I have:
id :: forall a . a -> a
I know that it does not mean id ~ a^a, but it actually means:
id :: forall a . (() -> a) -> a
id ~ ()
~ 1
Similarly:
pair :: forall r . (a -> b -> r) -> r
pair ~ ((a, b) -> r) - > r
~ (a, b)
~ a * b
Which brings me to my question. What is the "algebraic" interpretation of this rule:
(forall r . (a -> r) -> r) ~ a
For every concrete type isomorphism I can point to an equivalent algebraic rule, such as:
(a, (b, c)) ~ ((a, b), c)
a * (b * c) = (a * b) * c
a -> (b -> c) ~ (a, b) -> c
(c^b)^a = c^(b * a)
But I don't understand the algebraic equality that is analogous to:
(forall r . (a -> r) -> r) ~ a

This is the famous Yoneda lemma for the identity functor.
Check this post for a readable introduction, and any category theory textbook for more.
Briefly, given f :: forall r. (a -> r) -> r you can apply f id to get an a, and conversely, given x :: a you can take ($x) to get forall r. (a -> r) -> r.
These operations are mutually inverse. Proof:
Obviously ($x) id == x. I will show that
($(f id)) == f,
since functions are equal when they are equal on all arguments, let's take x :: a -> r and show that
($(f id)) x == f x i.e.
x (f id) == f x.
Since f is polymorphic, it works as a natural transformation; this is the naturality diagram for f:
f_A
Hom(A, A) → A
(x.) ↓ ↓ x
Hom(A, R) → R
f_R
So x . f == f . (x.).
Plugging identity, (x . f) id == f x. QED

(Rewritten for clarity)
There seem to be two parts to your question. One is implied and is asking what the algebraic interpretation of forall is, and the other is asking about the cont/Yoneda transformation, which sdcvvc's answer already covered pretty well.
I'll try to address the algebraic interpretation of forall for you. You mention that A -> B is B^A but I'd like to take that a step further and expand it out to B * B * B * ... * B (|A| times). Although we do have exponentiation as a notation for repeated multiplication like that, there's a more flexible notation, ∏ (uppercase Pi) representing arbitrary indexed products. There are two components to a Pi: the range of values we want to multiply over, and the expression that we're multiplying out. For example, at the value level, you might express the factorial function as fact i = ∏ [1..i] (λx -> x).
Going back to the world of types, we can view the exponentiation operator in the A -> B ~ B^A correspondence as a Pi: B^A ~ ∏ A (λ_ -> B). This says that we're defining an A-ary product of Bs, such that the Bs cannot depend on the particular A we've chosen. Sure, it's equivalent to plain exponentiation, but it lets us move up to cases in which there is a dependence.
In the most general case, we get dependent types, like what you see in Agda or Coq: in Agda syntax, replicate : Bool -> ((n : Nat) -> Vec Bool n) is one possible application of a Pi type, which could be expressed more explicitly as replicate : Bool -> ∏ Nat (Vec Bool), or further as replicate : ∏ Bool (λ_ -> ∏ Nat (Vec Bool)).
Note that as you might expect from the underlying algebra, you can fuse both of the ∏s in the definition of replicate above into a single ∏ ranging over the cartesian product of the domains: ∏ Bool (\_ -> ∏ Nat (Vec Bool)) is equivalent to ∏ (Bool, Nat) (λ(_, n) -> Vec Bool n) just like it would be at the "value level". This is simply uncurrying from the perspective of type theory.
I do realize your question was about polymorphism, so I'll stop going on about dependent types, but they are relevant: forall in Haskell is roughly equivalent to a ∏ with a domain over the type (kind) of types, *. Indeed, the function-like behavior of polymorphism can be observed directly in GHC core, which types them as capital lambdas (Λ). As such, a polymorphic type like forall a. a -> a is actually just ∏ * (Λ a -> (a -> a)) (using the Λ notation now that we distinguish between types and values), which can be expanded out to the infinite product (Bool -> Bool, Int -> Int, () -> (), (Int -> Bool) -> (Int -> Bool), ...) for every possible type. Instantiation of the type variable is simply projecting out the suitable element from the *-ary product (or applying the type function).
Now, for the big piece I missed in my original version of this answer: parametricity. Parametricity can be described in several different ways, but none of the ones I know of (viewing types as relations, or (di)naturality in category theory) really has a very algebraic interpretation. For our purposes, though, it boils down to something fairly simple: you can't pattern-match on *. I know that GHC lets you do that at the type level with type families, but you can only cover a finite chunk of * when doing that, so there are necessarily always points at which your type family is undefined.
What this means, from the point of view of polymorphism, is that any type function F we write in ∏ * F must either be constant (i.e., completely ignore the type it was polymorphic over) or pass the type through unchanged. Thus, ∏ * (Λ _ -> B) is valid because it ignores its argument, and corresponds to forall a. B. The other case is something like ∏ * (Λ x -> Maybe x), which corresponds to forall a. Maybe a, which doesn't ignore the type argument, but only "passes it through". As such, a ∏ A that has an irrelevant domain A (such as when A = *) can be seen as more of an A-ary indexed intersection (picking the common elements across all instantiations of the index), rather than a product.
Crucially, at the value level, the rules of parametricity prevent any funny behavior that might suggest the types are larger than they really are. Because we don't have typecase, we can't construct a value of type forall a. B that does something different based on what a was instantiated to. Thus, although the type is technically a function * -> B, it is always a constant function, and is thus equivalent to a single value of B. Using the ∏ interpretation, it is indeed equivalent to an infinite *-ary product of Bs, but those B values must always be identical, so the infinite product is effectively as big as a single B.
Similarly, although ∏ * (Λ x -> (x -> x)) (a.k.a., forall a. a -> a) is technically equivalent to an infinite product of functions, none of those functions can inspect the type, so all are constrained to only return their input value and not do any funny business like (+1) : Int -> Int when instantiated to Int. Because there is only one (assuming a total language) function that can't inspect the type of its argument but must return a value of that same type, the infinite product is thus just as large as a single value.
Now, about your direct question on (forall r . (a -> r) -> r) ~ a. First, let's express your ~ operator more formally. It's really isomorphism, so we need two functions going back and forth, and an argument that they're inverses.
data Iso a b = Iso
{ to :: a -> b
, from :: b -> a
-- proof1 :: forall x. to (from x) == x
-- proof2 :: forall x. from (to x) == x
}
and now we express your original question in more formal terms. Your question amounts to constructing a term of the following (impredicative, so GHC has trouble with it, but we'll survive) type:
forall a. Iso (forall r. (a -> r) -> r) a
Which, using my earlier terminology, amounts to ∏ * (Λ a -> Iso (∏ * (Λ r -> ((a -> r) -> r))) a). Once again we have an infinite product that can't inspect its type argument. By handwaving, we can argue that the only possible values considering the parametricity rules (the other two proofs are respected automatically) for to and from are ($ id) and flip id.
If this feels unsatisfying, it's probably because the algebraic interpretation of forall didn't really add anything to the proof. It's really just plain old type theory, but I hope I was able to provide something that feels a little less categorical than the Yoneda form of it. It's worth noting that we don't actually need to use parametricity to write proof1 and proof2 above, though. Parametricity only enters the picture when we want to state that ($ id) and flip id are our only options for to and from (which we can't prove in Agda or Coq, for that reason).

To (attempt to) answer the actual question (which is less interesting than the answers to the broader issues raised), the question is ill formed because of a "type error"
Either ~ (+)
(,) ~ (*)
(->) b ~ flip (^)
() ~ 1
Void ~ 0
These all map types to integers, and type constructors to functions on naturals. In a sense, you have a functor from the category of types to the category of naturals. In the other direction, you "forget" stuff, since the types preserve algebraic structure while the naturals throw it away. I.e. given Either () () you can get a unique natural, but given that natural, you can get many types.
But this is different:
(forall r . (a -> r) -> r) ~ a
It maps a type to another type! It is not part of the above functor. It's just an isomorphism within the category of types. So let's give that a different symbol, <=>
Now we have
(forall r . (a -> r) -> r) <=> a
Now you note that we can not only send types to nats and arrows to arrows, but also some isomorphisms to other isomorphisms:
(a, (b, c)) <=> ((a, b), c) ~ a * (b * c) = (a * b) * c
But something subtle is going on here. In a sense, the latter isomorphism on pairs is true because the algebraic identity is true. This is to say that the "isomorphism" in the latter simply means that the two types are equivalent under the image of our functor to the nats.
The former isomorphism we need to prove directly, which is where we start to get to the underlying question -- is given our functor to the nats, what does forall r. map to? But the answer is that forall r. is neither a type, nor a meaningful arrow between types.
By introducing forall, we have moved away from first order types. There's no reason to expect that forall should fit in our above Functor, and indeed, it doesn't.
So we can explore, as others have above, why the isomorphism holds (which is itself very interesting) -- but in doing so we've abandoned the algebraic core of the question. A question which can be answered, I think, is, given the category of higher-order types and constructors as arrows between them, what is there meaningful Functor to?
Edit:
So now I have another approach which shows why adding polymorphism makes things go nuts. We start by asking a simpler question -- does a given polymorphic type have zero or more than zero inhabitants? This is the type inhabitation problem, and winds up being, via Curry-Howard, a problem in modified realizability, since it's the same thing as asking if a formula in some logic is realizable in an appropriate computational model. Now as that page explains, this is decidable in the simply typed lambda calculus but is PSPACE-complete. But once we move to anything more complicated, by adding polymorphism for example and going to System F, then it goes to undecidable!
So, if we can't decide if an arbitrary type is inhabited at all, then we clearly can't decide how many inhabitants it has!

It's an interesting question. I don't have a full answer, but this was too long for a comment.
The type signature (forall r. (a -> r) -> r) can be expressed as me saying
For any type r that you care to name, if you give me a function that takes a and produces an r, then I will give you back an r.
Now, this has to work for any type r, but it can be a specific type a. So the way for me to pull of this neat trick is to have an a sitting around somewhere, that I feed to the function (which produces an r for me) and then I hand that r back to you.
But if I have an a sitting around, I could give it to you:
If you give me a 1, I'll give you an a.
which corresponds to the type signature 1 -> a or simply a. By this informal argument we have
(forall r. (a -> r) -> r) ~ a
The next step would be to generate the corresponding algebraic expression, but I'm not clear on how the algebraic quantities interact with the universal quantification. We may need to wait for an expert!

A few links to the nLab:
Universal quantifier, corresponds to dependent product.
Existential quantifier, corresponds to dependent sum (dependent coproduct).
Thus, in settings of category theory:
Type | Modeled¹ as | In category
-------------------+---------------------------+-------------
Unit | Terminal object | CCC
Bottom | Initial object |
Record | Product |
Union | Sum (coproduct) |
Function | Exponential |
-------------------+---------------------------+-------------
Dependent product² | Right adjoint to pullback | LCCC
Dependent sum | Left adjoint to pullback |
¹) in appropriate category ─ CCC for total and non-polymorphic subset of Haskell (link), CPO for non-total traits of Haskell (link), LCCC for dependently typed languages.
²) forall quantification is a special case of dependent product:
∀(x :: *). y[x] ~ ∏(x : Set)y[x]
where Set is the universe of all small types.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string