Generic Pattern in Haskell - haskell

I was hoping that Haskell's compiler would understand that f v Is type-safe given Unfold v f (Although that is a tall order).
data Sequence a = FirstThen a (Sequence a) | Repeating a | UnFold b (b -> b) (b -> a)
Is there some way that I can encapsulate a Generic pattern for a datatype without adding extra template parameters.
(I am aware of a solution for this specific case using lazy maps but I am after a more general solution)

You can use existential quantification to get there:
data Sequence a = ... | forall b. UnFold b (b -> b) (b -> a)
I'm not sure this buys you much over the simpler solution of storing the result of unfolding directly, though:
data Stream a = Cons a (Stream a)
data Sequence' a = ... | Explicit (Stream a)
In particular, if somebody hands you a Sequence a, you can't pattern match on the b's contained within, even if you think you know what type they are.

Related

Hidden forall quantified types in ReifiedTraversal

This question really is more generic, since while I was asking it I found out how to fix it in this particular case (even though I don't like it) but I'll phrase it in my particular context.
Context:
I'm using the lens library and I found it particularly useful to provide functionality for "adding" traversals (conceptually, a traversal that traverses all the elements in both original traversals). I did not find a default implementation so I did it using Monoid. In order to be able to implement an instance, I had to use the ReifiedTraversal wrapper, which I assume is in the library precisely for this purpose:
-- Adding traversals
add_traversals :: Semigroup t => Traversal s t a b -> Traversal s t a b -> Traversal s t a b
add_traversals t1 t2 f s = liftA2 (<>) (t1 f s) (t2 f s)
instance Semigroup t => Semigroup (ReifiedTraversal s t a b) where
a1 <> a2 = Traversal (add_traversals (runTraversal a1) (runTraversal a2))
instance Semigroup s => Monoid (ReifiedTraversal' s a) where
mempty = Traversal (\_ -> pure . id)
The immediate application I want to extract from this is being able to provide a traversal for a specified set of indices in a list. Therefore, the underlying semigroup is [] and so is the underlying Traversable. First, I implemented a lens for an individual index in a list:
lens_idx :: Int -> Lens' [a] a
lens_idx _ f [] = error "No such index in the list"
lens_idx 0 f (x:xs) = fmap (\rx -> rx:xs) (f x)
lens_idx n f (x:xs) = fmap (\rxs -> x:rxs) (lens_idx (n-1) f xs)
All that remains to be done is to combine these two things, ideally to implement a function traversal_idxs :: [Int] -> Traversal' [a] a
Problem:
I get type checking errors when I try to use this. I know it has to do with the fact that Traversal is a type that includes a constrained forall quantifier in its definition. In order to be able to use the Monoid instance, I need to first reify the lenses provided by lens_idx (which are, of course, also traversals). I try to do this by doing:
r_lens_idx :: Int -> ReifiedTraversal' [a] a
r_lens_idx = Traversal . lens_idx
But this fails with two errors (two versions of the same error really):
Couldn't match type ‘f’ with ‘f0’...
Ambiguous type variable ‘f0’ arising from a use of ‘lens_idx’
prevents the constraint ‘(Functor f0)’ from being solved...
I understand this has to do with the hidden forall f. Functor f => in the Traversal definition. While writing this, I realized that the following does work:
r_lens_idx :: Int -> ReifiedTraversal' [a] a
r_lens_idx idx = Traversal (lens_idx idx)
So, by giving it the parameter it can make the f explicit to itself and then it can work with it. However, this feels extremely ad-hoc. Specially because originally I was trying to build this r_lens_idx inline in a where clause in the definition of the traversal_idxs function (in fact... on a function defining this function inline because I'm not really going to use it that often).
So, sure, I guess I can always use lambda abstraction, but... is this really the right way to deal with this? It feels like a hack, or rather, that the original error is an oversight by the type-checker.
The "adding" of traversals that you want was added in the most recent lens release, you can find it under the name adjoin. Note that it is unsound to use if your traversals overlap at all.
I am replying to my own question, although it is only pointing out that what I was trying to do with traversals was not actually possible in that shape and how I overcame it. There is still the underlying problem of the hidden forall quantified variables and how is it possible that lambda abstraction can make code that does not type check suddenly type check (or rather, why it did not type check to start with).
It turns out my implementation of Monoid for Traversal was deeply flawed. I realized when I started debugging it. For instance, I was trying to combine a list of indices, and a function that would return a lens for each index, mapping to that index in a list, to a traversal that would map to exactly those indices. That is possible, but it relies on the fact that List is a Monad, instead of just using the Applicative structure.
The function that I had written originally for add_traversal used only the Applicative structure, but instead of mapping to those indices in the list, it would duplicate the list for each index, concatenating them, each version of the list having applied its lens.
When trying to fix it, I realized I needed to use bind to implement what I really wanted, and then I stumbled upon this: https://www.reddit.com/r/haskell/comments/4tfao3/monadic_traversals/
So the answer was clear: I can do what I want, but it's not a Monoid over Traversal, but instead a Monoid over MTraversal. It still serves my purposes perfectly.
This is the resulting code for that:
-- Monadic traversals: Traversals that only work with monads, but they allow other things that rely on the fact they only need to work with monads, like sum.
type MTraversal s t a b = forall m. Monad m => (a -> m b) -> s -> m t
type MTraversal' s a = MTraversal s s a a
newtype ReifiedMTraversal s t a b = MTraversal {runMTraversal :: MTraversal s t a b}
type ReifiedMTraversal' s a = ReifiedMTraversal s s a a
-- Adding mtraversals
add_mtraversals :: Semigroup t => MTraversal r t a b -> MTraversal s r a b -> MTraversal s t a b
add_mtraversals t1 t2 f s = (t2 f s) >>= (t1 f)
instance Semigroup s => Semigroup (ReifiedMTraversal' s a) where
a1 <> a2 = MTraversal (add_mtraversals (runMTraversal a1) (runMTraversal a2))
instance Semigroup s => Monoid (ReifiedMTraversal' s a) where
mempty = MTraversal (\_ -> return . id)
Note that MTraversal is still a LensLike and an ASetter, so you can use many operators from the lens package, like .~.
As I mentioned, though, I still have to use lambda abstraction when using this for my purposes due to the forall quantifier being in an uncomfortable place, and I'd love if someone could clarify what the heck is up with the type checker in that regard.

SystemT Compiler and dealing with Infinite Types in Haskell

I'm following this blog post: http://semantic-domain.blogspot.com/2012/12/total-functional-programming-in-partial.html
It shows a small OCaml compiler program for System T (a simple total functional language).
The entire pipeline takes OCaml syntax (via Camlp4 metaprogramming) transforms them to OCaml AST that is translated to SystemT Lambda Calculus (see: module Term) and then finally SystemT Combinator Calculus (see:
module Goedel). The final step is also wrapped with OCaml metaprogramming Ast.expr type.
I'm attempting to translate it to Haskell without the use of Template Haskell.
For the SystemT Combinator form, I've written it as
{-# LANGUAGE GADTs #-}
data TNat = Zero | Succ TNat
data THom a b where
Id :: THom a a
Unit :: THom a ()
ZeroH :: THom () TNat
SuccH :: THom TNat TNat
Compose :: THom a b -> THom b c -> THom a c
Pair :: THom a b -> THom a c -> THom a (b, c)
Fst :: THom (a, b) a
Snd :: THom (a, b) b
Curry :: THom (a, b) c -> THom a (b -> c)
Eval :: THom ((a -> b), a) b -- (A = B) * A -> B
Iter :: THom a b -> THom (a, b) b -> THom (a, TNat) b
Note that Compose is forward composition, which differs from (.).
During the translation of SystemT Lambda Calculus to SystemT Combinator Calculus, the Elaborate.synth function tries to convert SystemT Lambda calculus variables into series of composed projection expressions (related to De Brujin Indices). This is because combinator calculus doesn't have variables/variable names. This is done with the Elaborate.lookup which uses the Quote.find function.
The problem is that with my encoding of the combinator calculus as the GADT THom a b. Translating the Quote.find function:
let rec find x = function
| [] -> raise Not_found
| (x', t) :: ctx' when x = x' -> <:expr< Goedel.snd >>
| (x', t) :: ctx' -> <:expr< Goedel.compose Goedel.fst $find x ctx'$ >>
Into Haskell:
find :: TVar -> Context -> _
find tvar [] = error "Not Found"
find tvar ((tvar', ttype):ctx)
| tvar == tvar' = Snd
| otherwise = Compose Fst (find tvar ctx)
Results in an infinite type error.
• Occurs check: cannot construct the infinite type: a ~ (a, c)
Expected type: THom (a, c) c
Actual type: THom ((a, c), c) c
The problem stems from the fact that using Compose and Fst and Snd from the THom a b GADT can result in infinite variations of the type signature.
The article doesn't have this problem because it appears to use the Ast.expr OCaml thing to wrap the underlying expressions.
I'm not sure how to resolve in Haskell. Should I be using an existentially quantified type like
data TExpr = forall a b. TExpr (THom a b)
Or some sort of type-level Fix to adapt the infinite type problem. However I'm unsure how this changes the find or lookup functions.
This answer will have to be a bit high-level, because there are three entirely different families of possible designs to fix that problem. What you’re doing seems closer to approach three, but the approaches are ordered by increasing complexity.
The approach in the original post
Translating the original post requires Template Haskell and partiality; find would return a Q.Exp representing some Hom a b, avoiding this problem just like the original post. Just like in the original post, a type error in the original code would be caught when typechecking the output of all the Template Haskell functions. So, type errors are still caught before runtime, but you will still need to write tests to find cases where your macros output ill-typed expressions. One can give stronger guarantees, at the cost of a significant increase in complexity.
Dependent typing/GADTs in input and output
If you want to diverge from the post, one alternative is to use “dependent typing” throughout and make the input dependently-typed. I use the term loosely to include both actually dependently-typed languages, actual Dependent Haskell (when it lands), and ways to fake dependent typing in Haskell today (via GADTs, singletons, and what not).
However, you lose the ability to write your own typechecker and choose which type system to use; typically, you end up embedding a simply-typed lambda calculus, and can simulate polymorphism via polymorphic Haskell functions that can generate terms at a given type. This is easier in dependently-typed languages, but possible at all in Haskell.
But honestly, in this road it’s easier to use higher-order abstract syntax and Haskell functions, with something like:
data Exp a where
Abs :: (Exp a -> Exp b) -> Exp (a -> b)
App :: Exp (a -> b) -> Exp a -> Exp b
Var :: String -> Exp a —- only use for free variables
exampleId :: Exp (a -> a)
exampleId = Abs (\x -> x)
If you can use this approach (details omitted here), you get high assurance from GADTs with limited complexity. However, this approach is too inflexible for many scenarios, for instance because the typing contexts are only visible to the Haskell compiler and not in your types or terms.
From untyped to typed terms
A third option is go dependently-typed and to still make your program turn weakly-typed input to strongly typed output. In this case, your typechecker overall transforms terms of some type Expr into terms of a GADT TExp gamma t, Hom a b, or such. Since you don’t know statically what gamma and t (or a and b) are, you’ll indeed need some sort of existential.
But a plain existential is too weak: to build bigger well-typed expression, you’ll need to check that the produced types allow it. For instance, to build a TExpr containing a Compose expression out of two smaller TExpr, you'll need to check (at runtime) that their types match. And with a plain existential, you can't. So you’ll need to have a representation of types also at the value level.
What's more existentials are annoying to deal with (as usual), since you can’t ever expose the hidden type variables in your return type, or project those out (unlike dependent records/sigma-types).
I am honestly not entirely sure this could eventually be made to work. Here is a possible partial sketch in Haskell, up to one case of find.
data Type t where
VNat :: Type Nat
VString :: Type String
VArrow :: Type a -> Type b -> Type (a -> b)
VPair :: Type a -> Type b -> Type (a, b)
VUnit :: Type ()
data SomeType = forall t. SomeType (Type t)
data SomeHom = forall a b. SomeHom (Type a) (Type b) (THom a b)
type Context = [(TVar, SomeType)]
getType :: Context -> SomeType
getType [] = SomeType VUnit
getType ((_, SomeType ttyp) :: gamma) =
case getType gamma of
SomeType ctxT -> SomeType (VPair ttyp
find :: TVar -> Context -> SomeHom
find tvar ((tvar’, ttyp) :: gamma)
| tvar == tvar’ =
case (ttyp, getType gamma) of
(SomeType t, SomeType ctxT) ->
SomeHom (VPair t ctxT) t Fst

Are there useful applications for the Divisible Type Class?

I've lately been working on an API in Elm where one of the main types is contravariant. So, I've googled around to see what one can do with contravariant types and found that the Contravariant package in Haskell defines the Divisible type class.
It is defined as follows:
class Contravariant f => Divisible f where
divide :: (a -> (b, c)) -> f b -> f c -> f a
conquer :: f a
It turns out that my particular type does suit the definition of the Divisible type class. While Elm does not support type classes, I do look at Haskell from time to time for some inspiration.
My question: Are there any practical uses for this type class? Are there known APIs out there in Haskell (or other languages) that benefit from this divide-conquer pattern? Are there any gotchas I should be aware of?
Thank you very much for your help.
One example:
Applicative is useful for parsing, because you can turn Applicative parsers of parts into a parser of wholes, needing only a pure function for combining the parts into a whole.
Divisible is useful for serializing (should we call this coparsing now?), because you can turn Divisible serializers of parts into a serializer of wholes, needing only a pure function for splitting the whole into parts.
I haven't actually seen a project that worked this way, but I'm (slowly) working on an Avro implementation for Haskell that does.
When I first came across Divisible I wanted it for divide, and had no idea what possible use conquer could be other than cheating (an f a out of nowhere, for any a?). But to make the Divisible laws check out for my serializers conquer became a "serializer" that encodes anything to zero bytes, which makes a lot of sense.
Here's a possible use case.
In streaming libraries, one can have fold-like constructs like the ones from the foldl package, that are fed a sequence of inputs and return a summary value when the sequence is exhausted.
These folds are contravariant on their inputs, and can be made Divisible. This means that if you have a stream of elements where each element can be somehow decomposed into b and c parts, and you also happen to have a fold that consumes bs and another fold that consumes cs, then you can build a fold that consumes the original stream.
The actual folds from foldl don't implement Divisible, but they could, using a newtype wrapper. In my process-streaming package I have a fold-like type that does implement Divisible.
divide requires the return values of the constituent folds to be of the same type, and that type must be an instance of Monoid. If the folds return different, unrelated monoids, a workaround is to put each return value in a separate field of a tuple, leaving the other field as mempty. This works because a tuple of monoids is itself a Monoid.
I'll examine the example of the core data types in Fritz Henglein's generalized radix sort techniques as implemented by Edward Kmett in the discrimination package.
While there's a great deal going on there, it largely focuses around a type like this
data Group a = Group (forall b . [(a, b)] -> [[b]])
If you have a value of type Group a you essentially must have an equivalence relationship on a because if I give you an association between as and some type b completely unknown to you then you can give me "groupings" of b.
groupId :: Group a -> [a] -> [[a]]
groupId (Group grouper) = grouper . map (\a -> (a, a))
You can see this as a core type for writing a utility library of groupings. For instance, we might want to know that if we can Group a and Group b then we can Group (a, b) (more on this in a second). Henglein's core idea is that if you can start with some basic Groups on integers—we can write very fast Group Int32 implementations via radix sort—and then use combinators to extend them over all types then you will have generalized radix sort to algebraic data types.
So how might we build our combinator library?
Well, f :: Group a -> Group b -> Group (a, b) is pretty important in that it lets us make groups of product-like types. Normally, we'd get this from Applicative and liftA2 but Group, you'll notice, is Contravaiant, not a Functor.
So instead we use Divisible
divided :: Group a -> Group b -> Group (a, b)
Notice that this arises in a strange way from
divide :: (a -> (b, c)) -> Group b -> Group c -> Group a
as it has the typical "reversed arrow" character of contravariant things. We can now understand things like divide and conquer in terms of their interpretation on Group.
Divide says that if I want to build a strategy for equating as using strategies for equating bs and cs, I can do the following for any type x
Take your partial relation [(a, x)] and map over it with a function f :: a -> (b, c), and a little tuple manipulation, to get a new relation [(b, (c, x))].
Use my Group b to discriminate [(b, (c, x))] into [[(c, x)]]
Use my Group c to discriminate each [(c, x)] into [[x]] giving me [[[x]]]
Flatten the inner layers to get [[x]] like we need
instance Divisible Group where
conquer = Group $ return . fmap snd
divide k (Group l) (Group r) = Group $ \xs ->
-- a bit more cleverly done here...
l [ (b, (c, d)) | (a,d) <- xs, let (b, c) = k a] >>= r
We also get interpretations of the more tricky Decidable refinement of Divisible
class Divisible f => Decidable f where
lose :: (a -> Void) -> f a
choose :: (a -> Either b c) -> f b -> f c -> f a
instance Decidable Group where
lose :: (a -> Void) -> Group a
choose :: (a -> Either b c) -> Group b -> Group c -> Group a
These read as saying that for any type a of which we can guarantee there are no values (we cannot produce values of Void by any means, a function a -> Void is a means of producing Void given a, thus we must not be able to produce values of a by any means either!) then we immediately get a grouping of zero values
lose _ = Group (\_ -> [])
We also can go a similar game as to divide above except instead of sequencing our use of the input discriminators, we alternate.
Using these techniques we build up a library of "Groupable" things, namely Grouping
class Grouping a where
grouping :: Group a
and note that nearly all the definitions arise from the basic definition atop groupingNat which uses fast monadic vector manipuations to achieve an efficient radix sort.

What is the difference between value constructors and tuples?

It's written that Haskell tuples are simply a different syntax for algebraic data types. Similarly, there are examples of how to redefine value constructors with tuples.
For example, a Tree data type in Haskell might be written as
data Tree a = EmptyTree | Node a (Tree a) (Tree a)
which could be converted to "tuple form" like this:
data Tree a = EmptyTree | Node (a, Tree a, Tree a)
What is the difference between the Node value constructor in the first example, and the actual tuple in the second example? i.e. Node a (Tree a) (Tree a) vs. (a, Tree a, Tree a) (aside from just the syntax)?
Under the hood, is Node a (Tree a) (Tree a) just a different syntax for a 3-tuple of the appropriate types at each position?
I know that you can partially apply a value constructor, such as Node 5 which will have type: (Node 5) :: Num a => Tree a -> Tree a -> Tree a
You sort of can partially apply a tuple too, using (,,) as a function ... but this doesn't know about the potential types for the un-bound entries, such as:
Prelude> :t (,,) 5
(,,) 5 :: Num a => b -> c -> (a, b, c)
unless, I guess, you explicitly declare a type with ::.
Aside from syntactical specialties like this, plus this last example of the type scoping, is there a material difference between whatever a "value constructor" thing actually is in Haskell, versus a tuple used to store positional values of the same types are the value constructor's arguments?
Well, coneptually there indeed is no difference and in fact other languages (OCaml, Elm) present tagged unions exactly that way - i.e., tags over tuples or first class records (which Haskell lacks). I personally consider this to be a design flaw in Haskell.
There are some practical differences though:
Laziness. Haskell's tuples are lazy and you can't change that. You can however mark constructor fields as strict:
data Tree a = EmptyTree | Node !a !(Tree a) !(Tree a)
Memory footprint and performance. Circumventing intermediate types reduces the footprint and raises the performance. You can read more about it in this fine answer.
You can also mark the strict fields with the the UNPACK pragma to reduce the footprint even further. Alternatively you can use the -funbox-strict-fields compiler option. Concerning the last one, I simply prefer to have it on by default in all my projects. See the Hasql's Cabal file for example.
Considering the stated above, if it's a lazy type that you're looking for, then the following snippets should compile to the same thing:
data Tree a = EmptyTree | Node a (Tree a) (Tree a)
data Tree a = EmptyTree | Node {-# UNPACK #-} !(a, Tree a, Tree a)
So I guess you can say that it's possible to use tuples to store lazy fields of a constructor without a penalty. Though it should be mentioned that this pattern is kinda unconventional in the Haskell's community.
If it's the strict type and footprint reduction that you're after, then there's no other way than to denormalize your tuples directly into constructor fields.
They're what's called isomorphic, meaning "to have the same shape". You can write something like
data Option a = None | Some a
And this is isomorphic to
data Maybe a = Nothing | Just a
meaning that you can write two functions
f :: Maybe a -> Option a
g :: Option a -> Maybe a
Such that f . g == id == g . f for all possible inputs. We can then say that (,,) is a data constructor isomorphic to the constructor
data Triple a b c = Triple a b c
Because you can write
f :: (a, b, c) -> Triple a b c
f (a, b, c) = Triple a b c
g :: Triple a b c -> (a, b, c)
g (Triple a b c) = (a, b, c)
And Node as a constructor is a special case of Triple, namely Triple a (Tree a) (Tree a). In fact, you could even go so far as to say that your definition of Tree could be written as
newtype Tree' a = Tree' (Maybe (a, Tree' a, Tree' a))
The newtype is required since you can't have a type alias be recursive. All you have to do is say that EmptyLeaf == Tree' Nothing and Node a l r = Tree' (Just (a, l, r)). You could pretty simply write functions that convert between the two.
Note that this is all from a mathematical point of view. The compiler can add extra metadata and other information to be able to identify a particular constructor making them behave slightly differently at runtime.

Algebraically interpreting polymorphism

So I understand the basic algebraic interpretation of types:
Either a b ~ a + b
(a, b) ~ a * b
a -> b ~ b^a
() ~ 1
Void ~ 0 -- from Data.Void
... and that these relations are true for concrete types, like Bool, as opposed to polymorphic types like a. I also know how to translate type signatures with polymorphic types into their concrete type representations by just translating the Church encoding according to the following isomorphism:
(forall r . (a -> r) -> r) ~ a
So if I have:
id :: forall a . a -> a
I know that it does not mean id ~ a^a, but it actually means:
id :: forall a . (() -> a) -> a
id ~ ()
~ 1
Similarly:
pair :: forall r . (a -> b -> r) -> r
pair ~ ((a, b) -> r) - > r
~ (a, b)
~ a * b
Which brings me to my question. What is the "algebraic" interpretation of this rule:
(forall r . (a -> r) -> r) ~ a
For every concrete type isomorphism I can point to an equivalent algebraic rule, such as:
(a, (b, c)) ~ ((a, b), c)
a * (b * c) = (a * b) * c
a -> (b -> c) ~ (a, b) -> c
(c^b)^a = c^(b * a)
But I don't understand the algebraic equality that is analogous to:
(forall r . (a -> r) -> r) ~ a
This is the famous Yoneda lemma for the identity functor.
Check this post for a readable introduction, and any category theory textbook for more.
Briefly, given f :: forall r. (a -> r) -> r you can apply f id to get an a, and conversely, given x :: a you can take ($x) to get forall r. (a -> r) -> r.
These operations are mutually inverse. Proof:
Obviously ($x) id == x. I will show that
($(f id)) == f,
since functions are equal when they are equal on all arguments, let's take x :: a -> r and show that
($(f id)) x == f x i.e.
x (f id) == f x.
Since f is polymorphic, it works as a natural transformation; this is the naturality diagram for f:
f_A
Hom(A, A) → A
(x.) ↓ ↓ x
Hom(A, R) → R
f_R
So x . f == f . (x.).
Plugging identity, (x . f) id == f x. QED
(Rewritten for clarity)
There seem to be two parts to your question. One is implied and is asking what the algebraic interpretation of forall is, and the other is asking about the cont/Yoneda transformation, which sdcvvc's answer already covered pretty well.
I'll try to address the algebraic interpretation of forall for you. You mention that A -> B is B^A but I'd like to take that a step further and expand it out to B * B * B * ... * B (|A| times). Although we do have exponentiation as a notation for repeated multiplication like that, there's a more flexible notation, ∏ (uppercase Pi) representing arbitrary indexed products. There are two components to a Pi: the range of values we want to multiply over, and the expression that we're multiplying out. For example, at the value level, you might express the factorial function as fact i = ∏ [1..i] (λx -> x).
Going back to the world of types, we can view the exponentiation operator in the A -> B ~ B^A correspondence as a Pi: B^A ~ ∏ A (λ_ -> B). This says that we're defining an A-ary product of Bs, such that the Bs cannot depend on the particular A we've chosen. Sure, it's equivalent to plain exponentiation, but it lets us move up to cases in which there is a dependence.
In the most general case, we get dependent types, like what you see in Agda or Coq: in Agda syntax, replicate : Bool -> ((n : Nat) -> Vec Bool n) is one possible application of a Pi type, which could be expressed more explicitly as replicate : Bool -> ∏ Nat (Vec Bool), or further as replicate : ∏ Bool (λ_ -> ∏ Nat (Vec Bool)).
Note that as you might expect from the underlying algebra, you can fuse both of the ∏s in the definition of replicate above into a single ∏ ranging over the cartesian product of the domains: ∏ Bool (\_ -> ∏ Nat (Vec Bool)) is equivalent to ∏ (Bool, Nat) (λ(_, n) -> Vec Bool n) just like it would be at the "value level". This is simply uncurrying from the perspective of type theory.
I do realize your question was about polymorphism, so I'll stop going on about dependent types, but they are relevant: forall in Haskell is roughly equivalent to a ∏ with a domain over the type (kind) of types, *. Indeed, the function-like behavior of polymorphism can be observed directly in GHC core, which types them as capital lambdas (Λ). As such, a polymorphic type like forall a. a -> a is actually just ∏ * (Λ a -> (a -> a)) (using the Λ notation now that we distinguish between types and values), which can be expanded out to the infinite product (Bool -> Bool, Int -> Int, () -> (), (Int -> Bool) -> (Int -> Bool), ...) for every possible type. Instantiation of the type variable is simply projecting out the suitable element from the *-ary product (or applying the type function).
Now, for the big piece I missed in my original version of this answer: parametricity. Parametricity can be described in several different ways, but none of the ones I know of (viewing types as relations, or (di)naturality in category theory) really has a very algebraic interpretation. For our purposes, though, it boils down to something fairly simple: you can't pattern-match on *. I know that GHC lets you do that at the type level with type families, but you can only cover a finite chunk of * when doing that, so there are necessarily always points at which your type family is undefined.
What this means, from the point of view of polymorphism, is that any type function F we write in ∏ * F must either be constant (i.e., completely ignore the type it was polymorphic over) or pass the type through unchanged. Thus, ∏ * (Λ _ -> B) is valid because it ignores its argument, and corresponds to forall a. B. The other case is something like ∏ * (Λ x -> Maybe x), which corresponds to forall a. Maybe a, which doesn't ignore the type argument, but only "passes it through". As such, a ∏ A that has an irrelevant domain A (such as when A = *) can be seen as more of an A-ary indexed intersection (picking the common elements across all instantiations of the index), rather than a product.
Crucially, at the value level, the rules of parametricity prevent any funny behavior that might suggest the types are larger than they really are. Because we don't have typecase, we can't construct a value of type forall a. B that does something different based on what a was instantiated to. Thus, although the type is technically a function * -> B, it is always a constant function, and is thus equivalent to a single value of B. Using the ∏ interpretation, it is indeed equivalent to an infinite *-ary product of Bs, but those B values must always be identical, so the infinite product is effectively as big as a single B.
Similarly, although ∏ * (Λ x -> (x -> x)) (a.k.a., forall a. a -> a) is technically equivalent to an infinite product of functions, none of those functions can inspect the type, so all are constrained to only return their input value and not do any funny business like (+1) : Int -> Int when instantiated to Int. Because there is only one (assuming a total language) function that can't inspect the type of its argument but must return a value of that same type, the infinite product is thus just as large as a single value.
Now, about your direct question on (forall r . (a -> r) -> r) ~ a. First, let's express your ~ operator more formally. It's really isomorphism, so we need two functions going back and forth, and an argument that they're inverses.
data Iso a b = Iso
{ to :: a -> b
, from :: b -> a
-- proof1 :: forall x. to (from x) == x
-- proof2 :: forall x. from (to x) == x
}
and now we express your original question in more formal terms. Your question amounts to constructing a term of the following (impredicative, so GHC has trouble with it, but we'll survive) type:
forall a. Iso (forall r. (a -> r) -> r) a
Which, using my earlier terminology, amounts to ∏ * (Λ a -> Iso (∏ * (Λ r -> ((a -> r) -> r))) a). Once again we have an infinite product that can't inspect its type argument. By handwaving, we can argue that the only possible values considering the parametricity rules (the other two proofs are respected automatically) for to and from are ($ id) and flip id.
If this feels unsatisfying, it's probably because the algebraic interpretation of forall didn't really add anything to the proof. It's really just plain old type theory, but I hope I was able to provide something that feels a little less categorical than the Yoneda form of it. It's worth noting that we don't actually need to use parametricity to write proof1 and proof2 above, though. Parametricity only enters the picture when we want to state that ($ id) and flip id are our only options for to and from (which we can't prove in Agda or Coq, for that reason).
To (attempt to) answer the actual question (which is less interesting than the answers to the broader issues raised), the question is ill formed because of a "type error"
Either ~ (+)
(,) ~ (*)
(->) b ~ flip (^)
() ~ 1
Void ~ 0
These all map types to integers, and type constructors to functions on naturals. In a sense, you have a functor from the category of types to the category of naturals. In the other direction, you "forget" stuff, since the types preserve algebraic structure while the naturals throw it away. I.e. given Either () () you can get a unique natural, but given that natural, you can get many types.
But this is different:
(forall r . (a -> r) -> r) ~ a
It maps a type to another type! It is not part of the above functor. It's just an isomorphism within the category of types. So let's give that a different symbol, <=>
Now we have
(forall r . (a -> r) -> r) <=> a
Now you note that we can not only send types to nats and arrows to arrows, but also some isomorphisms to other isomorphisms:
(a, (b, c)) <=> ((a, b), c) ~ a * (b * c) = (a * b) * c
But something subtle is going on here. In a sense, the latter isomorphism on pairs is true because the algebraic identity is true. This is to say that the "isomorphism" in the latter simply means that the two types are equivalent under the image of our functor to the nats.
The former isomorphism we need to prove directly, which is where we start to get to the underlying question -- is given our functor to the nats, what does forall r. map to? But the answer is that forall r. is neither a type, nor a meaningful arrow between types.
By introducing forall, we have moved away from first order types. There's no reason to expect that forall should fit in our above Functor, and indeed, it doesn't.
So we can explore, as others have above, why the isomorphism holds (which is itself very interesting) -- but in doing so we've abandoned the algebraic core of the question. A question which can be answered, I think, is, given the category of higher-order types and constructors as arrows between them, what is there meaningful Functor to?
Edit:
So now I have another approach which shows why adding polymorphism makes things go nuts. We start by asking a simpler question -- does a given polymorphic type have zero or more than zero inhabitants? This is the type inhabitation problem, and winds up being, via Curry-Howard, a problem in modified realizability, since it's the same thing as asking if a formula in some logic is realizable in an appropriate computational model. Now as that page explains, this is decidable in the simply typed lambda calculus but is PSPACE-complete. But once we move to anything more complicated, by adding polymorphism for example and going to System F, then it goes to undecidable!
So, if we can't decide if an arbitrary type is inhabited at all, then we clearly can't decide how many inhabitants it has!
It's an interesting question. I don't have a full answer, but this was too long for a comment.
The type signature (forall r. (a -> r) -> r) can be expressed as me saying
For any type r that you care to name, if you give me a function that takes a and produces an r, then I will give you back an r.
Now, this has to work for any type r, but it can be a specific type a. So the way for me to pull of this neat trick is to have an a sitting around somewhere, that I feed to the function (which produces an r for me) and then I hand that r back to you.
But if I have an a sitting around, I could give it to you:
If you give me a 1, I'll give you an a.
which corresponds to the type signature 1 -> a or simply a. By this informal argument we have
(forall r. (a -> r) -> r) ~ a
The next step would be to generate the corresponding algebraic expression, but I'm not clear on how the algebraic quantities interact with the universal quantification. We may need to wait for an expert!
A few links to the nLab:
Universal quantifier, corresponds to dependent product.
Existential quantifier, corresponds to dependent sum (dependent coproduct).
Thus, in settings of category theory:
Type | Modeled¹ as | In category
-------------------+---------------------------+-------------
Unit | Terminal object | CCC
Bottom | Initial object |
Record | Product |
Union | Sum (coproduct) |
Function | Exponential |
-------------------+---------------------------+-------------
Dependent product² | Right adjoint to pullback | LCCC
Dependent sum | Left adjoint to pullback |
¹) in appropriate category ─ CCC for total and non-polymorphic subset of Haskell (link), CPO for non-total traits of Haskell (link), LCCC for dependently typed languages.
²) forall quantification is a special case of dependent product:
∀(x :: *). y[x] ~ ∏(x : Set)y[x]
where Set is the universe of all small types.

Resources