Encoding ExistentialQuantification with RankNTypes - haskell

I've read in a few places claims that equivalent functionality to ExistentialQuantification can be had using RankNTypes. Could someone provide an example of why this is or is not possible?

Normally, all type variables in Haskell are implicitly universally quantified at the outermost scope of the type. RankNTypes allows a universal quantifier forall to appear nested, e.g. the type forall a b. (a -> a) -> b -> b is very different from forall b. (forall a. a -> a) -> b -> b.
There is a sense in which types on the left side of a function arrow are logically "negated", in roughly the same sense that (->) is logical implication. Logically, the universal and existential quantifiers are related by a De Morgan duality: (∃x. P(x)) is equivalent to ¬(∀x. ¬P(x)), or in other words "there exists an x such that P(x)" corresponds to "it is not the case that, for all x, P(x) is false".
So a forall on the left of a function arrow is "negated" and behaves like an existential. If you put the whole thing to the left of another function arrow it's double-negated and behaves as a universal quantifier again, modulo some fiddly details.
The same idea of negation applies to values as well, so to encode the type exists x. x we want:
forall x. in contravariant (negated) position
a value of that type x in covariant (positive) position.
Since the value must be inside the scope of the quantifier, our only choice is double-negation--a CPS transform, basically. To avoid restricting things otherwise, we'll then universally quantify over the type on the right of the function arrows. So exists x. x is translated to forall r. (forall x. x -> r) -> r. Compare the placement of the types and quantifiers here to the requirements above to verify that it meets the requirement.
In more operational terms this just means that given a function with the above type, because we give it a function with a universally quantified argument type, it can apply that function to any type x it likes, and since it has no other way of getting a value of type r we know it will apply that function to something. So x will refer to some type, but we don't know what--which is basically the essence of existential quantification.
In more practical, day to day terms, any universally quantified type variable can be regarded as existential if you're looking at it from the "other side" of a function type. Because the unification performed as part of type inference transcends quantifier scope, you can sometimes end up in a situation where GHC would have to unify a type variable in an outer scope with a quantified type from a nested scope, which is how you get compiler errors about escaping types and skolems and whatnot, the latter (I assume) being related to Skolem normal form.
The way this relates to data types using existentials is that while you can declare a type like this:
data Exists = forall a. Exists a
That represents an existential type, to get at the "existential type" you need to unwrap it by pattern matching:
unexist :: Exists -> r
unexist (Exists x) = foo x
But if you consider what the type of foo would have to be in this definition, you end up with something like forall a. a -> r, which is equivalent to the CPS-style encoding above. There's a close relationship between CPS transforms and Church encoding of data types, so the CPS form can also be seen as a reified version of the pattern match.
Finally, relating things back to logic--since that's where the term "existential quantifier" comes from--note that, if you think of being left of an arrow as negation and kinda squint to ignore the forall r. ... CPS cruft, these encodings of existential types are exactly the same thing as the De Morgan dualized form ¬(∀x. ¬P(x)) that was the starting point. So these really all are just different ways of looking at the same concept.

Something I've found useful to understand C. A. McCann's point that the left hand side of the function arrow is "negated" is to look at this from the point of view of game semantics. For example, here is a very simple game semantics for first-order logic (stated very informally):
There are two players, #1 and #2
There are two roles: Proponent and Opponent.
Player #1 is the initial Proponent, Player #2 the initial Opponent.
For a proposition ∀x.A, Opponent chooses a t and we play A[x := t] (A with all free instances of x replaced with t).
For a proposition A && B, Opponent chooses to oppose one of the conjuncts.
For a proposition not A, the players switch roles and play A.
For an atomic proposition, we consult its truth value; Proponent wins iff it's true, Opponent wins iff it's false.
The point of games like these is that a proposition is logically valid iff Player #1 (the initial Proponent) has a winning strategy for the corresponding game, i.e., if Player #1 has a winning move no matter what choices Player #2 makes.
Note the role-switch in the negation rule. The effect of that rule should be intuitive.
Because of the propositions-as-types correspondence this can be translated to type theory. Let's reformulate the game in terms of types formed in terms of some set of atoms, ∀, type variables, →, ⊤ and ⊥:
Two players; but we'll call them Runner and Crasher
Runner is initial Proponent, Crasher is initial Opponent
∀x.a: Opponent picks an atom P, and we play a[x := P] (a with all free instances of x replaced by P]
a → b: Proponent chooses whether to oppose a (role switch) or propose b.
Atomic type: Proponent wins iff type is inhabited, Opponent wins iff it's uninhabited. (I assume we've determined that beforehand for all atoms; also that ⊤ is inhabited and ⊥ is not.)
Note that in the rule for a → b, if Proponent picks a then the roles switch: the initial Proponent must become the Opponent for a. This corresponds to the fact that the left hand side of the arrow is "negated". (The idea is that ⊥ → b is inhabited no matter what b may be, so if the Proponent of a → b has a winning strategy to oppose a, then a → b must be inhabited, so its Proponent wins.)
Again, the idea is that a type is inhabited iff Runner has a winning strategy, and uninhabited iff Crasher has a winning strategy.
Example game: ∀r. (∀a. (a → Int) → a → r) → r
Opponent (Crasher) picks r := ⊥: (∀a. (a → Int) → a → ⊥) → ⊥
Proponent (Runner) chooses to oppose the antecedent: ∀a. (a → Int) → a → ⊥
Opponent (Runner) picks a := String: (String → Int) → String → ⊥
Proponent (Crasher) chooses to oppose the antecendent: String → Int
Proponent (Runner) chooses to propose the consequent: Int
Int is inhabited; Proponent (Runner) wins.
Suppose we added a rule for ∃. The rule would have to be like this:
∃x.a: Proponent picks x := P, and we play a[x := P]
Now, we can use this to analyze McCann's assertion that ∀r. (∀a. a → r) → r is equivalent to ∃a.a by analyzing corresponding game trees. I'm not going to do show the whole game tree (or carefully prove the equivalence, really), but I'll show two illustrative games:
First game: ∃a.a
Proponent (Runner) picks a := ⊤.
⊤ is inhabited; Proponent (Runner) wins.
Second game: ∀r. (∀a. a → r) → r
Opponent (Crasher) picks r := ⊥: (∀a. a → ⊥) → ⊥
Proponent (Runner) chooses to oppose the antecedent: ∀a. a → ⊥
Opponent (Runner) chooses a := ⊤: ⊤ → ⊥
Proponent (Crasher) loses because they must choose between opposing ⊤ or proposing ⊥.
Now, the informal observation I'm going to make here is that in both games Runner is the one who gets to pick the type to use for a. The equivalence between ∃a.a and ∀r. (∀a. a → r) → r really comes down to this: there is no strategy that allows Crasher to be the one who picks for a.

Related

If Either can be either Left or Right but not both, then why does it correspond to OR instead of XOR in Curry-Howard correspondence?

When I asked this question, one of the answers, now deleted, was suggesting that the type Either corresponds to XOR, rather than OR, in the Curry-Howard correspondence, because it cannot be Left and Right at the same time.
Where is the truth?
If you have a value of type P and a value of type Q (that is, you have both a proof of P and a proof of Q), then you are still able to provide a value of type Either P Q.
Consider
x :: P
y :: Q
...
z :: Either P Q
z = Left x -- Another possible proof would be `Right y`
While Either does not have a specific case that explicitly represents this situation (unlike These), it does not do anything to exclude it (as in exclusive OR).
This third case where both have proofs is a bit different than the other two cases where only one has a proof, which reflects the fact that "not excluding" something is a bit different than "including" something in intuitionistic logic, since Either does not provide a particular witness for this fact. However Either is not an XOR in the way that XOR would typically work since, as I said, it does not exclude the case where both parts have proofs. What Daniel Wagner proposes in this answer, on the other hand, is much closer to an XOR.
Either is kind of like an exclusive OR in terms of what its possible witnesses are. On the other hand, it is like an inclusive OR when you consider whether or not you can actually create a witness in four possible scenarios: having a proof of P and a refutation of Q, having a proof of Q and a refutation of P, having a proof of both or having a refutation of both.[1] While you can construct a value of type Either P Q when you have a proof of both P and Q (similar to an inclusive OR), you cannot distinguish this situation from the situation where only P has a proof or only Q has a proof using only a value of type Either P Q (kind of similar to an exclusive OR). Daniel Wagner's solution, on the other hand, is similar to exclusive OR on both construction and deconstruction.
It is also worth mentioning that These more explicitly represents the possibility of both having proofs. These is similar to inclusive OR on both construction and deconstruction. However, it's also worth noting that there is nothing preventing you from using an "incorrect" constructor when you have a proof of both P and Q. You could extend These to be even more representative of an inclusive OR in this regard with a bit of additional complexity:
data IOR a b
= OnlyFirst a (Not b)
| OnlySecond (Not a) b
| Both a b
type Not a = a -> Void
The potential "wrong constructor" issue of These (and the lack of a "both" witness in Either) doesn't really matter if you are only interested in a proof irrelevant logical system (meaning that there is no way to distinguish between any two proofs of the same proposition), but it might matter in cases where you want more computational relevance in the logic.[2]
In the practical situation of writing computer programs that are actually meant to be executed, computational relevance is often extremely important. Even though 0 and 23 are both proofs that the Int type is inhabited, we certainly like to distinguish between the two values in programs, in general!
Regarding "construction" and "destruction"
Essentially, I just mean "creating values of a type" by construction and "pattern matching" by destruction (sometimes people use the words "introduction" and "elimination" here, particularly in the context of logic).
In the case of Daniel Wagner's solutions:
Construction: When you construct a value of type Xor A B, you must provide a proof of exactly one of A or B and a refutation of the other one. This is similar to exclusive or. It is not possible to construct a value of this unless you have a refutation of either A or B and a proof of the other one. A particularly significant fact is that you cannot construct a value of this type if you have a proof of both A and B and you don't have a refutation of either of them (unlike inclusive OR).
Destruction: When you pattern match on a value of type Xor A B, you always have a proof of one of the types and a refutation of the other. It will never give you a proof of both of them. This follows from its definition.
In the case of IOR:
Construction: When you create a value of type IOR A B, you must do exactly one of the following: (1) provide only a proof of A and a refutation of B, (2) provide a proof of B and a refutation of B, (3) provide a proof of both A and B. This is like inclusive OR. These three possibilities correspond exactly to each of the three constructors of IOR, with no overlap. Note that, unlike the situation with These, you cannot use the "incorrect constructor" in the case where you have a proof of both A and B: the only way to make a value of type IOR A B in this case is to use Both (since you would otherwise need to provide a refutation of either A or B).
Destruction: Since the three possible situations where you have a proof of at least one of A and B are exactly represented by IOR, with a separate constructor for each (and no overlap between the constructors), you will always know exactly which of A and B are true and which is false (if applicable) by pattern matching on it.
Pattern matching on IOR
Pattern matching on IOR works exactly like pattern matching on any other algebraic datatype. Here is an example:
x :: IOR Char Int
x = Both 'c' 3
y :: IOR Char Void
y = OnlyFirst 'a' (\v -> v)
f :: Not p -> IOR p Int
f np = OnlySecond np 7
z :: IOR Void Int
z = f notVoid
g :: IOR p Int -> Int
g w =
case w of
OnlyFirst p q -> -1
OnlySecond p q -> q
Both p q -> q
-- We can show that the proposition represented by "Void" is indeed false:
notVoid :: Not Void
notVoid = \v -> v
Then a sample GHCi session, with the above code loaded:
ghci> g x
3
ghci> g z
7
[1]This gets a bit more complex when you consider that some statements are undecidable and therefore you cannot construct a proof or a refutation for them.
[2]Homotopy type theory would be one example of a proof relevant system, but this is reaching the limit of my knowledge as of now.
The confusion stems from the Boolean truth-table exposition of logic. In particular, when both arguments are True, OR is True, whereas XOR is False. Logically it means that to prove OR it's enough to provide the proof of one of the arguments; but it's okay if the other one is True as well--we just don't care.
In Curry-Howard interpretation, if somebody gives you an element of Either a b, and you were able to extract the value of a from it, you still know nothing about b. It could be inhabited or not.
On the other hand, to prove XOR, you not only need the proof of one argument, you must also provide the proof of the falsehood of the other argument.
So, with Curry-Howard interpretation, if somebody gives you an element of Xor a b and you were able to extract the value of a from it, you would conclude that b is uninhabited (that is, isomorphic to Void). Conversely, if you were able to extract the value of b, then you'd know that a was uninhabited.
The proof of falsehood of a is a function a->Void. Such a function would be able to produce a value of Void, given a value of a, which is clearly impossible. So there can be no values of a. (There is only one function that returns Void, and that's the identity on Void.)
Perhaps try replacing “proof” in the Curry-Howard isomorphism with “evidence”.
Below I will use italics for propositions and proofs (which I will also call evidence), the mathematical side of the isomorphism, and I will use code for types and values.
The question is: suppose I know the type for [values corresponding to] evidence that P is true (I will call this type P), and I know the type for evidence that Q is true (I call this type Q), then what is the type for evidence of the proposition R = P OR Q?
Well there are two ways to prove R: we can prove P, or we can prove Q. We could prove both but that would be more work than necessary.
Now ask what the type should be? It is the type for things which are either evidence of P or evidence of Q. I.e. values which are either things of type P or things of type Q. The type Either P Q contains precisely those values.
What if you have evidence of P AND Q? Well this is just a value of type (P, Q), and we can write a simple function:
f :: (p,q) -> Either p q
f (a,b) = Left a
And this gives us a way to prove P OR Q if we can prove P AND Q. Therefore Either cannot correspond to xor.
What is the type for P XOR Q?
At this point I will say that negations are a bit annoying in this sort of constructive logic.
Let’s convert the question to things we understand, and a simpler thing we don’t:
P XOR Q = (P AND (NOT Q)) OR (Q AND (NOT P))
Ask now: what is the type for evidence of NOT P?
I don’t have an intuitive explanation for why this is the simplest type but if NOT P were true then evidence of P being true would be a contradiction, which we say as proving FALSE, the unprovable thing (aka BOTTOM or BOT). That is, NOT P may be written in simpler terms as: P IMPLIES FALSE. The type for FALSE is called Void (in haskell). It is a type which no values inhabit because there are no proofs of it. Therefore if you could construct a value of that type you would have problems. IMPLIES corresponds to functions and so the type corresponding to NOT P is P -> Void.
We put this with what we know and get the following equivalence in the language of propositions:
P XOR Q = (P AND (NOT Q)) OR (Q AND (NOT P)) = (P AND (Q IMPLIES FALSE)) OR ((P IMPLIES FALSE) AND Q)
The type is then:
type Xor p q = Either (p, q -> Void) (p -> Void, q)

How are tail-position contexts GHC join points paper formed?

Compiling without Continuations describes a way to extend ANF System F with join points. GHC itself has join points in Core (an intermediate representation) rather than exposing join points directly in the surface language (Haskell). Out of curiosity, I started trying to write a language that simply extends System F with join points. That is, the join points are user facing. However, there's something about the typing rules in the paper that I don't understand. Here are the parts that I do understand:
There are two environments, one for ordinary values/functions and one that only has join points.
The rational for ∆ being ε in several of the rules. In the expression let x:σ = u in ..., u cannot reference any join points (VBIND) because it join points cannot return to arbitrary locations.
The strange typing rule for JBIND. The paper does a good job explaining this.
Here's what I don't get. The paper introduces a notation that I will call the "overhead arrow", but the paper itself does not explicitly give it a name or mention it. Visually, it looks like a arrow pointing to the right, and it goes above an expression. Roughly, this seems to indicate a "tail context" (the paper does use this term). In the paper, these overhead arrows can be applied to terms, types, data constructors, and even environments. They can be nested as well. Here's the main difficulty I'm having. There are several rules with premises that include type environments under an overhead arrow. JUMP, CASE, RVBIND, and RJBIND all include premises with such a type environment (Figure 2 in the paper). However, none of the typing rules have a conclusion where the type environment is under an overhead arrow. So, I cannot see how JUMP, CASE, etc. can ever be used since the premises cannot be derived by any of the other rules.
That's the question, but if anyone has any supplementary material that provides more context are the overhead arrow convention or if anyone is aware an implementation of the System-F-with-join-points type system (other than in GHC's IR), that would be helpful too.
In this paper, x⃗ means “A sequence of x, separated by appropriate delimiters”.
A few examples:
If x is a variable, λx⃗. e is an abbreviation for λx1. λx2. … λxn e. In other words, many nested 1-argument lambdas, or a many-argument lambda.
If σ and τ are types, σ⃗ → τ is an abbreviation for σ1 → σ2 → … → σn → τ. In other words, a function type with many parameter types.
If a is a type variable and σ is a type, ∀a⃗. σ is an abbreviation for ∀a1. ∀a2. … ∀an. σ. In other words, many nested polymorphic functions, or a polymorphic function with many type parameters.
In Figure 1 of the paper, the syntax of a jump expression is defined as:
e, u, v ⩴ … | jump j ϕ⃗ e⃗ τ
If this declaration were translated into a Haskell data type, it might look like this:
data Term
-- | A jump expression has a label that it jumps to, a list of type argument
-- applications, a list of term argument applications, and the return type
-- of the overall `jump`-expression.
= Jump LabelVar [Type] [Term] Type
| ... -- Other syntactic forms.
That is, a data constructor that takes a label variable j, a sequence of type arguments ϕ⃗, a sequence of term arguments e⃗, and a return type τ.
“Zipping” things together:
Sometimes, multiple uses of the overhead arrow place an implicit constraint that their sequences have the same length. One place that this occurs is with substitutions.
{ϕ/⃗a} means “replace a1 with ϕ1, replace a2 with ϕ2, …, replace an with ϕn”, implicitly asserting that both a⃗ and ϕ⃗ have the same length, n.
Worked example: the JUMP rule:
The JUMP rule is interesting because it provides several uses of sequencing, and even a sequence of premises. Here’s the rule again:
(j : ∀a⃗. σ⃗ → ∀r. r) ∈ Δ
(Γ; ε ⊢⃗ u : σ {ϕ/⃗a})
Γ; Δ ⊢ jump j ϕ⃗ u⃗ τ : τ
The first premise should be fairly straightforward, now: lookup j in the label context Δ, and check that the type of j starts with a bunch of ∀s, followed by a bunch of function types, ending with a ∀r. r.
The second “premise” is actually a sequence of premises. What is it looping over? So far, the sequences we have in scope are ϕ⃗, σ⃗, a⃗, and u⃗.
ϕ⃗ and a⃗ are used in a nested sequence, so probably not those two.
On the other hand, u⃗ and σ⃗ seem quite plausible if you consider what they mean.
σ⃗ is the list of argument types expected by the label j, and u⃗ is the list of argument terms provided to the label j, and it makes sense that you might want to iterate over argument types and argument terms together.
So this “premise” actually means something like this:
for each pair of σ and u:
Γ; ε ⊢ u : σ {ϕ/⃗a}
Pseudo-Haskell implementation
Finally, here’s a somewhat-complete code sample illustrating what this typing rule might look like in an actual implementation. x⃗ is implemented as a list of x values, and some monad M is used to signal failure when a premise is not satisfied.
data LabelVar
data Type
= ...
data Term
= Jump LabelVar [Type] [Term] Type
| ...
typecheck :: TermContext -> LabelContext -> Term -> M Type
typecheck gamma delta (Jump j phis us tau) = do
-- Look up `j` in the label context. If it's not there, throw an error.
typeOfJ <- lookupLabel j delta
-- Check that the type of `j` has the right shape: a bunch of `foralls`,
-- followed by a bunch of function types, ending with `forall r.r`. If it
-- has the correct shape, split it into a list of `a`s, a list of `\sigma`s
-- and the return type, `forall r.r`.
(as, sigmas, ret) <- splitLabelType typeOfJ
-- exactZip is a helper function that "zips" two sequences together.
-- If the sequences have the same length, it produces a list of pairs of
-- corresponding elements. If not, it raises an error.
for each (u, sigma) in exactZip (us, sigmas):
-- Type-check the argument `u` in a context without any tail calls,
-- and assert that its type has the correct form.
sigma' <- typecheck gamma emptyLabelContext u
-- let subst = { \sequence{\phi / a} }
subst <- exactZip as phis
assert (applySubst subst sigma == sigma')
-- After all the premises have been satisfied, the type of the `jump`
-- expression is just its return type.
return tau
-- Other syntactic forms
typecheck gamma delta u = ...
-- Auxiliary definitions
type M = ...
instance Monad M
lookupLabel :: LabelVar -> LabelContext -> M Type
splitLabelType :: Type -> M ([TypeVar], [Type], Type)
exactZip :: [a] -> [b] -> M [(a, b)]
applySubst :: [(TypeVar, Type)] -> Type -> Type
As far as I know SPJ’s style for notation, and this does align with what I see in the paper, it simply means “0 or more”. E.g. you can replace \overarrow{a} with a_1, …, a_n, n >= 0.
It may be “1 or more” in some cases, but it shouldn’t be hard to figure one which one of the two.

How to think about the lack of laws

I am a mathematician who works a lot with category theory, and I've been using Haskell for a while to perform certain computations etc., but I am definitely not a programmer. I really love Haskell and want to become much more fluent in it, and the type system is something that I find especially great to have in place when writing programs.
However, I've recently been trying to implement category theoretic things, and am running into problems concerning the fact that you seemingly can't have class method laws in Haskell. In case my terminology here is wrong, what I mean is that I can write
class Monoid c where
id :: c -> c
m :: c -> c -> c
but I can't write some law along the lines of
m (m x y) z == m x $ m y z
From what I gather, this is due to the lack of dependent types in Haskell, but I'm not sure how exactly this is the case (having now read a bit about dependent types). It also seems that the convention is just to include laws like this in comments and hope that you don't accidentally cook up some instance that doesn't satisfy them.
How should I change my approach to Haskell to deal with this problem? Is there a nice mathematical/type-theoretic solution (for example, require the existence of an associator that is an isomorphism (though then the question is, how do we encode isomorphisms without a law?)); is there some 'hack' (using extensions such as DataKinds); should I be drastic and switch to using something like Idris instead; or is the best response to just change the way I think about using Haskell (i.e. accept that these laws can't be implemented in a Haskelly way)?
(bonus) How exactly does the lack of laws come from not supporting dependent types?
You want to require that:
m (m x y) z = m x (m y z) -- (1)
But to require this you need a way to check it. So you, or your compiler (or proof assistant), need to construct a proof of this. And the question is, what type is a proof of (1)?
One could imagine some Proof type but then maybe you could just construct a proof that 0 = 0 instead of a proof of (1) and both would have type Proof. So you’d need a more general type. I can’t decide how to break up the rest of the question so I’ll go for a super brief explanation of the Curry-Howard isomorphism followed by an explanation of how to prove two things are equal and then how dependent types are relevant.
The Curry-Howard isomorphism says that propositions are isomorphic to types and proofs are isomorphic to programs: a type corresponds to a proposition and a proof of that proposition corresponds to a program constructing a value inhabiting that type. Ignoring how many propositions might be expressed as types, an example would be that the type A * B (written (A, B) in Haskell) corresponds to the proposition “A and B,” while the type A + B (written Either A B in Haskell) corresponds to the proposition “A or B.” Finally the type A -> B corresponds to “A implies B,” as a proof of this is a program which takes evidence of A and gives you evidence of B. One should note that there isn’t a way to express not A but one could imagine adding a type Not A with builtins of type Either a (Not a) for the law of the excluded middle as well as Not (Not a) -> a, and a * Not a -> Void (where Void is a type which cannot be inhabited and therefore corresponds to false), but then one can’t really run these programs to get constructivist proofs.
Now we will ignore some realities of Haskell and imagine that there aren’t ways round these rules (in particular undefined :: a says everything is true, and unsafeCoerce :: a -> b says that anything implies anything else, or just other functions that don’t return where their existence does not imply the corresponding proof).
So we know how to combine propositions but what might a proposition be? Well one could be to say that two types are equal. In Haskell this corresponds to the GADT
data Eq a b where Refl :: Eq c c
Where this constructor corresponds to the reflexive property of equality.
[side note: if you’re still interested so far, you may be interested to look up Voevodsky’s univalent foundations, depending on how much the idea of “Homotopy type theory” interests you]
So can we prove something now? How about the transitive property of equality:
trans :: Eq a b -> Eq b c -> Eq a c
trans x y =
case x of
Refl -> -- by this match being successful, the compiler now knows that a = b
case y of
Refl -> -- and now b = c and so the compiler knows a = c
Refl -- the compiler knows that this is of type Eq d d, and as it knows a = c, this typechecks as Eq a c
This feels like one hasn’t really proven anything (especially as this mainly relies on the compiler knowing the transitive and symmetric properties), but one gets a similar feeling when proving simple things in logic as well.
So now how might you prove the original proposition (1)? Well let’s imagine we want a type c to be a monoid then we should also prove that $\forall x,y,z:c, m (m x y) z = m x (m y z).$ So we need a way to express m (m x y) z as a type. Strictly speaking this isn’t dependent types (this can be done with DataKinds to promote values and type families instead of functions). But you do need dependent types to have types depend on values. Specifically if you have a type Nat of natural numbers and a type family Vec :: Nat -> * (* is the kind (read type) of all types) of fixed length vectors, you could define a dependently typed function mkVec :: (n::Nat) -> Vec n. Observe how the type of the output depends on the value of the input.
So your law needs to have functions promoted to type level (skipping the questions about how one defines type equality and value equality), as well as dependent types (made up syntax):
class Monoid c where
e :: c
(*) :: c -> c -> c
idl :: (x::c) -> Eq x (e * x)
idr :: (x::c) -> Eq x (x * e)
assoc :: (x::c) -> (y::c) -> (z::c) -> Eq ((x * y) * z) (x * (y * z))
Observe how types tend to become large with dependent types and proofs. In a language missing typeclasses one could put such values into a record.
Final note on the theory of dependent types and how these correspond to the curry Howard isomorphism.
Dependent types can be considered an answer to the question: what types correspond to the propositions $\forall x\in S\quad P(x)$ and $\exists y\in T\quad Q(y)?$
The answer is that you create new ways to make types: the dependent product and the dependent sum (coproduct). The dependent product expresses “for all values $x$ of type $S,$ there is a value of type $P(x).$” A normal product would be a dependent product with $S=2,$ a type inhabited by two values. A dependent product might be written (x:T) -> P x. A dependent sum says “some value $y$ of type $T$, paired with a value of type $Q(y).$” this might be written (y:T) * Q y.
One can think of these as a generalisation of arbitrarily indexed (co)products from Set to general categories, where one might sensibly write e.g. $\prod_\Lambda X(\lambda),$ and sometimes such notation is used in type theory.

forall as an intersection over those sets

I have been reading the existential section on Wikibooks and this is what is stated there:
Firstly, forall really does mean 'for all'. One way of thinking about
types is as sets of values with that type, for example, Bool is the
set {True, False, ⊥} (remember that bottom, ⊥, is a member of every
type!), Integer is the set of integers (and bottom), String is the set
of all possible strings (and bottom), and so on. forall serves as an
intersection over those sets. For example, forall a. a is the intersection over all types, which must be {⊥}, that is, the type (i.e. set) whose only value (i.e. element) is bottom.
How does forall serve as an intersection over those sets ?
forall in formal logic means that it can be any value from the universe of discourse. How does in Haskell it gets translated to intersection ?
Haskell's forall-s can be viewed as restricted dependent function types, which I think is the conceptually most enlightening approach and also most amenable to set-theoretic or logical interpretations.
In a dependent language one can bind the values of arguments in function types, and mention those values in the return types.
-- Idris
id : (a : Type) -> (a -> a)
id _ x = x
-- Can also leave arguments implicit (to be inferred)
id : a -> a
id x = x
-- Generally, an Idris function type has the form "(x : A) -> F x"
-- where A is a type (or kind/sort, or any level really) and F is
-- a function of type "A -> Type"
-- Haskell
id :: forall (a : *). (a -> a)
id x = x
The crucial difference is that Haskell can only bind types, lifted kinds, and type constructors, using forall, while dependent languages can bind anything.
In the literature dependent functions are called dependent products. Why call them that, when they are, well, functions? It turns out that we can implement Haskell's algebraic product types using only dependent functions.
Generally, any function a -> b can be viewed as a lookup function for some product, where the keys have type a and the elements have type b. Bool -> Int can be interpreted as a pair of Int-s. This interpretation is not very interesting for non-dependent functions, since all the product fields must be of the same type. With dependent functions, our pair can be properly polymorphic:
Pair : Type -> Type -> Type
Pair a b = (index : Bool) -> (if index then a else b)
fst : Pair a b -> a
fst pair = pair True
snd : Pair a b -> b
snd pair = pair False
setFst : c -> Pair a b -> Pair c b
setFst c pair = \index -> if index then c else pair False
setSnd : c -> Pair a b -> Pair a c
setSnd c pair = \index -> if index then pair True else c
We have recovered all the essential functionality of pairs here. Also, using Pair we can build up products of arbitrary arity.
So, how does is tie in to the interpretation of forall-s? Well, we can interpret ordinary products and build up some intuition for them, and then try to transfer that to forall-s.
So, let's look a bit first at the algebra of ordinary products. Algebraic types are called algebraic because we can determine the number of their values by algebra. Link to detailed explanation. If A has |A| number of values and B has |B| number of values, then Pair A B has |A| * |B| number of possible values. With sum types we sum the number of inhabitants. Since A -> B can be viewed as a product with |A| fields, all having type B, the number of the inhabitants of A -> B is |A| number of |B|-s multiplied together, which equals |B|^|A|. Hence the name "exponential type" that is sometimes used to denote functions. When the function is dependent, we fall back to the "product over some number of different types" interpretation, since the exponential formula no longer fits.
Armed with this understanding, we can interpret forall (a :: *). t as a product type with indices of type * and fields having type t, where a might be mentioned inside t, and thus the field types may vary depending on the choice of a. We can look up the fields by making Haskell infer some particular type for the forall, effectively applying the function to the type argument.
Note that this product has as many fields as many values of indices there are, which is pretty much infinite here, considering the potential number of Haskell types.
You have to view types in either negative or positive context—i.e. either in the process of construction or the process of use (have/receive and this is all probably best understood in Game Semantics, but I am not familiar with them).
If I "give you" a type forall a . a then you know I must have constructed it somehow. The only way for a particular constructed value to have the type forall a . a is that it could be a stand-in "for all" types in the universe of discourse—which is, of course, the intersection of their functionality. In sane languages no such value exists (Void), but in Haskell we have bottom.
bottom :: forall a . a
bottom = let a = a in a
On the other hand, if I somehow magically have a value of forall a . a and I attempt to use it then we get the opposite effect—I can treat it as anything in the union of all types in the universe of discourse (which is what you were looking for) and thus I have
absurd :: (forall a . a) -> b
absurd a = a
How does forall serve as an intersection over those sets ?
Here you may benefit from starting to read a bit about the Curry-Howard correspondence. To make a long story short, you can think of a type as a logical proposition, language expressions as proofs of their types, and values as normal form proofs (proofs that cannot be simplified any further). So for example, "Hello world!" :: String would be read as ""Hello world!" is a proof of the proposition String."
So now think of forall a. a as a proposition. Intuitively, think of this as a second-order quantified statement over a propositional variable: "For all statements a, a." It's basically asserting all propositions. This means that if x is a proof of forall a. a, then for any proposition P, x is also a proof of P. So, since the proofs of forall a. a are the proofs that prove any propositions, then it must follow that the proofs of forall a. a must be the same as what you'd get if you mapped each proposition to the set of its proofs and took their intersection. And the only normal-form proof (i.e. "value") that is common to all those sets is bottom.
Another informal way to look at it is that universal quantification is like an infinite conjunction (∀x.P(x) is like P(c0) ∧ P(c1) ∧ ...). Conjunction, seen from a model-theoretical view, is set intersection; the set of evaluation environments where A ∧ B is true is the intersection of the environments where A is true and the ones where B is true.

What are type quantifiers?

Many statically typed languages have parametric polymorphism. For example in C# one can define:
T Foo<T>(T x){ return x; }
In a call site you can do:
int y = Foo<int>(3);
These types are also sometimes written like this:
Foo :: forall T. T -> T
I have heard people say "forall is like lambda-abstraction at the type level". So Foo is a function that takes a type (for example int), and produces a value (for example a function of type int -> int). Many languages infer the type parameter, so that you can write Foo(3) instead of Foo<int>(3).
Suppose we have an object f of type forall T. T -> T. What we can do with this object is first pass it a type Q by writing f<Q>. Then we get back a value with type Q -> Q. However, certain f's are invalid. For example this f:
f<int> = (x => x+1)
f<T> = (x => x)
So if we "call" f<int> then we get back a value with type int -> int, and in general if we "call" f<Q> then we get back a value with type Q -> Q, so that's good. However, it is generally understood that this f is not a valid thing of type forall T. T -> T, because it does something different depending on which type you pass it. The idea of forall is that this is explicitly not allowed. Also, if forall is lambda for the type level, then what is exists? (i.e. existential quantification). For these reasons it seems that forall and exists are not really "lambda at the type level". But then what are they? I realize this question is rather vague, but can somebody clear this up for me?
A possible explanation is the following:
If we look at logic, quantifiers and lambda are two different things. An example of a quantified expression is:
forall n in Integers: P(n)
So there are two parts to forall: a set to quantify over (e.g. Integers), and a predicate (e.g. P). Forall can be viewed as a higher order function:
forall n in Integers: P(n) == forall(Integers,P)
With type:
forall :: Set<T> -> (T -> bool) -> bool
Exists has the same type. Forall is like an infinite conjunction, where S[n] is the n-th elemen to of the set S:
forall(S,P) = P(S[0]) ∧ P(S[1]) ∧ P(S[2]) ...
Exists is like an infinite disjunction:
exists(S,P) = P(S[0]) ∨ P(S[1]) ∨ P(S[2]) ...
If we do an analogy with types, we could say that the type analogue of ∧ is computing the intersection type ∩, and the type analogue of ∨ computing the union type ∪. We could then define forall and exists on types as follows:
forall(S,P) = P(S[0]) ∩ P(S[1]) ∩ P(S[2]) ...
exists(S,P) = P(S[0]) ∪ P(S[1]) ∪ P(S[2]) ...
So forall is an infinite intersection, and exists is an infinite union. Their types would be:
forall, exists :: Set<T> -> (T -> Type) -> Type
For example the type of the polymorphic identity function. Here Types is the set of all types, and -> is the type constructor for functions and => is lambda abstraction:
forall(Types, t => (t -> t))
Now a thing of type forall T:Type. T -> T is a value, not a function from types to values. It is a value whose type is the intersection of all types T -> T where T ranges over all types. When we use such a value, we do not have to apply it to a type. Instead, we use a subtype judgement:
id :: forall T:Type. T -> T
id = (x => x)
id2 = id :: int -> int
This downcasts id to have type int -> int. This is valid because int -> int also appears in the infinite intersection.
This works out nicely I think, and it clearly explains what forall is and how it is different from lambda, but this model is incompatible with what I have seen in languages like ML, F#, C#, etc. For example in F# you do id<int> to get the identity function on ints, which does not make sense in this model: id is a function on values, not a function on types that returns a function on values.
Can somebody with knowledge of type theory explain what exactly are forall and exists? And to what extent is it true that "forall is lambda at the type level"?
Let me address your questions separately.
Calling forall "a lambda at the type level" is inaccurate for two reasons. First, it is the type of a lambda, not the lambda itself. Second, that lambda lives on the term level, even though it abstracts over types (lambdas on the type level exist as well, they provide what is often called generic types).
Universal quantification does not necessarily imply "same behaviour" for all instantiations. That is a particular property called "parametricity" that may or may not be present. The plain polymorphic lambda calculus is parametric, because you simply cannot express any non-parametric behaviour. But if you add constructs like typecase (a.k.a. intensional type analysis) or checked casts as a weaker form of that, then you loose parametricity. Parametricity implies nice properties, e.g. it allows a language to be implemented without any runtime representation of types. And it induces very strong reasoning principles, see e.g. Wadler's paper "Theorems for free!". But it's a trade-off, sometimes you want dispatch on types.
Existential types essentially denote pairs of a type (the so-called witness) and a term, sometimes called packages. One common way to view these is as implementation of abstract data types. Here is a simple example:
pack (Int, (λx. x, λx. x)) : ∃ T. (Int → T) × (T → Int)
This is a simple ADT whose representation is Int and that only provides two operations (as a nested tuple), for converting ints in and out of the abstract type T. This is the basis of type theories for modules, for example.
In summary, universal quantification provides client-side data abstraction, while existential types dually provides implementor-side data abstraction.
As an additional remark, in the so-called lambda cube, forall and arrow are generalised to the unified notion of Π-type (where T1→T2 = Π(x:T1).T2 and ∀A.T = Π(A:&ast;).T) and likewise exists and tupling can be generalised to Σ-types (where T1×T2 = Σ(x:T1).T2 and ∃A.T = Σ(A:&ast;).T). Here, the type &ast; is the "type of types".
A few remarks to complement the two already-excellent answers.
First, one cannot say that forall is lambda at the type-level because there already is a notion of lambda at the type level, and it is different from forall. It appears in system F_omega, an extension of System F with type-level computation, that is useful to explain ML modules systems for example (F-ing modules, by Andreas Rossberg, Claudio Russo and Derek Dreyer, 2010).
In (a syntax for) System F_omega you can write for example:
type prod =
lambda (a : *). lambda (b : *).
forall (c : *). (a -> b -> c) -> c
This is a definition of the "type constructor" prod, such as prod a b is the type of the church-encoding of the product type (a, b). If there is computation at the type level, then you need to control it if you want to ensure termination of type-checking (otherwise you could define the type (lambda t. t t) (lambda t. t t). This is done by using a "type system at the type level", or a kind system. prod would be of kind * -> * -> *. Only the types at kind * can be inhabited by values, types at higher-kind can only be applied at the type level. lambda (c : k) . .... is a type-level abstraction that cannot be the type of a value, and may live at any kind of the form k -> ..., while forall (c : k) . .... classify values that are polymorphic in some type c : k and is necessarily of ground kind *.
Second, there is an important difference between the forall of System F and the Pi-types of Martin-Löf type theory. In System F, polymorphic values do the same thing on all types. As a first approximation, you could say that a value of type forall a . a -> a will (implicitly) take a type t as input and return a value of type t -> t. But that suggest that there may be some computation happening in the process, which is not the case. Morally, when you instantiate a value of type forall a. a -> a into a value of type t -> t, the value does not change. There are three (related) ways to think about it:
System F quantification has type erasure, you can forget about the types and you will still know what the dynamic semantic of the program is. When we use ML type inference to leave the polymorphism abstraction and instantiation implicit in our programs, we don't really let the inference engine "fill holes in our program", if you think of "program" as the dynamic object that will be run and compute.
A forall a . foo is not a something that "produces an instance of foo for each type a, but a single type foo that is "generic in an unknown type a".
You can explain universal quantification as an infinite conjunction, but there is an uniformity condition that all conjuncts have the same structure, and in particular that their proofs are all alike.
By contrast, Pi-types in Martin-Löf type theory are really more like function types that take something and return something. That's one of the reason why they can easily be used not only to depend on types, but also to depend on terms (dependent types).
This has very important implications once you're concerned about the soundness of those formal theories. System F is impredicative (a forall-quantified type quantifies on all types, itself included), and the reason why it's still sound is this uniformity of universal quantification. While introducing non-parametric constructs is reasonable from a programmer's point of view (and we can still reason about parametricity in an generally-non-parametric language), it very quickly destroys the logical consistency of the underlying static reasoning system. Martin-Löf predicative theory is much simpler to prove correct and to extend in correct way.
For a high-level description of this uniformity/genericity aspect of System F, see Fruchart and Longo's 97 article Carnap's remarks on Impredicative Definitions and the Genericity Theorem. For a more technical study of System F failure in presence of non-parametric constructs, see Parametricity and variants of Girard's J operator by Robert Harper and John Mitchell (1999). Finally, for a description, from a language design point of view, on how to abandon global parametricity to introduce non-parametric constructs but still be able to locally discuss parametricity, see Non-Parametric Parametricity by George Neis, Derek Dreyer and Andreas Rossberg, 2011.
This discussion of the difference between "computational abstraction" and "uniform abstract" has been revived by the large amount of work on representing variable binders. A binding construction feels like an abstraction (and can be modeled by a lambda-abstraction in HOAS style) but has an uniform structure that makes it rather like a data skeleton than a family of results. This has been much discussed, for example in the LF community, "representational arrows" in Twelf, "positive arrows" in Licata&Harper's work, etc.
Recently there have been several people working on the related notion of "irrelevance" (lambda-abstractions where the result "does not depend" on the argument), but it's still not totally clear how closely this is related to parametric polymorphism. One example is the work of Nathan Mishra-Linger with Tim Sheard (eg. Erasure and Polymorphism in Pure Type Systems).
if forall is lambda ..., then what is exists
Why, tuple of course!
In Martin-Löf type theory you have Π types, corresponding to functions/universal quantification and Σ-types, corresponding to tuples/existential quantification.
Their types are very similar to what you have proposed (I am using Agda notation here):
Π : (A : Set) -> (A -> Set) -> Set
Σ : (A : Set) -> (A -> Set) -> Set
Indeed, Π is an infinite product and Σ is infinite sum. Note that they are not "intersection" and "union" though, as you proposed because you can't do that without additionally defining where the types intersect. (which values of one type correspond to which values of the other type)
From these two type constructors you can have all of normal, polymorphic and dependent functions, normal and dependent tuples, as well as existentially and universally-quantified statements:
-- Normal function, corresponding to "Integer -> Integer" in Haskell
factorial : Π ℕ (λ _ → ℕ)
-- Polymorphic function corresponding to "forall a . a -> a"
id : Π Set (λ A -> Π A (λ _ → A))
-- A universally-quantified logical statement: all natural numbers n are equal to themselves
refl : Π ℕ (λ n → n ≡ n)
-- (Integer, Integer)
twoNats : Σ ℕ (λ _ → ℕ)
-- exists a. Show a => a
someShowable : Σ Set (λ A → Σ A (λ _ → Showable A))
-- There are prime numbers
aPrime : Σ ℕ IsPrime
However, this does not address parametricity at all and AFAIK parametricity and Martin-Löf type theory are independent.
For parametricity, people usually refer to the Philip Wadler's work.

Resources