Drinker's Principle

Drinker's Principle - predicate

I came across a proof in Isabelle/HOL that shows the Drinker's Principle. I do understand the entire proof; however, I do not understand very well the next proof case in the Drinker's Principle proof:
assume "∀x. drunk x"
then have "drunk a ⟶ (∀x. drunk x)" for a ..
then show ?thesis ..
Why does the proof also prove "drunk a ⟶ (∀x. drunk x)"? I think it is enough ∀x. drunk x to show ∃x. drunk x.
The entire proof as follows:
theorem Drinker's_Principle: "∃x. drunk x ⟶ (∀x. drunk x)"
proof cases
assume "∀x. drunk x"
then have "drunk a ⟶ (∀x. drunk x)" for a ..
then show ?thesis ..
next
assume "¬ (∀x. drunk x)"
then have "∃x. ¬ drunk x" by (rule de_Morgan)
then obtain a where "¬ drunk a" ..
have "drunk a ⟶ (∀x. drunk x)"
proof
assume "drunk a"
with ‹¬ drunk a› show "∀x. drunk x" by contradiction
qed
then show ?thesis ..
qed

I think it is enough ∀x. drunk x to show ∃x. drunk x.
That's what we are actually doing: the for a at the end of the the statement
then have "drunk a ⟶ (∀x. drunk x)" for a ..
is Isar's way of saying that there is a witness a of "drunk a ⟶ (∀x. drunk x)". The next line of the proof then uses this to prove the version of the statement with the existential premise.

Related

What does `~` (tilde) mean in an instance context, and why is it necessary to resolve overlap in some cases?

A complication.
Consider the following snippet:
class D u a where printD :: u -> a -> String
instance D a a where printD _ _ = "Same type instance."
instance {-# overlapping #-} D u (f x) where printD _ _ = "Instance with a type constructor."
And this is how it works:
λ printD 1 'a'
...
...No instance for (D Integer Char)...
...
λ printD 1 1
"Same type instance."
λ printD [1] [1]
...
...Overlapping instances for D [Integer] [Integer]
...
λ printD [1] ['a']
"Instance with a type constructor."
Note that the overlapping instances are not resolved, despite the pragma being supplied to this
end.
A solution.
It took some guesswork to arrive at the following adjusted definition:
class D' u a where printD' :: u -> a -> String
instance (u ~ a) => D' u a where printD' _ _ = "Same type instance."
instance {-# overlapping #-} D' u (f x) where printD' _ _ = "Instance with a type constructor."
It works as I expected the previous to:
λ printD' 1 'a'
...
...No instance for (Num Char)...
...
λ printD' 1 1
"Same type instance."
λ printD' [1] [1]
"Instance with a type constructor."
λ printD' [1] ['a']
"Instance with a type constructor."
My questions.
I am having a hard time understanding what is happening here. Is there an explanation?
Particularly, I can put forward two separate questions:
Why is the overlap not resolved in the first snippet?
Why is the overlap resolved in the second snippet?
But, if the issues are connected, perhaps a single, unified theory would serve explaining this case better.
P.S. concerning a close / duplicate vote I am aware that ~ signifies type equality, and I am consciously using it to obtain the behaviour I need (particularly, printD' 1 'a' not matching). It hardly explains anything concerning specifically the case I presented, where the two ways of stating type equality (the ~ and the instance D a a) lead to two subtly distinct behaviours.
note I tested the snippets above with ghc 8.4.3 and 8.6.0.20180810

First: only instance head matters during instance selection: what's on the left of => does not matter. So, instance D a a prevents selection unless they are equal; instance ... => D u a can always be selected.
Now, the overlap pragmas only come into play if one instance is already more "specific" than the other. "Specific", in this case, means "if there exists a substitution of type variables that can instantiate an instance head A to instance head B, then B is more specific than A". In
instance D a a
instance {-# OVERLAPPING #-} D u (f x)
neither is more specific than the other, as there is no substitution a := ? that makes D a a into D u (f x), nor is there any substitution u := ?; f := ?; x := x that makes D u (f x) into D a a. The {-# OVERLAPPING #-} pragma does nothing (at least, pertaining to the problem). So, when resolving the constraint D [Integer] [Integer], the compiler finds both instances to be candidates, neither more specific than the other, and gives an error.
In
instance (u ~ a) => D u a
instance {-# OVERLAPPING #-} D u (f x)
the second instance is more specific than the first one, because the first one can be instantiated with u := u; a := f x to get to the second one. The pragma now pulls its weight. When resolving D [Integer] [Integer], both instances match, the first one with u := [Integer]; a := [Integer], and the second with u := [Integer]; f := []; x := Integer. However, the second is both more specific and OVERLAPPING, so the first one is discarded as a candidate and the second instance is used. (Side note: I think the first instance should be OVERLAPPABLE, and the second instance should have no pragma. This way, all future instances implicitly overlap the catch-all instance, instead of having to annotate each one.)
With that trick, selection is done with the right priority, and then equality between the two arguments is forced anyway. This combination achieves what you want, apparently.
One way to visualize what is happening is a Venn diagram. From the first attempt, instance D a a and instance D u (f x) form two sets, the sets of the pairs of types that each one can match. These sets do overlap, but there are many pairs of types only D a a matches, and many pairs only D u (f x) matches. Neither can be said to be more specific, so the OVERLAPPING pragma fails. In the second attempt, D u a actually covers the entire universe of pairs of types, and D u (f x) is a subset (read: inside) of it. Now, the OVERLAPPING pragma works. Thinking in this way also shows us another way to make this work, by creating a new set that covers exactly the intersection of the first try.
instance D a a
instance D u (f x)
instance {-# OVERLAPPING #-} (f x) (f x)
But I'd go with the one with two instances unless you really need to use this one for some reason.
Note, however, that overlapping instances are considered a bit fragile. As you noticed, it is often tricky to understand which instance is picked and why. One needs to consider all the instances in scope, their priorities, and essentially run a nontrivial selection algorithm in one's mind to understand what's going on. When the instances are defined across multiple modules (including orphans) things become even more complex, because selection rules might differ according to the local imports. This can even lead to incoherence. It is best to avoid them when possible.
See also the GHC manual.

Polymorphic reasoning

I am learning Haskell and in internet I've found is paper from Philip Wadler.
I read it and did not understand at all, but it somehow connects to polymorphic function.
For example:
polyfunc :: a -> a -> a
It is a polymorphic function of any type.
What is the free theorem in connection of the example polyfunc?

I feel like if I actually understood that paper then any code I wrote would be coauthored by God.
My best guess for this problem though is that all polyfunc can do is either always return the first argument or always return the second argument. So there are actually only two implementations of polyfunc,
polyfuncA a _ = a
polyfuncB _ b = b
The paper gives you a way to prove that claim.
This is a very important concept. For example, I've been involved in data quality research previously. This free theorem says that there is no function which can select the best data from two arbitrary pieces of data. We have to know something more. Its actually a no-brainer that I was surprised to find some people willing to overlook.

I've never really understood the algorithm laid out in that paper either, so I thought I would try to figure it out.
(1) Type of function in question
f :: a -> a -> a
(2) Rephrasing as a relation
f : ∀X. X -> X -> X
(3) By parametricity
(f, f) ∈ ∀X. X -> X -> X
(4) By definition of ∀ on relations
for all Q : A <=> A',
(fA, fA') ∈ Q -> Q -> Q
(5) Applying definition of -> on relations to the first -> in (4)
for all Q : A <=> A',
for all (x, x') ∈ Q,
(fA x, fA' x') ∈ Q -> Q
(6) Applying definition of -> on relations to (5)
for all Q : A <=> A',
for all (x, x') ∈ Q,
for all (y, y') ∈ Q,
(fA x y, fA' x' y') ∈ Q
At this point I was done expanding the relational definition, but wasn't sure how to get this back from relations into terms of functions and types, so I went and found a webapp that will automatically derive the free theorem for a type. I won't spoil (yet) what result it gives, but looking at it did help me figure out the next step in my proof.
The next step is to get back into function-land from relation-land, by noting that Q can be the type of any function at all and this will still hold.
(7) Specializing Q to a function g :: p -> q
for all p, q
for all g :: p -> q
where g x = x'
and g y = y'
g (f x y) = f x' y'
(8) By definitions of x' and y'
for all p, q
for all g :: p -> q
g (f x y) = f (g x) (g y)
That looks true, right? It is equivalent to use g to transform both elements and then let f choose between them, or to let f choose an element and then transform it with g. By parametricity, f can't change whether it chooses the left or right element based on anything that g does.
Of course the claim given in trevor cook's answer is true: f must either always choose its first argument, or always choose its second. I'm not sure whether the free theorem I derived is equivalent to that, or is a weaker version of it.
And incidentally, this is a special case of something that is already covered explicitly in the paper. It gives the theorem for
k :: x -> y -> x
which of course is the same as your function f, where x ~ a and y ~ a. The result that it gives is the same as the one I described:
for all a, b, x, y
a (k x y) = k (a x) (b y)
if we choose b=a to make the two results equivalent.

Wadler's "Theorems for free" (TFF) paper is not a good reference for learning about relational parametricity. The TFF paper is focused on abstract theory but, if you need a practical algorithm, the paper does not actually give it to you. The explanations in TFF have a number of important omissions that will confuse anyone who does not already know a great deal about relational parametricity.
The first thing TFF does not explain is that a function will satisfy a "free theorem" only if the code of the function is fully parametric (restricted in a number of ways).
"Fully parametric" code is a function with type parameters, whose arguments are typed using only those type parameters, and whose code is purely functional and does not try to examine at run time what types are assigned to the type parameters. The code must treat all types as totally unknown, arbitrary type parameters. The code must work with all types in the same way.
With these restrictions, the code will satisfy a certain law, in most cases this will be the "naturality" law, but in some cases the law will be more complicated. The paper "Theorems for free" shows many examples of such laws but does not explain a general algorithm for deriving those theorems.
To prove that those laws always hold, one uses the technique of relational parametricity. This is a complicated and powerful technique where one replaces functions (viewed as many-to-one binary relations) by arbitrary (many-to-many) binary relations and then reformulates the naturality law in terms of relations. The result is a "relational naturality law". At the end, one replaces relations again by functions and tries to derive an equation.
I recently recorded a tutorial about relational parametricity, with code examples in Scala. https://www.youtube.com/watch?v=Jf2VFB90Q0s&list=PLcoadSpY7rHUO0I1zdcbu9EeByYbwPSQ6
My tutorial does not follow Wadler's TFF paper but instead explains a simple and straightforward approach focused on practical results: how to derive the free theorem and reason about relations effectively. In this approach, it becomes easier to derive the "free theorem" for a given type and also to prove the "parametricity theorem": fully parametric functions will always satisfy one "free theorem" per type parameter.
Of course, for practical usage you don't necessarily need to go through the proof of the parametricity theorem, but you do need to be able to write the free theorem itself.
The key element of my tutorial is the idea of "lifting a relation to a type constructor". If you have a relation r: A <=> B and a type constructor F A then you can lift r to a relation of type F A <=> F B.
This lifting operation is denoted by rmap_F. The relation rmap_F r has type F A <=> F B.
The lifting operation rmap_F is defined by induction on the type structure of F. The details of that definition are somewhat technical (and are not adequately explained in the TFF paper!). The most important step in learning about relational parametricity is to understand the practical technique for lifting relations to type constructors. This is explained in my tutorial, and it's too long to write it here. The definition explains how to to lift r to a trivial type constructor F A = C where C is a fixed type, to F A = A, to F A = Either (G A) (H A), to F A = (G A, H A), to F A = G A -> H A, etc.
The definition of rmap is analogous to the functor lifting fmap that lifts a function of type A -> B to a function of type F A -> F B. However, the functor lifting works only for covariant F, while the relational lifting works for any F, even if it is neither covariant nor contravariant, such as F A = A -> A -> A. This is the crucial feature that shows why relational technique is useful at all.
Let us apply the relational technique to the type forall A. A -> A -> A.
We define F A = A -> A -> A. We take an arbitrary fully parametric function t of type forall A. F A. The relational naturality law says: for any relation r: A <=> B between any types A and B the function t must be in the relation (t, t) ∈ rmap_F r.
Now we need to do two things: 1) select r to be the graph relation of some function f: A -> B, denoted r = graph f, and 2) use the definition of rmap_F to compute rmap_F (graph f) explicitly.
The definition of rmap_F gives:
(t1, t2) ∈ rmap_F r ===
for all a1: A, a2: A, b1: B, b2: B,
if (a1, b1) ∈ r and (a2, b2) ∈ r
then (t a1 a2, t b1 b2) ∈ r
Translating this with r = graph f, we get:
(a1, b1) ∈ r === b1 = f a1
(a2, b2) ∈ r === b2 = f a2
(t a1 a2, t b1 b2) ∈ r === t b1 b2 = f (t a1 a2)
So, we obtain the following law:
for all a1: A, a2: A,
t (f a1) (f a2) = f (t a1 a2)
This is actually a naturality law. This is the "free theorem" satisfied by t.

Pattern matching in Observational Type Theory

In the end of the "5. Full OTT" section of Towards Observational Type Theory the authors show how to define coercible-under-constructors indexed data types in OTT. The idea is basically to turn indexed data types into parameterized like this:
data IFin : ℕ -> Set where
zero : ∀ {n} -> IFin (suc n)
suc : ∀ {n} -> IFin n -> IFin (suc n)
data PFin (m : ℕ) : Set where
zero : ∀ {n} -> suc n ≡ m -> PFin m
suc : ∀ {n} -> suc n ≡ m -> PFin n -> PFin m
Conor also mentions this technique at the bottom of observational type theory (delivery):
The fix, of course, is to do what the GADT people did, and define
inductive families explicitly upto propositional equality. And then of
course you can transport them, by transisitivity.
However a type checker in Haskell is aware of equality constraints in scope and actually uses them during type checking. E.g. we can write
f :: a ~ b => a -> b
f x = x
It doesn't work so in type theory, since it's not enough to have a proof of a ~ b in scope to be able to rewrite by this equation: that proof also must be refl, because in the presense of a false hypothesis type checking becomes undecidable due to termination issues (something like this). So when you pattern match on Fin m in Haskell m gets rewritten to suc n in each branch, but that can't happen in type theory, instead you're left with an explicit proof of suc n ~ m. In OTT it's not possible to pattern match on proofs at all, hence you can neither pretend the proof is refl nor actually require that. It's only possible to supply the proof to coerce or just ignore it.
This makes it very hard to write anything that involves indexed data types. E.g. the usual three-lines (including the type signature) lookup for vectors becomes this beast:
vlookupₑ : ∀ {n m a} {α : Level a} {A : Univ α} -> ⟦ n ≅ m ⇒ fin n ⇒ vec A m ⇒ A ⟧
vlookupₑ p (fzeroₑ q) (vconsₑ r x xs) = x
vlookupₑ {n} {m} p (fsucₑ {n′} q i) (vconsₑ {m′} r x xs) =
vlookupₑ (left (suc n′) {m} {suc m′} (trans (suc n′) {n} {m} q p) r) i xs
vlookupₑ {n} {m} p (fzeroₑ {n′} q) (vnilₑ r) =
⊥-elim $ left (suc n′) {m} {0} (trans (suc n′) {n} {m} q p) r
vlookupₑ {n} {m} p (fsucₑ {n′} q i) (vnilₑ r) =
⊥-elim $ left (suc n′) {m} {0} (trans (suc n′) {n} {m} q p) r
vlookup : ∀ {n a} {α : Level a} {A : Univ α} -> Fin n -> Vec A n -> ⟦ A ⟧
vlookup {n} = vlookupₑ (refl n)
It could be a bit simplified, since if two elements of a data type that has decidable equality are observably equal, then they are also equal in the usual intensional sense, and natural numbers do have decidable equality, so we can coerce all the equations to their intensional counterparts and pattern match on them, but that would break some computational properties of vlookup and is verbose anyway. It's nearly impossible to deal in more complicated cases with indices which equality cannot be decided.
Is my reasoning correct? How is pattern matching in OTT meant to work? If this is a problem indeed, are there any ways to mitigate it?

I guess I'll field this one. I find it a strange question, but that's because of my own particular journey. The short answer is: don't do pattern matching in OTT, or in any kernel type theory. Which is not the same thing as to not do pattern matching ever.
The long answer is basically my PhD thesis.
In my PhD thesis, I show how to elaborate high-level programs written in a pattern matching style into a kernel type theory which has only the induction principles for inductive datatypes and a suitable treatment of propositional equality. The elaboration of pattern matching introduces propositional equations on datatype indices, then solves them by unification. Back then, I was using an intensional equality, but observational equality gives you at least the same power. That is: my technology for elaborating pattern matching (and thus keeping it out of the kernel theory), hiding all the equational piggery-jokery, predates the upgrade to observational equality. The ghastly vlookup you've used to illustrate your point might correspond to the output of the elaboration process, but the input need not be that bad. The nice definition
vlookup : Fin n -> Vec X n -> X
vlookup fz (vcons x xs) = x
vlookup (fs i) (vcons x xs) = vlookup i xs
elaborates just fine. The equation-solving that happens along the way is just the same equation-solving that Agda does at the meta-level when checking a definition by pattern matching, or that Haskell does. Don't be fooled by programs like
f :: a ~ b => a -> b
f x = x
In kernel Haskell, that elaborates to some sort of
f {q} x = coerce q x
but it's not in your face. And it's not in compiled code, either. OTT equality proofs, like Haskell equality proofs, can be erased before computing with closed terms.
Digression. To be clear about the status of equality data in Haskell, the GADT
data Eq :: k -> k -> * where
Refl :: Eq x x
really gives you
Refl :: x ~ y -> Eq x y
but because the type system is not logically sound, type safety relies on strict pattern matching on that type: you can't erase Refl and you really must compute it and match it at run time, but you can erase the data corresponding to the proof of x~y. In OTT, the entire propositional fragment is proof-irrelevant for open terms and erasable for closed computation. End of digression.
The decidability of equality on this or that datatype is not especially relevant (at least, not if you have uniqueness of identity proofs; if you don't always have UIP, decidability is one way to get it sometimes). The equational problems which show up in pattern matching are on arbitrary open expressions. That's a lot of rope. But a machine can certainly decide the fragment which consists of first-order expressions built from variables and fully applied constructors (and that's what Agda does when you split cases: if the constraints are too weird, the thing just barfs). OTT should allow us to push a bit further into the decidable fragments of higher-order unification. If you know (forall x. f x = t[x]) for unknown f, that's equivalent to f = \ x -> t[x].
So, "no pattern matching in OTT" has always been a deliberate design choice, as we always intended it to be an elaboration target for a translation we already knew how to do. Rather, it's a strict upgrade in kernel theory power.

Types à la Curry in Simply Typed Lamba Calculus

I'm writing a toy theorem prover with Haskell following the model of L.Paulson; one of the creators of Isabelle.
According to one of his articles, a theorem prover may be built with the Simply Typed Lambda Calculus. I created the data type with
data TLTerms = Var Id
| Const Id Types
| Lambda Id Types TLTerms
| Comp TLTerms TLTerms
deriving (Eq)
i.e. following types à la Church. However, I would like to implement the type system à la Curry. Maybe it is not the best of the systems for this purpose; but I want to try the Curry-style in order to judge which style is better. The first problem is type inference.
checkType :: TypeContext -> TLTerms -> Maybe Types
-- Given a context, if i is in the context
-- i has its type assigned in the context
checkType ls (Var i) = case lookup i ls of
Just t -> Just t
Nothing -> Nothing
-- the constants come predefined by the alphabet
checkType _ (Const _ typ) = Just typ
-- the abstractions have type sigma -> tau
checkType ls (Lambda x typ terms) = do
t <- checkType ((x,typ): ls) terms
return $ Arrow typ t
checkType ls (Comp term1 term2) = do
x <- checkType ls term1
y <- checkType ls term2
case (x,y) of
(Arrow a b, c) -> if a==c then Just b else Nothing
_ -> Nothing
I took the code from another related question. I surely believe there's a more elegant algorithm to verify if a lambda-term is well-typed. My questions are:
which parts should be changed/added/removed (apart from deleting the
type argument in abstractions);
which error monad should be better:
Maybe or Either or another (I want to return a value which means
failure);
or if I should create another function from scratch.
PD: I'm new to Haskell, although I've written example programs like dual numbers or power series. Just to explain my level of knowledge in this subject.

What is the combinatory logic equivalent of intuitionistic type theory?

I recently completed a university course which featured Haskell and Agda (a dependent typed functional programming language), and was wondering if it was possible to replace lambda calculus in these with combinatory logic. With Haskell this seems possible using the S and K combinators, thus making it point-free. I was wondering what the equivalent was for Agda. I.e., can one make a dependently typed functional programming language equivalent to Agda without using any variables?
Also, is it possible to somehow replace quantification with combinators? I don't know if this is a coincidence but universal quantification for example makes a type signature look like a lambda expression. Is there a way to remove universal quantification from a type signature without changing its meaning? E.g. in:
forall a : Int -> a < 0 -> a + a < a
Can the same thing be expressed without using a forall?

So I thought about it a bit more and made some progress. Here's a first stab at encoding Martin-Löf's delightfully simple (but inconsistent) Set : Set system in a combinatory style. It's not a good way to finish, but it's the easiest place to get started. The syntax of this type theory is just lambda-calculus with type annotations, Pi-types, and a universe Set.
The Target Type Theory
For completeness' sake, I'll present the rules. Context validity just says you can build contexts from empty by adjoining fresh variables inhabiting Sets.
G |- valid G |- S : Set
-------------- ----------------------------- x fresh for G
. |- valid G, x:S |- valid
And now we can say how to synthesize types for terms in any given context, and how to change the type of something up to the computational behaviour of the terms it contains.
G |- valid G |- S : Set G |- T : Pi S \ x:S -> Set
------------------ ---------------------------------------------
G |- Set : Set G |- Pi S T : Set
G |- S : Set G, x:S |- t : T x G |- f : Pi S T G |- s : S
------------------------------------ --------------------------------
G |- \ x:S -> t : Pi S T G |- f s : T s
G |- valid G |- s : S G |- T : Set
-------------- x:S in G ----------------------------- S ={beta} T
G |- x : S G |- s : T
In a small variation from the original, I've made lambda the only binding operator, so the second argument of Pi should be a function computing the way the return type depends on the input. By convention (e.g. in Agda, but sadly not in Haskell), scope of lambda extends rightwards as far as possible, so you can often leave abstractions unbracketed when they're the last argument of a higher-order operator: you can see I did that with Pi. Your Agda type (x : S) -> T becomes Pi S \ x:S -> T.
(Digression. Type annotations on lambda are necessary if you want to be able to synthesize the type of abstractions. If you switch to type checking as your modus operandi, you still need annotations to check a beta-redex like (\ x -> t) s, as you have no way to guess the types of the parts from that of the whole. I advise modern designers to check types and exclude beta-redexes from the very syntax.)
(Digression. This system is inconsistent as Set:Set allows the encoding of a variety of "liar paradoxes". When Martin-Löf proposed this theory, Girard sent him an encoding of it in his own inconsistent System U. The subsequent paradox due to Hurkens is the neatest toxic construction we know.)
Combinator Syntax and Normalization
Anyhow, we have two extra symbols, Pi and Set, so we might perhaps manage a combinatory translation with S, K and two extra symbols: I chose U for the universe and P for the product.
Now we can define the untyped combinatory syntax (with free variables):
data SKUP = S | K | U | P deriving (Show, Eq)
data Unty a
= C SKUP
| Unty a :. Unty a
| V a
deriving (Functor, Eq)
infixl 4 :.
Note that I've included the means to include free variables represented by type a in this syntax. Apart from being a reflex on my part (every syntax worthy of the name is a free monad with return embedding variables and >>= perfoming substitution), it'll be handy to represent intermediate stages in the process of converting terms with binding to their combinatory form.
Here's normalization:
norm :: Unty a -> Unty a
norm (f :. a) = norm f $. a
norm c = c
($.) :: Unty a -> Unty a -> Unty a -- requires first arg in normal form
C S :. f :. a $. g = f $. g $. (a :. g) -- S f a g = f g (a g) share environment
C K :. a $. g = a -- K a g = a drop environment
n $. g = n :. norm g -- guarantees output in normal form
infixl 4 $.
(An exercise for the reader is to define a type for exactly the normal forms and sharpen the types of these operations.)
Representing Type Theory
We can now define a syntax for our type theory.
data Tm a
= Var a
| Lam (Tm a) (Tm (Su a)) -- Lam is the only place where binding happens
| Tm a :$ Tm a
| Pi (Tm a) (Tm a) -- the second arg of Pi is a function computing a Set
| Set
deriving (Show, Functor)
infixl 4 :$
data Ze
magic :: Ze -> a
magic x = x `seq` error "Tragic!"
data Su a = Ze | Su a deriving (Show, Functor, Eq)
I use a de Bruijn index representation in the Bellegarde and Hook manner (as popularised by Bird and Paterson). The type Su a has one more element than a, and we use it as the type of free variables under a binder, with Ze as the newly bound variable and Su x being the shifted representation of the old free variable x.
Translating Terms to Combinators
And with that done, we acquire the usual translation, based on bracket abstraction.
tm :: Tm a -> Unty a
tm (Var a) = V a
tm (Lam _ b) = bra (tm b)
tm (f :$ a) = tm f :. tm a
tm (Pi a b) = C P :. tm a :. tm b
tm Set = C U
bra :: Unty (Su a) -> Unty a -- binds a variable, building a function
bra (V Ze) = C S :. C K :. C K -- the variable itself yields the identity
bra (V (Su x)) = C K :. V x -- free variables become constants
bra (C c) = C K :. C c -- combinators become constant
bra (f :. a) = C S :. bra f :. bra a -- S is exactly lifted application
Typing the Combinators
The translation shows the way we use the combinators, which gives us quite a clue about what their types should be. U and P are just set constructors, so, writing untranslated types and allowing "Agda notation" for Pi, we should have
U : Set
P : (A : Set) -> (B : (a : A) -> Set) -> Set
The K combinator is used to lift a value of some type A to a constant function over some other type G.
G : Set A : Set
-------------------------------
K : (a : A) -> (g : G) -> A
The S combinator is used to lift applications over a type, upon which all of the parts may depend.
G : Set
A : (g : G) -> Set
B : (g : G) -> (a : A g) -> Set
----------------------------------------------------
S : (f : (g : G) -> (a : A g) -> B g a ) ->
(a : (g : G) -> A g ) ->
(g : G) -> B g (a g)
If you look at the type of S, you'll see that it exactly states the contextualised application rule of the type theory, so that's what makes it suitable to reflect the application construct. That's its job!
We then have application only for closed things
f : Pi A B
a : A
--------------
f a : B a
But there's a snag. I've written the types of the combinators in ordinary type theory, not combinatory type theory. Fortunately, I have a machine that will make the translation.
A Combinatory Type System
---------
U : U
---------------------------------------------------------
P : PU(S(S(KP)(S(S(KP)(SKK))(S(KK)(KU))))(S(KK)(KU)))
G : U
A : U
-----------------------------------------
K : P[A](S(S(KP)(K[G]))(S(KK)(K[A])))
G : U
A : P[G](KU)
B : P[G](S(S(KP)(S(K[A])(SKK)))(S(KK)(KU)))
--------------------------------------------------------------------------------------
S : P(P[G](S(S(KP)(S(K[A])(SKK)))(S(S(KS)(S(S(KS)(S(KK)(K[B])))(S(KK)(SKK))))
(S(S(KS)(KK))(KK)))))(S(S(KP)(S(S(KP)(K[G]))(S(S(KS)(S(KK)(K[A])))
(S(S(KS)(KK))(KK)))))(S(S(KS)(S(S(KS)(S(KK)(KP)))(S(KK)(K[G]))))
(S(S(KS)(S(S(KS)(S(KK)(KS)))(S(S(KS)(S(S(KS)(S(KK)(KS)))
(S(S(KS)(S(KK)(KK)))(S(KK)(K[B])))))(S(S(KS)(S(S(KS)(S(KK)(KS)))(S(KK)(KK))))
(S(KK)(KK))))))(S(S(KS)(S(S(KS)(S(KK)(KS)))(S(S(KS)(S(KK)(KK)))
(S(S(KS)(KK))(KK)))))(S(S(KS)(S(S(KS)(S(KK)(KS)))(S(KK)(KK))))(S(KK)(KK)))))))
M : A B : U
----------------- A ={norm} B
M : B
So there you have it, in all its unreadable glory: a combinatory presentation of Set:Set!
There's still a bit of a problem. The syntax of the system gives you no way to guess the G, A and B parameters for S and similarly for K, just from the terms. Correspondingly, we can verify typing derivations algorithmically, but we can't just typecheck combinator terms as we could with the original system. What might work is to require the input to the typechecker to bear type annotations on uses of S and K, effectively recording the derivation. But that's another can of worms...
This is a good place to stop, if you've been keen enough to start. The rest is "behind the scenes" stuff.
Generating the Types of the Combinators
I generated those combinatory types using the bracket abstraction translation from the relevant type theory terms. To show how I did it, and make this post not entirely pointless, let me offer my equipment.
I can write the types of the combinators, fully abstracted over their parameters, as follows. I make use of my handy pil function, which combines Pi and lambda to avoid repeating the domain type, and rather helpfully allows me to use Haskell's function space to bind variables. Perhaps you can almost read the following!
pTy :: Tm a
pTy = fmap magic $
pil Set $ \ _A -> pil (pil _A $ \ _ -> Set) $ \ _B -> Set
kTy :: Tm a
kTy = fmap magic $
pil Set $ \ _G -> pil Set $ \ _A -> pil _A $ \ a -> pil _G $ \ g -> _A
sTy :: Tm a
sTy = fmap magic $
pil Set $ \ _G ->
pil (pil _G $ \ g -> Set) $ \ _A ->
pil (pil _G $ \ g -> pil (_A :$ g) $ \ _ -> Set) $ \ _B ->
pil (pil _G $ \ g -> pil (_A :$ g) $ \ a -> _B :$ g :$ a) $ \ f ->
pil (pil _G $ \ g -> _A :$ g) $ \ a ->
pil _G $ \ g -> _B :$ g :$ (a :$ g)
With these defined, I extracted the relevant open subterms and ran them through the translation.
A de Bruijn Encoding Toolkit
Here's how to build pil. Firstly, I define a class of Finite sets, used for variables. Every such set has a constructor-preserving embedding into the set above, plus a new top element, and you can tell them apart: the embd function tells you if a value is in the image of emb.
class Fin x where
top :: Su x
emb :: x -> Su x
embd :: Su x -> Maybe x
We can, of course, instantiate Fin for Ze and Suc
instance Fin Ze where
top = Ze -- Ze is the only, so the highest
emb = magic
embd _ = Nothing -- there was nothing to embed
instance Fin x => Fin (Su x) where
top = Su top -- the highest is one higher
emb Ze = Ze -- emb preserves Ze
emb (Su x) = Su (emb x) -- and Su
embd Ze = Just Ze -- Ze is definitely embedded
embd (Su x) = fmap Su (embd x) -- otherwise, wait and see
Now I can define less-or-equals, with a weakening operation.
class (Fin x, Fin y) => Le x y where
wk :: x -> y
The wk function should embed the elements of x as the largest elements of y, so that the extra things in y are smaller, and thus in de Bruijn index terms, bound more locally.
instance Fin y => Le Ze y where
wk = magic -- nothing to embed
instance Le x y => Le (Su x) (Su y) where
wk x = case embd x of
Nothing -> top -- top maps to top
Just y -> emb (wk y) -- embedded gets weakened and embedded
And once you've got that sorted out, a bit of rank-n skullduggery does the rest.
lam :: forall x. Tm x -> ((forall y. Le (Su x) y => Tm y) -> Tm (Su x)) -> Tm x
lam s f = Lam s (f (Var (wk (Ze :: Su x))))
pil :: forall x. Tm x -> ((forall y . Le (Su x) y => Tm y) -> Tm (Su x)) -> Tm x
pil s f = Pi s (lam s f)
The higher-order function doesn't just give you a term representing the variable, it gives you an overloaded thing which becomes the correct representation of the variable in any scope where the variable is visible. That is, the fact that I go to the trouble of distinguishing the different scopes by type gives the Haskell typechecker enough information to compute the shifting required for the translation to de Bruijn representation. Why keep a dog and bark yourself?

I guess the "Bracket Abstraction" works also for dependent types
under some circumstances. In section 5 of the following paper
you find some K and S types:
Outrageous but Meaningful Coincidences
Dependent type-safe syntax and evaluation
Conor McBride, University of Strathclyde, 2010
Converting a lambda expression into a combinatorial expression
roughly corresponds to converting a natural deduction proof into
a Hilbert style proof.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string