Those typing rules are present on the "On the expressivity of elementary linear logic: characterizing Ptime and an exponential time hierarchy" paper:
From the “What part of Milner-Hindley do you not understand?” Stack Overflow question, I can read some of those in English, but it is still difficult to figure out how to make a type checker out of that. This is my attempt at reading the first 4 rules:
Ax: as an axiom, if x has type A, then x has type A. (Isn't that obvious?)
Cut: If a context Γ proves t has type A, and another context ∆, extended with the assertion x has type A, proves u has type B, then those two contexts together prove the substitution of all occurrences of x by t in u has type B. (What does that mean, though? Why there are two contexts, where the extra one comes from? Also, that seems like a rule for substitution, but how, if substitution isn't a term, but an operation? The classical Milner-Hindley has nothing like that; it has just a very simple rule for App.)
Weak: If a context proves t has type A, then that context extended with the statement x has type B still proves t has type A. (Again, isn't that obvious?)
Contr: if a context extended with x1 has type !A and x2 has type !A proves t has type B, then that context, extended with x has type !A proves substituting all occurrences of x1 and x2 by x in t has type B. (Another rule for substitution, it seems? But why there are two terms above, one term below? Also, why those !s? Where would that all show up on the type checker?)
I quite get what those rules want to say, but I am missing something before it truly clicks and I'm able to implement the corresponding type checker. How can I approach understanding those rules?
This is a bit too broad, but from your comments I guess that you lack some basics of linear type systems. This system has weakening (not usually allowed in linear logic), so it actually corresponds to affine intuitionistic logic.
The key idea is: you can use every value you have (e.g. variables) at most once.
The type A (x) B (tensor product) roughly stands for the type of pair values, from which you can project out both a A value and a B value.
The type A -o B stands for a linear function which consumes a value A (remember: at most one use!) and produces a single B.
You can have e.g. \x.x : A -o A but you can not have any term : A -o (A (x) A) since that would require you to use the argument twice.
The type !A ("of course A!") stands for values of type A which can be duplicated as will -- as you can do normally in non-linear lambda calculi. This is done by the Contraction rule.
For instance, !A -o !B represents a plain function: it requires a value (in an unbounded amount of copies) and produce a value (in an unbounded amount of copies). You can write a function !A -o (!A (x) !A) as follows:
\a. (a (x) a)
Note that every linear typing rule with multiple premises has to split the environment variables between the premises (e.g. one gets Gamma, the other Delta), without overlap. Otherwise, you could duplicate linear variables. Cut has two contexts because of this. The non-linear cut would be:
G |- t: A G, x:A |- u: B
--------------------------------
G |- u[t/x]: B
but here both terms t and u can use the variables in G, hence u[t/x] can use variables twice -- not good. Instead, the linear cut
G1 |- t: A G2, x:A |- u: B
--------------------------------
G1,G2 |- u[t/x]: B
forces you to split variables between the two premises: what you use in t is unavailable for u.
Related
The Self Types for Dependently Typed Lambda Encodings paper (by Peng Fu and Aaron Stump) proposes self types, which are, supposedly, sufficient to encode the induction principle and Scott-encoded datatypes on the Calculus of Constructions, without making the system inconsistent or introducing paradoxes.
The notation of that paper is too heavy for me to fully understand how to implement it.
What is, precisely, the main difference of Fix and Self? Or, in other words: in what points a naively implemented Fix should be restricted, so that it leaves no inconsistencies on the core calculus?
This is what I have understood after browsing the paper.
A Fix type satisfies the typing equivalence (assuming equirecursive types)
G |- M : Fix x. t <=> G |- M : t{Fix x. t / x}
i.e. you can unfold the type over itself. Note how the term M does not play any role here. If using isorecursive types, M would get some isomorphism applied (e.g. a Haskell's newtype constructor), but it is not that important.
Instead, Self types satisfy the following
G |- M : Self x. t <=> G |- M : t{M / x}
Now, x is not a type variable, but a term variable. The term gets "moved" inside the type. This is not a recursive type at all.
A list in Haskell might look like this:
data List a = Nil | Cons a (List a)
A type theoretic interpretation is:
λα.μβ.1+αβ
which encodes the list type as the fixed point of a functor. In Haskell this could be represented:
data Fix f = In (f (Fix f))
data ListF a b = Nil | Cons a b
type List a = Fix (ListF a)
I'm curious about the scope of the earlier μ binder. Can a name bound in an outer scope remain available in an inner scope? Is, say, the following a valid expression:
μγ.1+(μβ.1+γβ)γ
...perhaps it's the same as:
μβ.μγ.1+(1+γβ)γ
...but then how would things change when the name is reused:
μβ.μγ.1+(μβ.1+γβ)γ
Are the above all regular types?
Scoping of μ works no different from other binders, so yes, all your examples are valid. They are also regular, because they do not even contain a λ. (*)
As for equality, that depends on what sort of μ-types you have. There are basically two different notions:
equi-recursive: in that case, the typing rules assume an equivalence
μα.T = T[μα.T / α]
i.e., a recursive type is considered equal to its one-level 'unrolling', where the μ is removed and the μ-bound variable is replaced by the type itself (and because this rule can be applied repeatedly, one can unroll arbitrary many times).
iso-recursive: here, no such equivalence exists. Instead, a μ-type is a separate form of type with its own expression forms to introduce and eliminate it -- they are usually called roll and unroll (or fold and unfold), and are typed as follows:
roll : T[μα.T / α] → μα.T
unroll : μα.T → T[μα.T / α]
These must be applied explicitly on the term level to mirror the equation above (once for each level of unrolling).
Functional languages like ML or Haskell usually use the latter for their interpretation of datatypes. However, the roll/unroll is built into the use of the data constructors. So each constructor is an injection into an iso-recursive type composed with an injection into a sum type (and inversely when matched).
Your examples are all different under the iso-recursive interpretation. The first and the third are the same under an equi-recursive interpretation, because the outer μ just disappears when you apply the above equivalence.
(*) Edit: An irregular recursive type is one whose infinite expansion does not correspond to a regular tree (or equivalently, can not be represented by a finite cyclic graph). Such a case can only be expressed with recursive type constructors, i.e., a λ that occurs under a μ. For example, μα.λβ.1+α(β×β) -- corresponding to the recursive equation t(β) = 1+t(β×β) -- would be irregular, because the recursive type constructor α is recursively applied to a type "larger" than its argument, and hence every application is a recursive type that "grows" indefinitely (and consequently, you cannot draw it as a graph).
It's worth noting, however, that in most type theories with μ, its bound variable is restricted to ground kind, so cannot express irregular types at all. In particular, a theory with unrestricted equi-recursive type constructors would have non-terminating type normalisation, so type equivalence (and thus type checking) would be undecidable. For iso-recursive types you'd need higher-order roll/unroll, which is possible, but I'm not aware of much literature investigating it.
Your μ type expressions are valid. I believe your types are regular as well since you only use recursion, sum, and products.
The type
T1 = μγ.1+(μβ.1+γβ)γ
does not look equal to
T2 = μβ.μγ.1+(1+γβ)γ
since inr (inr (inl *, inr (inl *)), inl *) has the second type but not the first.
The last type
T3 = μβ.μγ.1+(μβ.1+γβ)γ
is equal to (α-converting the first β)
μ_.μγ.1+(μβ.1+γβ)γ
which is, unfolding the top-level μ,
μγ.1+(μβ.1+γβ)γ
which is T1.
Basically, the scope of μ-bound variables follows the same rules of λ-bound variables. That is, the value of each occurrence of a variable β is provided by the closest μβ on top of it.
Reading through this question and this blog post got me thinking more about type algebra and specifically how to abuse it.
Basically,
1) We can think of the Either A B type as addition: A+B
2) We can think of the ordered pair (A,B) as multiplication: A*B
3) We can think of the function A -> B as exponentiation: B^A
There's an obvious pattern going on here: Multiplication is repeated addition, and exponentiation is repeated multiplication. This led Knuth to define the up arrow ↑ as exponentiation, ↑↑ as repeated exponentiation, ↑↑↑ as repeated ↑↑, and so on. Thus, 10↑↑↑↑10 is a HUGE number.
My question is: how can the function ↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑ be represented in algebraic data
types? It seems like ↑ should be a function with an infinitite number of arguments, but that doesn't make much sense. Would A↑B simply be [A] -> B and thus A↑↑↑↑B be [[[[A]]]]->B?
Bonus points if you can explain what the Ackerman function would look like, or any of the other hypergrowth functions.
At the most obvious level, you could identify a↑↑b with
((...(a -> a) -> ...) -> a) -- iterated b times
and a↑↑↑b is just
(a↑↑(a↑↑(...(a↑↑(a↑↑a))...))) -- iterated b times
so everything can be expressed in terms of some long function type (hence as some immensely long tuple type ...). But I don't think there's a convenient expression for an arbitrary up-arrow symbol in terms of (the cardinality of) familiar Haskell types (beyond the ones written above with ... or ↑), since I can't think of any common mathematical objects that have larger-than-exponential combinatorial dependencies on the size of the underlying sets (without going to recursive datatypes, which are too big) ... maybe there are some such objects in combinatorial set theory? (Your question seems [to me] more about the sizes of sets than anything specific to types.)
(The Wikipedia page you linked already connects these objects to the Ackermann function.)
I am working on a compiler/proof checker, and I was wondering, if I had a syntax tree such as this, for example:
data Expr
= Lambdas (Set String) Expr
| Var String
| ...
if there were a way to check the alpha-equivalence (equivalence modulo renaming) of Exprs. This Expr, however, is different from the lambda calculus in that the set of variables in a lambda is commutative -- i.e. the order of parameters does not factor in to the checking.
(For simplicity, however, Lambda ["x","y"] ... is distinct from Lambda ["x"] (Lambda ["y"] ...), and in that case the order does matter).
In other words, given two Exprs, how can one efficienly find a renaming from one to the other? This kind of combinatorial problem smells of NP-complete.
The commutativity of the parameters does hint at an exponential comparision, true.
But I suspect you can normalize the parameter lists so you only have to compare them in single order. Then a tree compare with renaming would be essentially linear in the size of the trees.
What I suggest doing is that for each parameter list, visit the subtree in (in-order, postorder, doesn't matter as long as you are consistent) and sort the parameter by the index of the order in which the visit first encounter a parameter use. So if you have
lambda(a,b): .... b ..... a ... b ....
you'd sort the parameter list as:
lambda(b,a)
because you encounter b first, then a second, and the additional encounter of b doesn't matter. Compare the trees with the normalized parameters list.
Life gets messier if you insist the the operators in a lambda clause can be commutative. My guess is that you can still normalize it.
We can appeal to Daan Leijen's HMF for a few ideas. (He dealing with binders for 'foralls', which also come across as commutative.
In particular, he rebinds the variables in the occurrence order in the body.
Then comparison of terms involves skolemizing both the same way and comparing the results.
We can do better than that by replacing that skolemization pass with a locally nameless representation.
data Bound t a = Bound {-# UNPACK #-} !Int t | Unbound a
instance Functor (Bound t) where ...
instance Bifunctor Bound where ...
data Expr a
= Lambdas {-# UNPACK #-} !Int (Expr (Bound () a))
| Var a
So now occurrences of Bound under a lambda are the variables bound directly by the lambda, along with any type information you want to put in the occurence, here I just used ().
Now closed terms are polymorphic in 'a' and, if you sort the elements of the lambda by their use site (and can ensure that you always canonicalize the lambda by removing unused terms) alpha equivalent terms compare simply with (==), and if you need open terms you can work with Expr String or some other representation.
A more anal retentive version of the signature for Expr and Bound would use an existential type and a type level natural to identify the number of variables being bound, and use 'Fin' in the Bound constructor, but since you already have to maintain the invariant that you bind no more variables than the # occurring in the lambda and that the type information agrees across all of Var (Bound n _) with the same n, its not too much of a burden to maintain another.
Update: You can use my bound package to do an improved version of this in a fully self-contained way!
I am trying to work out the expression for a probability distribution (related to bioinformatics), and am having trouble combining the information about a random variable from two different sources. Essentially, here is the scenario:
There are 3 discrete random variables X, A & B. X depends on A and B. A and B are related only through X, i.e. A and B are independent given X. Now, I have derived the expressions for:
P(X, A) and P(X, B). I need to calculate P(X, A, B) - this is not a straightforward application of the chain rule.
I can derive P(X | A) from the first expression since P(A) is available. B is never observed independently of A, P(B) is not readily available - at best I can approximate it by marginalizing over A, but the expression P(A, B) does not have a closed form so the integration is tricky.
Any thoughts on how P(X, A, B) can be derived, without discarding information? Many thanks in advance.
Amit
What you're dealing with here is an undirected acyclic graph. A is conditionally independent of B given X, but X depends (I assume directly) on A and B. I'm a little confused about the nature of your problem, i.e. what form your probability distributions are specified in, but you could look at belief propagation.
Ok, it has been a long time since I've done joint probabilities so take this with a big grain of salt but the first place I would start looking, given that A and B are orthogonal, is for an expression something like:
P(X, A, B) = P(X,A) + (P(X,B) * (1-P(X,A)));
Again, this is just to give you an idea to explore as it has been a very long time since I did this type of work!
Your question is very unclear in terms of what you observe and what are unknowns. It seems like the only fact that you state clearly is A and B are independent given X. That is,
Assumption: P(A,B|X)=P(A|X)P(B|X)
Hence: P(A,B,X)=P(A,B|X)P(X)=P(A|X)P(B|X)P(X)=P(A,X)P(X)=P(B,X)P(X)
Take your pick of factorizations.