Combining joint probabilities - statistics

I am trying to work out the expression for a probability distribution (related to bioinformatics), and am having trouble combining the information about a random variable from two different sources. Essentially, here is the scenario:
There are 3 discrete random variables X, A & B. X depends on A and B. A and B are related only through X, i.e. A and B are independent given X. Now, I have derived the expressions for:
P(X, A) and P(X, B). I need to calculate P(X, A, B) - this is not a straightforward application of the chain rule.
I can derive P(X | A) from the first expression since P(A) is available. B is never observed independently of A, P(B) is not readily available - at best I can approximate it by marginalizing over A, but the expression P(A, B) does not have a closed form so the integration is tricky.
Any thoughts on how P(X, A, B) can be derived, without discarding information? Many thanks in advance.
Amit

What you're dealing with here is an undirected acyclic graph. A is conditionally independent of B given X, but X depends (I assume directly) on A and B. I'm a little confused about the nature of your problem, i.e. what form your probability distributions are specified in, but you could look at belief propagation.

Ok, it has been a long time since I've done joint probabilities so take this with a big grain of salt but the first place I would start looking, given that A and B are orthogonal, is for an expression something like:
P(X, A, B) = P(X,A) + (P(X,B) * (1-P(X,A)));
Again, this is just to give you an idea to explore as it has been a very long time since I did this type of work!

Your question is very unclear in terms of what you observe and what are unknowns. It seems like the only fact that you state clearly is A and B are independent given X. That is,
Assumption: P(A,B|X)=P(A|X)P(B|X)
Hence: P(A,B,X)=P(A,B|X)P(X)=P(A|X)P(B|X)P(X)=P(A,X)P(X)=P(B,X)P(X)
Take your pick of factorizations.

Related

How to use category theory diagrams with polyary functions?

So, there's a lot of buzz about categories all around the Haskell ecosystem. But I feel one piece is missing from the common sense I have so far absorbed by osmosis. (I did read the first few pages of Mac Lane's famous introduction as well, but I don't believe I have enough mathematical maturity to carry the wisdom from this text to actual programming I have at hand.) I will now follow with a real world example involving a binary function that I have trouble depicting in categorical terms.
So, I have this function chain that allows me to S -> A, where A is a type synonym for a function, akin to a -> b. Now, I want to depict a process that does S -> a -> b, but I end up with an arrow pointing to another arrow rather than an object. How do I deal with such predicament?
I did overhear someone talking about a thing called n-category but I don't know if I should even try to understand what it is and how it's useful.
Though I believe my abstraction is accurate, the actual functions are parsePath >>> either error id >>> toAxis :: String -> Text.XML.Cursor.Axis from selectors and Axis = Text.XML.Cursor.Cursor -> [Text.XML.Cursor.Cursor] from xml-conduit.
There are two approaches to model binary functions as morphism in category theory (n-ary functions are dealt with similarly -- no new machinery is needed). One is to consider the uncurried version:
(A * B) -> C
where we take the product of the types A and B as a starting object. For that we need the category to contain such a products. (In Haskell, products are written (A, B). Well, technically in Haskell this is not exactly the product as in categories, but let's ignore that.)
Another is to consider the result type (B -> C) as an object in the category. Usually, this is called an exponential object, written as C^B. Assuming our category has such objects, we can write
A -> C^B
These two representations of binary functions are isomorphic: using curry and uncurry we can transform each one into the other.
Indeed, when there is such a (natural) isomorphism, we get a so called cartesian closed category, which is the simplest form of category which can describe a simply typed lambda calculus -- the core of every typed functional language.
This isomorphism is often cited as an adjunction between two functors
(- * B) -| (- ^ B)
I can use tuple projections to depict this situation, as follows:
-- Or, in actual Haskell terms:
This diagram features backwards fst & snd arrows in place of a binary function that constructs the tuple from its constituents, and that I can in no way depict directly. The caveat is that, while in this diagram Cursor has only one incoming arrow, I should remember that in actual code some real arrows X -> Axis & Y -> Cursor should go to both of the projections of the tuple, not just the symbolic projecting functions. The flow will then be uniformly left to right.
Pragmatically speaking, I traded an arrow with two sources (that constructs a tuple and isn't a morphism) for two reversed arrows (the tuple's projections that are legal morphisms in all regards).

How can I interpret the typing rules on this paper?

Those typing rules are present on the "On the expressivity of elementary linear logic: characterizing Ptime and an exponential time hierarchy" paper:
From the “What part of Milner-Hindley do you not understand?” Stack Overflow question, I can read some of those in English, but it is still difficult to figure out how to make a type checker out of that. This is my attempt at reading the first 4 rules:
Ax: as an axiom, if x has type A, then x has type A. (Isn't that obvious?)
Cut: If a context Γ proves t has type A, and another context ∆, extended with the assertion x has type A, proves u has type B, then those two contexts together prove the substitution of all occurrences of x by t in u has type B. (What does that mean, though? Why there are two contexts, where the extra one comes from? Also, that seems like a rule for substitution, but how, if substitution isn't a term, but an operation? The classical Milner-Hindley has nothing like that; it has just a very simple rule for App.)
Weak: If a context proves t has type A, then that context extended with the statement x has type B still proves t has type A. (Again, isn't that obvious?)
Contr: if a context extended with x1 has type !A and x2 has type !A proves t has type B, then that context, extended with x has type !A proves substituting all occurrences of x1 and x2 by x in t has type B. (Another rule for substitution, it seems? But why there are two terms above, one term below? Also, why those !s? Where would that all show up on the type checker?)
I quite get what those rules want to say, but I am missing something before it truly clicks and I'm able to implement the corresponding type checker. How can I approach understanding those rules?
This is a bit too broad, but from your comments I guess that you lack some basics of linear type systems. This system has weakening (not usually allowed in linear logic), so it actually corresponds to affine intuitionistic logic.
The key idea is: you can use every value you have (e.g. variables) at most once.
The type A (x) B (tensor product) roughly stands for the type of pair values, from which you can project out both a A value and a B value.
The type A -o B stands for a linear function which consumes a value A (remember: at most one use!) and produces a single B.
You can have e.g. \x.x : A -o A but you can not have any term : A -o (A (x) A) since that would require you to use the argument twice.
The type !A ("of course A!") stands for values of type A which can be duplicated as will -- as you can do normally in non-linear lambda calculi. This is done by the Contraction rule.
For instance, !A -o !B represents a plain function: it requires a value (in an unbounded amount of copies) and produce a value (in an unbounded amount of copies). You can write a function !A -o (!A (x) !A) as follows:
\a. (a (x) a)
Note that every linear typing rule with multiple premises has to split the environment variables between the premises (e.g. one gets Gamma, the other Delta), without overlap. Otherwise, you could duplicate linear variables. Cut has two contexts because of this. The non-linear cut would be:
G |- t: A G, x:A |- u: B
--------------------------------
G |- u[t/x]: B
but here both terms t and u can use the variables in G, hence u[t/x] can use variables twice -- not good. Instead, the linear cut
G1 |- t: A G2, x:A |- u: B
--------------------------------
G1,G2 |- u[t/x]: B
forces you to split variables between the two premises: what you use in t is unavailable for u.

Is it possible to randomly generate theorems that are arbitrarily difficult to prove?

If I understand Curry-Howard's isomorphism correctly, every dependent type correspond to a theorem, for which a program implementing it is a proof. That means that any mathematical problem, such as a^n + b^n = c^n can be, somehow, expressed as a type.
Now, suppose I want to design a game which generates random types (theorems), and on which plays must try to implement programs (proofs) of those types (theorems). Is it possible to have control over the difficulty of those theorems? I.e., an easy mode would generate trivial theorems while a hard mode would generate much harder theorems.
A one-way function is a function that can be calculated in polynomial time, but that does not have a right inverse that can be calculated in polynomial time. If f is a one-way function, then you can choose an argument x whose size is determined by the difficulty setting, calculate y = f x, and ask the user to prove, constructively, that y is in the image of f.
This is not terribly simple. No one knows whether there are any one-way functions. Most people believe there are, but proving that, if true, is known to be at least as hard as proving P /= NP. However, there is a ray of light! People have managed to construct functions with the strange property that if any functions are one-way, then these must be. So you could choose such a function and be pretty confident you'll be offering sufficiently hard problems. Unfortunately, I believe all known universal one-way functions are pretty nasty. So you will likely find it hard to code them, and your users will likely find even the easiest proofs too difficult. So from a practical standpoint, you might be better off choosing something like a cryptographic hash function that's not as thoroughly likely to be truly one-way but that's sure to be hard for a human to crack.
If you generate just types, most of them will be isomorphic to ⊥. ∀ n m -> n + m ≡ m + n is meaningful, but ∀ n m -> n + m ≡ n, ∀ n m -> n + m ≡ suc m, ∀ n m -> n + m ≡ 0, ∀ n m xs -> n + m ≡ length xs and zillions of others are not. You can try to generate well-typed terms and then check, using something like Djinn, that the type of a generated term is not inhabited by a much simpler term. However many generated terms will be either too simple or just senseless garbage even with a clever strategy. Typed setting contains less terms than non-typed, but a type of just one variable can be A, A -> A, A -> B, B -> A, A -> ... -> E and all these type variables can be free or universally quantified. Besides, ∀ A B -> A -> B -> B and ∀ B A -> A -> B -> B are essentially the same types, so your equality is not just αη, but something more complex. The search space is just too big and I doubt a random generator can produce something really non-trivial.
But maybe terms of some specific form can be interesting. Bakuriu in comments suggested theorems provided by parametricity: you can simply take Data.List.Base or Function or any other basic module from Agda standard library and generate many theorems out of thin air. Check also the A computational interpretation of parametricity paper which gives an algorithm for deriving theorems from types in a dependently typed setting (though, I don't know how it's related to Theorems for free and they don't give the rules for data types). But I'm not sure that most produced theorems won't be provable by straightforward induction. Though, theorems about functions that are instances of left folds are usually harder than about those which are instances of right folds — that can be one criteria.
This falls into an interesting and difficult field of proving lower bounds in proof complexity. First, it very much depends on the strenght of the logic system you're using, and what proofs it allows. A proposition can be hard to prove in one system and easy to prove in another.
Next problem is that for a random proposition (in a reasonably strong logic system) it's even impossible to decide if it's provable or not (for example the set of provable propositions in first-order logic is only recursively enumerable). And even if we know it's provable, deciding its proof complexity can be extremely hard or undecidable (if you find a proof, it doesn't mean it's the shortest one).
Intuitively it seems similar to Kolmogorov complexity: For a general string we can't tell what's the shortest program that produces it.
For some proof systems and specific types of formulas there are known lower bounds. Haken proved in 1989:
For a sufficiently large n, any Resolution proof of PHP^n{n-1}_ (Pigeon hole principle) requires length 2^{\Omega(n)}.
These slides give an overview of the theorem. So you could generate propositions using such a schema, but that'd probably won't be very interesting for a game.

Type algebra and Knuth's up arrow notation

Reading through this question and this blog post got me thinking more about type algebra and specifically how to abuse it.
Basically,
1) We can think of the Either A B type as addition: A+B
2) We can think of the ordered pair (A,B) as multiplication: A*B
3) We can think of the function A -> B as exponentiation: B^A
There's an obvious pattern going on here: Multiplication is repeated addition, and exponentiation is repeated multiplication. This led Knuth to define the up arrow ↑ as exponentiation, ↑↑ as repeated exponentiation, ↑↑↑ as repeated ↑↑, and so on. Thus, 10↑↑↑↑10 is a HUGE number.
My question is: how can the function ↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑ be represented in algebraic data
types? It seems like ↑ should be a function with an infinitite number of arguments, but that doesn't make much sense. Would A↑B simply be [A] -> B and thus A↑↑↑↑B be [[[[A]]]]->B?
Bonus points if you can explain what the Ackerman function would look like, or any of the other hypergrowth functions.
At the most obvious level, you could identify a↑↑b with
((...(a -> a) -> ...) -> a) -- iterated b times
and a↑↑↑b is just
(a↑↑(a↑↑(...(a↑↑(a↑↑a))...))) -- iterated b times
so everything can be expressed in terms of some long function type (hence as some immensely long tuple type ...). But I don't think there's a convenient expression for an arbitrary up-arrow symbol in terms of (the cardinality of) familiar Haskell types (beyond the ones written above with ... or ↑), since I can't think of any common mathematical objects that have larger-than-exponential combinatorial dependencies on the size of the underlying sets (without going to recursive datatypes, which are too big) ... maybe there are some such objects in combinatorial set theory? (Your question seems [to me] more about the sizes of sets than anything specific to types.)
(The Wikipedia page you linked already connects these objects to the Ackermann function.)

How is (==) defined in Haskell?

I'm writing a small functional programming language in Haskell, but I can't find a definition of how (==) is implemented, as this seems to be quite tricky?
Haskell uses the concept of a "typeclass". The actual definition is something like this:
class Eq a where
(==) :: a -> a -> Bool
-- More functions follow, for more complex concepts of equality (eg NaN)
Then you can define it for your own types. For example:
-- Eq can't be automatically derived, because of the function
data Foo = Foo Int (Char -> Bool)
-- So define it here
instance Eq Foo where
(Foo x _) == (Foo y _) = x == y
I think Your question is very, very interesting. If You have meant also that You want to know the theoretical roots behind Your question, then, I think, we can abstract away from Haskell and investigate Your question in more general algorithm concepts. As for Haskell, I think, the two following facts matter:
Functions are first-class citizens in Haskell
Haskell is Turing-complete
but I have not done the discussion yet (how the strength of a language matters exactly here).
Possibility for specific cases, but no-go theorem for comprehensive
I think, in the roots, two theorems of computer science provide the answer. If we want to abstract away from technical details, we can investigate Your question in lambda calculus (or in combinatory logic). Can equality be defined in them? Thus, let us restrict ourselves first to the field of lambda-calculus or combinatory logic.
It must be noted that both of these algorithm approaches are very minimalistic. There are no ``predefined'' datatypes in them, not even numbers, nor booleans, nor lists. But You can mimic all of them, in clever ways.
Instead of booleans, You can use projection (selector) functions (Church booleans).
Instead of C-unions (or C++ class inheritance), You can use continuations. More precisely said, it is case analysis that You can implement in a concise and straightforward way.
You can mimic natural numbers with functions that iterate function composition (Church numeral).
You can implement lists and trees with sophisticated algebraic methods (catamorphisms).
Thus, You can mimic all meaningful datatypes even in such minimalistic "functional languages" like lambda calculus and combinatory logic. You can use lambda functions (or combinators) in a clever scene that mimic the datatype You want.
Now let us try to reply Your questions in these minimalistic functional languages first, to see whether the answer is Haskell-specific, or rather the mere consequence of some more general theorems.
Böhm's theorem provides: for any two previously given different expressions (that halt, and don't freeze the computer), a suitable testing function can be always written that decides correctly whether the two given expressions are semantically the same (Csörnyei 2007: 132, = Th 7.2.2). In most practical cases (lists, trees, booleans, numbers), Böhm's theorem provides that a suitable specific equality function can always be written. See an example for lists in Tromp 1999: Sec 2.
Scott-Curry undecidability theorem excludes that any all-general equality function could be ever written, meaningful for each possible scene (Csörnyei 2007: 140, = Th 7.4.1).
A go-theorem
After You have "implemented" a datatype, you can write its corresponding equality function for that. For most practical cases (lists, numbers, case analysis selections), there is no mystical "datatype" for which a corresponding equality function would lack. This positive answer is provided by Böhm's theorem.
You can write a Church-numeral-equality function that takes two Church numerals, and answers whether they equal. You can write another lambda-function/combinator that takes two (Church)-booleans and answers whether they equal or not. Moreover, You can implement lists in pure lambda calculus/CL (a proposed way is to use the notion of catamorphisms), and then, You can define a function that answers equality for lists of booleans. You can write another function that answers equality for lists of Church numerals. You can implement trees too, and thereafter, You can write a function that answers equality for trees (on booleans, and another, on Church numerals).
You can automatize some of this job, but not all. You can derive automatically some (but not all) equality functions automatically. If You have already the specific map functions for trees and lists, and equality functions for booleans and numbers, then You can derive equality functions also for boolean trees, boolean lists, number lists, number trees automatically.
A no-go theorem
But there is no way to define a single all-automatic equality function working for all possible ,"datatypes". If You "implement" a concrete, given datatype in lambda calculus, You usually have to plan its specific equality function for that scene.
Moreover, there is no way to define a lambda-function that would take two lambda-terms and answer, whether the two lambda-terms would behave the same way when reduced. Even more, there is no way to define a lambda-function that would take the representation (quotation) of two lambda-terms and answer, whether the two original lambda-terms would behave the same way when reduced (Csörnyei 2007: 141, Conseq 7.4.3). That no-go answer is provided by Scott-Curry undecidability theorem (Csörnyei 2007: 140, Th 7.4.1).
In other algorithm approaches
I think, the two above answers are not restricted to lambda calculus and combinatory logic. The similar possibility and restriction applies for some other algorithm concepts. For example, there is no recursive function that would take the Gödel numbers of two unary functions and decide whether these encoded functions behave the same extensionally (Monk 1976: 84, = Cor 5.18). This is a consequence of Rice's theorem (Monk 1976: 84, = Th 5.17). I feel, Rice's theorem sounds formally very similar to Scott-Curry undecidability theorem, but I have not considered it yet.
Comprehensive equality in a very restricted sense
If I wanted to write a combinatory logic interpreter that provides comprehensive equality testing (restricted for halting, normal-form-having terms), then I would implement that so:
I'd reduce both combinatory logic-terms under consideration to their normal forms,
and see whether they are identical as terms.
If so, then their unreduced original forms must have been equivalent semantically too.
But this works only with serious restrictions, although this method works well for several practical goals. We can make operations among numbers, lists, trees etc, and check whether we get the expected result. My quine (written in pure combinatory logic) uses this restricted concept of equality, and it suffices, despite of the fact this quine requires very sophisticated constructs (term trees implemented in combinatory logic itself).
I am yet unknowing what the limits of this restricted equality-concept are, but I suspect, it is very restricted, if compared to the correct definition of equality., The motivation behind it use is that it is computable at all, unlike the unrestricted concept of equality.
The restrictions can be seen also from the fact that this restricted equality concept is able to work only for combinators that have normal forms. For a counterexample: the restricted equality-concept cannot check whether I Ω = Ω, although we know well that the two terms can be converted mutually into each other.
I must consider yet, how the existence of this restricted concept of equality is related to the negative results claimed by Scott-Curry undecidability theorem and Rice's theorem. Both theorems deal with partial functions, but I do not know yet how this matters exactly.
Extensionality
But there are also further limitations of the restricted equality concept. It cannot deal with the concept of extensionality. For example, it does not notice that S K would be related to K I in any way, despite of the fact that S K behaves the same as K I when they are applied to at least two arguments:
The latter example must be explained in more details. We know that S K and K I are not identical as terms: S K ≢ K I. But if we apply both , respectively, to any two arguments X and Y, we see a relatedness:
S K X Y ⊳ K Y (X Y) ⊳ Y
K I X Y ⊳ I Y ⊳ Y
and of course Y ≡ Y, for any Y.
Of course, we cannot "try" such relatedness for each possible X and Y argument instances, because there can be can be infinitely many such CL-term instances to be substituted into these metavariables. But we do not have to stuck in this problem of infinity. If we augment our object language (combinatory logic) with (free) variables:
K is a term
S is a term
Any (free) variable is a term (new line, this is the modification!)
If both X and Y are terms, then also (X Y) is a term
Terms cannot be obtained in any other way
and we define reduction rules accordingly, in a proper way, then we can state an extensional definition of equality in a "finite" way, without relying on metavariables with infinite possible instances.
Thus, if free variables are allowed in combinatory logic terms (object language is augmented with its own object variables), then extensionality can be implemented to some degree. I have not yet considered this. As for the example above, we can use a notation
S K =2 K I
(Curry & Feys & Craig 1958: 162, = 5C 5), based on the fact that S K x y and K I x y can be proven to be equal (already without resorting to extensionality). Here, x and y are not metavariables for infinitely many possible instances of CL-terms in equation schemes, but are first-class citizens of the object language itself. Thus, this equation is no more an equation scheme, but one single equation.
As for the theoretical study, we can mean = by the "union" of =n instances for all n.
Alternatively, equality can be defined so that its inductive definition takes also extensionality in consideration. We add one further rule of inference dealing with extensionality (Csörnyei 2007: 158):
...
...
If E, F are combinators, x is an (object) variable, and x is not contained neither in E nor in F, then, from E x = F x we can infer E = F
The constraint about not-containing is important, as the following counterexample shows: K x ≠ I, despite of being K x x = I x. The "roles" of the two (incidentally identic) variable occurrences differ entirely. Excluding such incidence, that is the motivation of the constraint.
-
The use of this new rule of inference can be exemplified by showing how theorem S K x = K I x can be proven:
S K = K I is regarded to hold because S K x = K I x has already been proven to hold, see proof below:
S K x = K I x is regarded to hold because S K x y = K I x y has already been proven to hold, see below:
S K x y = K I x y can be proven without resorting to extensionality, we need only the familiar conversion rules.
What these remaining rules of inferences are? Here are they listed (Csörnyei 2007: 157):
Conversion axiom schemes:
``K E F = E'' is deducible (K-axiom scheme)
``S F G H = F H (G H)'' is deducible (S-axiom scheme)
Equality axiom schemes and rules of inference
``E = E'' is deducible (Reflexivity axiom scheme)
If "E = F" is deducible, then "F = E" is also deducible (Symmetry rule of inference)
If "E = F" is deducible, and "F = G" is deducible too, then also "E = G" is reducible (Transitivity rule)
If "E = F" is deducible, then "E G = F G" is also deducible (Leibniz rule I)
If "E = F" is deducible, then "G E = G F" is also deducible (Leibniz rule II)
References
Csörnyei, Zoltán (2007): Lambda-kalkulus. A funkcionális programozás alapjai. Budapest: Typotex. ISBN-978-963-9664-46-3.
Curry, Haskell B. & Feys, Robert & Craig, William (1958). Combinatory Logic. Vol. I. Amsterdam: North-Holland Publishing Company.
Madore, David (2003). The Unlambda Programming Language. Unlambda: Your Functional Programming Language Nightmares Come True.
Monk, J. Donald (1976). Mathematical Logic. Graduate Texts in Mathematics. New York • Heidelberg • Berlin: Springer-Verlag.
Tromp, John (1999). Binary Lambda Calculus and Combinatory Logic. Downloadable in PDF and Postscript from the author's John's Lambda Calculus and Combinatory Logic Playground.
Appendix
Böhm's theorem
I have not explained yet clearly how Böhm's theorem is related to the fact that in most practical cases, a suitable equality testing function can surely be written for a meaningful datatype (even in such minimalistic functional languages like pure lambda calculus or combinatory logic).
Statement
Let E and F be two different, closed terms of lambda calculus,
and let both of them have normal forms.
Then, the theorem claims, there is a suitable way for testing equality with applying them to a suitable series of arguments. In other words: there exists a natural number n and a series of closed lambda-terms G1, G2, G3,... Gn such that applying them to this series of arguments reduces to false and true, respectively:
E G1 G2 G3... Gn ⊳ false
F G1 G2 G3... Gn ⊳ true
where true and false are the two well-known, lamb-tame, easily manageable and distinguishable lambda terms:
true ≡ λ x y . x
false ≡ λ x y . y
Application
How this theorem can be exploited for implementing practical datatypes in pure lambda calculus? An implicit application of this theorem is exemplified by the way linked list can be defined in combinatory logic (Tromp 1999: Sec 2).
(==) is part of the type class Eq. A separate implementation is provided by each type that instances Eq. So to find the implementation you should usually look at where your type is defined.
Smells like homework to me. Elaborate on why you find it tricky.
You might look at how ML and various Lisps attempt to solve the problem.
You might also look in the source code of one of other languages interpreters/compilers, some are written with study in mind.

Resources