correlation vs causation (A cor B => ∃ D such that DcausA and DcausB)? - statistics

As explained in the title , I wonder how false/true this statement is :
if A and B are correlated , there must be some variable D that has a causation relationship with both B and A .
example : (A : ice cream , B: sun-screen , D : sun)
Intuitively it seem to me that it is true , if we suppose that the statement is false , i feel that the reasoning will lead to some contradiction ( A and B should be correlated by pure chance , without any latent variable that explains both )
Lets conduct this reasoning
suppose A and B are correlated , there must be a Z that causes A (under the assumption that we live in a world of cause and effect - let's not take into account weird observations done in quantum mechanics that are against causality )
In the same way B has some variables Wi that causes it
saying that there is no hidden causation implies that the intersection of set 1(vars that cause A) and set 2 (vars that cause B) is None
to me this sounds very unlikely , since it implies a world where many entities evolve with no interactions , to me this view , and the butterfly effect can not exist in the same world
Thank you for taking time to read this , and helping me push this thought further .

Related

Proof there is no instance for a given UML-Diagram

Given the diagram in the top-right corner, I'm supposed to decide whether there is any valid instance of it. Now the given image is a counterproof by example ('wegen' means 'because of'). The counterproof uses the cardinality ('Mächtigkeit') of the objects.
I don't understand, why for example 2*|A| equals |C|, as in UML, A would be in relation with 2 objects of C (rel1). So for every A there have to be 2 C to make a valid instance. 2*|A| = |C| should therefore be |A| = 2*|C|.
Why is it the other way around?
2*|A| = |C| since there is double the amount of C objects compared to A because each A has two C associated.
|A| = |B| because they have a 1-1 relation
3*|C| = 2*|B| because each C has 3 B and each B has 2 C
(4) and (5) are just substitutions where the last gives a contradiction
q.e.d
P.S. As #ShiDoiSi pointed out there is no {unique} constraint in the multiplicities. This will make it possible to have multiple associations to the same instance. Ergo, you have 1-1 relations. So with that being the case you actually CAN have a valid instantiation of the model.
Now go and tell that to your teacher xD

higher-order quantification that cannot be skolemized

I'm learning Alloy and experimenting with creating predicates for relations being injective and surjective. I tried this in Alloy using the following model:
sig A {}
sig B {}
pred injective(r: A -> B) {
all disj a, a': r.B | no (a.r & a'.r)
}
pred inj {
no r: A -> B | injective[r]
}
run inj for 8
However, I get this error on no r:
Analysis cannot be performed since it requires higher-order
quantification that could not be skolemized.
I've read the portions of Software Abstractions about skolemization and some other SO questions but it's not clear to me what the issue is here. Can I fix this by rephrasing or have I hit a fundamental limitation?
EDIT:
After some experimenting the issue seems to be associated with the negation. Asking for some r: A -> B | injective[r] immediately produces an injective example. This makes conceptual sense as a generally harder problem, but they seem like more or less isomorphic questions in a small scope.
EDIT2:
I've found using the following model that Alloy gives me examples which also satisfy the commented predicate that gives me the error.
sig A {}
sig B {}
pred injective(r: A -> B) {
all disj a, a': r.B | no (a.r & a'.r)
}
pred surjective(r: A -> B) {
B = A.r
}
pred function(f: A -> B) {
all a: f.B | one a.f
}
pred inj {
some s: A -> B | function[s] && surjective[s]
// no r: A -> B | function[r] && injective[r]
}
run inj for 8
Think of it like this: every Alloy analysis involves model finding, in which the solution is a model (called an instance in Alloy) that maps names to relations. To show that no model exists at all for a formula, you can run it and if Alloy finds no solution, then there is no model (in that scope). So if you run
some s: A -> B | function[s] && surjective[s]
and you get no solution, you know there is no surjective function in that scope.
But if you ask Alloy to find a model such that no relation exists with respect to that model, that's a very different question, and requires a higher order analysis, because the solver can't just find a solution: it needs to show that there is no extension of that solution satisfying the negated constraint. Your initial example is like that.
So, yes, it's a fundamental limitation in the sense that higher order logic is fundamentally less tractable. But whether that's a limitation for you in practice is another matter. What analysis do you actually need?
One compelling use of higher order formulas is synthesis. A typical synthesis problem takes the form "find me a program structure P such that there is no counterexample to the specification S". This requires higher order analysis, and is the kind of problem that Alloy* was designed to solve.
But there's generally no reason to use Alloy* for standard verification/simulation problems.

Coefficient of Variations?

I have a list of values incrementing exponentially. I was asked to have multiple Coefficent of variations from them. You might agree with me that CV is only for the whole set of numbers and dividing the set of numbers into subgroups and calculating a CV for each subgroup seems unreasonable. Would there be any statistical idea behind multiple CVs and if there is, how histogram can be made by the CVs, I mean what would the bins of the historgram. I appreciate the answers in advance
I agree with you - it does not make sense to me to calculate multiple CVs for one dataset unless there's some inferential reason for doing so.
That being said, there might actually be a reason for considering sub-groups of a dataset. In the field of Statistics, context is everything. My first thought is to ask your colleague why they want you do proceed that way. Maybe there's a good reason, maybe they don't have as full a grasp of stats as you do, regardless, it should be an enlightening conversation to have.
If you do decide to go this route, here's some R code that might help (R is great - flexible, powerful, and free)
# first, simulating some fake data (100 values of measurement & group for 10 groups)
x <- rnorm(100, mean=10, sd=1)
group <- sample(LETTERS[1:10], 100, replace=T)
# first few values of each
head(data.frame(x, group))
x group
1 10.778480 F
2 9.274193 B
3 9.639143 G
4 9.080369 I
5 10.727895 D
6 10.850306 G
# this is the part you'd actually need...
# calculating the sd & avgs for each group
sds <- tapply(x, group, sd)
avgs <- tapply(x, group, mean)
# then the cv
cvs <- sds/avgs
cvs
A B C D E F G H I J
0.07859528 0.07570556 0.09370247 0.12552468 0.08897856 0.11044543 0.10947615 0.10323379 0.08908262 0.09729945
# and if you want a histogram, R makes it pretty easy
hist(cvs)

Nested if in Gnu Mathprog for an energy model

I have a code in Gnu Mathprog for an energy model:
s.t.EBa1_RateOfFuelProduction1{r in REGION, l in TIMESLICE, f in FUEL, t in TECHNOLOGY, m in MODE_OF_OPERATION, y in YEAR: OutputActivityRatio[r,t,f,m,y] <> 0}:
RateOfActivity[r,l,t,m,y]*OutputActivityRatio[r,t,f,m,y] = RateOfProductionByTechnologyByMode[r,l,t,m,f,y];
s.t.EBa4_RateOfFuelUse1{r in REGION, l in TIMESLICE, f in FUEL, t in TECHNOLOGY, m in MODE_OF_OPERATION, y in YEAR: InputActivityRatio[r,t,f,m,y]<>0}:
RateOfActivity[r,l,t,m,y]*InputActivityRatio[r,t,f,m,y] = RateOfUseByTechnologyByMode[r,l,t,m,f,y];
I want to put these two constraints in one, and i am thinking to insert two conditional expressions(if).The first if, will be referred to technology(t) and fuel(f)where the OutputActivityRatio<>0 and the second one for the same technology(t) it will start checking again the f(fuels) to see if the InputActivityRatio<>0.
Like that:
s.t.RateOfProduction{r in REGION, l in TIMESLICE, f in FUEL, t in TECHNOLOGY, m in MODE_OF_OPERATION, y in YEAR: OutputActivityRatio[r,t,f,m,y] <>0}:
RateOfActivity[r,l,t,m,y]*OutputActivityRatio[r,t,f,m,y] = RateOfProductionByTechnologyByMode[r,l,t,m,f,y]
If InputActivityRatio[r,t,ff,m,y]<>0 then
RateOfActivity[r,l,t,m,y]*InputActivityRatio[r,t,f,m,y] = RateOfUseByTechnologyByMode[r,l,t,m,f,y]
else 0
else 0 ;
My question is: is it possible to have two if in series (nested if) and between them to have an equation as well?How can I write something like that?
Thank you very much!
As described in your other Question (regarding nested if-then-else in mathprog) there are no If-Then-Else statements in mathprog. The workaround with conditional for-loops is also no solution for your problem, since you can only use them in pre- or post processing of your data (you can't use this in your constraints!).
But there are still possibilities to merge your constraints. I think something like the following would work, if your condition is that either Input or Output is 0.
s.t.RateOfProduction{r in REGION, l in TIMESLICE, f in FUEL, t in TECHNOLOGY, m in MODE_OF_OPERATION, y in YEAR}:
(RateOfActivity[r,l,t,m,y]*OutputActivityRatio[r,t,f,m,y])
+ (RateOfActivity[r,l,t,m,y]*InputActivityRatio[r,t,f,m,y])
= RateOfProductionByTechnologyByMode[r,l,t,m,f,y];
Here in the lefthandside summation one multiplication would turn zero.
Since I don't know which parts are variables and which a parameters, this solution could also fail (for example it could be problematic if there is input and output at the same time and the rest of the model doesn't contain the right bounds for that)

Understanding Bayes' Theorem

I'm working on an implementation of a Naive Bayes Classifier. Programming Collective Intelligence introduces this subject by describing Bayes Theorem as:
Pr(A | B) = Pr(B | A) x Pr(A)/Pr(B)
As well as a specific example relevant to document classification:
Pr(Category | Document) = Pr(Document | Category) x Pr(Category) / Pr(Document)
I was hoping someone could explain to me the notation used here, what do Pr(A | B) and Pr(A) mean? It looks like some sort of function but then what does the pipe ("|") mean, etc?
Pr(A | B) = Probability of A happening given that B has already happened
Pr(A) = Probability of A happening
But the above is with respect to the calculation of conditional probability. What you want is a classifier, which uses this principle to decide whether something belongs to a category based on the previous probability.
See http://en.wikipedia.org/wiki/Naive_Bayes_classifier for a complete example
I think they've got you covered on the basics.
Pr(A | B) = Pr(B | A) x Pr(A)/Pr(B)
reads: the probability of A given B is the same as the probability of B given A times the probability of A divided by the probability of B. It's usually used when you can measure the probability of B and you are trying to figure out if B is leading us to believe in A. Or, in other words, we really care about A, but we can measure B more directly, so let's start with what we can measure.
Let me give you one derivation that makes this easier for writing code. It comes from Judea Pearl. I struggled with this a little, but after I realized how Pearl helps us turn theory into code, the light turned on for me.
Prior Odds:
O(H) = P(H) / 1 - P(H)
Likelihood Ratio:
L(e|H) = P(e|H) / P(e|¬H)
Posterior Odds:
O(H|e) = L(e|H)O(H)
In English, we are saying that the odds of something you're interested in (H for hypothesis) are simply the number of times you find something to be true divided by the times you find it not to be true. So, say one house is robbed every day out of 10,000. That means that you have a 1/10,000 chance of being robbed, without any other evidence being considered.
The next one is measuring the evidence you're looking at. What is the probability of seeing the evidence you're seeing when your question is true divided by the probability of seeing the evidence you're seeing when your question is not true. Say you are hearing your burglar alarm go off. How often do you get that alarm when it's supposed to go off (someone opens a window when the alarm is on) versus when it's not supposed to go off (the wind set the alarm off). If you have a 95% chance of a burglar setting off the alarm and a 1% chance of something else setting off the alarm, then you have a likelihood of 95.0.
Your overall belief is just the likelihood * the prior odds. In this case it is:
((0.95/0.01) * ((10**-4)/(1 - (10**-4))))
# => 0.0095009500950095
I don't know if this makes it any more clear, but it tends to be easier to have some code that keeps track of prior odds, other code to look at likelihoods, and one more piece of code to combine this information.
I have implemented it in Python. It's very easy to understand because all formulas for Bayes theorem are in separate functions:
#Bayes Theorem
def get_outcomes(sample_space, f_name='', e_name=''):
outcomes = 0
for e_k, e_v in sample_space.items():
if f_name=='' or f_name==e_k:
for se_k, se_v in e_v.items():
if e_name!='' and se_k == e_name:
outcomes+=se_v
elif e_name=='':
outcomes+=se_v
return outcomes
def p(sample_space, f_name):
return get_outcomes(sample_space, f_name) / get_outcomes(sample_space, '', '')
def p_inters(sample_space, f_name, e_name):
return get_outcomes(sample_space, f_name, e_name) / get_outcomes(sample_space, '', '')
def p_conditional(sample_space, f_name, e_name):
return p_inters(sample_space, f_name, e_name) / p(sample_space, f_name)
def bayes(sample_space, f, given_e):
sum = 0;
for e_k, e_v in sample_space.items():
sum+=p(sample_space, e_k) * p_conditional(sample_space, e_k, given_e)
return p(sample_space, f) * p_conditional(sample_space, f, given_e) / sum
sample_space = {'UK':{'Boy':10, 'Girl':20},
'FR':{'Boy':10, 'Girl':10},
'CA':{'Boy':10, 'Girl':30}}
print('Probability of being from FR:', p(sample_space, 'FR'))
print('Probability to be French Boy:', p_inters(sample_space, 'FR', 'Boy'))
print('Probability of being a Boy given a person is from FR:', p_conditional(sample_space, 'FR', 'Boy'))
print('Probability to be from France given person is Boy:', bayes(sample_space, 'FR', 'Boy'))
sample_space = {'Grow' :{'Up':160, 'Down':40},
'Slows':{'Up':30, 'Down':70}}
print('Probability economy is growing when stock is Up:', bayes(sample_space, 'Grow', 'Up'))
Pr(A | B): Conditional probability of A : i.e. probability of A, given that all we know is B
Pr(A) : Prior probability of A
Pr is the probability, Pr(A|B) is the conditional probability.
Check wikipedia for details.
the pipe (|) means "given".
The probability of A given B is equal to the probability of B given A x Pr(A)/Pr(B)
Based on your question I can strongly advise that you need to read some undergraduate book on Probability Theory first. Without this you will not advance properly with your task on Naive Bayes Classifier.
I would recommend you this book http://www.athenasc.com/probbook.html or look at MIT OpenCourseWare.
The pipe is used to represent conditional probability.
Pr(A | B) = Probability of A given B
Example:
Let's say you are not feeling well and you surf the web for the symptoms. And the internet tells you that if you have these symptoms then you have XYZ disease.
In this case:
Pr(A | B) is what you are trying to find out, which is:
The probability of you having XYZ GIVEN THAT you have certain symptoms.
Pr(A) is the probability of having the disease XYZ
Pr(B) is the probability of having those symptoms
Pr(B | A) is what you find out from the internet, which is:
The probability of having the symptoms GIVEN THAT you have the disease.

Resources