P-values for two tailed binomial test exceed 1 - statistics

Say I want to test if a coin is fair.
An experiment is performed to determine whether a coin flip is fair (50% chance
of landing heads or tails) or unfairly biased, either toward heads (> 50% chance of
landing heads) or toward tails (< 50% chance of landing heads). Since we consider both biased alternatives, a two-tailed test is performed.
H0 = Coin is fair
H1 = Coin is unfair
Here is the experiment result: 10 Heads and 10 Tails
Then I calculate the probability of events assume H0 the coin is fair
The probability of exactly, or more than, 10 Heads out of 20 tosses is p = .588
By symmetry, the probability of exactly, or more than, 10 Tails out of 20 tosses is the same, .588
Thus, the p-value for the coin turning up the same face 10 times out of 20 total flips is .588 + .588 = 1.176 > 1
But p-value cannot be larger than 1, may I know what is wrong here?
Ref:
PROBABILITY VALUE (p-Value)
Binomial Test Calculator

Case of 10 Heads + 10 Tails is accounted for in both probabilities.
You can see that P(10T) = 0.176, and 0.588 + 0.588 = 1.176 = 1 + P(10T + 10H)

The general relationship for summing event probabilities is
P(A∪B) = P(A) + P(B) - P(A∩B)
Expressed in words, you can only use a simple sum of probabilities for disjoint events.
Your events are A = {#Heads ≥ 10} and B = {#Tails ≥ 10}. {#Tails ≥ 10} => {#Heads ≤ 10}, which makes it clear that the two events are not disjoint because they both include the outcome {#Heads = 10}. Your claim that the probability exceeds 1 fails because you've neglected that intersection term, which has P{#Heads = 10} = 0.176.

Related

If we rolled the die (6-sided) 1000 time, what is the range of times we'd expect to see a 1 rolled?

I got a question listed below about the confidence interval for rolling a die 1000 times. I'm assuming that the question is using Binomial Distribution but not sure if I'm correct. I guess in the solution, the probability 0.94 comes from 1-0.06. But I'm not sure if we need the probability in this interval, except it is only used for the Z-score, 1.88. Could I assume this question like this?
Question:
Assume that we are okay with accidentally rejecting H0​ 6% of the time, assuming H0​ is true.
If we rolled the die (6-sided) 1000 times, what is the range of times we'd expect to see a 1 rolled? (H0​ is the die is fair.)
Answer:
The interval is (144.50135805579743, 188.8319752775359), with probability = 0.94, mu = 166.67, sigma = 11.785113019775793
We can treat this as a binomial distribution with a success chance p of 1/6 and number of trials n = 1000.
Mean value of such a distribution is np, and variance is np(1-p). sigma (or std) is sqrt(variance).
However, finding the interval is not so trivial since it requires an inverse CDF. The solution apparently uses normal approximation (p is low, n is high) with a Z-score table (like https://www.math.arizona.edu/~rsims/ma464/standardnormaltable.pdf) thus range = mu +- 1.88 * sigma. Obviously, binomial is discrete, so there cannot be '145.5 times' of rolling 1. scipy.stats.binom.ppf(0.97, 1000, 1/6) and scipy.stats.binom.ppf(0.03, 1000, 1/6) yield a sane 145..189 range.

Calculating a custom probability distribution in python (numerically)

I have a custom (discrete) probability distribution defined somewhat in the form: f(x)/(sum(f(x')) for x' in a given discrete set X). Also, 0<=x<=1.
So I have been trying to implement it in python 3.8.2, and the problem is that the numerator and denominator both come out to be really small and python's floating point representation just takes them as 0.0.
After calculating these probabilities, I need to sample a random element from an array, whose each index may be selected with the corresponding probability in the distribution. So if my distribution is [p1,p2,p3,p4], and my array is [a1,a2,a3,a4], then probability of selecting a2 is p2 and so on.
So how can I implement this in an elegant and efficient way?
Is there any way I could use the np.random.beta() in this case? Since the difference between the beta distribution and my actual distribution is only that the normalization constant differs and the domain is restricted to a few points.
Note: The Probability Mass function defined above is actually in the form given by the Bayes theorem and f(x)=x^s*(1-x)^f, where s and f are fixed numbers for a given iteration. So the exact problem is that, when s or f become really large, this thing goes to 0.
You could well compute things by working with logs. The point is that while both the numerator and denominator might underflow to 0, their logs won't unless your numbers are really astonishingly small.
You say
f(x) = x^s*(1-x)^t
so
logf (x) = s*log(x) + t*log(1-x)
and you want to compute, say
p = f(x) / Sum{ y in X | f(y)}
so
p = exp( logf(x) - log sum { y in X | f(y)}
= exp( logf(x) - log sum { y in X | exp( logf( y))}
The only difficulty is in computing the second term, but this is a common problem, for example here
On the other hand computing logsumexp is easy enough to to by hand.
We want
S = log( sum{ i | exp(l[i])})
if L is the maximum of the l[i] then
S = log( exp(L)*sum{ i | exp(l[i]-L)})
= L + log( sum{ i | exp( l[i]-L)})
The last sum can be computed as written, because each term is now between 0 and 1 so there is no danger of overflow, and one of the terms (the one for which l[i]==L) is 1, and so if other terms underflow, that is harmless.
This may however lose a little accuracy. A refinement would be to recognize the set A of indices where
l[i]>=L-eps (eps a user set parameter, eg 1)
And then compute
N = Sum{ i in A | exp(l[i]-L)}
B = log1p( Sum{ i not in A | exp(l[i]-L)}/N)
S = L + log( N) + B

How do you find the sample space of flipping unfair coins?

So, usually for unbiased coins, the probability of getting 2 heads out of 3 flips is - 3C2 * 1/2 * 1/2 * 1/2 = 3/8, since we know, the formula for probability is likely events divided by all possible events; we can say that there are 8 possible events here.
Now flip an unbiased coin with the probability of getting heads 80% of the time,
so the probability of getting 2 heads out of 3 flips is -
3C2 * 0.8 * 0.8 * 0.2 = 3/7.8125, so is the sample space 7.8125 here ?
It is still 8. 8 possible results. It's all about classical definition of probability.
In first example (p=50%) each possible result (for example {head, head, not_head}) has the same probability, that's why we can calculate
**total_prob = count_success/count_total = 3*1.000/8 = 0.375**
In the second (p=80%) we don't have this assumption anymore, so cannot use classical definition of probability (count_success/count_total), so we have to calculate
**total_prob = sum_success/count_total = 3*1.024/8 = 0.384**
In general, You can imagine, that in 1st example each result has weight=1.000, and in 2nd example results have different weights (for example {head, head, not_head} has weight=1.024 and {not_head, not_head, not_head} has weight=0.064)

Analyzing the time complexity of Coin changing

We're doing the classic problem of determining the number of ways that we can make change that amounts to Z given a set of coins.
For example, Amount=5 and Coins={1, 2, 3}. One way we can make 5 is {2, 3}.
The naive recursive solution has a time complexity of factorial time.
f(n) = n * f(n-1) = n!
My professor argued that it actually has a time complexity of O(2^n), because we only choose to use a coin or not. That intuitively makes sense. However how come my recurence doesn't work out to be O(2^n)?
EDIT:
My recurrence is as follows:
f(5, {1, 2, 3})
/ \ .....
f(4, {2, 3}) f(3, {1, 3}) .....
Notice how the branching factor decreases by 1 at every step.
Formally.
T(n) = n*F(n-1) = n!
The recurrence doesn't work out to what you expect it to work out to because it doesn't reflect the number of operations made by the algorithm.
If the algorithm decides for each coin whether to output it or not, then you can model its time complexity with the recurrence T(n) = 2*T(n-1) + O(1) with T(1)=O(1); the intuition is that for each coin you have two options---output the coin or not; this obviously solves to T(n)=O(2^n).
I too was trying to analyze the time complexity for the brute force which performs depth first search:
def countCombinations(coins, n, amount, k=0):
if amount == 0:
return 1
res = 0
for i in range(k, n):
if coins[k] <= amount:
remaining_amount = amount - coins[i] # considering this coin, try for remaining sum
# in next round include this coin too
res += countCombinations(coins, n, remaining_amount, i)
return res
but we can see that the coins which are used in one round is used again in the next round, so at least for 1st coin we have n items at each stage which is equivalent to permutation with repetition n^r for n items available to arrange into r positions at each stage.
ex: [1, 1, 1, 1]; sum = 4
This will generate a recursive tree where for first path we literally have solutions at each diverged subpath until we have the sum=0. so the time complexity is O(sum^n) ie for each stage in the path towards sum we have n different subpaths.
Note however there is another algorithm which uses take/not-take approach and at most there is 2 branch at a node in recursion tree. Hence the time complexity for this algorithm is O(2^(n*m))
ex: say coins = [1, 1] sum = 2 there are 11 nodes/points to visit in the recursion tree for 6 paths(leaves) then complexity is at most 2^(2*2) => 2^4 => 16 (Hence 11 nodes visiting for a max of 16 possibility is correct but little loose on upper bound).
def get_count(coins, n, sum):
if(n == 0): # no coins left, to try a combination that matches the sum
return 0
if(sum == 0): # no more sum left to match, means that we have completely co-incided with our trial
return 1 # (return success)
# don't-include the last coin in the sum calc so, leave it and try rest
excluded = get_count(coins, n-1, sum)
included = 0
if(coins[n-1] <= sum):
# include the last coin in the sum calc, so reduce by its quantity in the sum
# we assume here that n is constant ie, it is supplied in unlimited(we can choose same coin again and again),
included = get_count(coins, n, sum-coins[n-1])
return included+excluded

How to avoid impression bias when calculate the ctr?

When we train a ctr(click through rate) model, sometimes we need calcute the real ctr from the history data, like this
#(click)
ctr = ----------------
#(impressions)
We know that, if the number of impressions is too small, the calculted ctr is not real. So we always set a threshold to filter out the large enough impressions.
But we know that the higher impressions, the higher confidence for the ctr. Then my question is that: Is there a impressions-normalized statistic method to calculate the ctr?
Thanks!
You probably need a representation of confidence interval for your estimated ctr. Wilson score interval is a good one to try.
You need below stats to calculate the confidence score:
\hat p is the observed ctr (fraction of #clicked vs #impressions)
n is the total number of impressions
zα/2 is the (1-α/2) quantile of the standard normal distribution
A simple implementation in python is shown below, I use z(1-α/2)=1.96 which corresponds to a 95% confidence interval. I attached 3 test results at the end of the code.
# clicks # impressions # conf interval
2 10 (0.07, 0.45)
20 100 (0.14, 0.27)
200 1000 (0.18, 0.22)
Now you can set up some threshold to use the calculated confidence interval.
from math import sqrt
def confidence(clicks, impressions):
n = impressions
if n == 0: return 0
z = 1.96 #1.96 -> 95% confidence
phat = float(clicks) / n
denorm = 1. + (z*z/n)
enum1 = phat + z*z/(2*n)
enum2 = z * sqrt(phat*(1-phat)/n + z*z/(4*n*n))
return (enum1-enum2)/denorm, (enum1+enum2)/denorm
def wilson(clicks, impressions):
if impressions == 0:
return 0
else:
return confidence(clicks, impressions)
if __name__ == '__main__':
print wilson(2,10)
print wilson(20,100)
print wilson(200,1000)
"""
--------------------
results:
(0.07048879557839793, 0.4518041980521754)
(0.14384999046998084, 0.27112660859398174)
(0.1805388068716823, 0.22099327100894336)
"""
If you treat this as a binomial parameter, you can do Bayesian estimation. If your prior on ctr is uniform (a Beta distribution with parameters (1,1)) then your posterior is Beta(1+#click, 1+#impressions-#click). Your posterior mean is #click+1 / #impressions+2 if you want a single summary statistic of this posterior, but you probably don't, and here's why:
I don't know what your method for determining whether ctr is high enough, but let's say you're interested in everything with ctr > 0.9. You can then use the cumulative density function of the beta distribution to look at what proportion of probability mass is over the 0.9 threshold (this will just be 1 - the cdf at 0.9). In this way, your threshold will naturally incorporate uncertainty about the estimate because of limited sample size.
There are many ways to calculate this confidence interval. An alternative to the Wilson Score is the Clopper-Perrson interval, which I found useful in spreadsheets.
Upper Bound Equation
Lower Bound Equation
Where
B() is the the Inverse Beta Distribution
alpha is the confidence level error (e.g for 95% confidence-level, alpha is 5%)
n is the number of samples (e.g. impressions)
x is the number of successes (e.g. clicks)
In Excel an implementation for B() is provided by the BETA.INV formula.
There is no equivalent formula for B() in Google Sheets, but a Google Apps Script custom function can be adapted from the JavaScript Statistical Library (e.g search github for jstat)

Resources