Calculate probability of an event not by exclusion - statistics

I have some doubt with these kind of problems, example:
"If we asked 20,000 in a stadium to toss a coin 10 times, what it's the probability of at least one person getting 10 heads?"
I took this example from Practical Statistics for Data Scientist.
So, the probability of at least one person getting 10 heads it's calculated using: 1 - P(of nobody in the stadium getting 10 heads).
So we kind of doing an exclude procedure here, first I get the probability of the contrary event I am trying to measure, not the ACTUAL experiment I want to measure: at least one people getting 10 heads.
Why do we do it this way?
How can I calculate the probability of at least someone getting 10 heads but without passing through the probability of no one getting 10 heads?

As #Robert Dodier mentioned in the comments, the reason is that the calculations are simpler. I will use a stadium of 20 people instead of 20000 as an example:
Method 1:
Probability of not getting 10 heads for one individual
= 1 - probability of getting 10 heads
= 1 - 10!/(10!0!)*0.5^10*(1-0.5)^0
= 0.9990234375
Probability of at least one person in the stadium getting 10 heads
= 1 - P(of nobody in the stadium getting 10 heads)
= 1 - 0.9990234375**20 (because all coin tosses are independent)
= 0.019351109194852834
Method 2:
Probability of getting 10 heads for one individual
= 10!/(10!0!)*0.5^10*(1-0.5)^0
= 0.0009765625
Probability of exactly 1, 2, 3, etc. persons in the stadium getting 10 heads:
p1 = 20!/(1!19!)*0.0009765625^1*(1-0.0009765625)^(20-1) = 0.019172021325613825
p2 = 20!/(2!18!)*0.0009765625^2*(1-0.0009765625)^(20-2) = 0.00017803929872270904
p3 = 20!/(3!17!)*0.0009765625^3*(1-0.0009765625)^(20-3) = 1.0442187608370032e-06
p4 = 20!/(4!16!)*0.0009765625^4*(1-0.0009765625)^(20-4) = 4.338152232216289e-09
p5 = 20!/(5!15!)*0.0009765625^5*(1-0.0009765625)^(20-5) = 1.3569977656981548e-11
p6 = 20!/(6!14!)*0.0009765625^6*(1-0.0009765625)^(20-6) = 3.316221323798032e-14
p7 = 20!/(7!13!)*0.0009765625^7*(1-0.0009765625)^(20-7) = 6.483326146232712e-17
p8 = 20!/(8!12!)*0.0009765625^8*(1-0.0009765625)^(20-8) = 1.029853859983202e-19
p9 = 20!/(9!11!)*0.0009765625^9*(1-0.0009765625)^(20-9) = 1.342266353839299e-22
p10 = 20!/(10!10!)*0.0009765625^10*(1-0.0009765625)^(20-10) = 1.443297154665913e-25
p11 = 20!/(11!9!)*0.0009765625^11*(1-0.0009765625)^(20-11) = 1.2825887804726853e-28
p12 = 20!/(12!8!)*0.0009765625^12*(1-0.0009765625)^(20-12) = 9.403143551852531e-32
p13 = 20!/(13!7!)*0.0009765625^13*(1-0.0009765625)^(20-13) = 5.656451493707817e-35
p14 = 20!/(14!6!)*0.0009765625^14*(1-0.0009765625)^(20-14) = 2.7646390487330485e-38
p15 = 20!/(15!5!)*0.0009765625^15*(1-0.0009765625)^(20-15) = 1.0809927854283668e-41
p16 = 20!/(16!4!)*0.0009765625^16*(1-0.0009765625)^(20-16) = 3.3021529369146104e-45
p17 = 20!/(17!3!)*0.0009765625^17*(1-0.0009765625)^(20-17) = 7.59508466888531e-49
p18 = 20!/(18!2!)*0.0009765625^18*(1-0.0009765625)^(20-18) = 1.2373875315877011e-52
p19 = 20!/(19!1!)*0.0009765625^19*(1-0.0009765625)^(20-19) = 1.2732289258503896e-56
p20 = 20!/(20!0!)*0.0009765625^20*(1-0.0009765625)^(20-20) = 6.223015277861142e-61
Probability of at least one person in the stadium getting 10 heads
= p1 + p2 + p3 + p4 + p5 + p6 + p7 + p8 + p9 + p10 +
p11 + p12 + p13 + p14 + p15 + p16 + p17 + p18 + p19 + p20
= 0.01935110919485281
So the result is the same (the tiny difference is due to floating point precision), but as you can see the first calculation is slightly simpler for 20 people, never mind for 20000 ;)

Related

Both bayes_R2 and loo_R2 get weird estimate 1

I ran a brms model on a dataset containing almost 11000 species using the following command.
fit <- brm(formula =brl ~ LH + (1|species),
data = data_,
cov_ranef = list(species=phyloMat),
family = gaussian(),
save_all_pars = T,
chains = 2,
cores = 10,
backend = "cmdstanr",
threads = threading(15)
)
Then, we used bayes_R2 and loo_R2 methods to calculate the r2. However, we got a weird result: R2 estimates were 1 in both cases.
bayes_r2 = bayes_R2(fit)
loo_r2 = loo_R2(fit)
> print(bayes_r2)
Estimate Est.Error Q2.5 Q97.5
R2 1 2.769589e-09 1 1
> print(loo_r2)
Estimate Est.Error Q2.5 Q97.5
R2 1 1.131317e-10 1 1
Can you help us infer what caused this problem?
Thank you very much!

Round up a decimal

I have a problem and i need your help!
Here is the code:
kg_lemons = float(input())
kg_sugar = float(input())
water = float(input())
total_lemon_juice = kg_lemons * 980 #in mililiters need to multiply by 1000
total_lemonade = total_lemon_juice + 5 * 1000 + (0.3 * kg_sugar)
cups_made = total_lemonade / 150
money_made = cups_made * 1.20
print(f'All cups sold: {cups_made:.2f}')
print(f'Money earned: {money_made:.2f}')
At then end, after I print it, it must shown the numbers:
All cups sold: 66
Money earned: 79.20
But I got:
All cups sold: 66.01
Money earned: 79.21
So I need to round it up to the second decimal (the lowest number). Should I use math.floor and, if so, how?

The concise 'how many beers' issue?

Where I'm at
I'm trying to figure out how many beers I can buy with 10 RMB after recycling every bottle I get. It's obvious to me that I'm doing something wrong, procedurally, but it's not occurring to me what that is. I'm currently reading "How To Think Like a Computer Scientist: Think Python" on chapter 9. I feel like this should be an easy program for me, but I'm not sure how to loop in the recycling portion of the app. What would be the most concise way to rinse and repeat beer purchases?
The question
Basically, one beer costs 2 RMB. 2 bins gets 1 RMB. 4 caps gets 1 RMB. I'm starting out with 10 RMB. How many beers can I buy (recycling all the bins and caps)?
#5 bottles 5 caps
#= 3 rmb + 1 caps 1 bottles
#6th bottle bought
#= 2rmb + 2 caps
#7th bottle bought
#= 0rmb + 3 caps 1 bottles.
import math
def countbeers(rmb):
beers = 0;
caps = 0;
bins = 0;
bcost = 2;
for i in range (0,rmb):
beers += 1/2
for i in range (0,math.floor(beers)):
caps += 1
bins += 1
rmb = rmb - bcost
for i in range (0,caps):
rmb += 1/4
for i in range (0,bins):
rmb += 1/2
# if rmb > 2 what goes here, trying to loop back through
return beers
print(countbeers(10))
Second attempt
#5 bottles 5 caps
#= 3 wallet + 1 caps 1 bottles
#6th bottle bought
#= 2wallet + 2 caps
#7th bottle bought
#= 0wallet + 3 caps 1 bottles.
import math
global beers
global caps
global bins
global bcost
beers = 0
caps = 0
bins = 0
bcost = 2
def buybeers(wallet):
beers = 0
for i in range (0,wallet):
beers += 1/2
wallet -= 2
return beers
def drinkbeers(beers):
for i in range (0,math.floor(beers)):
caps += 1
bins += 1
wallet = wallet - bcost
return wallet, caps, bins
def recycle(caps, bins):
for i in range (0,caps):
wallet += 1/4
for i in range (0,bins):
wallet += 1/2
return wallet
def maxbeers(wallet):
if wallet > 2:
buybeers(wallet)
if math.floor(beers) > 1:
drinkbeers(beers)
if caps > 4 | bins > 2:
recycle(caps, bins)
return wallet
wallet = int(input("How many wallet do you have?"))
maxbeers(wallet)
if wallet >= 2:
maxbeers(wallet)
elif wallet < 2:
print(beers)
Your main problem is that you are not looping. Every beer you bought from rmb gives you one more bottle, and one more cap. This new bottle and cap might be enough to earn you another rmb, which might be enough for another beer. Your implementation handles this to a limited extent, since you call maxbeers multiple times, but it will not give the correct answer if you give it a truckload of beers, i.e. 25656 bottles.
If you know the number of rmb you have, you can do the calculation by hand on paper and write this:
def maxbeers(rmb):
return 7 # totally correct, I promise. Checked this by hand.
but that's no fun. What if rmb is 25656?
Assuming we can exchange:
2 bottles -> 1 rmb
4 caps -> 1 rmb
2 rmb -> 1 beer + 1 cap + 1 bottle
we calculate it like this, through simulation:
def q(rmb):
beers = 0
caps = 0
bottles = 0
while rmb > 0:
# buy a beer with rmb
rmb -= 2
beers += 1
caps += 1
bottles += 1
# exchange all caps for rmb
while caps >= 4:
rmb += 1
caps -= 4
# exchange all bottles for rmb
while bottles >= 2:
rmb += 1
bottles -= 2
return beers
for a in range(20):
print("rmb:", a, "beers:", q(a))
Then we can buy 20525 beers.

Calculate cubic bezier T value where tangent is perpendicular to anchor line

Project a cubic bezier p1,p2,p3,p4 onto the line p1,p4. When p2 or p3 does not project onto the line segment between p1 and p4, the curve will bulge out from the anchor points. Is there a way to calculate the T value where the tangent of the curve is perpendicular to the anchor line?
This could also be stated as finding the T values where the projected curve is farthest from the center of the line segment p1,p4. When p2 and p3 project onto the line segment, then the solutions are 0 and 1 respectively. Is there an equation for solving the more interesting case?
The T value seems to depend only on the distance of the mapped control points from the anchor line segment.
I can determine the value by refining guesses, but I am hoping there is a better way.
Edit:
Starting with p1,..,p4 in 2d with values x1,y1, ..., x4,y4 I use the following code based on the answer from Philippe:
dx = x4 - x1;
dy = y4 - y1;
d2 = dx*dx + dy*dy;
p1 = ( (x2-x1)*dx + (y2-y1)*dy ) / d2;
p2 = ( (x3-x1)*dx + (y3-y1)*dy ) / d2;
tr = sqrt( p1*p1 - p1*p2 - p1 + p2*p2 );
t1 = ( 2*p1 - p2 - tr ) / ( 3*p1 - 3*p2 + 1 );
t2 = ( 2*p1 - p2 + tr ) / ( 3*p1 - 3*p2 + 1 );
In the sample I looked at, t2 had to be subtracted from 1.0 before it was correct.
Let's assume you got a 1D cubic Bézier curve with P0 = 0 and P3 = 1 then the curve is:
P(t) = b0,3(t)*0 + b1,3(t)*P1 + b2,3(t)*P2 + b3,3(t)*1
Where bi,3(t) are the Bernstein polynomials of degree 3. Then we're looking for the value of t where this P(t) is minimal and maximal, so we derive:
P'(t) = b1,3'(t)*P1 + b2,3'(t)*P2 + b3,3'(t)
= (3 - 12t + 9t^2)*P1 + (6t - 9t^2)*P2 + 3t^2
= 0
This has a closed-form but nontrivial solution. According to WolframAlpha, when 3P1 - 3P2 +1 != 0 it's:
t = [2*P1 - P2 +/- sqrt(P1^2-P1*P2-P1+P2^2)] / (3*P1 - 3*P2 + 1)
Otherwise it's:
t = 3P1 / (6P1 - 2)
For a general n-dimensional cubic Bézier P0*, P1*, P2*, P3* compute:
P1 = proj(P1*, P03*) / |P3* - P0*|
P2 = proj(P2*, P03*) / |P3* - P0*|
Where proj(P, P03*) is the signed distance from P0* to the point P projected on the line passing through P0* and P3*.
(I haven't checked this, so please confirm there is nothing wrong in my reasoning.)

How do I convert the 2 control points of a cubic curve to the single control point of a quadratic curve?

Having searched the web, I see various people in various forums alluding to approximating a cubic curve with a quadratic one. But I can't find the formula.
What I want is this:
input: startX, startY, control1X, control1Y, control2X, control2Y, endX, endY
output: startX, startY, controlX, controlY, endX, endY
Actually, since the starting and ending points will be the same, all I really need is...
input: startX, startY, control1X, control1Y, control2X, control2Y, endX, endY
output: controlX, controlY
As mentioned, going from 4 control points to 3 is normally going to be an approximation. There's only one case where it will be exact - when the cubic bezier curve is actually a degree-elevated quadratic bezier curve.
You can use the degree elevation equations to come up with an approximation. It's simple, and the results are usually pretty good.
Let's call the control points of the cubic Q0..Q3 and the control points of the quadratic P0..P2. Then for degree elevation, the equations are:
Q0 = P0
Q1 = 1/3 P0 + 2/3 P1
Q2 = 2/3 P1 + 1/3 P2
Q3 = P2
In your case you have Q0..Q3 and you're solving for P0..P2. There are two ways to compute P1 from the equations above:
P1 = 3/2 Q1 - 1/2 Q0
P1 = 3/2 Q2 - 1/2 Q3
If this is a degree-elevated cubic, then both equations will give the same answer for P1. Since it's likely not, your best bet is to average them. So,
P1 = -1/4 Q0 + 3/4 Q1 + 3/4 Q2 - 1/4 Q3
To translate to your terms:
controlX = -0.25*startX + .75*control1X + .75*control2X -0.25*endX
Y is computed similarly - the dimensions are independent, so this works for 3d (or n-d).
This will be an approximation. If you need a better approximation, one way to get it is by subdividing the initial cubic using the deCastlejau algorithm, and then degree-reduce each segment. If you need better continuity, there are other approximation methods that are less quick and dirty.
The cubic can have loops and cusps, which quadratic cannot have. This means that there are not simple solutions nearly never. If cubic is already a quadratic, then the simple solution exists. Normally you have to divide cubic to parts that are quadratics. And you have to decide what are the critical points for subdividing.
http://fontforge.org/bezier.html#ps2ttf says:
"Other sources I have read on the net suggest checking the cubic spline for points of inflection (which quadratic splines cannot have) and forcing breaks there. To my eye this actually makes the result worse, it uses more points and the approximation does not look as close as it does when ignoring the points of inflection. So I ignore them."
This is true, the inflection points (second derivatives of cubic) are not enough. But if you take into account also local extremes (min, max) which are the first derivatives of cubic function, and force breaks on those all, then the sub curves are all quadratic and can be presented by quadratics.
I tested the below functions, they work as expected (find all critical points of cubic and divides the cubic to down-elevated cubics). When those sub curves are drawn, the curve is exactly the same as original cubic, but for some reason, when sub curves are drawn as quadratics, the result is nearly right, but not exactly.
So this answer is not for strict help for the problem, but those functions provide a starting point for cubic to quadratic conversion.
To find both local extremes and inflection points, the following get_t_values_of_critical_points() should provide them. The
function compare_num(a,b) {
if (a < b) return -1;
if (a > b) return 1;
return 0;
}
function find_inflection_points(p1x,p1y,p2x,p2y,p3x,p3y,p4x,p4y)
{
var ax = -p1x + 3*p2x - 3*p3x + p4x;
var bx = 3*p1x - 6*p2x + 3*p3x;
var cx = -3*p1x + 3*p2x;
var ay = -p1y + 3*p2y - 3*p3y + p4y;
var by = 3*p1y - 6*p2y + 3*p3y;
var cy = -3*p1y + 3*p2y;
var a = 3*(ay*bx-ax*by);
var b = 3*(ay*cx-ax*cy);
var c = by*cx-bx*cy;
var r2 = b*b - 4*a*c;
var firstIfp = 0;
var secondIfp = 0;
if (r2>=0 && a!==0)
{
var r = Math.sqrt(r2);
firstIfp = (-b + r) / (2*a);
secondIfp = (-b - r) / (2*a);
if ((firstIfp>0 && firstIfp<1) && (secondIfp>0 && secondIfp<1))
{
if (firstIfp>secondIfp)
{
var tmp = firstIfp;
firstIfp = secondIfp;
secondIfp = tmp;
}
if (secondIfp-firstIfp >0.00001)
return [firstIfp, secondIfp];
else return [firstIfp];
}
else if (firstIfp>0 && firstIfp<1)
return [firstIfp];
else if (secondIfp>0 && secondIfp<1)
{
firstIfp = secondIfp;
return [firstIfp];
}
return [];
}
else return [];
}
function get_t_values_of_critical_points(p1x, p1y, c1x, c1y, c2x, c2y, p2x, p2y) {
var a = (c2x - 2 * c1x + p1x) - (p2x - 2 * c2x + c1x),
b = 2 * (c1x - p1x) - 2 * (c2x - c1x),
c = p1x - c1x,
t1 = (-b + Math.sqrt(b * b - 4 * a * c)) / 2 / a,
t2 = (-b - Math.sqrt(b * b - 4 * a * c)) / 2 / a,
tvalues=[];
Math.abs(t1) > "1e12" && (t1 = 0.5);
Math.abs(t2) > "1e12" && (t2 = 0.5);
if (t1 >= 0 && t1 <= 1 && tvalues.indexOf(t1)==-1) tvalues.push(t1)
if (t2 >= 0 && t2 <= 1 && tvalues.indexOf(t2)==-1) tvalues.push(t2);
a = (c2y - 2 * c1y + p1y) - (p2y - 2 * c2y + c1y);
b = 2 * (c1y - p1y) - 2 * (c2y - c1y);
c = p1y - c1y;
t1 = (-b + Math.sqrt(b * b - 4 * a * c)) / 2 / a;
t2 = (-b - Math.sqrt(b * b - 4 * a * c)) / 2 / a;
Math.abs(t1) > "1e12" && (t1 = 0.5);
Math.abs(t2) > "1e12" && (t2 = 0.5);
if (t1 >= 0 && t1 <= 1 && tvalues.indexOf(t1)==-1) tvalues.push(t1);
if (t2 >= 0 && t2 <= 1 && tvalues.indexOf(t2)==-1) tvalues.push(t2);
var inflectionpoints = find_inflection_points(p1x, p1y, c1x, c1y, c2x, c2y, p2x, p2y);
if (inflectionpoints[0]) tvalues.push(inflectionpoints[0]);
if (inflectionpoints[1]) tvalues.push(inflectionpoints[1]);
tvalues.sort(compare_num);
return tvalues;
};
And when you have those critical t values (which are from range 0-1), you can divide the cubic to parts:
function CPoint()
{
var arg = arguments;
if (arg.length==1)
{
this.X = arg[0].X;
this.Y = arg[0].Y;
}
else if (arg.length==2)
{
this.X = arg[0];
this.Y = arg[1];
}
}
function subdivide_cubic_to_cubics()
{
var arg = arguments;
if (arg.length!=9) return [];
var m_p1 = {X:arg[0], Y:arg[1]};
var m_p2 = {X:arg[2], Y:arg[3]};
var m_p3 = {X:arg[4], Y:arg[5]};
var m_p4 = {X:arg[6], Y:arg[7]};
var t = arg[8];
var p1p = new CPoint(m_p1.X + (m_p2.X - m_p1.X) * t,
m_p1.Y + (m_p2.Y - m_p1.Y) * t);
var p2p = new CPoint(m_p2.X + (m_p3.X - m_p2.X) * t,
m_p2.Y + (m_p3.Y - m_p2.Y) * t);
var p3p = new CPoint(m_p3.X + (m_p4.X - m_p3.X) * t,
m_p3.Y + (m_p4.Y - m_p3.Y) * t);
var p1d = new CPoint(p1p.X + (p2p.X - p1p.X) * t,
p1p.Y + (p2p.Y - p1p.Y) * t);
var p2d = new CPoint(p2p.X + (p3p.X - p2p.X) * t,
p2p.Y + (p3p.Y - p2p.Y) * t);
var p1t = new CPoint(p1d.X + (p2d.X - p1d.X) * t,
p1d.Y + (p2d.Y - p1d.Y) * t);
return [[m_p1.X, m_p1.Y, p1p.X, p1p.Y, p1d.X, p1d.Y, p1t.X, p1t.Y],
[p1t.X, p1t.Y, p2d.X, p2d.Y, p3p.X, p3p.Y, m_p4.X, m_p4.Y]];
}
subdivide_cubic_to_cubics() in above code divides an original cubic curve to two parts by the value t. Because get_t_values_of_critical_points() returns t values as an array sorted by t value, you can easily traverse all t values and get the corresponding sub curve. When you have those divided curves, you have to divide the 2nd sub curve by the next t value.
When all splitting is proceeded, you have the control points of all sub curves. Now there are left only the cubic control point conversion to quadratic. Because all sub curves are now down-elevated cubics, the corresponding quadratic control points are easy to calculate. The first and last of quadratic control points are the same as cubic's (sub curve) first and last control point and the middle one is found in the point, where lines P1-P2 and P4-P3 crosses.
Conventions/terminology
Cubic defined by: P1/2 - anchor points, C1/C2 control points
|x| is the euclidean norm of x
mid-point approx of cubic: a quad that shares the same anchors with the cubic and has the control point at C = (3·C2 - P2 + 3·C1 - P1)/4
Algorithm
pick an absolute precision (prec)
Compute the Tdiv as the root of (cubic) equation sqrt(3)/18 · |P2 - 3·C2 + 3·C1 - P1|/2 · Tdiv ^ 3 = prec
if Tdiv < 0.5 divide the cubic at Tdiv. First segment [0..Tdiv] can be approximated with by a quadratic, with a defect less than prec, by the mid-point approximation. Repeat from step 2 with the second resulted segment (corresponding to 1-Tdiv)
0.5<=Tdiv<1 - simply divide the cubic in two. The two halves can be approximated by the mid-point approximation
Tdiv>=1 - the entire cubic can be approximated by the mid-point approximation
The "magic formula" at step 2 is demonstrated (with interactive examples) on this page.
Another derivation of tfinniga's answer:
First see Wikipedia Bezier curve
for the formulas for quadratic and cubic Bezier curves (also nice animations):
Q(t) = (1-t)^2 P0 + 2 (1-t) t Q + t^2 P3
P(t) + (1-t)^3 P0 + 3 (1-t)^2 t P1 + 3 (1-t) t^2 P2 + t^3 P3
Require these to match at the middle, t = 1/2:
(P0 + 2 Q + P3) / 4 = (P0 + 3 P1 + 3 P2 + P3) / 8
=> Q = P1 + P2 - (P0 + P1 + P2 + P3) / 4
(Q written like this has a geometric interpretation:
Pmid = middle of P0 P1 P2 P3
P12mid = midway between P1 and P2
draw a line from Pmid to P12mid, and that far again: you're at Q.
Hope this makes sense -- draw a couple of examples.)
In general, you'll have to use multiple quadratic curves - many cases of cubic curves can't be even vaguely approximated with a single quadratic curve.
There is a good article discussing the problem, and a number of ways to solve it, at http://www.timotheegroleau.com/Flash/articles/cubic_bezier_in_flash.htm (including interactive demonstrations).
I should note that Adrian's solution is great for single cubics, but when the cubics are segments of a smooth cubic spline, then using his midpoint approximation method causes slope continuity at the nodes of the segments to be lost. So the method described at http://fontforge.org/bezier.html#ps2ttf is much better if you are working with font glyphs or for any other reason you want to retain the smoothness of the curve (which is most probably the case).
Even though this is an old question, many people like me will see it in search results, so I'm posting this here.
I would probably draw a series of curves instead of trying to draw one curve using a different alg. Sort of like drawing two half circles to make up a whole circle.
Try looking for opensource Postcript font to Truetype font converters. I'm sure they have it. Postscript uses cubic bezier curves, whereas Truetype uses quadratic bezier curves. Good luck.

Resources