What do you mean by Memoryless property of Exponential distribution? - statistics

I know memorylessness defines "the next state depends only on the current state and not on the sequence of events that preceded it". However, if some one could explain me how exponential distribution has this property.

THe Exponential distribution is called memoryless because the conditional distribution is the same as the unconditional distribution.
Consider the following example: The cdf of X is denoted by $f(x, \lambda) = \lambda e^{-\lambda x} & x \ge 0 $ ( I have not figured out how to write LaTex in Stackexchange, I think I wrote LaTex in Math Stackexchange in the same way I did here but it displays differently - maybe you can kindly edit it?) You want to know the probability that an event happens after time t. The integral from t to infinity is: $ P(X > t) = e^(-\lambda t) $. Now, the conditional probability of the event happening after time t, given that it did not happen unit time k is again: $ P(Y > t | X > k) = e^(-\lambda t) $. Hence, the distribution is called memoryless.

Via http://www.statlect.com/ucdexp1.htm, a simple explanation with the exponential distribution framed as time to event (X):
X is the time we need to wait before a certain event occurs. The Memoryless property says that the probability that the event happens during a time interval of length y is independent of how much time has already elapsed x without the event happening.

Related

Am I reading the notation of a heavysidestep function correclty for the leaky integrate-fire-or-burst neuron model?

As a passion project I'm recreating a neuronal model from XJ Wang's lab at NYU. The paper is Wei, W., & Wang, X. J. (2016). Inhibitory control in the cortico-basal ganglia-thalamocortical loop: complex regulation and interplay with memory and decision processes. Neuron, 92(5), 1093-1105.
The main problem I'm having is interpreting the equation for calculating the differential of the neurons membrane voltage. They have included a bursting neuronal model for cells in the basal ganglia and subthalamic nucleus. The differential equation for membrane voltage in these regions incorporates a hyperpolarization rebound which results in bursts and tonic spiking. The equation is on page 2 of a prior paper which uses basically the exact same model. I have linked to the paper below and I have provided an image link to the exact passage as well.
http://www.cns.nyu.edu/wanglab/publications/pdf/wei.jns2015.pdf
This is the equation I'm having trouble reading, don't worry about Isyn its the input current from the synapses
The equation is taken from this paper: https://www.physiology.org/doi/pdf/10.1152/jn.2000.83.1.588
Obviously the equation will need to be discritized so I can run it with numpy but I ill ignore that for now as it will be relatively easy to do so. The middle term with all the H's is whats giving me trouble. As I understand it I should be running code which dos the following:
gt * h * H(V-Vh) * (V-Vt)
Where H(V-Vh) is the heavyside step function, V is the membrane voltage at the prior timestep Vh = -60mV and Vt = 120mV. gt is a conductance efficacy constant in nanoSiemens. I think the correct way to interpret this for python is...
gt * h * heavyside(-60, 0.5)*(V-120)
But I'm not 100% sure I'm reading the notation correctly. Could someone please confirm I've read it as it is intended?
Secondly h is the deactivation term which gives rise to bursting as described in the final paragraph on page 2 of Smith et al., 2000 (the second pdf I've linked to). I understand the differential equations that govern the evolution of h well enough but what is the value of h? In Smith et al. 2000 the authors say that h relaxes to zero with a time constant of 20ms and it relaxes to unity with a time constant of 100ms. What value is h relaxing from and what does it mean to relax to unity?
For you x1 (of the numpy.heaviside) is = V-Vh; you are comparing that difference to zero. You might try writing your own version of the Heaviside function to deepen understanding, and then move back to the numpy version if you need it for speed or compatibility. The pseudo-code wordy version would be something like,
if (V<Vh): return(0); else: return(1);
You could probably just write (V>=Vh) in your code as Python will treat the boolean as 1 if true and 0 if false.
This ignores the possibility of V==Vh in the complete version of Heaviside, but for most practical work with real values (even discretized in a computer) that is unlikely to be a case to worth concerning yourself with, but you could easily add it in.

(in excel) randomly generating a power law distribution

I am trying to simulate a number of different distribution types for a project using Excel. Right now, I have generated a normal distribution with a mean of 35 and a standard deviation of 3.33. So far so good.
I would like to also generate some other distribution types.
One I have tried is a lognormal. To get that, I am using the following code:
=(LOGNORM.INV(RAND(),LN(45^2/SQRT(45^2+3.33^2)),SQRT(LN((45^2+3.33^2)/4.5^2))
It produces some output, but I would welcome anyone's input on the syntax.
What I really want to try to do is a power law distribution. From what I can tell, Excel does not have a built-in function to randomly generate this data. Does anyone know of a way to do it, besides switching software packages?
Thanks for any help you can provide.
E
For the (type I) Pareto distribution, if the parameters are a min value xm and an exponent alpha then the cdf is given by
p = 1 - (xm/x)^alpha
This gives the probability, p, that the random variable takes on a value which is <= x. This is easy to invert, so you can use inverse sampling to generate random variables which follow that distribution:
x = xm/(1-p)^(1/alpha) = xm*(1-p)^(-1/alpha)
If p is uniform over [0,1] then so is 1-p, so in the above you can just use RAND() to simulate 1/p. Thus, in Excel if you wanted to e.g. simulate a type-1 Pareto distribution with xm = 2 and alpha = 3, you would use the formula:
= 2 * RAND()^(-1/3)
If you are going to be doing this sort of thing a lot with different distributions, you might want to consider using R, which can be called directly from Excel using the REXcel add-in. R has a very large number of built-in distributions that it can directly sample from (and it also uses a better underlying random number generator than Excel does).

Problems with a function and odeint in python

For a few months I started working with python, considering the great advantages it has. But recently, i used odeint from scipy to solve a system of differential equations. But during the integration process the implemented function doesn't work as expected.
In this case, I want to solve a system of differential equations where one of the initial conditions (x[0]) varies (between 4-5) depending on the value that the variable reaches during the integration process (It is programmed inside of the function by means of the if structure).
#Control of oxygen
SO2_lower=4
SO2_upper=5
if x[0]<=SO2_lower:
x[0]=SO2_upper
When the function is used by odeint, some lines of code inside the function are obviated, even when the functions changes the value of x[0]. Here is all my code:
import numpy as np
from scipy.integrate import odeint
import matplotlib.pyplot as plt
plt.ion()
# Stoichiometric parameters
YSB_OHO_Ox=0.67 #Yield for XOHO growth per SB (Aerobic)
YSB_Stor_Ox=0.85 #Yield for XOHO,Stor formation per SB (Aerobic)
YStor_OHO_Ox=0.63 #Yield for XOHO growth per XOHO,Stor (Aerobic)
fXU_Bio_lys=0.2 #Fraction of XU generated in biomass decay
iN_XU=0.02 #N content of XU
iN_XBio=0.07 #N content of XBio
iN_SB=0.03 #N content of SB
fSTO=0.67 #Stored fraction of SB
#Kinetic parameters
qSB_Stor=5 #Rate constant for XOHO,Stor storage of SB
uOHO_Max=2 #Maximum growth rate of XOHO
KSB_OHO=2 #Half-saturation coefficient for SB
KStor_OHO=1 #Half-saturation coefficient for XOHO,Stor/XOHO
mOHO_Ox=0.2 #Endogenous respiration rate of XOHO (Aerobic)
mStor_Ox=0.2 #Endogenous respiration rate of XOHO,Stor (Aerobic)
KO2_OHO=0.2 #Half-saturation coefficient for SO2
KNHx_OHO=0.01 #Half-saturation coefficient for SNHx
#Other parameters
DT=1/86400.0
def f(x,t):
#Control of oxygen
SO2_lower=4
SO2_upper=5
if x[0]<=SO2_lower:
x[0]=SO2_upper
M=np.matrix([[-(1.0-YSB_Stor_Ox),-1,iN_SB,0,0,YSB_Stor_Ox],
[-(1.0-YSB_OHO_Ox)/YSB_OHO_Ox,-1/YSB_OHO_Ox,iN_SB/YSB_OHO_Ox-iN_XBio,0,1,0],
[-(1.0-YStor_OHO_Ox)/YStor_OHO_Ox,0,-iN_XBio,0,1,-1/YStor_OHO_Ox],
[-(1.0-fXU_Bio_lys),0,iN_XBio-fXU_Bio_lys*iN_XU,fXU_Bio_lys,-1,0],
[-1,0,0,0,0,-1]])
R=np.matrix([[DT*fSTO*qSB_Stor*(x[0]/(KO2_OHO+x[0]))*(x[1]/(KSB_OHO+x[1]))*x[4]],
[DT*(1-fSTO)*uOHO_Max*(x[0]/(KO2_OHO+x[0]))*(x[1]/(KSB_OHO+x[1]))* (x[2]/(KNHx_OHO+x[2]))*x[4]],
[DT*uOHO_Max*(x[0]/(KO2_OHO+x[0]))*(x[2]/(KNHx_OHO+x[2]))*((x[5]/x[4])/(KStor_OHO+(x[5]/x[4])))*(KSB_OHO/(KSB_OHO+x[1]))*x[4]],
[DT*mOHO_Ox*(x[0]/(KO2_OHO+x[0]))*x[4]],
[DT*mStor_Ox*(x[0]/(KO2_OHO+x[0]))*x[5]]])
Mt=M.transpose()
MxRm=Mt*R
MxR=MxRm.tolist()
return ([MxR[0][0],
MxR[1][0],
MxR[2][0],
MxR[3][0],
MxR[4][0],
MxR[5][0]])
#ODE solution
t=np.linspace(0.0,3600,3600)
#Initial conditions
y0=np.array([5,176,5,30,100,5])
Var=odeint(f,y0,t,args=(),h0=1,hmin=1,hmax=1,atol=1e-5,rtol=1e-5)
Sol=Var.tolist()
plt.plot(t,Var[:,0])
Thanks very much in advance!!!!!
Short answer:
You should not modify input state vector inside your ODE function. Instead try the following and verify your results:
x0 = x[0]
if x0<=SO2_lower:
x0=SO2_upper
# use x0 instead of x[0] in the rest of this function body
I suppose that this is your problem, but I am not sure, since you did not explain what exactly was wrong with the results. Moreover, you do not change "initial condition". Initial condition is
y0=np.array([5,176,5,30,100,5])
you just change the input state vector.
Detailed answer:
Your odeint integrator is probably using one of the higher order adaptive Runge-Kutta methods. This algorithm requires multiple ODE function evaluations to calculate single integration step, therefore changing the input state vector may lead to undefined results. In C++ boost::odeint this is even not possible to do so, because input variable is "const". Python however is not as strict as C++ and I suppose that it is possible to make this kind of bug unintentionally (I did not try it, though).
EDIT:
OK, I understand what you want to achieve.
Your variable x[0] is constrained by modular algebra and it is not possible to express in the form
x' = f(x,t)
which is one of the possible definitions of the Ordinary Differential Equation, that ondeint library is meant to solve. However, few possible "hacks" can be used here to bypass this limitation.
One possibility is to use a fixed step and low order (because for higher order solvers you need to know, which part of the algorithm you are actually in, see RK4 for example) solver and change your dx[0] equation (in your code it is MxR[0][0] element) to:
# at the beginning of your system
if (x[0] > S02_lower): # everything is normal here
x0 = x[0]
dx0 = # normal equation for dx0
else: # x[0] is too low, we must somehow force it to become S02_upper again
dx0 = (x[0] - S02_upper)/step_size // assuming that x0_{n+1} = x0_{n} + dx0*step_size
x0 = S02_upper
# remember to use x0 in the rest of your code and also remember to return dx0
However, I do not recommend this technique, because it makes you strongly dependent on the algorithm and you must know the exact step size (although, I may recommend it for saturation constraints). Another possibility is to perform a single integration step at a time and correct your x0 each time it is necessary:
// 1 do_step(sys, in, dxdtin, t, out, dt);
// 2 do something with output
// 3 in = out
// 4 return to 1 or finish
Sorry for C++ syntax, here is the exhaustive documentation (C++ odeint steppers), and here is its equivalent in python (Python ode class). C++ odeint interface is better for your task, however you may achieve exactly the same in python. Just look for:
integrate(t[, step, relax])
set_initial_value(y[, t])
in docs.

How do I efficiently estimate a probability based on a small amount of evidence?

I've been trying to find an answer to this for months (to be used in a machine learning application), it doesn't seem like it should be a terribly hard problem, but I'm a software engineer, and math was never one of my strengths.
Here is the scenario:
I have a (possibly) unevenly weighted coin and I want to figure out the probability of it coming up heads. I know that coins from the same box that this one came from have an average probability of p, and I also know the standard deviation of these probabilities (call it s).
(If other summary properties of the probabilities of other coins aside from their mean and stddev would be useful, I can probably get them too.)
I toss the coin n times, and it comes up heads h times.
The naive approach is that the probability is just h/n - but if n is small this is unlikely to be accurate.
Is there a computationally efficient way (ie. doesn't involve very very large or very very small numbers) to take p and s into consideration to come up with a more accurate probability estimate, even when n is small?
I'd appreciate it if any answers could use pseudocode rather than mathematical notation since I find most mathematical notation to be impenetrable ;-)
Other answers:
There are some other answers on SO that are similar, but the answers provided are unsatisfactory. For example this is not computationally efficient because it quickly involves numbers way smaller than can be represented even in double-precision floats. And this one turned out to be incorrect.
Unfortunately you can't do machine learning without knowing some basic math---it's like asking somebody for help in programming but not wanting to know about "variables" , "subroutines" and all that if-then stuff.
The better way to do this is called a Bayesian integration, but there is a simpler approximation called "maximum a postieri" (MAP). It's pretty much like the usual thinking except you can put in the prior distribution.
Fancy words, but you may ask, well where did the h/(h+t) formula come from? Of course it's obvious, but it turns out that it is answer that you get when you have "no prior". And the method below is the next level of sophistication up when you add a prior. Going to Bayesian integration would be the next one but that's harder and perhaps unnecessary.
As I understand it the problem is two fold: first you draw a coin from the bag of coins. This coin has a "headsiness" called theta, so that it gives a head theta fraction of the flips. But the theta for this coin comes from the master distribution which I guess I assume is Gaussian with mean P and standard deviation S.
What you do next is to write down the total unnormalized probability (called likelihood) of seeing the whole shebang, all the data: (h heads, t tails)
L = (theta)^h * (1-theta)^t * Gaussian(theta; P, S).
Gaussian(theta; P, S) = exp( -(theta-P)^2/(2*S^2) ) / sqrt(2*Pi*S^2)
This is the meaning of "first draw 1 value of theta from the Gaussian" and then draw h heads and t tails from a coin using that theta.
The MAP principle says, if you don't know theta, find the value which maximizes L given the data that you do know. You do that with calculus. The trick to make it easy is that you take logarithms first. Define LL = log(L). Wherever L is maximized, then LL will be too.
so
LL = hlog(theta) + tlog(1-theta) + -(theta-P)^2 / (2*S^2)) - 1/2 * log(2*pi*S^2)
By calculus to look for extrema you find the value of theta such that dLL/dtheta = 0.
Since the last term with the log has no theta in it you can ignore it.
dLL/dtheta = 0 = (h/theta) + (P-theta)/S^2 - (t/(1-theta)) = 0.
If you can solve this equation for theta you will get an answer, the MAP estimate for theta given the number of heads h and the number of tails t.
If you want a fast approximation, try doing one step of Newton's method, where you start with your proposed theta at the obvious (called maximum likelihood) estimate of theta = h/(h+t).
And where does that 'obvious' estimate come from? If you do the stuff above but don't put in the Gaussian prior: h/theta - t/(1-theta) = 0 you'll come up with theta = h/(h+t).
If your prior probabilities are really small, as is often the case, instead of near 0.5, then a Gaussian prior on theta is probably inappropriate, as it predicts some weight with negative probabilities, clearly wrong. More appropriate is a Gaussian prior on log theta ('lognormal distribution'). Plug it in the same way and work through the calculus.
You can use p as a prior on your estimated probability. This is basically the same as doing pseudocount smoothing. I.e., use
(h + c * p) / (n + c)
as your estimate. When h and n are large, then this just becomes h / n. When h and n are small, this is just c * p / c = p. The choice of c is up to you. You can base it on s but in the end you have to decide how small is too small.
You don't have nearly enough info in this question.
How many coins are in the box? If it's two, then in some scenarios (for example one coin is always heads, the other always tails) knowing p and s would be useful. If it's more than a few, and especially if only some of the coins are only slightly weighted then it is not useful.
What is a small n? 2? 5? 10? 100? What is the probability of a weighted coin coming up heads/tail? 100/0, 60/40, 50.00001/49.99999? How is the weighting distributed? Is every coin one of 2 possible weightings? Do they follow a bell curve? etc.
It boils down to this: the differences between a weighted/unweighted coin, the distribution of weighted coins, and the number coins in your box will all decide what n has to be for you to solve this with a high confidence.
The name for what you're trying to do is a Bernoulli trial. Knowing the name should be helpful in finding better resources.
Response to comment:
If you have differences in p that small, you are going to have to do a lot of trials and there's no getting around it.
Assuming a uniform distribution of bias, p will still be 0.5 and all standard deviation will tell you is that at least some of the coins have a minor bias.
How many tosses, again, will be determined under these circumstances by the weighting of the coins. Even with 500 tosses, you won't get a strong confidence (about 2/3) detecting a .51/.49 split.
In general, what you are looking for is Maximum Likelihood Estimation. Wolfram Demonstration Project has an illustration of estimating the probability of a coin landing head, given a sample of tosses.
Well I'm no math man, but I think the simple Bayesian approach is intuitive and broadly applicable enough to put a little though into it. Others above have already suggested this, but perhaps if your like me you would prefer more verbosity.
In this lingo, you have a set of mutually-exclusive hypotheses, H, and some data D, and you want to find the (posterior) probabilities that each hypothesis Hi is correct given the data. Presumably you would choose the hypothesis that had the largest posterior probability (the MAP as noted above), if you had to choose one. As Matt notes above, what distinguishes the Bayesian approach from only maximum likelihood (finding the H that maximizes Pr(D|H)) is that you also have some PRIOR info regarding which hypotheses are most likely, and you want to incorporate these priors.
So you have from basic probability Pr(H|D) = Pr(D|H)*Pr(H)/Pr(D). You can estimate these Pr(H|D) numerically by creating a series of discrete probabilities Hi for each hypothesis you wish to test, eg [0.0,0.05, 0.1 ... 0.95, 1.0], and then determining your prior Pr(H) for each Hi -- above it is assumed you have a normal distribution of priors, and if that is acceptable you could use the mean and stdev to get each Pr(Hi) -- or use another distribution if you prefer. With coin tosses the Pr(D|H) is of course determined by the binomial using the observed number of successes with n trials and the particular Hi being tested. The denominator Pr(D) may seem daunting but we assume that we have covered all the bases with our hypotheses, so that Pr(D) is the summation of Pr(D|Hi)Pr(H) over all H.
Very simple if you think about it a bit, and maybe not so if you think about it a bit more.

Probability of selecting an element from a set

The expected probability of randomly selecting an element from a set of n elements is P=1.0/n .
Suppose I check P using an unbiased method sufficiently many times. What is the distribution type of P? It is clear that P is not normally distributed, since cannot be negative. Thus, may I correctly assume that P is gamma distributed? And if yes, what are the parameters of this distribution?
Histogram of probabilities of selecting an element from 100-element set for 1000 times is shown here.
Is there any way to convert this to a standard distribution
Now supposed that the observed probability of selecting the given element was P* (P* != P). How can I estimate whether the bias is statistically significant?
EDIT: This is not a homework. I'm doing a hobby project and I need this piece of statistics for it. I've done my last homework ~10 years ago:-)
With repetitions, your distribution will be binomial. So let X be the number of times you select some fixed object, with M total selections
P{ X = x } = ( M choose x ) * (1/N)^x * (N-1/N)^(M-x)
You may find this difficult to compute for large N. It turns out that for sufficiently large N, this actually converges to a normal distribution with probability 1 (Central Limit theorem).
In case P{X=x} will be given by a normal distribution. The mean will be M/N and the variance will be M * (1/N) * ( N-1) / N.
This is a clear binomial distribution with p=1/(number of elements) and n=(number of trials).
To test whether the observed result differs significantly from the expected result, you can do the binomial test.
The dice examples on the two Wikipedia pages should give you some good guidance on how to formulate your problem. In your 100-element, 1000 trial example, that would be like rolling a 100-sided die 1000 times.
As others have noted, you want the Binomial distribution. Your question seems to imply an interest in a continuous approximation to it, though. It can actually be approximated by the normal distribution, and also by the Poisson distribution.
Is your distribution a discrete uniform distribution?

Resources