In Excel, I want to generate arrival times for a simulation (illustration) of a M/M/1 queue.
Jobs arrive according to a Poisson process. I found POISSON and POISSON.DIST functions in Excel, but not an inverse Poisson distribution function. I figured that since Normal distribution with mean λ and variance λ is supposed to be a good approximation of Poisson distribution (given large enough time intervals), I tried to use inverse Normal distribution function to simulate the intervals between arrivals:
=NORM.INV(RAND(), mean, SQRT(mean))
And to compute the arrival times (Excel format of time is in fractions of a day):
=IFERROR(previous_time + interval_in_seconds/60/60/24, 0)
I am no expert in statistics, but my simulated intervals look a bit too regular for it to be a Poisson process (see illustration below for λ = 1/10s) - what am I doing wrong plz??
I realized my mistake after a good night's sleep that there is an important distinction between these 2 concepts:
Poisson Process
A renewal process with exponentially distributed renewal intervals.
Poisson Distribution
A discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time.
So while number of jobs that arrive according to Poisson process during a time interval x follow Poisson distribution with parameter λx, the inter-arrival times of this process are distributed exponentially.
Inverse exponential function can be written in Excel as follows:
=-LN(RAND()) * mean
Illustration for λ = 1/10s:
Related
I study a problem of a random walk with drift and an absorbing boundary. The system is well theoretically understood. My task is to simulate it numerically, in particular to generate random numbers from this distribution, see the formula. It is the distribution of the coordinate x at time t given the starting point x_0, the noise intensity \sigma and the drift \mu. The question is how to generate random numbers from this distribution? I can of course use the inverse transform sampling, but it is slow. May be I can make use of the fact that the probability density function is the difference of two Gaussian functions? Can I relate somehow my distribution with the normal distribution?
I've got files with irradiance data measured every minute 24 hours a day.
So if there is a day without any clouds on the sky the data shows a nice continuous bell curves.
When looking for a day without any clouds in the data I always plotted month after month with gnuplot and checked for nice bell curves.
I was wondering If there's a python way to check, if the Irradiance measurements form a continuos bell curve.
Don't know if the question is too vague but I'm simply looking for some ideas on that quest :-)
For a normal distribution, there are normality tests.
In short, we abuse some knowledge we have of what normal distributions look like to identify them.
The kurtosis of any normal distribution is 3. Compute the kurtosis of your data and it should be close to 3.
The skewness of a normal distribution is zero, so your data should have a skewness close to zero
More generally, you could compute a reference distribution and use a Bregman Divergence, to assess the difference (divergence) between the distributions. bin your data, create a histogram, and start with Jensen-Shannon divergence.
With the divergence approach, you can compare to an arbitrary distribution. You might record a thousand sunny days and check if the divergence between the sunny day and your measured day is below some threshold.
Just to complement the given answer with a code example: one can use a Kolmogorov-Smirnov test to obtain a measure for the "distance" between two distributions. SciPy offers a neat interface for this, called kstest:
from scipy import stats
import numpy as np
data = np.random.normal(size=100) # Our (synthetic) dataset
D, p = stats.kstest(data, "norm") # Perform a one-sided Kolmogorov-Smirnov test
In the above example, D denotes the distance between our data and a Gaussian normal (norm) distribution (smaller is better), and p denotes the corresponding p-value. Other distributions can be similarly tested by substituting norm with those implemented in scipy.stats.
I have objects which are randomly distributed in the X axis. The objects have a periodic (m) distribution with a slight variation of their position around multiples of m.
The graph here shows a distribution for m=100.
Is there a way to calculate m using the statistics of distribution?
Thanks!
Do you know the error distribution? If so, for example, it's a 0 mean Gaussian with variance \sigma^2, then you can calculate the likelihood of the data as a function of the unknown period m. Once you can do this you can try to solve an optimization and find the period that has maximum likelihood.
Hey i am trying to calculate a cosinor analysis in statistica but am at a loss as to how to do so. I need to calculate the MESOR, AMPLITUDE, and ACROPHASE of ciracadian rhythm data.
http://www.wepapers.com/Papers/73565/Cosinor_analysis_of_accident_risk_using__SPSS%27s_regression_procedures.ppt
there is a link that shows how to do it, the formulas and such, but it has not given me much help. Does anyone know the code for it, either in statistica or SPSS??
I really need to get this done because it is for an important paper
I don't have SPSS or Statistica, so I can't tell you the exact "push-this-button" kind of steps, but perhaps this will help.
Cosinor analysis is fitting a cosine (or sine) curve with a known period. The main idea is that the non-linear problem of fitting a cosine function can be reduced to a problem that is linear in its parameters if the period is known. I will assume that your period T=24 hours.
You should already have two variables: Time at which the measurement is taken, and Value of the measurement (these, of course, might be called something else).
Now create two new variables: SinTime = sin(2 x pi x Time / 24) and CosTime = cos(2 x pi x Time / 24) - this is desribed on p.11 of the presentation you linked (x is multiplication). Use pi=3.1415 if the exact value is not built-in.
Run multiple linear regression with Value as outcome and SinTime and CosTime as two predictors. You should get estimates of their coefficients, which we will call A and B.
The intercept term of the regression model is the MESOR.
The AMPLITUDE is sqrt(A^2 + B^2) [square root of A squared plus B squared]
The ACROPHASE is arctan(- B / A), where arctan is the inverse function of tan. The last two formulas are from p.14 of the presentation.
The regression model should also give you an R-squared value to see how well the 24 hour circadian pattern fits the data, and an overall p-value that tests for the presence of a circadian component with period 24 hrs.
One can get standard errors on amplitude and phase using standard error propogation formulas, but that is not included in the presentation.
The expected probability of randomly selecting an element from a set of n elements is P=1.0/n .
Suppose I check P using an unbiased method sufficiently many times. What is the distribution type of P? It is clear that P is not normally distributed, since cannot be negative. Thus, may I correctly assume that P is gamma distributed? And if yes, what are the parameters of this distribution?
Histogram of probabilities of selecting an element from 100-element set for 1000 times is shown here.
Is there any way to convert this to a standard distribution
Now supposed that the observed probability of selecting the given element was P* (P* != P). How can I estimate whether the bias is statistically significant?
EDIT: This is not a homework. I'm doing a hobby project and I need this piece of statistics for it. I've done my last homework ~10 years ago:-)
With repetitions, your distribution will be binomial. So let X be the number of times you select some fixed object, with M total selections
P{ X = x } = ( M choose x ) * (1/N)^x * (N-1/N)^(M-x)
You may find this difficult to compute for large N. It turns out that for sufficiently large N, this actually converges to a normal distribution with probability 1 (Central Limit theorem).
In case P{X=x} will be given by a normal distribution. The mean will be M/N and the variance will be M * (1/N) * ( N-1) / N.
This is a clear binomial distribution with p=1/(number of elements) and n=(number of trials).
To test whether the observed result differs significantly from the expected result, you can do the binomial test.
The dice examples on the two Wikipedia pages should give you some good guidance on how to formulate your problem. In your 100-element, 1000 trial example, that would be like rolling a 100-sided die 1000 times.
As others have noted, you want the Binomial distribution. Your question seems to imply an interest in a continuous approximation to it, though. It can actually be approximated by the normal distribution, and also by the Poisson distribution.
Is your distribution a discrete uniform distribution?