Find the period in a random distribution - statistics

I have objects which are randomly distributed in the X axis. The objects have a periodic (m) distribution with a slight variation of their position around multiples of m.
The graph here shows a distribution for m=100.
Is there a way to calculate m using the statistics of distribution?
Thanks!

Do you know the error distribution? If so, for example, it's a 0 mean Gaussian with variance \sigma^2, then you can calculate the likelihood of the data as a function of the unknown period m. Once you can do this you can try to solve an optimization and find the period that has maximum likelihood.

Related

How to generate a random number from a weird distribution

I study a problem of a random walk with drift and an absorbing boundary. The system is well theoretically understood. My task is to simulate it numerically, in particular to generate random numbers from this distribution, see the formula. It is the distribution of the coordinate x at time t given the starting point x_0, the noise intensity \sigma and the drift \mu. The question is how to generate random numbers from this distribution? I can of course use the inverse transform sampling, but it is slow. May be I can make use of the fact that the probability density function is the difference of two Gaussian functions? Can I relate somehow my distribution with the normal distribution?

Finding 1-hop, 2-hop, ..., k-hop neighbors in Python using Networkx

I am trying to find 1-hop, 2-hop, and if needed, k-hop neighbors of some specific nodes (lets say l nodes) in a graph using nx.single_source_dijkstra_path_length.
what is the time complexity according to each step (1-hop, 2-hop, ...), and
is there any faster algorithm?
If you are looking at unweighted graph, you can use breadth-first-search and the time complexity for small k should be on average O(<k>^k), where <k>is the average degree of the regarded graph.
If you want to calculate multiple distances in a weighted graph, you should rather use the multi_source_dijkstra_path_length. I am not sure which runtime this algorithm has, but probably it is an improvement in comparison to multiple runs of Dijkstra, which has O(|V|log(|V|)+|E|) (depending on the implementation).
If you want to use a threshold the maximal distance in a weighted graph, it probably depends on the weight distribution on your edges and the minimal or average edge weight, which influences the number of nodes needed to calculate to reach the threshold.

Check if numbers form bell curve (gauss distribution) Python 3

I've got files with irradiance data measured every minute 24 hours a day.
So if there is a day without any clouds on the sky the data shows a nice continuous bell curves.
When looking for a day without any clouds in the data I always plotted month after month with gnuplot and checked for nice bell curves.
I was wondering If there's a python way to check, if the Irradiance measurements form a continuos bell curve.
Don't know if the question is too vague but I'm simply looking for some ideas on that quest :-)
For a normal distribution, there are normality tests.
In short, we abuse some knowledge we have of what normal distributions look like to identify them.
The kurtosis of any normal distribution is 3. Compute the kurtosis of your data and it should be close to 3.
The skewness of a normal distribution is zero, so your data should have a skewness close to zero
More generally, you could compute a reference distribution and use a Bregman Divergence, to assess the difference (divergence) between the distributions. bin your data, create a histogram, and start with Jensen-Shannon divergence.
With the divergence approach, you can compare to an arbitrary distribution. You might record a thousand sunny days and check if the divergence between the sunny day and your measured day is below some threshold.
Just to complement the given answer with a code example: one can use a Kolmogorov-Smirnov test to obtain a measure for the "distance" between two distributions. SciPy offers a neat interface for this, called kstest:
from scipy import stats
import numpy as np
data = np.random.normal(size=100) # Our (synthetic) dataset
D, p = stats.kstest(data, "norm") # Perform a one-sided Kolmogorov-Smirnov test
In the above example, D denotes the distance between our data and a Gaussian normal (norm) distribution (smaller is better), and p denotes the corresponding p-value. Other distributions can be similarly tested by substituting norm with those implemented in scipy.stats.

Determine Uncertainty in Peak Value of Spectrum (Standard Error or Parameter Error)

I want to extract the position of a peak from a spectrum (energy spectrum of scattered photons). To do so, I am using scipy.optimize.curve_fit to fit a Gaussian to the region of the spectrum that resembles the Gaussian.
How do I find the uncertainty of the peak value? The peak value itself will be given by the result for the mean parameter from the Gaussian regression.
There are two things that came to my mind:
I get covariance values from the minimisation routine from which I get the error on the mean parameter.
Also, I could think about using the sigma of the Gaussian to get to the
error of the mean.
My thoughts on this would be, that the error on the mean parameter cannot be the wrong way to go. And I would also wager that the standard error does not really tell us the uncertainty with which we know the peak value. It tells us about the shape of the distribution but not about the uncertainty in the peak value (which, for simplicity, we believe to have a true, sharply defined value.)
(This is a repost of a question I originally posted on stats.stackoverflow where I did not get any answers after 2 days.)
The peak value is the mean of the Gaussian distribution, so the standard error of the mean parameter gives the uncertainty of the peak. The sigma parameter describes the width of the peak and has its own uncertainty. If you are measuring a wide peak and took a good measurement, you would get a large sigma but a low peak uncertainty (or standard error).

Probability of selecting an element from a set

The expected probability of randomly selecting an element from a set of n elements is P=1.0/n .
Suppose I check P using an unbiased method sufficiently many times. What is the distribution type of P? It is clear that P is not normally distributed, since cannot be negative. Thus, may I correctly assume that P is gamma distributed? And if yes, what are the parameters of this distribution?
Histogram of probabilities of selecting an element from 100-element set for 1000 times is shown here.
Is there any way to convert this to a standard distribution
Now supposed that the observed probability of selecting the given element was P* (P* != P). How can I estimate whether the bias is statistically significant?
EDIT: This is not a homework. I'm doing a hobby project and I need this piece of statistics for it. I've done my last homework ~10 years ago:-)
With repetitions, your distribution will be binomial. So let X be the number of times you select some fixed object, with M total selections
P{ X = x } = ( M choose x ) * (1/N)^x * (N-1/N)^(M-x)
You may find this difficult to compute for large N. It turns out that for sufficiently large N, this actually converges to a normal distribution with probability 1 (Central Limit theorem).
In case P{X=x} will be given by a normal distribution. The mean will be M/N and the variance will be M * (1/N) * ( N-1) / N.
This is a clear binomial distribution with p=1/(number of elements) and n=(number of trials).
To test whether the observed result differs significantly from the expected result, you can do the binomial test.
The dice examples on the two Wikipedia pages should give you some good guidance on how to formulate your problem. In your 100-element, 1000 trial example, that would be like rolling a 100-sided die 1000 times.
As others have noted, you want the Binomial distribution. Your question seems to imply an interest in a continuous approximation to it, though. It can actually be approximated by the normal distribution, and also by the Poisson distribution.
Is your distribution a discrete uniform distribution?

Resources