How to generate a random number from a weird distribution - statistics

I study a problem of a random walk with drift and an absorbing boundary. The system is well theoretically understood. My task is to simulate it numerically, in particular to generate random numbers from this distribution, see the formula. It is the distribution of the coordinate x at time t given the starting point x_0, the noise intensity \sigma and the drift \mu. The question is how to generate random numbers from this distribution? I can of course use the inverse transform sampling, but it is slow. May be I can make use of the fact that the probability density function is the difference of two Gaussian functions? Can I relate somehow my distribution with the normal distribution?

Related

Gaussian approximation of old states

I came across the following sentence referred to the usual Extended Kalman Filter and I'm trying to make sense of it:
States before the current state are approximated with a normal distribution
What does it mean?
the modeled quantity has uncertainty because it is derived from measurements. you can't be sure it's exactly value X. that's why the quantity is represented by a probability density function (or a cumulative distribution function, which is the integral of that).
a probability distribution can look very arbitrary but there are many "simple" distributions that approximate the real world. you've heard of the normal distribution (gaussian), the uniform distribution (rectangle), ...
the normal distribution (parameters mu and sigma) occurs everywhere in nature so it's likely that your measurements already fit a normal distribution very well.
"a gaussian" implies that your distribution isn't a mixture (sum) of gaussians but a single gaussian.

Why additive noise needs to be calibrated with sensitivity in differential privacy?

As a beginner to differential privacy, I would like to why the variance for noise mechanisms needs to be calibrated with sensitivity? What is the purpose of that? What happens if we don't calibrate it and add a random variance?
Example scenario here In Laplacian noise, why scale parameter is calibrated?
One way you can understand this intuitively is by imagining a function that returns either of two values, say 0 and a for some real a.
Suppose further that we have an additive noise mechanism, so that we end up with two probability distributions on the real line, as in the image from your attached link (this is an example of the setup above, with a=1):
In pure DP, we are interested in computing the maximum of the ratio of these distributions over the entire real line. As the calculation in your link shows, this ratio is bounded everywhere by e to the power of epsilon.
Now, imagine moving the centers of these distributions further apart, say by shifting the red distribution further to the right (IE, increasing a). Clearly this will place less probability mass from the red distribution on the value 0, which is where the maximum of this ratio will be achieved. Therefore the ratio between these distributions at 0 will be increased--a constant (the mass the blue distribution places on 0) is divided by a smaller number.
One way we could move the ratio back down would be to "fatten" the distributions out. This would correspond pictorally to moving the peaks of the distributions lower, and spreading the mass out over a wider area (since they have to integrate to 1, these two things are necessarily coupled for a distribution like the Laplace). Mathematically we would accomplish this by increasing the variance in the Laplace distribution (increasing b in the parameterization here), which has the effect of lowering the peak of the blue distribution at 0 and raising the mass the red distribution places at 0, thereby reducing the ratio between them back down (a smaller numerator and a larger denominator).
If you perform the calculations, you will find that the relationship between the variance parameter b and the sensitivity of the function f is in fact linear; that is, setting b to be
fixes the maximum of this ratio, to
which is precisely the definition of pure differential privacy.
If you add arbitrary amounts of random noise, you simply end up with random data. Sure, it preserves privacy, but at the same time as destroying any real value in the data. The noise you add needs to match your existing distribution so that it preserves privacy without destroying the value of the data. That’s what the calibration step does.

How can I build a good approximation of an unknown distribution when only having samples from it in order to draw from it in torch?

Say I just have random samples from the Distribution and no other data - e.g. a list of numbers - [1,15,30,4,etc.]. What's the best way to estimate the distribution to draw more samples from it in pytorch?
I am currently assuming that all samples come from a Normal distribution and just using the mean and std of the samples to build it and draw from it. The function, however, can be of any distribution.
samples = torch.Tensor([1,2,3,4,3,2,2,1])
Normal(samples.mean(), samples.std()).sample()
If you have enough samples (and preferably sample dimension is higher than 1), you could model the distribution using Variational Autoencoder or Generative Adversarial Networks (though I would stick with the first approach as it's simpler).
Basically, after correct implementation and training you would get deterministic decoder able to decode hidden code you would pass it (say vector of size 10 taken from normal distribution) into a value from your target distribution.
Note it might not be reliable at all though, it would be even harder if your samples are 1D only.
The best way depends on what you want to achieve. If you don't know the underlying distribution, you need to make assumptions about it and then fit a suitable distribution (that you know how to sample) to your samples. You may start with something simple like a Mixture of Gaussians (several normal distributions with different weightings).
Another way is to define a discrete distribution over the values you have. You will give each value the same probability, say p(x)=1/N. When you sample from it, you simply draw a random integer from [0,N) that points to one of your samples.

What N ((1,0)T , I) mean related to Gaussian Distribution

Hi everyone I am reading a book "Element of Statistical Learning) and came across the below paragraph which i dont I understand. (explains how the training data was generated)
We generated 10 means mk from a bivariate Gaussian distribution N((0,1)T,I) and labeled this class as blue. Similraly, 10 more were drawn from from N((0,1)T,I) and labeled class Orange. Then for each class we generated 100 observations as follows: for each observation, we picked an mk at random with probability 1/10, and then generated a N(mk, I/5), thus leading to a mixture of Gaussian cluster for each class.
I would appreciate if you could explain the above paragraph and especially N((0,1)T,I)
by the way- (0,1) to the power of T for Transpose.
Is this notation mathmatically common or related to a specific computer language.
In the paragraph N stands for the Normal distribution; more specifically, in this case it stands for the Multivariate normal distribution. It is not specific to any programming languages. It comes from statistics and probability theory, but due to numerous appealing properties and important applications of this probability distribution it is also widely used in programming, so you should be able to perform the described procedure in any language.
The part (0,1)^T is a vector of means. That is, we have in mind a random vector of length two, where the first element on average is 0, and the second one on average is 1.
"I" stands for the 2x2 identity matrix whose role is the variance-covariance matrix. That is, the variance of both random vector components is 1 (i.e., the diagonal terms), while off-diagonal points are 0 and correspond to the covariance between the two random variables.

Average and Measure of Spread of 3D Rotations

I've seen several similar questions, and have some ideas of what I might try, but I don't remember seeing anything about spread.
So: I am working on a measurement system, ultimately computer vision based.
I take N captures, and process them using a library which outputs pose estimations in the form of 4x4 affine transformation matrices of translation and rotation.
There's some noise in these pose estimations. The standard deviation in Euler angles for each axis of rotation is less than 2.5 degrees, so all orientations are pretty close to each other (for a case where all Euler angles are close to 0 or 180). Standard errors of less than 0.25 degrees are important to me. But I have already run into the problems endemic to Euler angles.
I want to average all these pretty-close-together pose estimates to get a single final pose estimate. And I also want to find some measure of spread so that I can estimate accuracy.
I'm aware that "average" isn't actually well defined for rotations.
(For the record, my code is in Numpy-heavy Python.)
I also may want to weight this average, since some captures (and some axes) are known to be more accurate than others.
My impression is that I can just take the mean and standard deviation of the translation vector, and that for the rotation I can convert to quaternions, take the mean, and re-normalize with OK accuracy since these quaternions are pretty close together.
I've also heard mentions of least-squares across all the quaternions, but most of my research into how this would be implemented has been a dismal failure.
Is this workable? Is there a reasonably well-defined measure of spread in this context?
Without more info about your geometry setup is hard to answer. Anyway for rotations I would:
create 3 unit vectors
x=(1,0,0),y=(0,1,0),z=(0,0,1)
and apply the rotation on them and call the output
x(i),y(i),z(i)
it is just applying the matrix(i) with position at (0,0,0)
do this for all measurements you have
now average all vectors
X=avg(x(1),x(2),...x(n))
Y=avg(y(1),y(2),...y(n))
Z=avg(z(1),z(2),...z(n))
correct the vector values
so make each of the X,Y,Z unit vectors again and take the axis which is more closest to the rotation axis as main axis. It will stay as is and recompute the remaining two axises as cross product of main axis and the other vector to ensure orthogonality. Beware of the multiplication order (wrong order of operands will negate the output)
construct averaged transform matrix
see transform matrix anatomy as origin you can use averaged origin of the measurement matrices
Moakher wrote a paper that explains there are basically two ways to take an average of Rotation matrices. The first is a weighted average followed by a projection back to SO(3) using the SVD. The second is the Riemannian center of mass. That one is a closer notion to the geometric mean, and its more complicated to compute.

Resources