In machine Learning, and especially in Turning Point Detection Problem, it is important to have the best estimate for the probability distribution function (PDF) of the future samples. Lets say that we have ${x_1, \cdots, x_n}$ as a time series, probably a Gaussian one with $f_{X_1, \cdots, X_n}(x_1, \cdots, x_n)$ as its joint distribution function. We want to estimate the predictive distribution of the next $k$ uncertain samples, having the previous $n$ samples with or wouthout the ARMA/ARIMA assumptions, i.e., we are looking to find the $f_{X_{n+1}, \cdots, X_{n+k}}(x_{n+1}, \cdots, x_{n+k} | x_1, \cdots, x_n)$. With the Normal distribution assumption, what would be the mean and variance of the predictive distribution and how could we estimate them using known $n$ samples.
Related
I came across the following sentence referred to the usual Extended Kalman Filter and I'm trying to make sense of it:
States before the current state are approximated with a normal distribution
What does it mean?
the modeled quantity has uncertainty because it is derived from measurements. you can't be sure it's exactly value X. that's why the quantity is represented by a probability density function (or a cumulative distribution function, which is the integral of that).
a probability distribution can look very arbitrary but there are many "simple" distributions that approximate the real world. you've heard of the normal distribution (gaussian), the uniform distribution (rectangle), ...
the normal distribution (parameters mu and sigma) occurs everywhere in nature so it's likely that your measurements already fit a normal distribution very well.
"a gaussian" implies that your distribution isn't a mixture (sum) of gaussians but a single gaussian.
Say I just have random samples from the Distribution and no other data - e.g. a list of numbers - [1,15,30,4,etc.]. What's the best way to estimate the distribution to draw more samples from it in pytorch?
I am currently assuming that all samples come from a Normal distribution and just using the mean and std of the samples to build it and draw from it. The function, however, can be of any distribution.
samples = torch.Tensor([1,2,3,4,3,2,2,1])
Normal(samples.mean(), samples.std()).sample()
If you have enough samples (and preferably sample dimension is higher than 1), you could model the distribution using Variational Autoencoder or Generative Adversarial Networks (though I would stick with the first approach as it's simpler).
Basically, after correct implementation and training you would get deterministic decoder able to decode hidden code you would pass it (say vector of size 10 taken from normal distribution) into a value from your target distribution.
Note it might not be reliable at all though, it would be even harder if your samples are 1D only.
The best way depends on what you want to achieve. If you don't know the underlying distribution, you need to make assumptions about it and then fit a suitable distribution (that you know how to sample) to your samples. You may start with something simple like a Mixture of Gaussians (several normal distributions with different weightings).
Another way is to define a discrete distribution over the values you have. You will give each value the same probability, say p(x)=1/N. When you sample from it, you simply draw a random integer from [0,N) that points to one of your samples.
I study a problem of a random walk with drift and an absorbing boundary. The system is well theoretically understood. My task is to simulate it numerically, in particular to generate random numbers from this distribution, see the formula. It is the distribution of the coordinate x at time t given the starting point x_0, the noise intensity \sigma and the drift \mu. The question is how to generate random numbers from this distribution? I can of course use the inverse transform sampling, but it is slow. May be I can make use of the fact that the probability density function is the difference of two Gaussian functions? Can I relate somehow my distribution with the normal distribution?
we choose SSE(sum of squared error) for deciding the best fit line instead of sum of residual or sum of absolute residual
The purpose is to allow linear algebra to directly solve for equation coefficients in regression. The other fitting targets you mention cannot be used in this way. Using derivative calculus, it was found that a fitting target of lowest sum of squared error allowed a direct, non-iterative solution to the problem of fitting experimental data to equations that are linear in their coefficients - such as standard polynomial equations.
James is right that the ability to formulate the estimates of regression coefficients as a form of linear algebra is one large advantage of the least squares estimate (minimizing SSE), but using the least squares estimate provides a few other useful properties.
With the least squares estimate you're minimizing the variance of the errors - which is often desired. This gives us the best linear unbiased estimator (BLUE) of the coefficients (given the Gauss–Markov assumptions are met). (Gauss-Markov assumptions and a proof showing why this formulation gives us the best linear unbiased estimates can be found here.)
With the least squares, you also end up with a unique solution (assuming you have more observations than estimated coefficients and no perfect multicollinearity).
As for using the sum of residual, this wouldn’t work well since this would be minimized by having all negative residuals.
But the sum of absolute residual is used in some linear models where you may want the estimates to be more robust to outliers and aren’t necessarily concerned with the variance of the residuals.
I am running an experiment (it's an image processing experiment) in which I have a set of paper samples and each sample has a set of lines. For each line in the paper sample, its strength is calculated which is denoted by say 's'. For a given paper sample I have to find the variation amongst the strength values 's'. If the variation is above a certain limit, we have to discard that paper.
1) I started with the Standard Deviation of the values, but the problem I am facing is that for each sample, order of magnitude for s (because of various properties of line like its length, sharpness, darkness etc) might differ and also the calculated Standard Deviations values are also differing a lot in magnitude. So I can't really use this method for different samples.
Is there any way where I can find that suitable limit which can be applicable for all samples.
I am thinking that since I don't have any history of how the strength value should behave,( for a given sample depending on the order of magnitude of the strength value more variation could be tolerated in that sample whereas because the magnitude is less in another sample, there should be less variation in that sample) I first need to find a way of baselining the variation in different samples. I don't know what approaches I could try to get started.
Please note that I have to tell variation between lines within a sample whereas the limit should be applicable for any good sample.
Please help me out.
You seem to have a set of samples. Then, for each sample you want to do two things: 1) compute a descriptive metric and 2) perform outlier detection. Both of these are vast subjects that require some knowledge of the phenomenology and statistics of the underlying problem. However, below are some ideas to get you going.
Compute a metric
Median Absolute Deviation. If your sample strength s has values that can jump by an order of magnitude across a sample then it is understandable that the standard deviation was not a good metric. The standard deviation is notoriously sensitive to outliers. So, try a more robust estimate of dispersion in your data. For example, the MAD estimate uses the median in the underlying computations which is more robust to a large spread in the numbers.
Robust measures of scale. Read up on other robust measures like the Interquartile range.
Perform outlier detection
Thresholding. This is similar to what you are already doing. However, you have to choose a suitable threshold for the metric computed above. You might consider using another robust metric for thresholding the metric. You can compute a robust estimate of their mean (e.g., the median) and a robust estimate of their standard deviation (e.g., 1.4826 * MAD). Then identify outliers as metric values above some number of robust standard deviations above the robust mean.
Histogram Another simple method is to histogram your computed metrics from step #1. This is non-parametric so it doesn't require you to model your data. If can histogram your metric values and then use the top 1% (or some other value) as your threshold limit.
Triangle Method A neat and simple heuristic for thresholding is the triangle method to perform binary classification of a skewed distribution.
Anomaly detection Read up on other outlier detection methods.