How do you calculate the standard deviation for data which is mainly discrete but has a probability of being continuous? - statistics

I’m having some issue with calculating the standard deviation of a game. In the game you can get several different discrete scores. The scores have a fixed probability which is given. There is also a 5% chance that your score is randomly generated. You do not know the distribution of the random variable you are only given the mean and variance.
I’ve calculated the variance of the main game (ignoring the random variable) to be 5.2. The variance of the random variable is 137. From this I get a standard deviation of
sqrt(5.2 + 5% *137) = 3.47
Is this the correct method?

Related

bollinger bands versus statistics: is 1 standard deviation supposed to be split into two halves by it's mean? or top and bottom bands from mean?

I have a question about how bollinger bands are plotted in relation to statistics. In statistics, once a standard deviation is calculated from a mean of a set of numbers, shouldn't interpreting a 1 standard deviation be done so that you divide this number is half, and plot each half above and below the mean? By doing so, you can then determine whether or not it's data points fall within this 1 standard deviation.
Then, correct me if I am wrong, but aren't bollinger bands NOT calculated this way?? Instead, it takes a 1 standard deviation (if you have set it to 1) and plots the WHOLE value both above and below the mean (not splitting in two), thereby doubling the size of this standard-deviation?
Bollinger bands loosely state that that 68% of data falls within the 1st band, 1 standard deviation (loosely because the empirical rule in statistics requires that distributions be normal distributions which most often stock prices are not). However if this empirical rule is from statistics where 1 standard deviation is split in half, that means that applying a 68% probability in to an entire bollinger band is wrong. ??? is this correct??
You can modify the deviation multiples to suite your purpose, you can use 0.5 for example.

90% Confidence ellipsoid of 3 dimensinal data

i did get to know confidence ellipses during university (but that has been some semesters ago).
In my current project, I'd like to calculate a 3 dimensional confidence ellipse/ellipsoid in which I can set the probability of success to e.g. 90%. The center of the data is shifted from zero.
At the moment i am calculating the variance-covariance matrix of the dataset and from it its eigenvalues and eigenvectors which i then represent as an ellipsoid.
here, however, I am missing the information on the probability of success, which I cannot specify.
What is the correct way to calculate a confidence ellipsoid with e.g. 90% probability of success ?

Weighted Least Squares vs Monte Carlo comparison

I have an experimental dataset of the following values (y, x1, x2, w), where y is the measured quantity, x1 and x2 are the two independet variables and w is the error of each measurement.
The function I've chosen to describe my data is
These are my tasks:
1) Estimate values of bi
2) Estimate their standard errors
3) Calculate predicted values of f(x1, x2) on a mesh grid and estimate their confidence intervals
4) Calculate predicted values of
and definite integral
and their confidence intervals on a mesh grid
I have several questions:
1) Can all of my tasks be solved by weighted least squares? I've solved task 1-3 using WLS in matrix form by linearisation of the chosen function, but I have no idea, how to solve step №4.
2) I've performed Monte Carlo simulations to estimate bi and their s.e. I've generated perturbated values y'i from normal distribution with mean yi and standard deviation wi. I did this operation N=5000 times. For each perturbated dataset I estimated b'i, and from 5000 values of b'i I calculated mean values and their standard distribution. In the end, bi estimated from Monte-Carlo simulation coincide with those found by WLS. Am I correct, that standard deviations of b'i must be devided by № of Degrees of freedom to obtain standard error?
3) How to estimate confidence bands for predicted values of y using Monte-Carlo approach? I've generated a bunch of perturbated bi values from normal distribution using their BLUE as mean and standard deviations. Then I calculated lots of predicted values of f(x1,x2), found their means and standard deviations. Values of f(x1,x2) found by WLS and MC coincide, but s.d. found from MC are 5-45 order higher than those from WLS. What is the scaling factor that I'm missing here?
4) It seems that some of parameters b are not independent of each other, since there are only 2 independent variables. Should I take this into account in question 3, when I generate bi values? If yes, how can this be done? Should I use Chi-squared test to decide whether generated values of bi are suitable for further calculations, or should they be rejected?
In fact, I not only want to solve tasks I've mentioned earlier, but also I want to compare the two methods for regression analysys. I would appreciate any help and suggestions!

Convert GMM-UBM scores to equicalent accuracy percent

I have constructed a GMM-UBM model for the speaker recognition purpose. The output of models adapted for each speaker some scores calculated by log likelihood ratio. Now I want to convert these likelihood scores to equivalent number between 0 and 100. Can anybody guide me please?
There is no straightforward formula. You can do simple things like
prob = exp(logratio_score)
but those might not reflect the true distribution of your data. The computed probability percentage of your samples will not be uniformly distributed.
Ideally you need to take a large dataset and collect statistics on what acceptance/rejection rate do you have for what score. Then once you build a histogram you can normalize the score difference by that spectrogram to make sure that 30% of your subjects are accepted if you see the certain score difference. That normalization will allow you to create uniformly distributed probability percentages. See for example How to calculate the confidence intervals for likelihood ratios from a 2x2 table in the presence of cells with zeroes
This problem is rarely solved in speaker identification systems because confidence intervals is not what you want actually want to display. You need a simple accept/reject decision and for that you need to know the amount of false rejects and accept rate. So it is enough to find just a threshold, not build the whole distribution.

How do I calculate the standard deviation between weighted measurements?

I have several weighted values for which I am taking a weighted average. I want to calculate a weighted standard deviation using the weighted values and weighted average. How would I modify the typical standard deviation to include weights on each measurement?
This is the standard deviation formula I am using.
When I simply use each weighted value for 'x' and the weighted average for '\bar{x}', the result seems smaller than it should be.
I just found this wikipedia page discussing data of equal significance vs weighted data. The correct way to calculate the biased weighted estimator of variance is
,
though the following, on-the-fly implementation, is more efficient computationally as it does not require calculating the weighted average before looping over the sum on the weighted differences squared
.
Despite my skepticism, I tried both and got the exact same results.
Note, be sure to use the weighted average
.

Resources