How to find the variance of the sample mean, given the population mean, standard deviation and sample size - statistics

Problem Statement : Given a sample of size n = 60 taken from a continuous population distribution with mean 56 and standard deviation 25, find the variance of the sample mean.
I tried the below code but as expected, there is no fixed answer. And my answer is shown incorrect.
dist = scipy.stats.norm(loc=56, scale=25)
sample = dist.rvs(60)
x = np.var(sample)

Err = math.sqrt(25/60)
dist = scipy.stats.norm(loc=56, scale=Err)
Variance = np.variance(dist)
Its something around 10.52

Related

How to manually find residual/regression SS and Standard Error

I have been utilizing the Analysis ToolPak Add-Ins built in Regression tool in order to determine whether or not a stock's historical returns (Y-Range) compared that to the VFINX (X-Range) are significantly valid as seen below:
In this case, the F value is < 4 and t-stat is < 1.67, implying that this stock should be skipped over.
I am interested in determining the steps this tool takes provided the given inputs to calculate these statistics in order to manually compute them within VBA.
Here is what I know so far:
F = Regression MS / Residual MS
MS = SS / df
Total SS = DEVSQ(Y-Range)
Regression df = 1
Residual df = Count(Y-Range) - 2
Total df = Count(Y-Range) - 1
t-stat = Beta / StandardError
beta = Slope(Y-Range, X-Range)
The calculations I am missing are:
Regression SS = ??
Residual SS = ??
StandardError = ??
I am hoping that there is a relatively easy function/formula that I can utilize to calculate these missing values, as I am not looking to keep the process lean and fast.
After doing a bit more digging I found the LINEST function that handles this perfectly.
For the F Value:
=INDEX(LINEST(Y-Range, X-Range, TRUE, TRUE),4)
For the t-stat:
=INDEX(LINEST(Y-Range, X-Range, TRUE, TRUE),1) / INDEX(LINEST(Y-Range, X-Range, TRUE, TRUE),2)

How to calculate the standard deviation from a histogram? (Python, Matplotlib)

Let's say I have a data set and used matplotlib to draw a histogram of said data set.
n, bins, patches = plt.hist(data, normed=1)
How do I calculate the standard deviation, using the n and bins values that hist() returns? I'm currently doing this to calculate the mean:
s = 0
for i in range(len(n)):
s += n[i] * ((bins[i] + bins[i+1]) / 2)
mean = s / numpy.sum(n)
which seems to work fine as I get pretty accurate results. However, if I try to calculate the standard deviation like this:
t = 0
for i in range(len(n)):
t += (bins[i] - mean)**2
std = np.sqrt(t / numpy.sum(n))
my results are way off from what numpy.std(data) returns. Replacing the left bin limits with the central point of each bin doesn't change this either. I have the feeling that the problem is that the n and bins values don't actually contain any information on how the individual data points are distributed within each bin, but the assignment I'm working on clearly demands that I use them to calculate the standard deviation.
You haven't weighted the contribution of each bin with n[i]. Change the increment of t to
t += n[i]*(bins[i] - mean)**2
By the way, you can simplify (and speed up) your calculation by using numpy.average with the weights argument.
Here's an example. First, generate some data to work with. We'll compute the sample mean, variance and standard deviation of the input before computing the histogram.
In [54]: x = np.random.normal(loc=10, scale=2, size=1000)
In [55]: x.mean()
Out[55]: 9.9760798903061847
In [56]: x.var()
Out[56]: 3.7673459904902025
In [57]: x.std()
Out[57]: 1.9409652213499866
I'll use numpy.histogram to compute the histogram:
In [58]: n, bins = np.histogram(x)
mids is the midpoints of the bins; it has the same length as n:
In [59]: mids = 0.5*(bins[1:] + bins[:-1])
The estimate of the mean is the weighted average of mids:
In [60]: mean = np.average(mids, weights=n)
In [61]: mean
Out[61]: 9.9763028267760312
In this case, it is pretty close to the mean of the original data.
The estimated variance is the weighted average of the squared difference from the mean:
In [62]: var = np.average((mids - mean)**2, weights=n)
In [63]: var
Out[63]: 3.8715035807387328
In [64]: np.sqrt(var)
Out[64]: 1.9676136767004677
That estimate is within 2% of the actual sample standard deviation.
The following answer is equivalent to Warren Weckesser's, but maybe more familiar to those who prefer to want mean as the expected value:
counts, bins = np.histogram(x)
mids = 0.5*(bins[1:] + bins[:-1])
probs = counts / np.sum(counts)
mean = np.sum(probs * mids)
sd = np.sqrt(np.sum(probs * (mids - mean)**2))
Do take note in certain context you may want the unbiased sample variance where the weights are not normalized by N but N-1.

Bit Error Probability

I want to ask for any help to solve my problem. I want to compute BER (Bit error rate) for a nine combination points as illustrated below. I used the computation for SER and then I convert it to BER, but the result was incorrect. Any site or suggestion please?
Many thanks
Othman
My code is:
clear all clc
SNR = 0:40;
SNRL = 10.^(SNR./10);
Eb=1;
sigma = sqrt(2*Eb./SNRL);
d2 = 0.3;
Pe = 14/9*erfc((d2)./sqrt(2*sigma.^2))+2/9*erfc((0)./sqrt(2*sigma.^2));
semilogy(SNR,Pe) grid on hold on

Calculating 95 % confidence interval for the mean in python

I need little help. If I have 30 random sample with mean of 52 and variance of 30 then how can i calculate the 95 % confidence interval for the mean with estimated and true variance of 30.
Here you can combine the powers of numpy and statsmodels to get you started:
To produce normally distributed floats with mean of 52 and variance of 30 you can use numpy.random.normal with numbers = np.random.normal(loc=52, scale=30, size=30) where the parameters are:
Parameters
----------
loc : float
Mean ("centre") of the distribution.
scale : float
Standard deviation (spread or "width") of the distribution.
size : int or tuple of ints, optional
Output shape. If the given shape is, e.g., ``(m, n, k)``, then
``m * n * k`` samples are drawn. Default is None, in which case a
single value is returned.
And here's a 95% confidence interval of the mean using DescrStatsW.tconfint_mean:
import statsmodels.stats.api as sms
conf = sms.DescrStatsW(numbers).tconfint_mean()
conf
# output
# (36.27, 56.43)
EDIT - 1
That's not the whole story though... Depending on your sample size, you should use the Z score and not t score that's used by sms.DescrStatsW(numbers).tconfint_mean() here. And I have a feeling that its not coincidental that the rule-of-thumb threshold is 30, and that you have 30 observations in your question. Z vs t also depends on whether or not you know the population standard deviation or have to rely on an estimate from your sample. And those are calculated differently as well. Take a look here. If this is something you'd like me to explain and demonstrate further, I'll gladly take another look at it over the weekend.

lsqcurvefit when expecting small coefficients

I've generated a plot of the attenutation seen in an electrical trace up to a frequency of 14e10 rad/s. The ydata ranges from approximately 1-10 Np/m. I'm trying to generate a fit of the form
y = A*sqrt(x) + B*x + C*x^2.
I expect A to be around 10^-6, B to be around 10^-11, and C to be around 10^-23. However, the smallest coefficient lsqcurvefit will return is 10^-7. Also, its will only return a nonzero coefficient for A, while returning 0 for B and C. The fit actually looks really good however the physics indicates that B and C should not be 0.
Here is how I'm calling the function
% measurement estimate
x_alpha = [1e-6 1e-11 1e-23];
lb = [1e-7, 1e-13, 1e-25];
ub = [1e-3, 1e-6, 1e-15];
x_alpha = lsqcurvefit(#modelfun, x_alpha, omega, alpha_t, lb,ub)
Here is the model function
function [ yhat ] = modelfun( x, xdata )
yhat = x(1)*xdata.^.5 + x(2)*xdata + x(3)*xdata.^2;
end
Is it possible to get lsqcurvefit to return such small coefficients? Is the error in rounding or is it something else? Any ways I can change the tolerance to see a fit closer to what I expect?
Found a stackoverflow page that seems to address this issue!
fit using lsqcurvefit

Resources