binomial distribution z-score value too large

binomial distribution z-score value too large - statistics

i try to solve this question
by
n =500 ,p=0.9/100 and q=1-0.9/100
but im geting z-score and mean very large .
Paycheck Errors The payroll department of a hospital has found that in one year, 0.9% of its paychecks are calcu- lated incorrectly. The hospital has 500 employees.
(a) What is the probability that in one month’s records no paycheck errors are made?
(b) What is the probability that in one month’s records at least one paycheck error is made?

Z transformation is a poor approximation to the binomial distribution for npq < 10. For your problem npq == 4.4595, so the Z approximation is a no-go.
You'd do better to calculate it exactly as a binomial using software, or approximate it as a Poisson with rate λ=np. Once you solve part (a), part (b) is just the complement.
I went ahead and calculated part (a) both ways. The Poisson approximation differs from the exact calculation by only 0.00022.

You should use binomial distribution formula rather than sampling distribution formula.

Related

Find the Sum of Cumulative Proability in Excel

Im wondering how I would calculate the Sum of a Cumulative Probability in Excel?
I have attached the column of values that I am working with. Any help is appreciated
I have tried finding the mean/average of the values and then std deviation, then using the norm distribution function and then sum those values but it doesn't seem to be creating the right value.

You can use NORM.DIST(x,mean,standard_dev,cumulative) which allows you to specify the mean and the standard deviation. If the last argument is TRUE it returns the cumulative probability. Obviously, under the assumption, the distribution of your data corresponds to the Normal Distribution. If you are not sure about that, then you need to run a normality test that will confirm that first (anyway most natural phenomenons are distributed as Normal).
For the mean, you can use the AVERAGE function, and for the Standard Deviation STDEV.S.
So on cell D4 put the following formula to calculate the cumulative probability for 0.25:
=NORM.DIST(D3,D1,D2, TRUE)
So if your data correspond to a Normal Distribution, then the cumulative probability for 0.25 will be 0.361494.

How to choose between geometric and negative binomial distributions

A sample question for an actuarial science exam sample exam goes like this:
"Calculate the probability that there will be at least four months in which no accidents occur before the fourth month in which at least one accident occurs.
A company takes out an insurance policy to cover accidents that occur at its manufacturing plant. The probability that one or more accidents will occur during any given month is 3/5.
The number of accidents that occur in any given month is independent of the number of accidents that occur in all other months."
I interpreted this as what is the probability (P) of no accidents during any of at least 3 months before one or more accidents occur in the following month.
I assumed a geometric distribution and calculated two different ways, got the same answer both times:
Given: "event": "one or more accidents in a month"
p(event) = 3/5; q(non event) = 1-p = 2/5
One event occurs after 3 or more months of no events: P = q^3psum(k=0->inf)(q^k) = q^3p(1/(1-q)) = q^3 = (2/5)^3 = 0.064
P = 1 - Prob(one or more accidents occur in one or more of the first three months). Same answer: 0.064.
But 0.064 is not among the answer choices.
The exam offers its solution as using the negative binomial distribution as follows:
"Solution: D
If a month with one or more accidents is regarded as success and k = the number of failures before the fourth success, then k follows a negative binomial distribution and the requested probability is:
Alternatively the solution is
which can be derived directly or by regarding the problem as a negative binomial distribution with
success taken as a month with no accidents
k = the number of failures before the fourth success, and calculating"
So my question is: How to infer that the correct probability distribution to consider is the negative binomial ?? In my reading of the question, it is the first "success" not the fourth "success" that occurs after three failures hence geometric distribution (or, equivalently, (1,p) NB distribution).
What am I missing?
Thanks in advance.

I think they asked to calculate the probability of the event before an Rth success occurs. So, the whole point of negative binomial distribution is to find the probabilities of events before Rth success in "N-R" trials. whereas it is quite different with geometric distribution where you find the probability of the first success.
I hope my explanation was understandable, also I just stumbled upon this.

How to calculate probability of each country with total death in excel formula?

How to do this sum: each country, calculate the probability of death using excel formula.

Probability of death excluding death by natural causes:
=(D2-E2)/C2
Format as %.

Probability of death does not depend on citizenship or location - and it equals 100%, unfortunately.
The formula you are looking for is 1 or =1, you can format it using %.
Other answers are only shooting. It seems like the correct question should include - somehow - the cause of death (otherwise it makes no sense to have "natural" and "disasters" separately) and some statement about the time interval.

Weighted percentile calculation from group of percentiles

Can we calculate the overall kth percentile if we have kth percentile over 1 minute window for the same time period?
The underlying data is not available. Only the kth percentile and count of underlying data is available.
Are there any existing algorithms available for this?
How approximate will the calculated kth percentile be?

No. If you have only one percentile (and count) for every time period, then you cannot reasonably estimate that same percentile for the entire time period.
This is because percentiles are only semi-numerical measures (like Means) and don't implicitly tell you enough about their distributions above and below their measured values at each measurement time. There are a couple of exceptions to the above.
If the percentile that you have is the 50th percentile (i.e., the Mean), then you can do some extrapolation to the Mean of the whole time, but it's a bit sketchy and I'm not sure how bad the variance would be.
If all of your percentile measure are very close together (compared to the actual range of the measured population), then obviously you can use that as a reasonable estimate of the overall percentile.
If you can assume with high assurance that every minute's data is an independent sampling of the exact same population distribution (i.e., there is no time-dependence), then you may be able to combine them, possibly even if the exact distribution is not fully known (has parameter that are unknown, but still known to be fixed over the time-period). Again I am not sure what the valid functions and variance calculations are for this.
If the distribution is known (or can be assumed) to be a specific function or shape with some unknown value or values and where time-dependence has a known role in that function, then you should be able to using weighting and time-adjustments to transform into the same situation as #3 above. So for instance if the distributions were a time-varying exponential distribution of the form pdf(k,t) = (k*t)e^-(k*t) then I believe that you could derive an overall percentile estimate by estimating the value of k for by adjust it for each different minute (t).
Unfortunately I am not a professional statistician. I have Math/CS background, enough to have some idea of what's mathematically possible/reasonable, but not enough to tell exactly how to do it. If you think that your situation falls into one of the above categories, then you might be able to take it to https://stats.stackexchange.com but you will need to also provide the information I mentioned in those categories and/or detailed and specific information about what you are measuring and how you are measuring it.

Based on statistical instincts ,The error rate will be proportional to Standard Deviation of the total set. If you are creating a approximation for a longer time span , that includes the discrete chunks of kth percentile . [ clarification may be need for proving this theory.]

Compute statistical significance with Excel

I have 2 columns and multiple rows of data in excel. Each column represents an algorithm and the values in rows are the results of these algorithms with different parameters. I want to make statistical significance test of these two algorithms with excel. Can anyone suggest a function?
As a result, it will be nice to state something like "Algorithm A performs 8% better than Algorithm B with .9 probability (or 95% confidence interval)"
The wikipedia article explains accurately what I need:
http://en.wikipedia.org/wiki/Statistical_significance
It seems like a very easy task but I failed to find a scientific measurement function.
Any advice over a built-in function of excel or function snippets are appreciated.
Thanks..
Edit:
After tharkun's comments, I realized I should clarify some points:
The results are merely real numbers between 1-100 (they are percentage values). As each row represents a different parameter, values in a row represents an algorithm's result for this parameter. The results do not depend on each other.
When I take average of all values for Algorithm A and Algorithm B, I see that the mean of all results that Algorithm A produced are 10% higher than Algorithm B's. But I don't know if this is statistically significant or not. In other words, maybe for one parameter Algorithm A scored 100 percent higher than Algorithm B and for the rest Algorithm B has higher scores but just because of this one result, the difference in average is 10%.
And I want to do this calculation using just excel.

Thanks for the clarification. In that case you want to do an independent sample T-Test. Meaning you want to compare the means of two independent data sets.
Excel has a function TTEST, that's what you need.
For your example you should probably use two tails and type 2.
The formula will output a probability value known as probability of alpha error. This is the error which you would make if you assumed the two datasets are different but they aren't. The lower the alpha error probability the higher the chance your sets are different.
You should only accept the difference of the two datasets if the value is lower than 0.01 (1%) or for critical outcomes even 0.001 or lower. You should also know that in the t-test needs at least around 30 values per dataset to be reliable enough and that the type 2 test assumes equal variances of the two datasets. If equal variances are not given, you should use the type 3 test.
http://depts.alverno.edu/nsmt/stats.htm

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string