I have a hypothesis where I assume that respondents would ALWAYS choose answer A instead of answer B meaning I would have a probability of 1. How would I use a binomial test to prove significance?
Related
A sample question for an actuarial science exam sample exam goes like this:
"Calculate the probability that there will be at least four months in which no accidents occur before the fourth month in which at least one accident occurs.
A company takes out an insurance policy to cover accidents that occur at its manufacturing plant. The probability that one or more accidents will occur during any given month is 3/5.
The number of accidents that occur in any given month is independent of the number of accidents that occur in all other months."
I interpreted this as what is the probability (P) of no accidents during any of at least 3 months before one or more accidents occur in the following month.
I assumed a geometric distribution and calculated two different ways, got the same answer both times:
Given: "event": "one or more accidents in a month"
p(event) = 3/5; q(non event) = 1-p = 2/5
One event occurs after 3 or more months of no events: P = q^3psum(k=0->inf)(q^k) = q^3p(1/(1-q)) = q^3 = (2/5)^3 = 0.064
P = 1 - Prob(one or more accidents occur in one or more of the first three months). Same answer: 0.064.
But 0.064 is not among the answer choices.
The exam offers its solution as using the negative binomial distribution as follows:
"Solution: D
If a month with one or more accidents is regarded as success and k = the number of failures before the fourth success, then k follows a negative binomial distribution and the requested probability is:
Alternatively the solution is
which can be derived directly or by regarding the problem as a negative binomial distribution with
success taken as a month with no accidents
k = the number of failures before the fourth success, and calculating"
So my question is: How to infer that the correct probability distribution to consider is the negative binomial ?? In my reading of the question, it is the first "success" not the fourth "success" that occurs after three failures hence geometric distribution (or, equivalently, (1,p) NB distribution).
What am I missing?
Thanks in advance.
I think they asked to calculate the probability of the event before an Rth success occurs. So, the whole point of negative binomial distribution is to find the probabilities of events before Rth success in "N-R" trials. whereas it is quite different with geometric distribution where you find the probability of the first success.
I hope my explanation was understandable, also I just stumbled upon this.
How do you decide the critical values(alpha) and analyze with the p value
example: stats.ttest_ind(early['assignment1_grade'], late['assignment1_grade'])
(2 series with score of their assignments)
I understand the concept that if the p value is greater than the alpha value then the null hypothesis cant be neglected.
Im doing a course and instructor said that the alpha value here is 0.05 but how do you determine it.
The alpha value cannot be determined in the sense that there were a formula to calculate it. Instead, it is arbitrarily chosen, ideally before the study is conducted.
The value alpha = 0.05 is a common choice that goes back to a suggestion by Ronald Fisher in his influential book Statistical Methods for Research Workers (first published in 1925). The only particular reason for this value is that if the test statistic has a normal distribution under the null hypothesis, then for a two-tailed test with alpha = 0.05 the critical values of the test statistic will be its mean plus/minus 2 (more exactly, 1.96) times its standard deviation.
In fact, you don't need alpha when you calculate the p value, because you can just publish the p value and then every reader can decide whether to consider it low enough for any given purpose or not.
Excel provides a function for determining the left-tailed inverse of the Student's t-distribution.
T.INV(probability,deg_freedom)
If I needed the right-tailed inverse, can someone confirm if these are valid statistical operations (both lead to the same answer)
T.INV(1-probability,deg_freedom)
ABS(T.INV(probability,deg_freedom))
Thanks!
Your first formula is valid. The second is not. Consider asking for the right-tailed inverse of p = .75. That should be a negative number, which the second formula can't be because of the ABS.
i try to solve this question
by
n =500 ,p=0.9/100 and q=1-0.9/100
but im geting z-score and mean very large .
Paycheck Errors The payroll department of a hospital has found that in one year, 0.9% of its paychecks are calcu- lated incorrectly. The hospital has 500 employees.
(a) What is the probability that in one month’s records no paycheck errors are made?
(b) What is the probability that in one month’s records at least one paycheck error is made?
Z transformation is a poor approximation to the binomial distribution for npq < 10. For your problem npq == 4.4595, so the Z approximation is a no-go.
You'd do better to calculate it exactly as a binomial using software, or approximate it as a Poisson with rate λ=np. Once you solve part (a), part (b) is just the complement.
I went ahead and calculated part (a) both ways. The Poisson approximation differs from the exact calculation by only 0.00022.
You should use binomial distribution formula rather than sampling distribution formula.
I have 2 columns and multiple rows of data in excel. Each column represents an algorithm and the values in rows are the results of these algorithms with different parameters. I want to make statistical significance test of these two algorithms with excel. Can anyone suggest a function?
As a result, it will be nice to state something like "Algorithm A performs 8% better than Algorithm B with .9 probability (or 95% confidence interval)"
The wikipedia article explains accurately what I need:
http://en.wikipedia.org/wiki/Statistical_significance
It seems like a very easy task but I failed to find a scientific measurement function.
Any advice over a built-in function of excel or function snippets are appreciated.
Thanks..
Edit:
After tharkun's comments, I realized I should clarify some points:
The results are merely real numbers between 1-100 (they are percentage values). As each row represents a different parameter, values in a row represents an algorithm's result for this parameter. The results do not depend on each other.
When I take average of all values for Algorithm A and Algorithm B, I see that the mean of all results that Algorithm A produced are 10% higher than Algorithm B's. But I don't know if this is statistically significant or not. In other words, maybe for one parameter Algorithm A scored 100 percent higher than Algorithm B and for the rest Algorithm B has higher scores but just because of this one result, the difference in average is 10%.
And I want to do this calculation using just excel.
Thanks for the clarification. In that case you want to do an independent sample T-Test. Meaning you want to compare the means of two independent data sets.
Excel has a function TTEST, that's what you need.
For your example you should probably use two tails and type 2.
The formula will output a probability value known as probability of alpha error. This is the error which you would make if you assumed the two datasets are different but they aren't. The lower the alpha error probability the higher the chance your sets are different.
You should only accept the difference of the two datasets if the value is lower than 0.01 (1%) or for critical outcomes even 0.001 or lower. You should also know that in the t-test needs at least around 30 values per dataset to be reliable enough and that the type 2 test assumes equal variances of the two datasets. If equal variances are not given, you should use the type 3 test.
http://depts.alverno.edu/nsmt/stats.htm