What's considered a strong p-value? - statistics

Hi I'm new to statistics and just wanted some clarifications on p-values.
So far I've learned that if we use a 5% significance level then we reject the null hypothesis and accept the alternative hypothesis if the p-value is less than 0.05.
If the p-value is greater than 0.05 then we say there is insufficient evidence and we can't reject the null hypothesis. I've learned that we can't accept the null hypothesis if p-value is greater than 0.05 but at the same time if we have a strong p-value we can't ignore it
So my question it what is considered a high p-value where I should consider accepting the null hypothesis, like where should I cut off at 0.7 and higher? 0.8? 0.9?

Can't argue with the link to the ASA statement.
An example that helped me with this:
If you are working to a 5% significance level (alpha=0.05), and calculate a p-value of 0.5, your data does not provide sufficient evidence to reject the null hypothesis.
There are two possible scenarios here:
The null hypothesis is indeed true
The null hypothesis is actually false (Type
II error, false negative)
Once that point has been reached, you shouldn't do much more with the p-value. It is tempting to try to justify inconvenient results by saying that (for example) a p-value of 0.07 is quite close to 0.05, so there is some evidence to support the alternative hypothesis, but that is not a very robust approach. It is good practice to set your significance level in advance, and stick to it.
As a side-note, significance levels are an expression of how much uncertainty in your results you are willing to accept. A value of 5% indicates that you are willing (on average, over a large number of experiments) to be wrong about 5% of the time, or 1 in 20 experiments. In this case, by 'wrong' we mean falsely reject a true null hypothesis in favour of the alternative hypothesis (that is not true). By increasing the significance level we are saying we are willing to be wrong more often (with the trade-off of having to gather less data).

Related

Interpreting Confidence Intervals and p-values

Im having difficulties understanding the connection between the p-values and CIs. Some of these results in the picture seem to be inconsistent. Starting with G1:
divide the two sample means in order to acquire a t statistic which will then state that the difference between these two sample means represents the population mean.
The H0 is: the difference between the two sample means is equal to 0 (there is no difference), with a p-value of 0.05. If the p-value is below 0.05 we reject the null hypothesis. If it's greater than 0.05 we fail to reject the hypothesis. So,
G1 has a p-value 0.02<0.05 --> we reject the null hypothesis
We can't find a 0 in the confidence interval so we reject the null hypothesis. The population means are not equal. Fining a 0 in the interval would mean that we have found the H0 to be true.
G2--> CI has a 0, p-value is 0.07>0.05 --> we fail to reject the H0.
G3--> there is no 0 in CI --> we reject, and 0.09>0.05 we accept the H0 --Inconsistency! We would need to have a confidence interval including a 0.
G4--> CI doesn't involve a 0 indicating there is a difference between the means--rejecting H0; 0.13 > 0.05 we accept the H0. Inconsistency! A better p-value would be more close to 0.05?
Thanks for bearing with me and having read the whole text! You can open the screenshot i "interpreted" here 2
I agree with your interpretations. In the case of G4, since the p-value is greater than 0.05 and the confidence interval is (23, 125), H0 cannot be a test that there is no difference between the means. Either that, or there is a typo somewhere.
Also, regarding:
A better p-value would be more close to 0.05?
Not really. This type of testing is binary. Either we accept reject the null hypothesis, or we fail to reject it. Obviously this leaves open problems such as a p-value of 0.049999 and 0.50001. Clearly these are essentially the same results, yet this kind of hypothesis testing would produce entirely different conclusions.

Comparing significance level to p-value in a two sided test

I had a p-value of 0.608.
For a two-sided test scenario,
If I take 95% confidence i.e. 5% significance or alpha = 0.05,
In that case,
should we say, the answer to task be:
"0.608 > 0.025, therefore, we cannot reject null hypothesis.??"
OR, we should say:
"0.608 > 0.05, therefore, we cannot reject null hypothesis.??"
It was my understanding (inference mostly) that, assuming the data is standard normal distributed, you divide the alpha(or significance level) by two if it is a two sided test, before comparing it with p-value.??
Please correct my understanding? Many thanks
I went on to check online calculater for p-value and decision maker.
While decision making, for 2 tailed test, the online calculator compared the p-value to that value of significance itself,
Thus, we can say,
"0.608 > 0.05, therefore, we cannot reject null hypothesis" is the correct statement to make, considering 5% significance.
Note:- We divide alpha/2 for 2 tailed test, when we calculate z_critical or t_critical from the standard z-table or t-table!
Thanks.

Need of bonferroni correction in A/B testing

I am a newbie in the field of Data Science. I came across the below statements which read:
More metrics we choose in our A/B testing, higher the chance of getting significant difference by chance.
To eliminate this problem we use Bonferroni correction method.
What does the 1st statement mean? How does it increase the chances of getting false positives? and how does the Bonferroni correction method help us here?
With p value of 0.05 (which is a commonly used level of statistical significance), you will get false positive results 5% of time. Thus if in your analysis you have one test, your chance of false positive is 5%. If you have two tests, you´ll have 5% for the first AND 5% for the second. Et cetera.
So for each additional test, your risk increases. Still, as you want to keep your total risk level at 0.05, you either set more strict level of statistical significance (smaller p value), or use some statistical method to correct for multiple comparisons. Bonferroni correction is one of such methods.

Violation of PH assumption

Running a survival analysis, assume the p-value regarding a variable is statistically significant - let's say with a positive association with the outcome. However, according to the Schoenfeld residuals, the proportional hazard (PH) assumption has is violated.
Which scenario among below could possibly happen after correcting for PH violations?
The p-value may not be significant anymore.
p-value still significant, but the size of HR may change.
p-value still significant, but the direction of association may be altered (i. e. a positive association may end up being negative).
The PH assumption violation usually means that there is an interaction effect that needs to be included in the model. In the simple linear regression, including a new variable may alter the direction of the existing variables' coefficients due to the collinearity. Can we use the same rationale in the case above?
Therneau and Gramsch have written a very useful text, "Modeling Survival Data" that has an entire chapter on testing proportionality. At the end of the chapter is a section on causes and modeling alternatives, which I think can be used for answering this question. Since you mention interactions it makes your question about a particular p-value rather ambiguous and vague.
1) Certainly if you have chosen a particular measurement as the subject of your interest and it turns out the all of the effects are due to its interaction with another variable that you happened to also measure, then you may be in a position where the variable-of-interest's p-value will decrease, possibly to zero.
2) It's almost certain that modification of a model with a different structure (say will the addition of time-varying covariates or a different treatment of time) will result in a different estimated HR for a particular covariate and I think it would be impossible to predict the direction of the change.
3) As to whether to sign of the coefficient could change, I'm quite sure that would be possible as well. The scenario I'm thinking of would have a mixture of two groups say men and women and one of the groups had a sub-group whose early mortality was greatly increased, e.g. breast cancer, while the surviving members of that group would have a more favorable survival expectation. The base model might show a positive coefficient (high risk) while a model that was capable of identifying the subgroup at risk would then allow the gender-related coefficient to become negative (lower risk).

P-value, significance level and hypothesis

I am confused about the concept of p-value. In general, if the p-value is greater than alpha which is generally 0.05, we are fail to reject null hypothesis and if the p-value is less than alpha, we reject null hypothesis. As I understand, if the p-value is greater than alpha, difference between two group is just coming from sampling error or by chance.So far everything is okay. However, if the p-value is less than alpha, the result is statistically significant, I was supposing it to be statistically nonsignificant ( because, in case p-value is less than alpha we reject null hypothesis).
Basically, if result statistically significant, reject null hypothesis. But, how a hypothesis can be rejected, if it is statistically significant? From the word of "statistically significant", I am understanding that the result is good.
You are mistaking what the significance means in terms of the p-value.
I will try to explain below:
Let's assume a test about the means of two populations being equal. We will perform a t-test to test that by drawing one sample from each population and calculating the p-value.
The null hypothesis and the alternative:
H0: m1 - m2 = 0
H1: m1 - m2 != 0
Which is a two-tailed test (although not important for this).
Let's assume that you get a p-value of 0.01 and your alpha is 0.05. The p-value is the probability of the means being equal when sampling from the two populations (m1 and m2). This means that there is a 1% probability that the means will be equal or in other words only 1 out of 100 sample pairs will have a mean difference of 0.
Such a low probability of the two means being equal makes us confident (makes us certain) that the means of the populations are not equal and thus we consider the result to be statistically significant.
What is the threshold that makes us think that a result is significant? That is determined by the significance level (a) which in this case is 5%.
The p-value being less than the significance level is what makes us think that the result is significant and therefore we are certain that we can reject the null hypothesis since the probability of the NULL hypothesis being true is very low.
I hope that makes sense now!
Let me make an example that I often use with my pupils, in order to explan the concepts of null hypothesis, alpha, & significance.
Let's say we're playing a round of Poker. I deal the cards & we make our bets. Hey, lucky me! I got a flush on my first hand. You curse your luck and we deal again. I get another flush and win. Another round, and again, I get 4 aces: at this point you kick the table and call me a cheater: "this is bs! You're trying to rob me!"
Let's explain this in terms of probability: There is a possibility associated with getting a flush on the first hand: anyone can get lucky. There's a smaller probability of getting too lucky twice in a row. There is finally a probability of getting really lucky three times in a row. But for the third shot, you are stating: "the probability that you get SO LUCKY is TOO SMALL. I REJECT the idea that you're just lucky. I'm calling you a cheater". That is, you rejected the null hypothesis (the hypothesis that nothing is going on!)
The null hypothesis is, in all cases: "This thing we are observing is an effect of randomness". In our example, the null hypothesis states: "I'm just getting all these good hands one after the other, because i'm lucky"
p-value is the value associated with an event, given that it happens randomly. You can calculate the odds of getting good hands in poker after properly shuffling the deck. Or for example: if I toss a fair coin 20 times, the odss that I get 20 heads in a row is 1/(2^20) = 0.000000953 (really small). That's the p-value for 20 heads in a row, tossing 20 times.
"Statistically significant", means "This event seems to be weird. It has a really tiny probability of happening by chance. So, i'll reject the null hypothesis."
Alpha, or critical p-value, is the magic point where you "kick the table", and reject the null hypothesis. In experimental applications, you define this in advance (alpha=0.05, e.g.) In our poker example, you can call me a cheater after three lucky hands, or after 10 out of 12, and so on. It's a threshold of probability.
okay for p-value you should at least know about null hypothesis and alternate hypothesis
null hypothesis means take an example we have 2 flowers and it is saying there is no significant difference between them
and alternate hypothesis is saying that there is significant difference between them
and yes what is the significant value for p- value most of the data scientist take as 0.05 but it is based on researches(value of level of significant)
0.5
0.05
0.01
0.001
can be taken as p-value
okay now p-value is taken by you but what to do next
if your model p-value is 0.03 and significant value you have taken 0.05 so you have to reject null hypothesis means there is significant difference between 2 flowers or simple as stated
p-value of your model < level of significant than reject it
and your model p-value is >level of significant than null hypothesis is going to accept.

Resources