f-statistics and p value suggest different results - statistics

In a multiple linear regression, if I would like to know is there a relationship between the response and predictors,
The book said how large does the F-value need to be before reject Ho depends on the p-value. I just wondered if there gonna be a case where p-value and f-statistics suggest different things. Then which one should we rely on?
Thanks!

The F-statistic and p-value will not conflict with each other. The p-value is a measure of how extreme the F-statistic is - it's a tail probability from the F distribution.
So, for example, suppose your criterion of choosing the alternative hypothesis is a p-value less than or equal to 0.05 (a commonly used value) and the F distribution critical point corresponding to this p-value is 2.2
Then if the F statistic is > 2.2 the null hypothesis is rejected. If this occurs, the p-value must be < 0.05
Correspondingly, if the p-value is < 0.05, then the F statistic must be > 2.2

Related

estimate standard error from sample means

Random sample of 143 girl and 127 boys were selected from a large population.A measurement was taken of the haemoglobin level(measured in g/dl) of each child with the following result.
girl n=143 mean = 11.35 sd = 1.41
boys n=127 mean 11.01 sd =1.32
estimate the standard error of the difference between the sample means
In essence, we'd pool the standard errors by adding them. This implies that we´re answering the question: what is the vairation of the sampling distribution considering both samples?
SD = sqrt( (sd₁**2 / n₁) + (sd₂**2 / n₂) \
SD = sqrt( (1.41**2 / 143) + (1.32**2 / 127) ≈ 0.1662
Notice that the standrad deviation squared is simply the variance of each sample. As you can see, in our case the value is quite small, which indicates that the difference between sampled means doesn´t need to be that large for there to be a larger than expected difference between obervations.
We´d calculate the difference between means as 0.34 (or -0.34 depending on the nature of the question) and divide this difference by the standrad error to get a t-value. In our case 2.046 (or -2.046) indicates that the observed difference is 2.046 times larger than the average difference we would expect given the variation the variation that we measured AND the size of our sample.
However, we need to verify whether this observation is statistically significant by determining the t-critical value. This t-critical can be easily calculated by using a t-value chart: one needs to know the alpha (typically 0.05 unless otherwise stated), one needs to know the original alternative hypothesis (if it was something along the lines of there is a difference between genders then we would apply a two tailed distribution - if it was something along the lines of gender X has a hameglobin level larger/smaller than gender X then we would use a single tailed distribution).
If the t-value > t-critical then we would claim that the difference between means is statistically significant, thereby having sufficient evident to reject the null hypothesis. Alternatively, if t-value < t-critical, we would not have statistically significant evidence against the null hypothesis, thus we would fail to reject the null hypothesis.

Interpreting Confidence Intervals and p-values

Im having difficulties understanding the connection between the p-values and CIs. Some of these results in the picture seem to be inconsistent. Starting with G1:
divide the two sample means in order to acquire a t statistic which will then state that the difference between these two sample means represents the population mean.
The H0 is: the difference between the two sample means is equal to 0 (there is no difference), with a p-value of 0.05. If the p-value is below 0.05 we reject the null hypothesis. If it's greater than 0.05 we fail to reject the hypothesis. So,
G1 has a p-value 0.02<0.05 --> we reject the null hypothesis
We can't find a 0 in the confidence interval so we reject the null hypothesis. The population means are not equal. Fining a 0 in the interval would mean that we have found the H0 to be true.
G2--> CI has a 0, p-value is 0.07>0.05 --> we fail to reject the H0.
G3--> there is no 0 in CI --> we reject, and 0.09>0.05 we accept the H0 --Inconsistency! We would need to have a confidence interval including a 0.
G4--> CI doesn't involve a 0 indicating there is a difference between the means--rejecting H0; 0.13 > 0.05 we accept the H0. Inconsistency! A better p-value would be more close to 0.05?
Thanks for bearing with me and having read the whole text! You can open the screenshot i "interpreted" here 2
I agree with your interpretations. In the case of G4, since the p-value is greater than 0.05 and the confidence interval is (23, 125), H0 cannot be a test that there is no difference between the means. Either that, or there is a typo somewhere.
Also, regarding:
A better p-value would be more close to 0.05?
Not really. This type of testing is binary. Either we accept reject the null hypothesis, or we fail to reject it. Obviously this leaves open problems such as a p-value of 0.049999 and 0.50001. Clearly these are essentially the same results, yet this kind of hypothesis testing would produce entirely different conclusions.

Comparing significance level to p-value in a two sided test

I had a p-value of 0.608.
For a two-sided test scenario,
If I take 95% confidence i.e. 5% significance or alpha = 0.05,
In that case,
should we say, the answer to task be:
"0.608 > 0.025, therefore, we cannot reject null hypothesis.??"
OR, we should say:
"0.608 > 0.05, therefore, we cannot reject null hypothesis.??"
It was my understanding (inference mostly) that, assuming the data is standard normal distributed, you divide the alpha(or significance level) by two if it is a two sided test, before comparing it with p-value.??
Please correct my understanding? Many thanks
I went on to check online calculater for p-value and decision maker.
While decision making, for 2 tailed test, the online calculator compared the p-value to that value of significance itself,
Thus, we can say,
"0.608 > 0.05, therefore, we cannot reject null hypothesis" is the correct statement to make, considering 5% significance.
Note:- We divide alpha/2 for 2 tailed test, when we calculate z_critical or t_critical from the standard z-table or t-table!
Thanks.

P-value, significance level and hypothesis

I am confused about the concept of p-value. In general, if the p-value is greater than alpha which is generally 0.05, we are fail to reject null hypothesis and if the p-value is less than alpha, we reject null hypothesis. As I understand, if the p-value is greater than alpha, difference between two group is just coming from sampling error or by chance.So far everything is okay. However, if the p-value is less than alpha, the result is statistically significant, I was supposing it to be statistically nonsignificant ( because, in case p-value is less than alpha we reject null hypothesis).
Basically, if result statistically significant, reject null hypothesis. But, how a hypothesis can be rejected, if it is statistically significant? From the word of "statistically significant", I am understanding that the result is good.
You are mistaking what the significance means in terms of the p-value.
I will try to explain below:
Let's assume a test about the means of two populations being equal. We will perform a t-test to test that by drawing one sample from each population and calculating the p-value.
The null hypothesis and the alternative:
H0: m1 - m2 = 0
H1: m1 - m2 != 0
Which is a two-tailed test (although not important for this).
Let's assume that you get a p-value of 0.01 and your alpha is 0.05. The p-value is the probability of the means being equal when sampling from the two populations (m1 and m2). This means that there is a 1% probability that the means will be equal or in other words only 1 out of 100 sample pairs will have a mean difference of 0.
Such a low probability of the two means being equal makes us confident (makes us certain) that the means of the populations are not equal and thus we consider the result to be statistically significant.
What is the threshold that makes us think that a result is significant? That is determined by the significance level (a) which in this case is 5%.
The p-value being less than the significance level is what makes us think that the result is significant and therefore we are certain that we can reject the null hypothesis since the probability of the NULL hypothesis being true is very low.
I hope that makes sense now!
Let me make an example that I often use with my pupils, in order to explan the concepts of null hypothesis, alpha, & significance.
Let's say we're playing a round of Poker. I deal the cards & we make our bets. Hey, lucky me! I got a flush on my first hand. You curse your luck and we deal again. I get another flush and win. Another round, and again, I get 4 aces: at this point you kick the table and call me a cheater: "this is bs! You're trying to rob me!"
Let's explain this in terms of probability: There is a possibility associated with getting a flush on the first hand: anyone can get lucky. There's a smaller probability of getting too lucky twice in a row. There is finally a probability of getting really lucky three times in a row. But for the third shot, you are stating: "the probability that you get SO LUCKY is TOO SMALL. I REJECT the idea that you're just lucky. I'm calling you a cheater". That is, you rejected the null hypothesis (the hypothesis that nothing is going on!)
The null hypothesis is, in all cases: "This thing we are observing is an effect of randomness". In our example, the null hypothesis states: "I'm just getting all these good hands one after the other, because i'm lucky"
p-value is the value associated with an event, given that it happens randomly. You can calculate the odds of getting good hands in poker after properly shuffling the deck. Or for example: if I toss a fair coin 20 times, the odss that I get 20 heads in a row is 1/(2^20) = 0.000000953 (really small). That's the p-value for 20 heads in a row, tossing 20 times.
"Statistically significant", means "This event seems to be weird. It has a really tiny probability of happening by chance. So, i'll reject the null hypothesis."
Alpha, or critical p-value, is the magic point where you "kick the table", and reject the null hypothesis. In experimental applications, you define this in advance (alpha=0.05, e.g.) In our poker example, you can call me a cheater after three lucky hands, or after 10 out of 12, and so on. It's a threshold of probability.
okay for p-value you should at least know about null hypothesis and alternate hypothesis
null hypothesis means take an example we have 2 flowers and it is saying there is no significant difference between them
and alternate hypothesis is saying that there is significant difference between them
and yes what is the significant value for p- value most of the data scientist take as 0.05 but it is based on researches(value of level of significant)
0.5
0.05
0.01
0.001
can be taken as p-value
okay now p-value is taken by you but what to do next
if your model p-value is 0.03 and significant value you have taken 0.05 so you have to reject null hypothesis means there is significant difference between 2 flowers or simple as stated
p-value of your model < level of significant than reject it
and your model p-value is >level of significant than null hypothesis is going to accept.

How do i prove that my derived equation and the Monte-Carlo simulation are equivalent?

I have derived and implemented an equation of an expected value.
To show that my code is free of errors i have employed the Monte-Carlo
computation a number of times to show that it converges into the same
value as the equation that i derived.
As I have the data now, how can i visualize this?
Is this even the correct test to do?
Can I give a measure how sure i am that the results are correct?
It's not clear what you mean by visualising the data, but here are some ideas.
If your Monte Carlo simulation is correct, then the Monte Carlo estimator for your quantity is just the mean of the samples. The variance of your estimator (how far away from the 'correct' value the average value will be) will scale inversely proportional to the number of samples you take: so long as you take enough, you'll get arbitrarily close to the correct answer. So, use a moderate (1000 should suffice if it's univariate) number of samples, and look at the average. If this doesn't agree with your theoretical expectation, then you have an error somewhere, in one of your estimates.
You can also use a histogram of your samples, again if they're one-dimensional. The distribution of samples in the histogram should match the theoretical distribution you're taking the expectation of.
If you know the variance in the same way as you know the expectation, you can also look at the sample variance (the mean squared difference between the sample and the expectation), and check that this matches as well.
EDIT: to put something more 'formal' in the answer!
if M(x) is your Monte Carlo estimator for E[X], then as n -> inf, abs(M(x) - E[X]) -> 0. The variance of M(x) is inversely proportional to n, but exactly what it is will depend on what M is an estimator for. You could construct a specific test for this based on the mean and variance of your samples to see that what you've done makes sense. Every 100 iterations, you could compute the mean of your samples, and take the difference between this and your theoretical E[X]. If this decreases, you're probably error free. If not, you have issues either in your theoretical estimate or your Monte Carlo estimator.
Why not just do a simple t-test? From your theoretical equation, you have the true mean mu_0 and your simulators mean,mu_1. Note that we can't calculate mu_1, we can only estimate it using the mean/average. So our hypotheses are:
H_0: mu_0 = mu_1 and H_1: mu_0 does not equal mu_1
The test statistic is the usual one-sample test statistic, i.e.
T = (mu_0 - x)/(s/sqrt(n))
where
mu_0 is the value from your equation
x is the average from your simulator
s is the standard deviation
n is the number of values used to calculate the mean.
In your case, n is going to be large, so this is equivalent to a Normal test. We reject H_0 when T is bigger/smaller than (-3, 3). This would be equivalent to a p-value < 0.01.
A couple of comments:
You can't "prove" that the means are equal.
You mentioned that you want to test a number of values. One possible solution is to implement a Bonferroni type correction. Basically, you reduce your p-value to: p-value/N where N is the number of tests you are running.
Make your sample size as large as possible. Since we don't have any idea about the variability in your Monte Carlo simulation it's impossible to say use n=....
The value of p-value < 0.01 when T is bigger/smaller than (-3, 3) just comes from the Normal distribution.

Resources