estimate standard error from sample means - statistics

Random sample of 143 girl and 127 boys were selected from a large population.A measurement was taken of the haemoglobin level(measured in g/dl) of each child with the following result.
girl n=143 mean = 11.35 sd = 1.41
boys n=127 mean 11.01 sd =1.32
estimate the standard error of the difference between the sample means

In essence, we'd pool the standard errors by adding them. This implies that we´re answering the question: what is the vairation of the sampling distribution considering both samples?
SD = sqrt( (sd₁**2 / n₁) + (sd₂**2 / n₂) \
SD = sqrt( (1.41**2 / 143) + (1.32**2 / 127) ≈ 0.1662
Notice that the standrad deviation squared is simply the variance of each sample. As you can see, in our case the value is quite small, which indicates that the difference between sampled means doesn´t need to be that large for there to be a larger than expected difference between obervations.
We´d calculate the difference between means as 0.34 (or -0.34 depending on the nature of the question) and divide this difference by the standrad error to get a t-value. In our case 2.046 (or -2.046) indicates that the observed difference is 2.046 times larger than the average difference we would expect given the variation the variation that we measured AND the size of our sample.
However, we need to verify whether this observation is statistically significant by determining the t-critical value. This t-critical can be easily calculated by using a t-value chart: one needs to know the alpha (typically 0.05 unless otherwise stated), one needs to know the original alternative hypothesis (if it was something along the lines of there is a difference between genders then we would apply a two tailed distribution - if it was something along the lines of gender X has a hameglobin level larger/smaller than gender X then we would use a single tailed distribution).
If the t-value > t-critical then we would claim that the difference between means is statistically significant, thereby having sufficient evident to reject the null hypothesis. Alternatively, if t-value < t-critical, we would not have statistically significant evidence against the null hypothesis, thus we would fail to reject the null hypothesis.

Related

Can a 2 sample statistical comparison have too large of a population size to be accurate?

I'm trying to do a simple comparison of two samples to determine if their means are different. Regardless of whether their standard deviations are equal/unequal, the formulas for a t-test or z-test are similar.
(i can't post images on a new account)
t-value w/ unequal variances:
https://www.biologyforlife.com/uploads/2/2/3/9/22392738/949234_orig.jpg
t-value w/ equal/pooled variances:
https://vitalflux.com/wp-content/uploads/2022/01/pooled-t-statistics-300x126.jpg
The issue here is the inverse and sqrt of sample size in the denominator that causes large samples to seem to have massive t-values.
For instance, I have 2 samples w/
size: N1=168,000 and N2=705,000
avgs: X1=89 and X2=49
stddev: S1=96 and S2=66 .
At first glance, these standard deviations are larger than the mean and suggest a nonhomogeneous sample with a lot of internal variation. When comparing the two samples, however, the denominator of the t-test becomes approx 0.25, suggesting that a 1 unit difference in means is equivalent to 4 standard deviations. Thus my t-value here comes out to around 160(!!)
All this to say, I'm just plugging in numbers since I didn't do many of these problems in advanced stats and haven't seen this formula since Stats110.
It makes some sense that two massive populations need their variance biased downward before comparing, but this seems like not the best test out there for the magnitude of what I'm doing.
What other tests are out there that I could try? What is the logic behind this seemingly over-biased variance?

How to determin minimum samples to consider to obtain nearly same average as all samples?

I would like to know if there is any standard algorithm or statistical parameter than can be used to determine how many minimum samples should be considered from beginning whose average value nearly matches with the average of all samples.
For Example : If 2000 samples are present and Average is 20
Acceptable average range is 20+-0.01
If we start taking average from first sample then by taking average of X samples we can get average within 20+-0.01
Problem is about find value of X
Just need guidance from logical perspective [Procedure or Algorithm to consider]
Thanks in advance
Alright, so if the standard deviation is known, then for a 95% confidence that the sample mean will be within 0.01 of the true mean for a normal distribution with standard deviation equal to s, we require that:
0.01 = z95 x s / sqrt(n)
Here, z95 is the two-sided CDF of the normal distribution and is about 1.96 (from tables), s is the standard deviation and n is the number of samples required. We can solve for n in terms of s:
0.01 = 1.96 x s / sqrt(n)
<=> sqrt(n) = 196s
<=> n = 38416s
So, if s = 1, you'd expect to need about 38.5k samples to get a 95% confidence that the sample mean would be within 0.01 of the true mean. The number of samples needed to achieve a given precision is directly proportional to the true sample standard deviation.
If the true population's standard deviation is not known, the calculation works in a similar fashion, except you would use the CDF from the student's T distribution (so instead of z95 you'd use t95) and you'd use the sample standard deviation.
If you want a different confidence interval - higher or lower - you'd look up the corresponding two-sided CDF for whichever distribution you're using and use the corresponding value (so something besides 1.96).
The discussion on Wikipedia, Basic Steps section, is instructive: https://en.wikipedia.org/wiki/Confidence_interval

Is the akaike information criterion (AIC) unit-dependent?

One formula for AIC is:
AIC = 2k + n*Log(RSS/n)
Intuitively, if you add a parameter to your model, your AIC will decrease (and hence you should keep the parameter), if the increase in the 2k term due to the new parameter is offset by the decrease in the n*Log(RSS/n) term due to the decreased residual sum of squares. But isn't this RSS value unit-specific? So if I'm modeling money, and my units are in millions of dollars, the change in RSS with adding a parameter might be very small, and won't offset the increase in the 2k term. Conversely, if my units are pennies, the change in RSS would be very large, and could greatly offset the increase in the 2k term. This arbitrary change in units would lead to a change in my decision whether to keep the extra parameter.
So: does the RSS have to be in standardized units for AIC to be a useful criterion? I don't see how it could be otherwise.
No, I don't think so (partially rowing back from what I said in my earlier comment). For the simplest possible case (least squares regression for y = ax + b), from wikipedia, RSS = Syy - a x Sxy.
From their definitions given in that article, both a and Sxy grow by a factor of 100 and Syy grows by a factor of 1002 if you change the unit for y from dollars to cents. So, after rescaling, the new RSS for that model will be 1002 times the the old one. I'm quite sure that the same result holds for models with k <> 2 parameters.
Hence nothing changes for the AIC difference where the key part is log(RSSB/RSSA). After rescaling both RSS will have grown by the same factor and you'll get the exact same AIC difference between model A and B as before.
Edit:
I've just found this one:
"It is correct that the choice of units introduces a multiplicative
constant into the likelihood. Thence the log likelihood has an
additive constant which contributes (after doubling) to the AIC. The difference of AICs is unchanged."
Note that this comment even talks about the general case where the exact log-likelihood is used.
I had the same question, and I felt like the existing answer above could have been clearer and more direct. Hopefully the following clarifies it a bit for others as well.
When using the AIC to compare models, it is the difference that is of interest. The portion in question here is the n*log(RSS/n). When we compare this for two different models, we will get:
n1*log(RSS1/n1) + 2k1 - n2*log(RSS2/n2) - 2k2
From our logarithmic identities, we know that log(a) - log(b) = log(a/b). AIC1 - AIC2 therefore simplifies to:
2k1 - 2k2 + log(RSS1*n2/(RSS2*n1))
If we add a gain factor G to represent a change in units, that difference becomes:
2k1 - 2k2 + log(G*RSS1*n2/(G*RSS2*n1)) = 2k1 - 2k2 + log(RSS1*n2/(RSS2*n1))
As you can see, we are left with the same AIC difference, regardless of which units we choose.

calculating reliability of measurements

I have many measurements of age of the same person. Let's say:
[23 25 32 23 25]
I would like to output a single value and a reliability score of this value. The single value can be the average.
Reliability, I don't know well how to calculate it. The value should be between 0 and 1, where 1 means all ages are equal and a very unreliable measurement should be near 0.
Probably the variance should be used here, but it's not clear to me how to normalize it between 0 and 1 in a meaningful way (1/(x+1) is not much meaningful :)).
Assume some probability distribution (or determine what probability distribution your data fits most accurately). A good choice is a normal distribution, which for discrete data requires a continuity correction. See example here: http://www.milefoot.com/math/stat/pdfc-normaldisc.htm
In your example, your reliability score for the average age of 26 (25.6 rounded to nearest integer), is simply the probability that X falls in the range (25.5, 26.5).
The easiest way for assessing reliability (or internal consistency) is to use Cronbach's alpha. I guess most statistics software has this method built-in.
https://en.wikipedia.org/wiki/Cronbach%27s_alpha

P-value, significance level and hypothesis

I am confused about the concept of p-value. In general, if the p-value is greater than alpha which is generally 0.05, we are fail to reject null hypothesis and if the p-value is less than alpha, we reject null hypothesis. As I understand, if the p-value is greater than alpha, difference between two group is just coming from sampling error or by chance.So far everything is okay. However, if the p-value is less than alpha, the result is statistically significant, I was supposing it to be statistically nonsignificant ( because, in case p-value is less than alpha we reject null hypothesis).
Basically, if result statistically significant, reject null hypothesis. But, how a hypothesis can be rejected, if it is statistically significant? From the word of "statistically significant", I am understanding that the result is good.
You are mistaking what the significance means in terms of the p-value.
I will try to explain below:
Let's assume a test about the means of two populations being equal. We will perform a t-test to test that by drawing one sample from each population and calculating the p-value.
The null hypothesis and the alternative:
H0: m1 - m2 = 0
H1: m1 - m2 != 0
Which is a two-tailed test (although not important for this).
Let's assume that you get a p-value of 0.01 and your alpha is 0.05. The p-value is the probability of the means being equal when sampling from the two populations (m1 and m2). This means that there is a 1% probability that the means will be equal or in other words only 1 out of 100 sample pairs will have a mean difference of 0.
Such a low probability of the two means being equal makes us confident (makes us certain) that the means of the populations are not equal and thus we consider the result to be statistically significant.
What is the threshold that makes us think that a result is significant? That is determined by the significance level (a) which in this case is 5%.
The p-value being less than the significance level is what makes us think that the result is significant and therefore we are certain that we can reject the null hypothesis since the probability of the NULL hypothesis being true is very low.
I hope that makes sense now!
Let me make an example that I often use with my pupils, in order to explan the concepts of null hypothesis, alpha, & significance.
Let's say we're playing a round of Poker. I deal the cards & we make our bets. Hey, lucky me! I got a flush on my first hand. You curse your luck and we deal again. I get another flush and win. Another round, and again, I get 4 aces: at this point you kick the table and call me a cheater: "this is bs! You're trying to rob me!"
Let's explain this in terms of probability: There is a possibility associated with getting a flush on the first hand: anyone can get lucky. There's a smaller probability of getting too lucky twice in a row. There is finally a probability of getting really lucky three times in a row. But for the third shot, you are stating: "the probability that you get SO LUCKY is TOO SMALL. I REJECT the idea that you're just lucky. I'm calling you a cheater". That is, you rejected the null hypothesis (the hypothesis that nothing is going on!)
The null hypothesis is, in all cases: "This thing we are observing is an effect of randomness". In our example, the null hypothesis states: "I'm just getting all these good hands one after the other, because i'm lucky"
p-value is the value associated with an event, given that it happens randomly. You can calculate the odds of getting good hands in poker after properly shuffling the deck. Or for example: if I toss a fair coin 20 times, the odss that I get 20 heads in a row is 1/(2^20) = 0.000000953 (really small). That's the p-value for 20 heads in a row, tossing 20 times.
"Statistically significant", means "This event seems to be weird. It has a really tiny probability of happening by chance. So, i'll reject the null hypothesis."
Alpha, or critical p-value, is the magic point where you "kick the table", and reject the null hypothesis. In experimental applications, you define this in advance (alpha=0.05, e.g.) In our poker example, you can call me a cheater after three lucky hands, or after 10 out of 12, and so on. It's a threshold of probability.
okay for p-value you should at least know about null hypothesis and alternate hypothesis
null hypothesis means take an example we have 2 flowers and it is saying there is no significant difference between them
and alternate hypothesis is saying that there is significant difference between them
and yes what is the significant value for p- value most of the data scientist take as 0.05 but it is based on researches(value of level of significant)
0.5
0.05
0.01
0.001
can be taken as p-value
okay now p-value is taken by you but what to do next
if your model p-value is 0.03 and significant value you have taken 0.05 so you have to reject null hypothesis means there is significant difference between 2 flowers or simple as stated
p-value of your model < level of significant than reject it
and your model p-value is >level of significant than null hypothesis is going to accept.

Resources