I'm looking for a mathematical algorithm to proof significances in multivariate testing.
E.g. Lets take website tests having 3 headlines, 2 images, 2 buttons test. This results in 3 x 2 x 2 = 12 variations:
h1-i1-b1, h2-i1-b1, h3-i1-b1,
h1-i2-b1, h2-i2-b1, h3-i2-b1,
h1-i1-b2, h2-i1-b2, h3-i1-b2,
h1-i2-b2, h2-i2-b2, h3-i2-b2.
The hypothesis is that one variation is better than others.
I'd like to to know with which significane one of the variations is the winner and how long I have to wait, that I can be sure that I have statistically a winner or at least have an indicator how sure I can be that one variation is the winner.
So basically I'd like to get a probability for each variation telling me wether it the winner or not. As the tests runs longer some variations drop in probability and the winner increases.
Which algorithm would you use? Whats the formula?
Are there any libs for this?
You can use a chi-square test. Your null hypothesis is that all outcomes are equally likely; when you plug in the measured counts for each of the 12 outcomes, you get out a number telling you the probability of getting a set of 12 counts as extreme (i.e. as far away from equally distributed) as this. If the probability is sufficiently small (typically < 5% or < 1%), you conclude that the null hypothesis was wrong.
Related
I'm trying to do a simple comparison of two samples to determine if their means are different. Regardless of whether their standard deviations are equal/unequal, the formulas for a t-test or z-test are similar.
(i can't post images on a new account)
t-value w/ unequal variances:
https://www.biologyforlife.com/uploads/2/2/3/9/22392738/949234_orig.jpg
t-value w/ equal/pooled variances:
https://vitalflux.com/wp-content/uploads/2022/01/pooled-t-statistics-300x126.jpg
The issue here is the inverse and sqrt of sample size in the denominator that causes large samples to seem to have massive t-values.
For instance, I have 2 samples w/
size: N1=168,000 and N2=705,000
avgs: X1=89 and X2=49
stddev: S1=96 and S2=66 .
At first glance, these standard deviations are larger than the mean and suggest a nonhomogeneous sample with a lot of internal variation. When comparing the two samples, however, the denominator of the t-test becomes approx 0.25, suggesting that a 1 unit difference in means is equivalent to 4 standard deviations. Thus my t-value here comes out to around 160(!!)
All this to say, I'm just plugging in numbers since I didn't do many of these problems in advanced stats and haven't seen this formula since Stats110.
It makes some sense that two massive populations need their variance biased downward before comparing, but this seems like not the best test out there for the magnitude of what I'm doing.
What other tests are out there that I could try? What is the logic behind this seemingly over-biased variance?
Random sample of 143 girl and 127 boys were selected from a large population.A measurement was taken of the haemoglobin level(measured in g/dl) of each child with the following result.
girl n=143 mean = 11.35 sd = 1.41
boys n=127 mean 11.01 sd =1.32
estimate the standard error of the difference between the sample means
In essence, we'd pool the standard errors by adding them. This implies that we´re answering the question: what is the vairation of the sampling distribution considering both samples?
SD = sqrt( (sd₁**2 / n₁) + (sd₂**2 / n₂) \
SD = sqrt( (1.41**2 / 143) + (1.32**2 / 127) ≈ 0.1662
Notice that the standrad deviation squared is simply the variance of each sample. As you can see, in our case the value is quite small, which indicates that the difference between sampled means doesn´t need to be that large for there to be a larger than expected difference between obervations.
We´d calculate the difference between means as 0.34 (or -0.34 depending on the nature of the question) and divide this difference by the standrad error to get a t-value. In our case 2.046 (or -2.046) indicates that the observed difference is 2.046 times larger than the average difference we would expect given the variation the variation that we measured AND the size of our sample.
However, we need to verify whether this observation is statistically significant by determining the t-critical value. This t-critical can be easily calculated by using a t-value chart: one needs to know the alpha (typically 0.05 unless otherwise stated), one needs to know the original alternative hypothesis (if it was something along the lines of there is a difference between genders then we would apply a two tailed distribution - if it was something along the lines of gender X has a hameglobin level larger/smaller than gender X then we would use a single tailed distribution).
If the t-value > t-critical then we would claim that the difference between means is statistically significant, thereby having sufficient evident to reject the null hypothesis. Alternatively, if t-value < t-critical, we would not have statistically significant evidence against the null hypothesis, thus we would fail to reject the null hypothesis.
I have two related question on population statistics. I'm not a statistician, but would appreciate pointers to learn more.
I have a process that results from flipping a three sided coin (results: A, B, C) and I compute the statistic t=(A-C)/(A+B+C). In my problem, I have a set that randomly divides itself into sets X and Y, maybe uniformly, maybe not. I compute t for X and Y. I want to know whether the difference I observe in those two t values is likely due to chance or not.
Now if this were a simple binomial distribution (i.e., I'm just counting who ends up in X or Y), I'd know what to do: I compute n=|X|+|Y|, σ=sqrt(np(1-p)) (and I assume my p=.5), and then I compare to the normal distribution. So, for example, if I observed |X|=45 and |Y|=55, I'd say σ=5 and so I expect to have this variation from the mean μ=50 by chance 68.27% of the time. Alternately, I expect greater deviation from the mean 31.73% of the time.
There's an intermediate problem, which also interests me and which I think may help me understand the main problem, where I measure some property of members of A and B. Let's say 25% in A measure positive and 66% in B measure positive. (A and B aren't the same cardinality -- the selection process isn't uniform.) I would like to know if I expect this difference by chance.
As a first draft, I computed t as though it were measuring coin flips, but I'm pretty sure that's not actually right.
Any pointers on what the correct way to model this is?
First problem
For the three-sided coin problem, have a look at the multinomial distribution. It's the distribution to use for a "binomial" problem with more then 2 outcomes.
Here is the example from Wikipedia (https://en.wikipedia.org/wiki/Multinomial_distribution):
Suppose that in a three-way election for a large country, candidate A received 20% of the votes, candidate B received 30% of the votes, and candidate C received 50% of the votes. If six voters are selected randomly, what is the probability that there will be exactly one supporter for candidate A, two supporters for candidate B and three supporters for candidate C in the sample?
Note: Since we’re assuming that the voting population is large, it is reasonable and permissible to think of the probabilities as unchanging once a voter is selected for the sample. Technically speaking this is sampling without replacement, so the correct distribution is the multivariate hypergeometric distribution, but the distributions converge as the population grows large.
Second problem
The second problem seems to be a problem for cross-tabs. Then use the "Chi-squared test for association" to test whether there is a significant association between your variables. And use the "standardized residuals" of your cross-tab to identify which of the assiciations is more likely to occur and which is less likely.
I am confused about the concept of p-value. In general, if the p-value is greater than alpha which is generally 0.05, we are fail to reject null hypothesis and if the p-value is less than alpha, we reject null hypothesis. As I understand, if the p-value is greater than alpha, difference between two group is just coming from sampling error or by chance.So far everything is okay. However, if the p-value is less than alpha, the result is statistically significant, I was supposing it to be statistically nonsignificant ( because, in case p-value is less than alpha we reject null hypothesis).
Basically, if result statistically significant, reject null hypothesis. But, how a hypothesis can be rejected, if it is statistically significant? From the word of "statistically significant", I am understanding that the result is good.
You are mistaking what the significance means in terms of the p-value.
I will try to explain below:
Let's assume a test about the means of two populations being equal. We will perform a t-test to test that by drawing one sample from each population and calculating the p-value.
The null hypothesis and the alternative:
H0: m1 - m2 = 0
H1: m1 - m2 != 0
Which is a two-tailed test (although not important for this).
Let's assume that you get a p-value of 0.01 and your alpha is 0.05. The p-value is the probability of the means being equal when sampling from the two populations (m1 and m2). This means that there is a 1% probability that the means will be equal or in other words only 1 out of 100 sample pairs will have a mean difference of 0.
Such a low probability of the two means being equal makes us confident (makes us certain) that the means of the populations are not equal and thus we consider the result to be statistically significant.
What is the threshold that makes us think that a result is significant? That is determined by the significance level (a) which in this case is 5%.
The p-value being less than the significance level is what makes us think that the result is significant and therefore we are certain that we can reject the null hypothesis since the probability of the NULL hypothesis being true is very low.
I hope that makes sense now!
Let me make an example that I often use with my pupils, in order to explan the concepts of null hypothesis, alpha, & significance.
Let's say we're playing a round of Poker. I deal the cards & we make our bets. Hey, lucky me! I got a flush on my first hand. You curse your luck and we deal again. I get another flush and win. Another round, and again, I get 4 aces: at this point you kick the table and call me a cheater: "this is bs! You're trying to rob me!"
Let's explain this in terms of probability: There is a possibility associated with getting a flush on the first hand: anyone can get lucky. There's a smaller probability of getting too lucky twice in a row. There is finally a probability of getting really lucky three times in a row. But for the third shot, you are stating: "the probability that you get SO LUCKY is TOO SMALL. I REJECT the idea that you're just lucky. I'm calling you a cheater". That is, you rejected the null hypothesis (the hypothesis that nothing is going on!)
The null hypothesis is, in all cases: "This thing we are observing is an effect of randomness". In our example, the null hypothesis states: "I'm just getting all these good hands one after the other, because i'm lucky"
p-value is the value associated with an event, given that it happens randomly. You can calculate the odds of getting good hands in poker after properly shuffling the deck. Or for example: if I toss a fair coin 20 times, the odss that I get 20 heads in a row is 1/(2^20) = 0.000000953 (really small). That's the p-value for 20 heads in a row, tossing 20 times.
"Statistically significant", means "This event seems to be weird. It has a really tiny probability of happening by chance. So, i'll reject the null hypothesis."
Alpha, or critical p-value, is the magic point where you "kick the table", and reject the null hypothesis. In experimental applications, you define this in advance (alpha=0.05, e.g.) In our poker example, you can call me a cheater after three lucky hands, or after 10 out of 12, and so on. It's a threshold of probability.
okay for p-value you should at least know about null hypothesis and alternate hypothesis
null hypothesis means take an example we have 2 flowers and it is saying there is no significant difference between them
and alternate hypothesis is saying that there is significant difference between them
and yes what is the significant value for p- value most of the data scientist take as 0.05 but it is based on researches(value of level of significant)
0.5
0.05
0.01
0.001
can be taken as p-value
okay now p-value is taken by you but what to do next
if your model p-value is 0.03 and significant value you have taken 0.05 so you have to reject null hypothesis means there is significant difference between 2 flowers or simple as stated
p-value of your model < level of significant than reject it
and your model p-value is >level of significant than null hypothesis is going to accept.
I'd like to know what method is best suited for predicting event occurrences.
For example, given a set of data from 5 years of malaria infection occurrences and several other factors that affect the occurrences, I'd like to predict the next five years for malaria infection occurrences.
What I thought of doing was to derive a kind of occurrence factor using fuzzy logic rules, and then average the occurrences with the occurrence factor to get the first predicted occurrence, and then average all again with the predicted occurrence and keep on iterating for all five years, but I decided to seek for help online.
There are many ways to do forecasting, each has its own advantages and disadvantages. The science of determining the accuracy of a forecast often consists of trying to minimize error. All forecasting comes down to using the past as a predictor of the future, adjusting it by some amount. E.g. tomorrow the temperature will be the same as today, plus or minus some amount. How you decide the +/- is what varies.
Here are a range of techniques you might want to review:
Moving Averages (simple, single, double)
Exponential Smoothing
Decomposition(Trend + Seasonality + Cyclicals + Irregualrities)
Linear Regression
Multiple Regression
Box-Jenkis (a.k.a. ARIMA,
Auto-Regressive Integrated Moving
Average)
Sorry, for the vague answer but forecasting is complex stuff.
What you describe about feeding your predictions back into the model to produce future predictions is standard stuff. I don't know if "fuzzy logic" gets you anything in particular. As any forecasting instructor will tell you, sometimes you just squint and look at the data. Context is everything.
I would use a logit or probit model to predict occurrence given a set of exogenous circumstances. Not sure why you want to iterate. That would basically be equivalent to including a lag in the regression formula. You could do it, and as long as the coefficient was <1, you wouldn't have the explosion problem.
If you want to introduce an element of endogeneity to the independent variables, you could use a VAR.
I think with your idea as stated, you'll have asymptotic behavior as time goes by. Either your data will converge to 0, or it will explode. That said, you'd probably have to give some data and/or describe its properties before anyone can help you. This is basically a simulation, and the factors are everything when it comes to extrapolation.