I'm running an A/B test, and I want to check for statistical significance with 90% confidence two-sided. I've calculated standard errors, z-scores and p-values.
I'm saying that I have significance when my p-value is lower than 0.1 and greater than 0.9. Am I right? I'm using this tool https://vwo.com/blog/ab-testing-significance-calculator-spreadsheet-in-excel/
I'm doubting whether it should be lower than 0.05 and greater than 0.95.
I think I'm mixing things up in my head. Because, I have the p-value, and I'm saying that my alpha is 0.1. I'm not calculating the alpha/2 and p-value/2, nor I need it. So, should I just check if the p-value is lower than 0.1 and that's all? Not even greater than 0.9?
90% confidence interval means you have 10% as the Region of Rejection, which is divided into 2 equal halves on both sides in the tail area.
You are correct! Just check if the p-value is less than 5% or 0.05 and is greater than 95% or 0.95 as this range represents the Region of Rejection
If the p-value lies in the above range, you can reject the Null Hypothesis.
Related
I have a question about how bollinger bands are plotted in relation to statistics. In statistics, once a standard deviation is calculated from a mean of a set of numbers, shouldn't interpreting a 1 standard deviation be done so that you divide this number is half, and plot each half above and below the mean? By doing so, you can then determine whether or not it's data points fall within this 1 standard deviation.
Then, correct me if I am wrong, but aren't bollinger bands NOT calculated this way?? Instead, it takes a 1 standard deviation (if you have set it to 1) and plots the WHOLE value both above and below the mean (not splitting in two), thereby doubling the size of this standard-deviation?
Bollinger bands loosely state that that 68% of data falls within the 1st band, 1 standard deviation (loosely because the empirical rule in statistics requires that distributions be normal distributions which most often stock prices are not). However if this empirical rule is from statistics where 1 standard deviation is split in half, that means that applying a 68% probability in to an entire bollinger band is wrong. ??? is this correct??
You can modify the deviation multiples to suite your purpose, you can use 0.5 for example.
i did get to know confidence ellipses during university (but that has been some semesters ago).
In my current project, I'd like to calculate a 3 dimensional confidence ellipse/ellipsoid in which I can set the probability of success to e.g. 90%. The center of the data is shifted from zero.
At the moment i am calculating the variance-covariance matrix of the dataset and from it its eigenvalues and eigenvectors which i then represent as an ellipsoid.
here, however, I am missing the information on the probability of success, which I cannot specify.
What is the correct way to calculate a confidence ellipsoid with e.g. 90% probability of success ?
I have a dataset with the low-water and high-water surface area of lakes/ponds within a delta for each year. These lakes can undergo substantial change from year to year, and sometimes can dry out completely. As such, surface area can have values of 0 during the low-water period. I'm trying to quantify the magnitude of flooding in the spring on the surface areas of these lakes. Given the high inter annual variations in surface area, I need to compare the low-water value from the previous year to the high-water value of the following year to quantify this magnitude; comparing to a mean isn't sensitive enough. However, given the low water surface area of 0 for some lakes, I cannot quantify percent change.
My current idea is to do an "inverse" of percent change (don't know how else to describe it), where I divide the low-water value by the high-water value. This gives me a scale where large change will equal 0 and little change will equal 1. However, again small changes from a surface area of 0 will be over represented. Any idea how I could accurately compare the magnitude of flooding in such a case?
I have access to my services' latency metrics at all percentiles. I need to calculate the trimmed 10% mean of the service's latency now. Is there a way I can approximate the trimmed 10% mean using just the percentiles data? I understand I can simply calculate the mean using a script for the transactions between the 10th percentile and 90th percentile, but since this data is to be used directionally only, I was wondering if there is an easy hack to guesstimate it as doing it at scale would be expensive.
This is really more suitable for stats.stackexchange.com, but anyway you can approximate the trimmed mean or any other sample statistic given percentiles. From the percentiles, construct the equivalent histogram. Each bar has the width from one percentile to the next, and height equal to the difference of percentiles. (So if you reversed the process and added up the bars, you would get the percentiles again.)
Now with that histogram, calculate the sample statistic. The exact value is an integral. An easy approximation is to generate a number of data from the span of each bar, and then use those data to calculate the sample statistic according to the ordinary formula. The first thing to try is to just generate data equal to the midpoint of each bar, with the number of values in each bin proportional to the bar height.
I don't know a package to do this, but with this description maybe you can look it up, or work out the details.
enter image description here
Hey, this data does look correlated correct? Pearson value says the correlation is only .2; I assume the value is this low because the correlation is not linear. Thanks.
This looks like an exponential decrease, while your x-values are discrete. You could log-transform your y-values and jitter your x-values (adding random numbers of +/- 0.2 or something of this order of magnitude) and then recheck the correlation.