How to determine statistical significance using T-Test in Excel? - statistics

I have two groups of data sets, A and B. I would like to know weither the average value of A
significantly differs then B's average. How to do that in Excel 2007?
(I know there's a TTEST formula in excel, I also know I don't need to use the paired version of it, what other parameters do I need to set and how to interpert the result?)
Thanks,
Jon

=ttest(array1,array2,tails,type)
array1 is data set A
array2 is data set B
tails: 1= one tailed, 2 = two tailed. Use one tailed if you are testing whether A is higher than B, or whether A is lower than B. Use two tailed if you are testing whether A is either higher or lower than B. (Probably 1 for your situation.)
type: You said you don't need paired, which is Type 1. Type 2 is if your data sets have equal variance, and Type 3 is if they have unequal variance. For example if the data points in A are all pretty close, but in B they are wildly different, use Type 3.

Related

Calculate risk using Cox model coefficients and mean values

I'm trying to understand the example presented in Appendix C here
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6481149/
Equation C1 is clear to me.
But in Equation C2 they use the mean values.
Such mean values are clear to me in the case of categorical variables for example 1.548 is the mean value of the Sex variable (as shown in the Table 3). Please correct me if I'm wrong.
But in numerical variables I don't understand which mean values are they using. For example for the Age variable they use 3.768, if I understand right, that value is the log of the mean age, should be log(44.15)=1.64. Instead the used value is 3.768.
Please could anybody clarify where does this value come from?
In statistics log often means the natural logarithm, sometimes denoted ln. The four values they take the logarithms of are:
Variable
Reported Mean
ln(Mean)
Reported
Age
44.15
3.788
3.768
BMI
25.61
3.243
3.230
BP Syst
138.6
4.932
4.913
Pulse Rate
75.61
4.326
4.311
The calculated values are not exactly equal to the reported values. But it looks close enough that this is probably the calculation they used. Without the data and/or code they used it's hard to say why the results are different. The study mentions excluding 130 participants because of ethics protections. So, perhaps one table was calculated using a slightly different group of participants than the other table?

What test can I use to calculate significance of optimization lift?

Given the following data for 12 users:
username, number of deals for control, revenue from test, revenue from control
Here's an example of how the data looks like
Can you help me figure out how I can calculate the significance of the hypothesis that the test is more profitable (preferably using excel)?
The measure I was thinking of using was the % of lift in revenues for each customer.
P.s. I have a background in statistics but not an expert so please keep it as simple as possible.
Since each pair of incomes refers to the same individual, you can perform a paired t-test.
Variable 1: Control income
Variable 2: Deals income
Then follow these instructions (copied here for posterity):
In Excel, click Data Analysis on the Data tab.
From the Data Analysis popup, choose t-Test: Paired Two Sample for Means.
Under Input, select the ranges for both Variable 1 and Variable 2.
In Hypothesized Mean Difference, you’ll typically enter zero. This value is the null hypothesis value, which represents no effect. In
this case, a mean difference of zero represents no difference between
the two methods, which is no effect.
Check the Labels checkbox if you have meaningful variables labels in row 1. This option helps make the output easier to interpret. Ensure
that you include the label row in step #3.
Excel uses a default Alpha value of 0.05, which is usually a good value. Alpha is the significance level. Change this value only when
you have a specific reason for doing so.
Click OK.
Alternatively, you can indeed calculate the difference between the two incomes, and then perform a one sample t-test (assuming that the difference is zero). However, such a test is not available out-of-the-box in Excel; the procedure is described here.

Create exchanges with bounded random parameters and fixed sum to be used in Montecarlo

I have to run a montecarlo where, for some products, certain exchanges are relate to each other in the sense that my process can take as input any of the products in different (bounded) proportions but with fixed sum.
Example:
my product a takes as inputs a total of 10 kg of x,y, and z alltogheter and x has a uniform distribution that goes from 0 to 4 kg, y from 1 to 6 and z from 3 to 8 with their sum that must be equal to 10. So, every iteration I would need to get a random number for my three exchanges within their bounds making sure that their sum is always 10.
I have seen that in stats_array it is possible to set the bounds of the distributions and thus create values in a specified interval but this would not ensure that the sum of my random vector equals the fixed sum of 10.
Wondering if there is already a (relatively) straightforward way to implemented this in bw2
Otherwise the only way I see this feasible is to create all the uncertainity parameters with ParameterVectorLCA, tweak the value in the array for those products that must meet the aforementioned requirements (e.g with something like this or this) and then use this array with modified parameters to re-run my MC .
We are working on this in https://github.com/PascalLesage/brightway2-presamples, but it isn't ready yet. I don't know of any way to do this currently without hacking something together by subclassing the MonteCarloLCA.

How to obtain Incremental standard deviations from a set of standard deviations?

I have a data set containing three columns, first column represents number of trials, second column represents experimental values, and the third column represents corresponding standard deviation.
With each experiment there is an increment in my experimental values. To get the incremental values, I hold my first value as the reference value and subtract this reference value from each subsequent value and use them to create fourth column of these incremental values.
My problem begins right from here. How do I create a new set of incremental standard deviations for the incremental experimental values I got? My apology if the problem is not well defined but hopefully someone will eventually be able to help me out. Many thanks!
Below is my data set,
Trial Mean SD Incr Mean Incre SD
1 45.311 4.668 0
2 56.682 2.234 11.371
3 62.197 2.266 16.886
4 70.550 4.751 25.239
5 80.528 4.412 35.217
6 87.453 4.542 42.142
7 89.979 2.185 44.668
8 96.859 3.476 51.548
To be clear, for other readers, your incremental mean is actually the difference between trial 1 and the other trials.
Variances add directly when you subtract (or add) independent normal distributions. So you first want to convert that standard deviation to a variance by squaring it, and then you can add the variances, and then you can take the square root to turn it back into a standard deviation. Note when using this kind of Pythagorean combination, you are assuming that trial 1 is independent from the trials, so for example, you cannot do things like have some sample in both trials.
Logically this makes sense that your so called "incremental SD" will always be greater than the individual SDs, since the uncertainty of both distributions contributes towards the uncertainty of the difference.

Compute statistical significance with Excel

I have 2 columns and multiple rows of data in excel. Each column represents an algorithm and the values in rows are the results of these algorithms with different parameters. I want to make statistical significance test of these two algorithms with excel. Can anyone suggest a function?
As a result, it will be nice to state something like "Algorithm A performs 8% better than Algorithm B with .9 probability (or 95% confidence interval)"
The wikipedia article explains accurately what I need:
http://en.wikipedia.org/wiki/Statistical_significance
It seems like a very easy task but I failed to find a scientific measurement function.
Any advice over a built-in function of excel or function snippets are appreciated.
Thanks..
Edit:
After tharkun's comments, I realized I should clarify some points:
The results are merely real numbers between 1-100 (they are percentage values). As each row represents a different parameter, values in a row represents an algorithm's result for this parameter. The results do not depend on each other.
When I take average of all values for Algorithm A and Algorithm B, I see that the mean of all results that Algorithm A produced are 10% higher than Algorithm B's. But I don't know if this is statistically significant or not. In other words, maybe for one parameter Algorithm A scored 100 percent higher than Algorithm B and for the rest Algorithm B has higher scores but just because of this one result, the difference in average is 10%.
And I want to do this calculation using just excel.
Thanks for the clarification. In that case you want to do an independent sample T-Test. Meaning you want to compare the means of two independent data sets.
Excel has a function TTEST, that's what you need.
For your example you should probably use two tails and type 2.
The formula will output a probability value known as probability of alpha error. This is the error which you would make if you assumed the two datasets are different but they aren't. The lower the alpha error probability the higher the chance your sets are different.
You should only accept the difference of the two datasets if the value is lower than 0.01 (1%) or for critical outcomes even 0.001 or lower. You should also know that in the t-test needs at least around 30 values per dataset to be reliable enough and that the type 2 test assumes equal variances of the two datasets. If equal variances are not given, you should use the type 3 test.
http://depts.alverno.edu/nsmt/stats.htm

Resources