Non-Numeric Ttest? - statistics

What is a statistical test you can use to determine a relationship between two groups that are non-numeric (i.e. looking for the equivalent of a ttest for non-numeric data).
Specifically, I am looking to compare data of plant growth. I want to compare the type of plant (non-numeric) to CO2 uptake (numeric) and determine if there is a statistical association.
Thanks!

Related

Calculate risk using Cox model coefficients and mean values

I'm trying to understand the example presented in Appendix C here
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6481149/
Equation C1 is clear to me.
But in Equation C2 they use the mean values.
Such mean values are clear to me in the case of categorical variables for example 1.548 is the mean value of the Sex variable (as shown in the Table 3). Please correct me if I'm wrong.
But in numerical variables I don't understand which mean values are they using. For example for the Age variable they use 3.768, if I understand right, that value is the log of the mean age, should be log(44.15)=1.64. Instead the used value is 3.768.
Please could anybody clarify where does this value come from?
In statistics log often means the natural logarithm, sometimes denoted ln. The four values they take the logarithms of are:
Variable
Reported Mean
ln(Mean)
Reported
Age
44.15
3.788
3.768
BMI
25.61
3.243
3.230
BP Syst
138.6
4.932
4.913
Pulse Rate
75.61
4.326
4.311
The calculated values are not exactly equal to the reported values. But it looks close enough that this is probably the calculation they used. Without the data and/or code they used it's hard to say why the results are different. The study mentions excluding 130 participants because of ethics protections. So, perhaps one table was calculated using a slightly different group of participants than the other table?

Is there any way to treat multicollinearity between two variables in dataset?

I am doing regression problem where I have to predict weekly sales of 45 outlet of a departmental store. I have a variable named temperature(values in Celsius) and I want to do feature engineering on this column as a practice so I came with the the idea of creating another column which will also be a temperature column but values will be in farenhite which will be derived from celsius column, but that will lead to multicollinearity between these two variable, Is there any way to treat multicollinearity between two variable or should I go with another approach?

Comparing two contingency tables

I have two 6x4 contingency tables for frequency data. They are based on the same type of sampling criteria of a number of discreet variables but for two condition (before and after). I would like to compare these statistcally to see how much - or not - they differ.
A Chi square related test seems appropriate but normally this gives a result in comparison to the theoretical to calculate the statistic. So in other words I need to swap the theoretical for the second table. Of course it doesn't have to be a basic chi square test - any other appropriate test would be ok.
I have access to XLSTAT, Excel and SPSS. And would appreciate some help on this.

How to determine statistical significance using T-Test in Excel?

I have two groups of data sets, A and B. I would like to know weither the average value of A
significantly differs then B's average. How to do that in Excel 2007?
(I know there's a TTEST formula in excel, I also know I don't need to use the paired version of it, what other parameters do I need to set and how to interpert the result?)
Thanks,
Jon
=ttest(array1,array2,tails,type)
array1 is data set A
array2 is data set B
tails: 1= one tailed, 2 = two tailed. Use one tailed if you are testing whether A is higher than B, or whether A is lower than B. Use two tailed if you are testing whether A is either higher or lower than B. (Probably 1 for your situation.)
type: You said you don't need paired, which is Type 1. Type 2 is if your data sets have equal variance, and Type 3 is if they have unequal variance. For example if the data points in A are all pretty close, but in B they are wildly different, use Type 3.

Compute statistical significance with Excel

I have 2 columns and multiple rows of data in excel. Each column represents an algorithm and the values in rows are the results of these algorithms with different parameters. I want to make statistical significance test of these two algorithms with excel. Can anyone suggest a function?
As a result, it will be nice to state something like "Algorithm A performs 8% better than Algorithm B with .9 probability (or 95% confidence interval)"
The wikipedia article explains accurately what I need:
http://en.wikipedia.org/wiki/Statistical_significance
It seems like a very easy task but I failed to find a scientific measurement function.
Any advice over a built-in function of excel or function snippets are appreciated.
Thanks..
Edit:
After tharkun's comments, I realized I should clarify some points:
The results are merely real numbers between 1-100 (they are percentage values). As each row represents a different parameter, values in a row represents an algorithm's result for this parameter. The results do not depend on each other.
When I take average of all values for Algorithm A and Algorithm B, I see that the mean of all results that Algorithm A produced are 10% higher than Algorithm B's. But I don't know if this is statistically significant or not. In other words, maybe for one parameter Algorithm A scored 100 percent higher than Algorithm B and for the rest Algorithm B has higher scores but just because of this one result, the difference in average is 10%.
And I want to do this calculation using just excel.
Thanks for the clarification. In that case you want to do an independent sample T-Test. Meaning you want to compare the means of two independent data sets.
Excel has a function TTEST, that's what you need.
For your example you should probably use two tails and type 2.
The formula will output a probability value known as probability of alpha error. This is the error which you would make if you assumed the two datasets are different but they aren't. The lower the alpha error probability the higher the chance your sets are different.
You should only accept the difference of the two datasets if the value is lower than 0.01 (1%) or for critical outcomes even 0.001 or lower. You should also know that in the t-test needs at least around 30 values per dataset to be reliable enough and that the type 2 test assumes equal variances of the two datasets. If equal variances are not given, you should use the type 3 test.
http://depts.alverno.edu/nsmt/stats.htm

Resources