I have individual values and want to test each values significance against a randomly generated dataset (20 values but could generate more if needed). What is the best statistical test to do this please? Thanks!
Related
I want to compare the output of my simulation model to the observed data in various ways, like using an independent t-test to compare the means. However, when I do the independent t-test in SPSS, I get a different result than the independent t-test in Excel. I don't know why so I don't know which one I should use. Can anybody tell me why the results are different?
Here is the independent t-test in SPSS (with t-value 0,181 and p-value 0,857):
Here is the t-test: Two-Sample Assuming Equal Variances in Excel (also the t-test assuming unequal variances is different than the one in SPSS):
After running the TTest on your original data in SPSS, with the FULL data, I got perfectly matching results from spss and excel (see below).
The problem you had has to do with one case seemingly missing from your "observed" set (as noted by #TomSharpe).
This could be due to a simple error in copying the data to SPSS. In SPSS you had to have the two ranges in the same column - you may have done this manually with an error - in which case you should learn to use restructure commands, namely varstocases, to avoid such mistakes.
On the other hand, if the data is full but still a case seems to be missing from the analysis, you should check if the data is weighted by another variable. Weighted data can change the apparent number of observarions in the analysis, and of course change the results.
.
t-tests have been yielding just one output...but I want 10 t-tests.
Each t-test should compare each one of the 10 values in list to 0.
I have tried the below:
import scipy
from scipy import stats
list2=[0.10415380403918414, 0.09142102934943379, 0.08340408682911706, 0.07791383429638124, 0.0738177221067812, 0.07111840615962706, 0.0673345711222398, 0.06431875318226271, 0.06074216826770115, 0.052948996685723906]
print(scipy.stats.ttest_ind(list2,[0]*10))
Each t-test should compare each one of the 10 values in list to 0. that is, I should get 10 t-test comparison, so 10 t-tests should be outputted
All of this is to say: I am seeking 10 rows of output (each corresponding to a unique t-test, therefore I am seeking 10 t-tests), but the code I have now just provides me with one row output, i.e. just one test
listofzeros=[0,0,0,0,0,0,0,0,0,0]
for i in range(10):
print(scipy.stats.ttest_ind(list2,listofzeros))
Firstly, there is no need to use stats.ttest_ind and create a list of zeros with the same length as the sample. You just can use stats.ttest_1samp, as follows:
print(scipy.stats.ttest_1samp(list2,0,))
That will lead to the same result but without tweaking returning the r-static value and the p-value for the mean of the input sample not returning the results per value per sample.
To be more comprehensive, The t-test is used to determine whether the sample "mean" is statistically significantly different from the population "mean".
What you are trying to do is to perform a two-sample T-test which will work on the mean of the two lists, not on every two associated values of the two samples.
I have a Iris Dataset & I want to calculate total count of features in my dataset.Which library or function can be used to calculate the result???
Pls help me out.
If what you are asking is the number of columns in the dataset, then this will be a simple way to do so:
len(iris.columns)
I have two 6x4 contingency tables for frequency data. They are based on the same type of sampling criteria of a number of discreet variables but for two condition (before and after). I would like to compare these statistcally to see how much - or not - they differ.
A Chi square related test seems appropriate but normally this gives a result in comparison to the theoretical to calculate the statistic. So in other words I need to swap the theoretical for the second table. Of course it doesn't have to be a basic chi square test - any other appropriate test would be ok.
I have access to XLSTAT, Excel and SPSS. And would appreciate some help on this.
I have a data matrix depicting the number of telephone calls from one telephone to another, all calls are unidirectional. The rows represent days and the columns represent hours. The data is not a sample - it is the full population. Rows are days of the week and columns are one hour blocks of a 24 hour clock. Values in the cells represent the number of telephone calls from telephone A to telephone B for that specific hour.
I would like to have a repeatable measure that enables me to tell my audience that the likelihood of this distribution occurring randomly is <x.
I'd like the formula for Excel 2007 or, as a last resort, VBA code.
I've searched and found answers that tell me how to statistically determine the significance of differences between two different data sets but not how to measure for just one data set against a random outcome.
Thanx in advance.
If the total number of calls in a given hour is T, and the total calling population is P; then the number of calls from A to B should be about T/P if "random". To test whether this is really the case you'd use the Chi-squared test. I'm afraid I don't have time to give you the full answer - but it'd be the testvalue=sum((observed_i/P - (T/P))^2/(T/P)) where you check the testvalue against the chi-squared table, and you can pick off the probability too. Excel can calculate these values. Refer Chi-Squared Test for more details.