stats, t-test, lists, and only one output - python-3.x

t-tests have been yielding just one output...but I want 10 t-tests.
Each t-test should compare each one of the 10 values in list to 0.
I have tried the below:
import scipy
from scipy import stats
list2=[0.10415380403918414, 0.09142102934943379, 0.08340408682911706, 0.07791383429638124, 0.0738177221067812, 0.07111840615962706, 0.0673345711222398, 0.06431875318226271, 0.06074216826770115, 0.052948996685723906]
print(scipy.stats.ttest_ind(list2,[0]*10))
Each t-test should compare each one of the 10 values in list to 0. that is, I should get 10 t-test comparison, so 10 t-tests should be outputted
All of this is to say: I am seeking 10 rows of output (each corresponding to a unique t-test, therefore I am seeking 10 t-tests), but the code I have now just provides me with one row output, i.e. just one test

listofzeros=[0,0,0,0,0,0,0,0,0,0]
for i in range(10):
print(scipy.stats.ttest_ind(list2,listofzeros))

Firstly, there is no need to use stats.ttest_ind and create a list of zeros with the same length as the sample. You just can use stats.ttest_1samp, as follows:
print(scipy.stats.ttest_1samp(list2,0,))
That will lead to the same result but without tweaking returning the r-static value and the p-value for the mean of the input sample not returning the results per value per sample.
To be more comprehensive, The t-test is used to determine whether the sample "mean" is statistically significantly different from the population "mean".
What you are trying to do is to perform a two-sample T-test which will work on the mean of the two lists, not on every two associated values of the two samples.

Related

Statistically compare an experimental data and a theory value

I appreciate if someone could answer my question. I have two values. The first one is experimental data deduced from only one measurement. The uncertainty for this values is determined. The second value is theory result. My question is how to statistically compare these two values? I tried to use t-test, but failed because the number of freedom is df = 1-1 =0 (only one experiment was conducted to measure the first value).

fmin for multiple sets of coefficients at once

Problem
I have a dataset of pricing information
I have a function of 2 variables to calculate a price
Given a price, I want to fit my 2 variables to the price so that when pushed through my function, they output the price
I can do the above using an fmin function of any description. But the twist is that I want to vectorize it so that given a vector of 10_000 different prices, I get back 10_000X2 input variables.
The variations of fmin that I have used e.g scipy.optimize.fmin or scipy.optimize.fmin_bfgs do not support vectorized outputs. i.e they expect a cost function that returns a scalar and they expect to return a single set of variables rather than a matrix.
Any solutions or advice would be greatly appreciated

Create exchanges with bounded random parameters and fixed sum to be used in Montecarlo

I have to run a montecarlo where, for some products, certain exchanges are relate to each other in the sense that my process can take as input any of the products in different (bounded) proportions but with fixed sum.
Example:
my product a takes as inputs a total of 10 kg of x,y, and z alltogheter and x has a uniform distribution that goes from 0 to 4 kg, y from 1 to 6 and z from 3 to 8 with their sum that must be equal to 10. So, every iteration I would need to get a random number for my three exchanges within their bounds making sure that their sum is always 10.
I have seen that in stats_array it is possible to set the bounds of the distributions and thus create values in a specified interval but this would not ensure that the sum of my random vector equals the fixed sum of 10.
Wondering if there is already a (relatively) straightforward way to implemented this in bw2
Otherwise the only way I see this feasible is to create all the uncertainity parameters with ParameterVectorLCA, tweak the value in the array for those products that must meet the aforementioned requirements (e.g with something like this or this) and then use this array with modified parameters to re-run my MC .
We are working on this in https://github.com/PascalLesage/brightway2-presamples, but it isn't ready yet. I don't know of any way to do this currently without hacking something together by subclassing the MonteCarloLCA.

Spotfire- calculated column with row ratios based on condition

I’m having trouble understanding if Spotfire allows for conditional computations between arbitrary rows containing numerical data repeated over data groups. I could not find anything to cue me onto a right solution.
Context (simplified): I have data from a sensor reporting state of a process and this data is grouped into bursts/groups representing a measurement taking several minutes each.
Within each burst the sensor is measuring a signal and if a predefined feature (signal shape) was detected the sensor outputs some calculated value, V quantifying this feature and also reports a RunTime at which this happened.
So in essence I have three columns: Burst number, a set of RTs within this burst and Values associated with these RTs.
I need to add a calculated column to do a ratio of Values for rows where RT is equal to a specific number, let’s say 1.89 and 2.76.
The high level logic would be:
If a Value exists at 1.89 Run Time and a Value exists at 2.76 Run Time then compute the ratio of these values. Repeat for every Burst.
I understand I can repeat the computation over groups using OVER operator but I’m struggling with logic within each group...
Any tips would be appreciated.
Many thanks!
The first thing you need to do here is apply an order to your dataset. I assume the sample data is complete and encompasses the cases in your real data, thus, we create a calculated column:
RowID() as [ROWID]
Once this is done, we can create a calculated column which will compute your ratio over it's respective groups. Just a note, your B4 example is incorrect compared to the other groups. That is, you have your numerator and denominator reversed.
If(([RT]=1.89) or ([RT]=2.76),[Value] / Max([Value]) OVER (Intersect([Burst],Previous([ROWID]))))
Breaking this down...
If(([RT]=1.89) or ([RT]=2.76), limits the rows to those where the RT = 1.89 or 2.76.
Next comes the evaluation if the above condition is TRUE
[Value] / Max([Value]) OVER (Intersect([Burst],Previous([ROWID])))) This takes the value for the row and divides it by the Max([Value]) over the grouping of [Burst] and AllPrevious([ROWID]). This is noted by the Intersect() function. So, the denominator will always be the previous value for the grouping. Note that Max() was a simple aggregate used, but any should do for this case since we are only expecting a single value. All Over() functions require and aggregate to limit the result set to a single row.
RESULTS

Compute statistical significance with Excel

I have 2 columns and multiple rows of data in excel. Each column represents an algorithm and the values in rows are the results of these algorithms with different parameters. I want to make statistical significance test of these two algorithms with excel. Can anyone suggest a function?
As a result, it will be nice to state something like "Algorithm A performs 8% better than Algorithm B with .9 probability (or 95% confidence interval)"
The wikipedia article explains accurately what I need:
http://en.wikipedia.org/wiki/Statistical_significance
It seems like a very easy task but I failed to find a scientific measurement function.
Any advice over a built-in function of excel or function snippets are appreciated.
Thanks..
Edit:
After tharkun's comments, I realized I should clarify some points:
The results are merely real numbers between 1-100 (they are percentage values). As each row represents a different parameter, values in a row represents an algorithm's result for this parameter. The results do not depend on each other.
When I take average of all values for Algorithm A and Algorithm B, I see that the mean of all results that Algorithm A produced are 10% higher than Algorithm B's. But I don't know if this is statistically significant or not. In other words, maybe for one parameter Algorithm A scored 100 percent higher than Algorithm B and for the rest Algorithm B has higher scores but just because of this one result, the difference in average is 10%.
And I want to do this calculation using just excel.
Thanks for the clarification. In that case you want to do an independent sample T-Test. Meaning you want to compare the means of two independent data sets.
Excel has a function TTEST, that's what you need.
For your example you should probably use two tails and type 2.
The formula will output a probability value known as probability of alpha error. This is the error which you would make if you assumed the two datasets are different but they aren't. The lower the alpha error probability the higher the chance your sets are different.
You should only accept the difference of the two datasets if the value is lower than 0.01 (1%) or for critical outcomes even 0.001 or lower. You should also know that in the t-test needs at least around 30 values per dataset to be reliable enough and that the type 2 test assumes equal variances of the two datasets. If equal variances are not given, you should use the type 3 test.
http://depts.alverno.edu/nsmt/stats.htm

Resources