Problem
I have a dataset of pricing information
I have a function of 2 variables to calculate a price
Given a price, I want to fit my 2 variables to the price so that when pushed through my function, they output the price
I can do the above using an fmin function of any description. But the twist is that I want to vectorize it so that given a vector of 10_000 different prices, I get back 10_000X2 input variables.
The variations of fmin that I have used e.g scipy.optimize.fmin or scipy.optimize.fmin_bfgs do not support vectorized outputs. i.e they expect a cost function that returns a scalar and they expect to return a single set of variables rather than a matrix.
Any solutions or advice would be greatly appreciated
Related
t-tests have been yielding just one output...but I want 10 t-tests.
Each t-test should compare each one of the 10 values in list to 0.
I have tried the below:
import scipy
from scipy import stats
list2=[0.10415380403918414, 0.09142102934943379, 0.08340408682911706, 0.07791383429638124, 0.0738177221067812, 0.07111840615962706, 0.0673345711222398, 0.06431875318226271, 0.06074216826770115, 0.052948996685723906]
print(scipy.stats.ttest_ind(list2,[0]*10))
Each t-test should compare each one of the 10 values in list to 0. that is, I should get 10 t-test comparison, so 10 t-tests should be outputted
All of this is to say: I am seeking 10 rows of output (each corresponding to a unique t-test, therefore I am seeking 10 t-tests), but the code I have now just provides me with one row output, i.e. just one test
listofzeros=[0,0,0,0,0,0,0,0,0,0]
for i in range(10):
print(scipy.stats.ttest_ind(list2,listofzeros))
Firstly, there is no need to use stats.ttest_ind and create a list of zeros with the same length as the sample. You just can use stats.ttest_1samp, as follows:
print(scipy.stats.ttest_1samp(list2,0,))
That will lead to the same result but without tweaking returning the r-static value and the p-value for the mean of the input sample not returning the results per value per sample.
To be more comprehensive, The t-test is used to determine whether the sample "mean" is statistically significantly different from the population "mean".
What you are trying to do is to perform a two-sample T-test which will work on the mean of the two lists, not on every two associated values of the two samples.
I want to write a function that gets a List of double values and a value for the variance as parameters and returns an adjusted List.
List<double> adjustTimeSeriesAsForecast(List<double> ts, double variance) {...}
This adjusted List should differ slightly or strongly from the original List, depending on the variance passed. However, the differences should be both above and below the original List, so that the mean percentage error (MPE) between the Lists remains the same and only the mean absolute percentage error (MAPE) changes. The returned List should look like a forecast.
With this function I want to test a system for stability, which gets a forecasted time series list for weather as input parameter.
How can I adjust a List of real weather data to look like a forecast?
I have to run a montecarlo where, for some products, certain exchanges are relate to each other in the sense that my process can take as input any of the products in different (bounded) proportions but with fixed sum.
Example:
my product a takes as inputs a total of 10 kg of x,y, and z alltogheter and x has a uniform distribution that goes from 0 to 4 kg, y from 1 to 6 and z from 3 to 8 with their sum that must be equal to 10. So, every iteration I would need to get a random number for my three exchanges within their bounds making sure that their sum is always 10.
I have seen that in stats_array it is possible to set the bounds of the distributions and thus create values in a specified interval but this would not ensure that the sum of my random vector equals the fixed sum of 10.
Wondering if there is already a (relatively) straightforward way to implemented this in bw2
Otherwise the only way I see this feasible is to create all the uncertainity parameters with ParameterVectorLCA, tweak the value in the array for those products that must meet the aforementioned requirements (e.g with something like this or this) and then use this array with modified parameters to re-run my MC .
We are working on this in https://github.com/PascalLesage/brightway2-presamples, but it isn't ready yet. I don't know of any way to do this currently without hacking something together by subclassing the MonteCarloLCA.
I have a mean value x and I want to model it into the future. I want to output a value of what it could be in 6 months. Assuming the value follows a normal distribution and we have the standard deviation how do I randomize the value x while following a normal distribution? I'm doing this in excel, but just understanding it would help too! Basically I want to produce numbers 68% of the time within 1 deviation, 95% of the time withing 2 deviation etc. etc.
You can use the excel function 'NORMINV' to convert a random input 'RAND()' to a normal distribution.
=NORMINV(RAND(),Mean,Std Dev)
i.e. if you repeat this many times, save and analyze the results, you'll see a bell curve over the input Mean value.
Does that get you started?
The tricky bit comes when you come up with the formula to predict what a value will be in the future using this.
I have 2 columns and multiple rows of data in excel. Each column represents an algorithm and the values in rows are the results of these algorithms with different parameters. I want to make statistical significance test of these two algorithms with excel. Can anyone suggest a function?
As a result, it will be nice to state something like "Algorithm A performs 8% better than Algorithm B with .9 probability (or 95% confidence interval)"
The wikipedia article explains accurately what I need:
http://en.wikipedia.org/wiki/Statistical_significance
It seems like a very easy task but I failed to find a scientific measurement function.
Any advice over a built-in function of excel or function snippets are appreciated.
Thanks..
Edit:
After tharkun's comments, I realized I should clarify some points:
The results are merely real numbers between 1-100 (they are percentage values). As each row represents a different parameter, values in a row represents an algorithm's result for this parameter. The results do not depend on each other.
When I take average of all values for Algorithm A and Algorithm B, I see that the mean of all results that Algorithm A produced are 10% higher than Algorithm B's. But I don't know if this is statistically significant or not. In other words, maybe for one parameter Algorithm A scored 100 percent higher than Algorithm B and for the rest Algorithm B has higher scores but just because of this one result, the difference in average is 10%.
And I want to do this calculation using just excel.
Thanks for the clarification. In that case you want to do an independent sample T-Test. Meaning you want to compare the means of two independent data sets.
Excel has a function TTEST, that's what you need.
For your example you should probably use two tails and type 2.
The formula will output a probability value known as probability of alpha error. This is the error which you would make if you assumed the two datasets are different but they aren't. The lower the alpha error probability the higher the chance your sets are different.
You should only accept the difference of the two datasets if the value is lower than 0.01 (1%) or for critical outcomes even 0.001 or lower. You should also know that in the t-test needs at least around 30 values per dataset to be reliable enough and that the type 2 test assumes equal variances of the two datasets. If equal variances are not given, you should use the type 3 test.
http://depts.alverno.edu/nsmt/stats.htm