I am using Excel to build a product recommendation tool, which will take user inputs and use simple calculations to recommend the right product. The results will include a.) a product and b.) a numeric spec of that product.
There are two potential equations I can use in the underlying calculations. In testing, they frequently produce the same result. However, there are cases where they differ. I would like to use simulation with random numbers to understand the situations that produce different results.
Here's a simple structure:
User Input I (random numbers)
User Input II (random numbers)
Equation A Results - Product recommendation and numeric spec
Equation B Results - Product recommendation and numeric spec
Simulation Needs
how often are the results different? On what input conditions is this most likely to happen?
When the products are the same, but the numeric spec is different, what are the stats (stdev, average) of the numeric spec?
what equation is optimal to use in the recommender.
Monte Carlo simulation would seem suitable for this exercise. Formulas and macros are documented here and in videos. I'm struggling with the structure/syntax to compare the outputs of two different equations.
Stata uses the method of quantile calculation called R-2 (https://en.wikipedia.org/wiki/Quantile), whereas Excel uses R-7 with percentile.inc function. My goal is to find a correct formula in Excel that would give results identical to ones in Stata with the R-2 method.
For now, I can see that percentile.inc matches Stata results only for odd and discrete samples (I am dealing with discrete samples). However, the issue occurs with even samples shown here
Conceptually, using percentile.inc in Excel does not seem to be correct since it is an R-7 method, even though it matches with the R-2 method for odd and discrete samples.
My question is what is the simplest formula that would be correct to use in Excel to match Stata percentile results?
So a fairly literal translation of R-2 into Excel for N=4 would look like this (assuming sorted data):
=(INDEX(A$2:A$5,CEILING(C2*4,1))+INDEX(A$2:A$5,FLOOR(C2*4+1,1)))/2
It does indeed go wrong if you try and put in a quantile of zero so that would have to be a special case as would a quantile of 1. I assume Stata gives the lowest and highest values in the set in these two cases?
A more dynamic formula with all the checking would look like this:
=IFS(OR(C2<0,C2>1),"Out of range",C2=0,A$2,C2=1,INDEX(A:A,COUNT(A:A)+1),TRUE,(INDEX(A$2:INDEX(A:A,COUNT(A:A)+1),CEILING(C2*COUNT(A:A),1))+INDEX(A$2:INDEX(A:A,COUNT(A:A)+1),FLOOR(C2*COUNT(A:A)+1,1)))/2)
although you could make it shorter using the Let construct in Microsoft 365.
It would probably be nice to implement this as function in VBA which would sort the data as well as returning the quantile value or of course you could do the sort in a Microsoft 365 formula as well:
=LET(N,COUNT(A:A),sortedRange,SORT(A$2:INDEX(A:A,N+1)),IFS(OR(C2<0,C2>1),"Out of range",C2=0,INDEX(sortedRange,1),C2=1,INDEX(sortedRange,N),TRUE,(INDEX(sortedRange,CEILING(C2*N,1))+INDEX(sortedRange,FLOOR(C2*N+1,1)))/2))
I have the following 3 cases of a numeric metric on a time series(t,t1,t2 etc denotes different hourly comparisons across periods)
If you notice the 3 graphs t(period of interest) clearly has a drop off for image 1 but not so much for image 2 and image 3. Assume this is some sort of numeric metric(raw metric or derived) and I want to create a system/algo which specifically catches case 1 but not case 2 or 3 with t being the point of interest. While visually this makes sense and is very intuitive I am trying to design a way to this in python using the dataframes shown in the picture.
Generally the problem is how do I detect when the time series is behaving very differently from any of the prior weeks.
Edit: When I say different what I really mean is, my metric trends together across periods in t1 to t4 but if they dont and try to separate out of the envelope, that to me is an anomaly. If you notice chart 1 you can see t tries to split out from rest of the tn this is an anomaly for me. in other cases t is within the bounds of other time periods. Hope this helps.
With small data the best is if you can come up with a good transformation into a simpler representation.
In this case I would try the following:
Distance to the median along the time-axis. Then a summary of that, could be median, Mean-Squared-Error etc
Median of the cross-correlation of the signals
I have two 6x4 contingency tables for frequency data. They are based on the same type of sampling criteria of a number of discreet variables but for two condition (before and after). I would like to compare these statistcally to see how much - or not - they differ.
A Chi square related test seems appropriate but normally this gives a result in comparison to the theoretical to calculate the statistic. So in other words I need to swap the theoretical for the second table. Of course it doesn't have to be a basic chi square test - any other appropriate test would be ok.
I have access to XLSTAT, Excel and SPSS. And would appreciate some help on this.
I have 2 columns and multiple rows of data in excel. Each column represents an algorithm and the values in rows are the results of these algorithms with different parameters. I want to make statistical significance test of these two algorithms with excel. Can anyone suggest a function?
As a result, it will be nice to state something like "Algorithm A performs 8% better than Algorithm B with .9 probability (or 95% confidence interval)"
The wikipedia article explains accurately what I need:
http://en.wikipedia.org/wiki/Statistical_significance
It seems like a very easy task but I failed to find a scientific measurement function.
Any advice over a built-in function of excel or function snippets are appreciated.
Thanks..
Edit:
After tharkun's comments, I realized I should clarify some points:
The results are merely real numbers between 1-100 (they are percentage values). As each row represents a different parameter, values in a row represents an algorithm's result for this parameter. The results do not depend on each other.
When I take average of all values for Algorithm A and Algorithm B, I see that the mean of all results that Algorithm A produced are 10% higher than Algorithm B's. But I don't know if this is statistically significant or not. In other words, maybe for one parameter Algorithm A scored 100 percent higher than Algorithm B and for the rest Algorithm B has higher scores but just because of this one result, the difference in average is 10%.
And I want to do this calculation using just excel.
Thanks for the clarification. In that case you want to do an independent sample T-Test. Meaning you want to compare the means of two independent data sets.
Excel has a function TTEST, that's what you need.
For your example you should probably use two tails and type 2.
The formula will output a probability value known as probability of alpha error. This is the error which you would make if you assumed the two datasets are different but they aren't. The lower the alpha error probability the higher the chance your sets are different.
You should only accept the difference of the two datasets if the value is lower than 0.01 (1%) or for critical outcomes even 0.001 or lower. You should also know that in the t-test needs at least around 30 values per dataset to be reliable enough and that the type 2 test assumes equal variances of the two datasets. If equal variances are not given, you should use the type 3 test.
http://depts.alverno.edu/nsmt/stats.htm