Left skewed data generator (Excel) - excel

I am looking for a function in excel that will allow me to generator left skewed data with the option of picking a mean and standard deviation.
E.g If I want to generate 5 numbers with SD 2 and mean 45 , it must be 43,44,45,46,47.
I would also like to be able to pick the upper and lower limit. I don't want 0,44,46,46,90

This isn't an Excel question (i.e. it has nothing to do with Excel).
I don't think there's a standard function in any application or language that will give you what you need, you're basically asking for someone to develop an algorithm for you. If you want to do this in Excel you need some VBA code..
I don't mind having a crack at the algorithm but you need to be more specific with your parameters..
are there always 5 numbers returned? (this simplifies it hugely)
do they have to be integers?
also the SD of 43,44,45,46,47 is 1.414 not 2, 2 is the variance

Related

Dividing people in to groups based on strength and rank

Edited:
after using Solver which Saulo Suggested I have managed to get excel sorting them in to groups up to 8 groups of 3. though am approaching troubles when going further ideally at this time I need to be able to do 18 groups of 3. but even with the same settings obviously adjusting for the increase in groups excel seems to belly up on the process and fails, any suggestions to adapt to this?
I am trying to figure out an easy and as accurate as possible without going too crazy with the math and formulas as I am basic with my excel coding (coding in general) to calculate the ideal groups of 3 based on rank and strength for a video game.
I want to pair the strongest with the weakest and then fill the gaps evenly for the 3rd person. so, that each team’s overall strength is the same roughly.
factors I have is a designated leader(rank) and an overall power level(strength).
doing this manually isn't too hard but trying to automate it is. any thoughts or suggestions would be amazing!?
something like this but automated which is where I am getting stuck, as I want to be able to add more players and adjust strengths as they come along.
Hope this makes sense.
Jordan, you have a classic situation to use "Solver". First of all, you have to make "Solver available in your Excel. Select Home -> Option -> Supplements -> Solver. Then, the solver button´s will be at the "Data" menu.
Solver is about solving a problem with especific conditions, changing specific cell, with specifc purpose. In your case, your purpose is creat teams with the minimal strength diference. A condition of your problem is that teams should have the same number of players. See how I organized the sheet.
Sheet
When do you open solver, the first field is "Seat Goal". Our goal (or purpose) is reduce the diference between teams as minimal as possible. So we selected the cell with the diference between teams. Then we have to tell to excel that our purpose is that cell have the minimal value (chossing "min").
Then we have to tell excel wich cells they can change values to achieve our goal. In this case, Excel can change the teams of player, so we select the cells with the teams.
Last we have to tell excel whats condition (or restrictions) of this problem. The first restction is that the total of player of team 1 is tree. The total of players of team 2 is, both, 3. Then we have to tell to excel what are the limits (superior and inferior) of values of variable cells. In this case, we chose superior or equal to 1, AND (other restrction) inferior or equal to 2.
Ok. Now we have a problem with specif goal, changing the value of specifc cells, with specif restriction. Now solver can work. Then choose the method of soluction "Evolutionary", Honestly, I do know the diference between the methods, but my experience is that evolutionary method works better than the others.
I recomend reading the excel tutorial on solver. At first, all of us think that is too dificult, but believe that is simpler than it seems.

Excel Solver, way to maximize two values?

I am a user of microsoft excels solver, and am pretty sure it is not possible to solve to maximize for two values. I was wondering if anyone might have another clever way to do this.
Basically I have a column of numbers between 1 and 30 that I need to look over about and pull out 9 to 10 values (out of 200) based on a couple other constraints. I would also like to not just maximize this value, but also a probability value (range from 0 to 1) that I would also like to maximize.
Adding them up won't work as that would grossly undervalue the probability value and multiplying may do the opposite by overvaluing the probability. Any Strategies to handle this problem would be greatly appreciated.
This is an example of multi-objective optimization, which has an extensive literature. As the Wikipedia article shows, this can lead to some pretty deep waters.
By far the easiest approach is that of linear scalarization. This refers to replacing a vector of 2 (or more) objective functions by a single (hence scalar) objective function which is a linear combination of the objective function. What you can do with the solver is to create 2 cells to hold the relative weights to assign to the two objectives. These will be 2 numbers in the range 0 and 1 which sum to 1. Then create a new objective function which is the SUMPRODUCT (linear combination) of these weights and the objectives. Then -- jut use the solver to optimize this objective function. If you aren't happy with the results -- adjust the weights. There is no one right answer. One of the advantages of this approach is that it allows a decision maker to clarify the relevant importance of the objectives.

How do I use a standard distribution to guess where the value falls in the future?

I have a mean value x and I want to model it into the future. I want to output a value of what it could be in 6 months. Assuming the value follows a normal distribution and we have the standard deviation how do I randomize the value x while following a normal distribution? I'm doing this in excel, but just understanding it would help too! Basically I want to produce numbers 68% of the time within 1 deviation, 95% of the time withing 2 deviation etc. etc.
You can use the excel function 'NORMINV' to convert a random input 'RAND()' to a normal distribution.
=NORMINV(RAND(),Mean,Std Dev)
i.e. if you repeat this many times, save and analyze the results, you'll see a bell curve over the input Mean value.
Does that get you started?
The tricky bit comes when you come up with the formula to predict what a value will be in the future using this.

Generating random numbers with normal distribution in Excel

I want to produce 100 random numbers with normal distribution (with µ=10, σ=7) and then draw a quantity diagram for these numbers.
How can I produce random numbers with a specific distribution in Excel 2010?
One more question:
When I produce, for example, 20 random numbers with RANDBETWEEN(Bottom,Top), the numbers change every time the sheet recalculates. How can I keep this from happening?
Use the NORMINV function together with RAND():
=NORMINV(RAND(),10,7)
To keep your set of random values from changing, select all the values, copy them, and then paste (special) the values back into the same range.
Sample output (column A), 500 numbers generated with this formula:
IF you have excel 2007, you can use
=NORMSINV(RAND())*SD+MEAN
Because there was a big change in 2010 about excel's function
As #osknows said in a comment above (rather than an answer which is why I am adding this), the Analysis Pack includes Random Number Generation functions (e.g. NORM.DIST, NORM.INV) to generate a set of numbers. A good summary link is at http://www.bettersolutions.com/excel/EUN147/YI231420881.htm.
Rand() does generate a uniform distribution of random numbers between 0 and 1, but the norminv (or norm.inv) function is taking the uniform distributed Rand() as an input to generate the normally distributed sample set.
About the recalculation:
You can keep your set of random values from changing every time you make an adjustment, by adjusting the automatic recalculation, to: manual recalculate. (Re)calculations are then only done when you press F9. Or shift F9.
See this link (though for older excel version than the current 2013) for some info about it: https://support.office.com/en-us/article/Change-formula-recalculation-iteration-or-precision-73fc7dac-91cf-4d36-86e8-67124f6bcce4.
Take a look at the Wikipedia article on random numbers as it talks about using sampling techniques. You can find the equation for your normal distribution by plugging into this one
(equation via Wikipedia)
As for the second issue, go into Options under the circle Office icon, go to formulas, and change calculations to "Manual". That will maintain your sheet and not recalculate the formulas each time.
Another interesting way to do this is using the Box-Muller Method. This lets you generate a normal distribution with mean of 0 and standard deviation σ (or variance σ2) of 1 using two uniform random distributions between 0 and 1. Then you can take this Norm(0,1) distribution and scale it to whatever mean and standard deviation you want.
Here's the formula in excel for a normal(0, 1) distribution:
=SQRT(-2*LN( RAND()))*COS(2 * PI()*RAND())
Then use this formula to scale your normal distribution to mean 10 and standard deviation of 7:
Norm(µ=b, σ=a) = a*Norm(µ=0, σ2=1) + b
This would make the equation in Excel:
=7* SQRT(-2*LN( RAND()))*COS(2 * PI()*RAND()) + 10
You can read more about the math behind this Box-Muller Equation on en.Wikipedia
Note that this equation only works if you calculate the cosine function using radians.
The numbers generated by
=NORMINV(RAND(),10,7)
are uniformally distributed. If you want the numbers to be normally distributed, you will have to write a function I guess.

Statistically removing erroneous values

We have a application where users enter prices all day. These prices are recorded in a table with a timestamp and then used for producing charts of how the price has moved... Every now and then the user enters a price wrongly (eg. puts in a zero to many or to few) which somewhat ruins the chart (you get big spikes). We've even put in an extra confirmation dialogue if the price moves by more than 20% but this doesn't stop them entering wrong values...
What statistical method can I use to analyse the values before I chart them to exclude any values that are way different from the rest?
EDIT: To add some meat to the bone. Say the prices are share prices (they are not but they behave in the same way). You could see prices moving significantly up or down during the day. On an average day we record about 150 prices and sometimes one or two are way wrong. Other times they are all good...
Calculate and track the standard deviation for a while. After you have a decent backlog, you can disregard the outliers by seeing how many standard deviations away they are from the mean. Even better, if you've got the time, you could use the info to do some naive Bayesian classification.
That's a great question but may lead to quite a bit of discussion as the answers could be very varied. It depends on
how much effort are you willing to put into this?
could some answers genuinely differ by +/-20% or whatever test you invent? so will there always be need for some human intervention?
and to invent a relevant test I'd need to know far more about the subject matter.
That being said the following are possible alternatives.
A simple test against the previous value (or mean/mode of previous 10 or 20 values) would be straight forward to implement
The next level of complexity would involve some statistical measurement of all values (or previous x values, or values of the last 3 months), a normal or Gaussian distribution would enable you to give each value a degree of certainty as to it being a mistake vs. accurate. This degree of certainty would typically be expressed as a percentage.
See http://en.wikipedia.org/wiki/Normal_distribution and http://en.wikipedia.org/wiki/Gaussian_function there are adequate links from these pages to help in programming these, also depending on the language you're using there are likely to be functions and/or plugins available to help with this
A more advanced method could be to have some sort of learning algorithm that could take other parameters into account (on top of the last x values) a learning algorithm could take the product type or manufacturer into account, for instance. Or even monitor the time of day or the user that has entered the figure. This options seems way over the top for what you need however, it would require a lot of work to code it and also to train the learning algorithm.
I think the second option is the correct one for you. Using standard deviation (a lot of languages contain a function for this) may be a simpler alternative, this is simply a measure of how far the value has deviated from the mean of x previous values, I'd put the standard deviation option somewhere between option 1 and 2
You could measure the standard deviation in your existing population and exclude those that are greater than 1 or 2 standard deviations from the mean?
It's going to depend on what your data looks like to give a more precise answer...
Or graph a moving average of prices instead of the actual prices.
Quoting from here:
Statisticians have devised several methods for detecting outliers. All the methods first quantify how far the outlier is from the other values. This can be the difference between the outlier and the mean of all points, the difference between the outlier and the mean of the remaining values, or the difference between the outlier and the next closest value. Next, standardize this value by dividing by some measure of scatter, such as the SD of all values, the SD of the remaining values, or the range of the data. Finally, compute a P value answering this question: If all the values were really sampled from a Gaussian population, what is the chance of randomly obtaining an outlier so far from the other values? If the P value is small, you conclude that the deviation of the outlier from the other values is statistically significant.
Google is your friend, you know. ;)
For your specific question of plotting, and your specific scenario of an average of 1-2 errors per day out of 150, the simplest thing might be to plot trimmed means, or the range of the middle 95% of values, or something like that. It really depends on what value you want out of the plot.
If you are really concerned with the true max and true of a day's prices, then you have to deal with the outliers as outliers, and properly exclude them, probably using one of the outlier tests previously proposed ( data point is x% more than next point, or the last n points, or more than 5 standard deviations away from the daily mean). Another approach is to view what happens after the outlier. If it is an outlier, then it will have a sharp upturn followed by a sharp downturn.
If however you care about overall trend, plotting daily trimmed mean, median, 5% and 95% percentiles will portray history well.
Choose your display methods and how much outlier detection you need to do based on the analysis question. If you care about medians or percentiles, they're probably irrelevant.

Resources