Generating random numbers with normal distribution in Excel - excel

I want to produce 100 random numbers with normal distribution (with µ=10, σ=7) and then draw a quantity diagram for these numbers.
How can I produce random numbers with a specific distribution in Excel 2010?
One more question:
When I produce, for example, 20 random numbers with RANDBETWEEN(Bottom,Top), the numbers change every time the sheet recalculates. How can I keep this from happening?

Use the NORMINV function together with RAND():
=NORMINV(RAND(),10,7)
To keep your set of random values from changing, select all the values, copy them, and then paste (special) the values back into the same range.
Sample output (column A), 500 numbers generated with this formula:

IF you have excel 2007, you can use
=NORMSINV(RAND())*SD+MEAN
Because there was a big change in 2010 about excel's function

As #osknows said in a comment above (rather than an answer which is why I am adding this), the Analysis Pack includes Random Number Generation functions (e.g. NORM.DIST, NORM.INV) to generate a set of numbers. A good summary link is at http://www.bettersolutions.com/excel/EUN147/YI231420881.htm.

Rand() does generate a uniform distribution of random numbers between 0 and 1, but the norminv (or norm.inv) function is taking the uniform distributed Rand() as an input to generate the normally distributed sample set.

About the recalculation:
You can keep your set of random values from changing every time you make an adjustment, by adjusting the automatic recalculation, to: manual recalculate. (Re)calculations are then only done when you press F9. Or shift F9.
See this link (though for older excel version than the current 2013) for some info about it: https://support.office.com/en-us/article/Change-formula-recalculation-iteration-or-precision-73fc7dac-91cf-4d36-86e8-67124f6bcce4.

Take a look at the Wikipedia article on random numbers as it talks about using sampling techniques. You can find the equation for your normal distribution by plugging into this one
(equation via Wikipedia)
As for the second issue, go into Options under the circle Office icon, go to formulas, and change calculations to "Manual". That will maintain your sheet and not recalculate the formulas each time.

Another interesting way to do this is using the Box-Muller Method. This lets you generate a normal distribution with mean of 0 and standard deviation σ (or variance σ2) of 1 using two uniform random distributions between 0 and 1. Then you can take this Norm(0,1) distribution and scale it to whatever mean and standard deviation you want.
Here's the formula in excel for a normal(0, 1) distribution:
=SQRT(-2*LN( RAND()))*COS(2 * PI()*RAND())
Then use this formula to scale your normal distribution to mean 10 and standard deviation of 7:
Norm(µ=b, σ=a) = a*Norm(µ=0, σ2=1) + b
This would make the equation in Excel:
=7* SQRT(-2*LN( RAND()))*COS(2 * PI()*RAND()) + 10
You can read more about the math behind this Box-Muller Equation on en.Wikipedia
Note that this equation only works if you calculate the cosine function using radians.

The numbers generated by
=NORMINV(RAND(),10,7)
are uniformally distributed. If you want the numbers to be normally distributed, you will have to write a function I guess.

Related

Adding a number to a date in excel (with an average and st dev)

I have a list of dates. They're dummy data for sign up dates.
I want to add another list that is dummy data for first usage dates.
To make the dates, I used this -
=RANDBETWEEN(DATE(2017,1,1),DATE(2017,6,30))
To make the first usages dates, I'm trying this -
=C6+RANDBETWEEN(0,100)
But, I don't want to just add a random number. I want to add a number from a normal distribution with a mean of 10 and a standard deviation of 30 (without going into negatives). Is that possible?
The following will generate a random number in normal distribution with a mean of 10 and deviation of 30:
= NORM.INV(RAND(),10,30)
The easiest way I can think of to exclude negative values is to just take the absolute value of this.
= ABS(NORM.INV(RAND(),10,30))
But, as already noted, if you exclude negative numbers (no matter how you decide to exclude them), then it isn't really a normal distribution anymore.
EDIT:
Another way to exclude negative numbers is the following:
Since NORM.INV(0.37,10,30) returns a value just above 0, you can use this knowledge to change the formula to only allow random values between 0.37 and 1 to be generated:
= NORM.INV(0.63*RAND()+0.37,10,30)
However, again I must point out this isn't a true normal distribution.

checking kurtois and skewnes for normal variable

i have generated random variable according to normal distribution in excel using technique of analysis toolpak
what i need to know is that why kurtois of this normal variable is not equal to 3? here excel formula for calculation of kurtois
=KURT(A2:A2001) and its corresponding value is equal to `0.14521546`
does it mean that 3 is only in case of infinity sample ? and for specific amount of sample range of amplitudes of kurtois varies between -1 and 1? here is also formula for calculation skewnes
=SKEW(A2:A2001)
-0.006510255
thanks in advance
Excel's KURT() function calculates excess kurtosis which is kurtosis-3.
Therefore it should be around 0 for a normally distributed random variable like in your case.

Left skewed data generator (Excel)

I am looking for a function in excel that will allow me to generator left skewed data with the option of picking a mean and standard deviation.
E.g If I want to generate 5 numbers with SD 2 and mean 45 , it must be 43,44,45,46,47.
I would also like to be able to pick the upper and lower limit. I don't want 0,44,46,46,90
This isn't an Excel question (i.e. it has nothing to do with Excel).
I don't think there's a standard function in any application or language that will give you what you need, you're basically asking for someone to develop an algorithm for you. If you want to do this in Excel you need some VBA code..
I don't mind having a crack at the algorithm but you need to be more specific with your parameters..
are there always 5 numbers returned? (this simplifies it hugely)
do they have to be integers?
also the SD of 43,44,45,46,47 is 1.414 not 2, 2 is the variance

How do I use a standard distribution to guess where the value falls in the future?

I have a mean value x and I want to model it into the future. I want to output a value of what it could be in 6 months. Assuming the value follows a normal distribution and we have the standard deviation how do I randomize the value x while following a normal distribution? I'm doing this in excel, but just understanding it would help too! Basically I want to produce numbers 68% of the time within 1 deviation, 95% of the time withing 2 deviation etc. etc.
You can use the excel function 'NORMINV' to convert a random input 'RAND()' to a normal distribution.
=NORMINV(RAND(),Mean,Std Dev)
i.e. if you repeat this many times, save and analyze the results, you'll see a bell curve over the input Mean value.
Does that get you started?
The tricky bit comes when you come up with the formula to predict what a value will be in the future using this.

Interpolating data points in Excel

I'm sure this is the kind of problem other have solved many times before.
A group of people are going to do measurements (Home energy usage to be exact).
All of them will do that at different times and in different intervals.
So what I'll get from each person is a set of {date, value} pairs where there are dates missing in the set.
What I need is a complete set of {date, value} pairs where for each date withing the range a value is known (either measured or calculated).
I expect that a simple linear interpolation would suffice for this project.
If I assume that it must be done in Excel.
What is the best way to interpolate in such a dataset (so I have a value for every day) ?
Thanks.
NOTE: When these datasets are complete I'll determine the slope (i.e. usage per day) and from that we can start doing home-to-home comparisons.
ADDITIONAL INFO After first few suggestions:
I do not want to manually figure out where the holes are in my measurement set (too many incomplete measurement sets!!).
I'm looking for something (existing) automatic to do that for me.
So if my input is
{2009-06-01, 10}
{2009-06-03, 20}
{2009-06-06, 110}
Then I expect to automatically get
{2009-06-01, 10}
{2009-06-02, 15}
{2009-06-03, 20}
{2009-06-04, 50}
{2009-06-05, 80}
{2009-06-06, 110}
Yes, I can write software that does this. I am just hoping that someone already has a "ready to run" software (Excel) feature for this (rather generic) problem.
I came across this and was reluctant to use an add-in because it makes it tough to share the sheet with people who don't have the add-in installed.
My officemate designed a clean formula that is relatively compact (at the expensive of using a bit of magic).
Things to note:
The formula works by:
using the MATCH function to find the row in the inputs range just before the value being searched for (e.g. 3 is the value just before 3.5)
using OFFSETs to select the square of that line and the next (in light purple)
using FORECAST to build a linear interpolation using just those two points, and getting the result
This formula cannot do extrapolations; make sure that your search value is between the endpoints (I do this in the example below by having extreme values).
Not sure if this is too complicated for folks; but it had the benefit of being very portable (and simpler than many alternate solutions).
If you want to copy-paste the formula, it is:
=FORECAST(F3,OFFSET(inputs,MATCH(F3,inputs)-1,1,2,1),OFFSET(inputs,MATCH(F3,inputs)-1,0,2,1
(inputs being a named range)
There are two functions, LINEST and TREND, that you can try to see which gives you the better results. They both take sets of known Xs and Ys along with a new X value, and calculate a new Y value. The difference is that LINEST does a simple linear regression, while TREND will first try to find a curve that fits your data before doing the regression.
The easiest way to do it probably is as follows:
Download Excel add-on here: XlXtrFun™ Extra Functions for Microsoft Excel
Use function intepolate().
=Interpolate($A$1:$A$3,$B$1:$B$3,D1,FALSE,FALSE)
Columns A and B should contain your input, and column G should contain all your date values. Formula goes into the column E.
A nice graphical way to see how well your interpolated results fit:
Take your date,value pairs and graph them using the XY chart in Excel (not the Line chart). Right-click on the resulting line on the graph and click 'Add trendline'. There are lots of different options to choose which type of curve fitting is used. Then you can go to the properties of the newly created trendline and display the equation and the R-squared value.
Make sure that when you format the trendline Equation label, you set the numerical format to have a high degree of precision, so that all of the significant digits of the equation constants are displayed.
The answer above by YGA doesn't handle end of range cases where the desired X value is the same as the reference range's X value. Using the example given by YGA, the excel formula would return #DIV/0! error if an interpolated value at 9999 was asked for. This is obviously part of the reason why YGA added the extreme endpoints of 9999 and -9999 to the input data range, and then assumes that all forecasted values are between these two numbers. If such padding is undesired or not possible, another way to avoid a #DIV/0! error is to check for an exact input value match using the following formula:
=IF(ISNA(MATCH(F3,inputs,0)),FORECAST(F3,OFFSET(inputs,MATCH(F3,inputs)-1,1,2,1),OFFSET(inputs,MATCH(F3,inputs)-1,0,2,1)),OFFSET(inputs,MATCH(F3,inputs)-1,1,1,1))
where F3 is the value where interpolated results are wanted.
Note: I would have just added this as a comment to the original YGA post, but I don't have enough reputation points yet.
alternatively.
=INDEX(yVals,MATCH(J7,xVals,1))+(J7-MATCH(J7,xVals,1))*(INDEX(yVals,MATCH(J7,xVals,1)+1)-INDEX(yVals,MATCH(J7,xVals,1)))/(INDEX(xVals,MATCH(J7,xVals,1)+1)-MATCH(J7,xVals,1))
where j7 is the x value.
xvals is range of x values
yvals is range of y values
easier to put this into code.
You can find out which formula fits best your data, using Excel's "trend line" feature. Using that formula, you can calculate y for any x
Create linear scatter (XY) for it (Insert => Scatter);
Create Polynominal or Moving Average trend line, check "Display Equation on
chart" (right-click on series => Add Trend Line);
Copy the equation into cell and replace x's with your desired x value
On screenshot below A12:A16 holds x's, B12:B16 holds y's, and C12 contains formula that calculates y for any x.
I first posted an answer here, but later found this question

Resources