How to generate random numbers within a normal distribution using Excel - excel

I want to use the RAND() function in Excel to generate a random number between 0 and 1.
However, I would like 80% of the values to fall between 0 and 0.2, 90% of the values to fall between 0 and 0.3, 95% of the values to fall between 0 and 0.5, etc.
This reminds me that I took an applied statistics course once upon a time, but not of what was actually in the course...
How is the best way to go about achieving this result using an Excel formula. Alternatively, what is this kind of statistical calculation called / any other pointers that I can Google around for.
=================
Use case:
I have a single column of meter readings, which I would like to duplicate 7 times (each column for a new month). each column has 55 000 rows. While the meter readings need to vary for each month, when taken as a time series, each meter number should have 7 realistic readings.
The aim is to produce realistic data to turn into heat maps (i.e. flag outlying meter readings)

I don't think that there is a formula which would fit exactly to your requirements. I would use a very straightforward solution:
Generate 80% of data using =RANDBETWEEN(0,20)/100
Generate 10% of data using =RANDBETWEEN(20,30)/100
Generate 5% of data using =RANDBETWEEN(30,50)/100
and so on
You can easily change the precision of generated data by modifying the parameters, for example: =RANDBETWEEN(0,2000)/10000 will generate data with up to 4 digits after decimal point.
UPDATE
Use a normal distribution for the use case, for example:
=NORMINV(RAND(), 20, 5)
where 20 is a mean value and 5 is a standard deviation.

Related

making big data set smaller in excel

I made a little test machine that accidentally created a 'big' data set:
6 columns with +/- 550.000 rows.
The end result I am looking for is a graph with 6 lines, horizontal axis 1 - 550.000 measurements and vertically the values in the rows. (capped at 200 or so). Data is a resistance measurement that should be between 0 - 30 or very big (borken), the software writes 'inf' in these cases.
My skill is limited to excel, so what have I done until now:
Imported in Excel. The measurements are valuable between 0 - 30 and inf is not good for a graph, so I did: if(cell>200){200}else{keep cell value}.
Now making a graph is a timely exercise and excel does not like this, result is not good.
So I would like to take the average value of 60 measurements to reduce the rows to below 10.000. So =AVERAGE(H1:H60)
But I cannot get this to work.
Questions:
How do I reduce this data set and get a good graph.
Should I switch
to other software that is more applicable?
FYI: I already changed the software of the testing device to take the average value of a bunch of measurements the next time... But I cannot repeat this test.
Download link of data set comma separated file 17MB
I think you are on the right track, however my guess is that you only want to get an average every 60 rows and are unsure how to do this.
Using MOD(Number, Divisor) inside an if statement will let you specify that the average should be calculated only once in every x number of cells.
Assuming you'll have one row above your data table for headers, you are looking for something along the lines of:
=IF(MOD(ROW(A61),60) = 1,AVERAGE(H2:H61),"")
Once you have this you can filter your average column to non-blank values and use this to create your graph.

How do I use a standard distribution to guess where the value falls in the future?

I have a mean value x and I want to model it into the future. I want to output a value of what it could be in 6 months. Assuming the value follows a normal distribution and we have the standard deviation how do I randomize the value x while following a normal distribution? I'm doing this in excel, but just understanding it would help too! Basically I want to produce numbers 68% of the time within 1 deviation, 95% of the time withing 2 deviation etc. etc.
You can use the excel function 'NORMINV' to convert a random input 'RAND()' to a normal distribution.
=NORMINV(RAND(),Mean,Std Dev)
i.e. if you repeat this many times, save and analyze the results, you'll see a bell curve over the input Mean value.
Does that get you started?
The tricky bit comes when you come up with the formula to predict what a value will be in the future using this.

How to generate random vallues in excel for arrival and service, with fixed mean?

i am simulating an M/M/1 queue in excel where i want to generate random values for arrival rate lambda and service rate (meu) in two columns such that :
arrival rate (lambda) = 2
service rate (meu) = 5 (but the average of meu always remains 1 in the column despite
generating the values for meu randomly using rand().
How can i generate random values using rand() for lambda and meu such that their averages can be restricted to a fixed value in the respective columns. I need to run the simulation for different values of utilization where utilization = lambda/meu ratio (i need to use 0.2, 0.4, 0.6 and 0.8 for lambda/meu). But the values in the column for both would be random and average of meu should be fixed at 1. I will change lambda for different utilization ratios then.
Excel lacks an exponential RV generating function, but you can generate your own with the desired rate via inversion: -log(U) / rate has an exponential distribution, where U is a uniform(0,1) random value and rate is the desired rate (mu or lambda).
That said, I really wouldn't recommend using Excel for anything other than toy problems unless you've read papers such as this one and understand the issues you may be dealing with.
As an aside in case you haven't seen it before, there's an easy way to model a first-come-first-served single-server queue using iterative logic.

How to obtain Incremental standard deviations from a set of standard deviations?

I have a data set containing three columns, first column represents number of trials, second column represents experimental values, and the third column represents corresponding standard deviation.
With each experiment there is an increment in my experimental values. To get the incremental values, I hold my first value as the reference value and subtract this reference value from each subsequent value and use them to create fourth column of these incremental values.
My problem begins right from here. How do I create a new set of incremental standard deviations for the incremental experimental values I got? My apology if the problem is not well defined but hopefully someone will eventually be able to help me out. Many thanks!
Below is my data set,
Trial Mean SD Incr Mean Incre SD
1 45.311 4.668 0
2 56.682 2.234 11.371
3 62.197 2.266 16.886
4 70.550 4.751 25.239
5 80.528 4.412 35.217
6 87.453 4.542 42.142
7 89.979 2.185 44.668
8 96.859 3.476 51.548
To be clear, for other readers, your incremental mean is actually the difference between trial 1 and the other trials.
Variances add directly when you subtract (or add) independent normal distributions. So you first want to convert that standard deviation to a variance by squaring it, and then you can add the variances, and then you can take the square root to turn it back into a standard deviation. Note when using this kind of Pythagorean combination, you are assuming that trial 1 is independent from the trials, so for example, you cannot do things like have some sample in both trials.
Logically this makes sense that your so called "incremental SD" will always be greater than the individual SDs, since the uncertainty of both distributions contributes towards the uncertainty of the difference.

Analyzing how noisy a data set using Excel

I have a set of data that has over 15,000 records in Excel that is from a measurement tool that finds trends over a large areas. I'm not interested in looking for trends within the data as whole but rather over the data closest to each other to get a sense of how noisy (variation with neighboring records). Almost like I want to know the average standard deviation of looking at the 15,000 or so records only at 20 records at a time. The hope is the data values trend gradually rather than sudden changes from record to record and thus looks noisy. If I add a Chart and use the "Moving Average" Trendline it kind of visually shows how noisy the data looks across the 15,000 + records. However, I was hoping to get a numeric value to rate how noisy the data is vs. other datasets. Any ideas on what I could do here with formula's built-in Excel or by adding some add-in? Let me know if I need to explain this any better.
Could you calculate your moving average for your 20 sample window, then use the difference between each point and the expected value to calculate a variance?
Hard to do tables here, but here is a sample of what I mean
Actual Measured Expected Variance
5 5.44 4.49 0.91
6 4.34 5.84 2.26
7 8.45 7.07 1.90
8 6.18 7.84 2.75
9 8.89 9.10 0.04
10 11.98 10.01 3.89
The "measured" values were determined as
measured = actual + (rand() - 0.5) * 4
The "expected" values were calculated from a moving average (the table was pulled from the middle of the data set).
The variance is simply the square of expected minus measured.
Then you could calculate an average variance as a summary statistic.
Moving average is the correct, but you need a critical element - order. Do you date/time variable or a sequence number?
Use the OFFSET function to setup your window. If you want 20, your formula will look something like AVERAGE(OFFSET(C15,-10,0,21)). This is your moving average.
Relate that to C15, whether additive or multiplicative, you'll have your distance. All we need now is your tolerance.

Resources