Excel Random number from a set of options - excel-formula

In MS Excel, how can I randomly sum up to a target number with numbers divisible by 5?
For example, I would like a completely random output of numbers divisible by 5 (5,10,15,20….) in cells B1:B100, to add up to 10000.
I initially looked at the CHOOSE(RANDBETWEEN) option but I can't get up to make the numbers add up to 10000.

In Office 365, B1 enter the formula:
=LET(
rndArr,RANDARRAY(100,1),
Correction,INT(SEQUENCE(100,1,1,-1/100)),
INT(rndArr/SUM(rndArr)*2000)*5+IF(Correction=1,10000-SUM(INT(rndArr/SUM(rndArr)*2000)*5),0))
EDIT: the below added in response to the comment about constraining it to a min/max. It's not actually foolproof for all min/max values, but seemed to work well enough for me with the values you supplied.
=LET(
Total, 10000,
Min, 10, Max, 300,
rndArr, RANDARRAY(100, 1),
Correction, SEQUENCE(100, 1, 1, 1) = MATCH(MIN(rndArr), rndArr, 0),
rndArr5, INT(rndArr/SUM(rndArr)*Total/5)*5,
rndArrMinMax, IFS(rndArr5 < Min, Min, rndArr5 > Max, Max, TRUE, rndArr5),
rndArrMinMax + (Total-SUM(rndArrMinMax)) * Correction
)
Explanation of what that does:
Enter Total, Min and Max variables
create rndArr, an array of random numbers (that is the correct size, 100 rows x 1 col)
create Correction, a boolean array of the same size as rndArr where the only TRUE value is the position of the smallest value in rndArr. This is because we'll need to add a figure in later to ensure the total is correct, and want to add it to the smallest number in the array (best possible chance that it won't go above our maximum, remember I said this wasn't foolproof for all values).
create rndArr5, which proportionately increases rndArr so it totals 2000, rounds down to nearest integers, then multiplies by 5. The result is an array of random multiples of 5 that totals somewhere below 10000
create rndArrMinMax by checking rndArr5 (our progress so far) against desired min and max values, editing any outside of our desired range to be the min or max value respectively.
Final output value is that corrected value, plus any difference to make the correct total (that's Total - SUM(rndArrMinMax), which is multiplied by our Correction boolean array so it only gets added on the smallest value in the array. Again, this may result in that smallest value going over the max if the totals are way out and/or the Max is very small, but there's not much you can do about that with random numbers.

Related

How can I use simulation tool in Excel for solving the following problem related to probability?

Trial Number 1 2 3 4 5 ........ 2000000 (two million)
Success in nth attempt 12 4 21 5 10 12
Note: Imagine throwing a dice where each outcome has probability of 1/10 (not 1/6 as it is usual for dice). For us "success" means throwing a "3". For each trial (see above) we keep throwing dice until we get "3". For example, above I assume that during first trial we threw dice 12 times and could get "3" only on 12th attempt. The same for other trials. For instance, on 5th trial we threw dice 10 times and could get "3" only on 10th attempt.
We need to simulate this for 2 million times (or lower than that, let's say 500,000 times).
Eventually we need to calculate what percent of "success" happens in interval of 10-20 tries, 1-10 tries etc.
For example, out of 2000000 trials in 60% of cases (1200000) we get "3" in between 10th and 20th attempts of throwing a dice.
Can you please help?
I performed a manual simulation, but could not create a simulation model. Can you please help?
This might be not a good solution for a large dataset as is your intent. Probably Excel is not the most efficient tool for that. Anyway here is a possible approach.
In cell A1, put the following formula:
=LET(maxNum, 10, trialNum, 5, maxRep, 20, event, 3, cols, SEQUENCE(trialNum,1),
rows, SEQUENCE(maxRep, 1), rArr, RANDARRAY(maxRep, trialNum, 1, maxNum, TRUE),
groupSize, 10, startGroups, SEQUENCE(maxRep/groupSize, 1,,groupSize),
GROUP_PROB, LAMBDA(matrix,start,end, LET(result, BYCOL(matrix, LAMBDA(arr,
LET(idx, IFERROR(XMATCH(event, arr),0), IF(AND(idx >= start, idx <= end), 1, 0)))),
AVERAGE(result))),
HREDUCE, LAMBDA(arr, LET(idx, XMATCH(event, arr),
IF(ISNUMBER(idx), FILTER(arr, rows <= idx),event &" not found"))),
trials, DROP(REDUCE(0, cols, LAMBDA(acc,col, HSTACK(acc,
HREDUCE(INDEX(rArr,,col))))),,1),
dataBlock, VSTACK("Trials", trials),
probBlock, DROP(REDUCE(0, startGroups, LAMBDA(acc,ss,
VSTACK(acc, LET(ee, ss+groupSize-1, HSTACK("%-Group "&ss&"-"&ee,
GROUP_PROB(trials, ss, ee))
))
)),1),
IFERROR(HSTACK(dataBlock, probBlock), "")
)
And here is the output:
Explanation
We use LET for easy reading and composition. We first define the parameters of the experiment:
maxNum, the maximum random number to be generated. The minimum will be 1.
trialNum, the number of trials (in our case the number of columns)
maxRep, the maximum number of repetitions in our case the number of rows.
rows and cols rows and columns respectively
event, our successful event, in our case 3.
groupSize, The size of each group for calculating the probability of each group
startGroups The start index position of each group
rArr, Random array of size maxRep x trialNum. The minimum random number will be 1 and the maximum maxNum. The last input argument of RANDARRAY ensures it generates only integer numbers.
GROUP_PROB is a user LAMBDA function to calculate the probability of our successful event: number 3 was generated.
LAMBDA(matrix,start,end, LET(result, BYCOL(matrix, LAMBDA(arr,
LET(idx, IFERROR(XMATCH(event, arr),0), IF(AND(idx >= start, idx <= end), 1, 0)))),
AVERAGE(result)))
Basically, for each column (arr) of matrix, finds the index position of the event and check if the index position belongs to the reference interval: start, end, if so return 1, otherwise 0. Finally, the AVERAGE function serves to calculate the probability. If the event was not generated, then it counts as 0 too.
We use the DROP/REDUCE/VSTACK or HSTACK pattern. Please check the answer to the question: how to transform a table in Excel from vertical to horizontal but with different length provided by #DavidLeal.
HREDUCE user LAMBDA function filters the rArr until the event is found. In case the event was not found, then it returns a string indicating the event was not found.
The name probBlock builds the array with all the probability groups

What's the logic behind PERCENTILE.INC Excel function?

I would like to know how does Excel think to calculate the values on the function PERCENTILE.INC. I'm making some studies on Percentile and Quartile, I got the below results:
How does Excel think to calculate the values on column F?
Here's the formulas I'm using:
=PERCENTILE.INC(B2:B21; 0,75) ==> F2
=PERCENTILE.INC(B2:B21; 0,50) ==> F3
=PERCENTILE.INC(B2:B21; 0,25) ==> F4
=PERCENTILE.INC(B2:B21; 0,00) ==> F5
Short answer - the position of a given percentile when the data is sorted in ascending order, using percentile.inc, is given by
(N-1)p+1
where p is the required percentile as a fraction from 0 to 1 and N is the number of points.
If this expression gives a whole number, you take the value at this position (e.g. percentile zero gives 1, so its value is exactly 22). If it's not a whole number, you interpolate between the value at the position given by the whole number part (e.g. for p=0.25 it's 5 and the value at this position is 52) and the value at the position one higher (in this case position 6 so the number is 55), then multiply the difference of the two values (3) by the fraction part (0.75) giving you 2.25 and finally add this to the lower of the two values giving you 54.25. A shorter way of saying this is that you go a quarter of the way between the two nearest values. So you have:
If you wished to show the logic as an Excel formula, you could implement the expression shown here on the right (where h, in the second column of the table, is the position calculated from the formula above and x is the value at that position)
like this:
=LET(p,J3,
range,I$2:I$21,
N,COUNT(range),
position,(N-1)*p+1,
lower,FLOOR(position,1),
fraction,MOD(position,1),
upper,CEILING(position,1),
lowerValue,INDEX(range,lower),
upperValue,INDEX(range,upper),
difference,upperValue-lowerValue,
lowerValue+fraction*difference)

How can I express easily a formula that has a lot of nesting Ifs

I want to express a formula that says if a number in a column is 50 to 99, then return 50. If 100-149, then return 100, 150-199, then return 150, etc, etc. I need a more concise way to do that for numbers that could reach 2000 (in 50 increments).
Right now my formula is =if(and >50 <100),50,if >100,100,true,0) or something like that, I can't see if right now.
There's probably a faster way, but here's what I would do:
Create a new column that rounds down to the nearest 50:
Assume the numbers are in Column A:
=CONCAT(FLOOR(A2,50),"-",IF(FLOOR(A2,100)-1<FLOOR(A2,50),FLOOR(A2,100)+99,FLOOR(A2,100)-1))
This will produce, for every row, the nearest 50 and nearest 100-1. Also, it allows you to go to 10,000, 50,000, 100,000 and never have to change this formula.
The only thing is adding another nested if for any number below 50, but that's up to you. Otherwise, it shows as 0-99 for any number under 50 and 50-99 for any number below 99 but above 50.
Edit
I found out, after all that work, that you just wanted it rounded down to the nearest 50. Just use =FLOOR(A2, 50)
Divide the number by 50, then multiply the integer of that by 50:
=INT(A1/50)*50
Or subtract half the number and use MROUND:
=MROUND(A1-25,50)

Calculating an average while ignoring certain values

Let's say I have a set of numbers e.g. [10,45,3,0,0,0,27] and I want to average every number that isn't a 0. So in this case it would be (10 + 45 + 3 + 27) / 4. How can I do this in excel, given that I will change the 0's to non-0's at some point, so the average will need to be updated?
=AVERAGEIF(Range,"<>"&0)
Just use Excel's =COUNTIF(Range, Criteria) function, e.g. in your case =COUNTIF(A1:A10,**">0"**) to count every value greater than 0 and then divide by the number.
Also this might help: https://www.ablebits.com/office-addins-blog/2014/07/02/excel-countif-examples/

Binning in Excel

Which formulae in MS Excel can we use for -
equi-depth binning
equi-width binning
Here's what I used. The data I was binning was in A2:A2001.
Equi-width:
I calculated the width in a separate cell (U2), using this formula:
=(MAX($A$2:$A$2001) - MIN($A$2:$A$2001) + 0.00000001)/10
10 is the number of bins. The + 0.00000000001 is there because without it, values equal to the maximum were getting put into their own bin.
Then, for the actual binning, I used this:
=ROUNDDOWN(($A2-MIN($A$2:$A$2001))/$U$2, 0)
This function is finding how many bin-widths above the minimum your value is, by dividing (value - minimum) by the bin width. We only care about how many full bin-widths fit into the value, not fractional ones, so we use ROUNDDOWN to chop off all the fractional bin-widths (that is, show 0 decimal places).
Equi-depth
This one is simpler.
=ROUNDDOWN(PERCENTRANK($A$2:$A$2001, $A2)*10, 0)
First, get the percentile rank of the current cell ($A2) out of all the cells being binned ($A$2:$A$2001). This will be a value between 0 and 1, so to convert it into bins, just multiply by the total number of bins you want (I used 10). Then, chop off the decimals the same way as before.
For either of these, if you want your bins to start at 1 rather than 0, just add a +1 to the end of the formula.
Best approach is to use the built-in method:
http://support.microsoft.com/kb/214269
I think the VBA version of the addin (step 3 with most versions) will also give you the code.
Put this formula in B1:
=MAX( ROUNDUP( PERCENTRANK($A$1:$A$8, A1) *4, 0),1)
Fill down the formula all across B column and you are done. The formula divides the range into 4 equal buckets and it returns the bucket number which the cell A1 falls into. The first bucket contains the lowest 25% of values.
General pattern is:
=MAX( ROUNDUP ( PERCENTRANK ([Range], [TestCell]) * [NumberOfBuckets], 0), 1)
You may have to build the matrix to graph.
For the bin bracket you could use =PERCENTILE() for equi-depth and a proportion of the difference =Max(Data) - Min(Data) for equi-width.
You could obtain the frequency with =COUNTIF(). The bin's Mean could be obtained using =SUMPRODUCT((Data>LOWER_BRACKET)*(Data<UPPER_BRACKET)*Data)/frequency
More complex statistics could be reached hacking around with SUMPRODUCT and/or Array formulas (which I do not recommend since are very hard to comprehend for a non-programmer)

Resources