How to generate random values (multinomial?) - excel

I have four groups with individual proportions: p1, p2, p3, p4. Taking a random sample from this, the sum of the four values should equal 20 for each generation.
I have seen how to create four random values that sum to a number, but how do I factor in that they have these initial probabilities of being chosen?
*******Maybe I should elaborate on the question. There are four groups of candy being chosen to create a package of 20. The proportions are 20%, 25%, 30%, and 25% of the candies on an assembly line. Candies are selected randomly from this process and placed in the packages of 20 pieces. I am to simulate the process of creating 1000 of these packages. All 1000 generations should add up to 20.****

Just wondering if we are talking about something like the below?
Perc =RAND() =B2*C2 =D2/SUM($D$2:$D$5) =E2*20
P1 5% 0.168440417 0.008422021 0.026888651 0.537773022
P2 15% 0.23130968 0.034696452 0.110773983 2.215479657
P3 25% 0.424406873 0.106101718 0.338746737 6.774934746
P4 55% 0.2981786 0.16399823 0.523590629 10.47181258
Where the formulas are pasted in row 2, and copied down?
After seeing the edits to your question, I'm thinking maybe the below would be better?
F3:Y1003 has the formula: =RANDBETWEEN(1,20)
A3 has the formula: =COUNTIFS($F5:$Y5,">=1",$F5:$Y5,"<=4") (equivalent of 20%)
B3 has the formula: =COUNTIFS($F5:$Y5,">=5",$F5:$Y5,"<=9") (equivalent of 25%)
C3 has the formula: =COUNTIFS($F5:$Y5,">=10",$F5:$Y5,"<=15") (equivalent of 30%)
D3 has the formula: =COUNTIFS($F5:$Y5,">=16",$F5:$Y5,"<=20") (equivalent of 25%)
Then copy A3:D3 down to A1003:D1003.

Related

Rank order data

I have the loan dataset below -
Sector
Total Units
Bad units
Bad Rate
Retail Trade
16
5
31%
Construction
500
1100
20%
Healthcare
165
55
33%
Mining
3
2
67%
Utilities
56
19
34%
Other
300
44
15%
How can I create a ranking function to sort this data based on the bad_rate while also accounting for the number of units ?
e.g This is the result when I sort in descending order based on bad_rate
Sector
Total Units
Bad units
Bad Rate
Mining
3
2
67%
Utilities
56
19
34%
Healthcare
165
55
33%
Retail Trade
16
5
31%
Construction
500
1100
20%
Other
300
44
15%
Here, Mining shows up first but I don't really care about this sector as it only has a total of 3 units. I would like construction, other and healthcare to show up on the top as they have more # of total as well as bad units
STEP 1) is easy...
Use SORT("Range","ByColNumber","Order")
Just put it in the top left cell of where you want your sorted data.
=SORT(B3:E8,4,-1):
STEP 2)
Here's the tricky part... you need to decide how to weight the outage.
Here, I found multiplying the Rate% by the Total Unit Rank:
I think this approach gives pretty good results... you just need to play with the formula!
Please let me know what formula you eventually use!
You would need to define sorting criteria, since you don't have a priority based on column, but a combination instead. I would suggest defining a function that weights both columns: Total Units and Bad Rate. Using a weight function would be a good idea, but first, we would need to normalize both columns. For example put the data in a range 0-100, so we can weight each column having similar values. Once you have the data normalized then you can use criteria like this:
w_1 * x + w_2 * y
This is the main idea. Now to put this logic in Excel. We create an additional temporary variable with the previous calculation and name it crit. We Define a user LAMBDA function SORT_BY for calculating crit as follows:
LAMBDA(a,b, wu*a + wbr*b)
and we use MAP to calculate it with the normalized data. For convenience we define another user LAMBDA function to normalize the data: NORM as follows:
LAMBDA(x, 100*(x-MIN(x))/(MAX(x) - MIN(x)))
Note: The above formula ensures a 0-100 range, but because we are going to use weights maybe it is better to use a 1-100 range, so the weight takes effect for the minimum value too. In such case it can be defined as follow:
LAMBDA(x, ( 100*(x-MIN(x)) + (MAX(x)-x) )/(MAX(x)-MIN(x)))
Here is the formula normalizing for 0-100 range:
=LET(wu, 0.6, wbr, 0.8, u, B2:B7, br, D2:D7, SORT_BY, LAMBDA(a,b, wu*a + wbr*b),
NORM, LAMBDA(x, 100*(x-MIN(x))/(MAX(x) - MIN(x))),
crit, MAP(NORM(u), NORM(br), LAMBDA(a,b, SORT_BY(a,b))),
DROP(SORT(HSTACK(A2:D7, crit),5,-1),,-1))
You can customize how to weight each column (via wu for Total Units and wbr for Bad Rates columns). Finally, we present the result removing the sorting criteria (crit) via the DROP function. If you want to show it, then remove this step.
If you put the formula in F2 this would be the output:

Is there a way to use the offset function in Excel to sum the contribution from multiple cohorts over time?

I am trying to find a formula that will generate the total profit for a number of cohorts that generate a different periodic profit per unit, without having to create a line item for each cohort.
In this example, the profit contributed by each widget over time is shown in row 3, and the number of widgets issued in each cohort is shown vertically in column B. Each unit will contribute $25 in the first period, $60 in the second period, and so on. So year 1 total profit would be 100 x $25 = $2,500. Then in year 2, the Y1 cohort would generate 100 x $60 and the Y2 cohort would generate 200 x $25 for a total year 2 profit of $11,000.
Does someone know of a method in Excel that would work to consolidate the total profit calculation each year into a single formula? I am trying to model multiple line items over many periods, so looking for a more efficient solution.
Edit: In case this helps clarify the question, below is an image showing an example of another inefficient way to solve the problem in one line for year 4 total profits, but this is still not scalable. Also shown in text below.
`Year 4 total profit =
Y1 units issued x P4 profit per unit +
Y2 units issued x P3 profit per unit +
Y3 units issued x P2 profit per unit +
Y4 units issued x P1 profit per unit`
inefficient solution
Office 365, in C17:
=SUM(INDEX($B7:$B15,SEQUENCE(COUNT($C3:C3)))*INDEX($C3:C3,SEQUENCE(COUNT($C3:C3),,COUNT($C3:C3),-1)))
and copied right.
Ah well, I've just written an answer compatible with lower versions of Excel:
=MMULT(TRANSPOSE(B7:B15)^0,IF(ROW(B7:B15)-ROW(B7)<=COLUMN(C3:K3)-COLUMN(C3),INDEX(C3:K3,COLUMN(C3:K3)-COLUMN(C3)-(ROW(B7:B15)-ROW(B7))+1)*B7:B15,0))
It could be done a bit more easily in Excel 365 using Sequence() instead of row() and column(), but the same principle - generate a 2D matrix by comparing row and column numbers, then obtain its column totals using a standard method with Mmult.
I've filled in the intermediate results in C7:K15, but you only need the formula in C17.

Summation Changes Based on Result

My objective is to do a sum of previous values based on ranges.
I have an index of 0 to 25. I have a base rate at index 0 to be 0.142857.
The final result is a percentage. I have 3 formula's I know...
Base Rate * 0.1 is the increment if the result is <30%
Base Rate * 0.02 is the increment if the result is <32%
If the result is above 32%, do nothing else, show it as a hard cap of 32%
So for example:
So the problem is that I hardcoded column C with the increment per index. I need this to be smart enough to know, hey if this would go above 30% then use the next formula and if it would go above 32% then cap it. Any ideas on how I could do this in Excel or Google Sheets without using scripts or VBA?
paste in B3 and drag down:
=IF(IF(B2< 30%, B2+($B$2*0.1)) < 30%, B2+($B$2*0.1),
IF(IF(B2< 32%, B2+($B$2*0.02))< 32%, B2+($B$2*0.02),
IF(IF(B2>=32%, B2+($B$2*0.02))>=32%, 32%)))
cell C2 could be: =ARRAYFORMULA({TEXT(B2:B27, "00.00%")})
=IF(B2< 30%, B2+($B$2*0.1),
IF(B2< 32%, B2+($B$2*0.02),
IF(B2>=32%, 32%)))
or combo:
=IF(B2< 30%, B2+($B$2*0.1),
IF(IF(B2< 32%, B2+($B$2*0.02))< 32%, B2+($B$2*0.02),
IF(IF(B2>=32%, B2+($B$2*0.02))>=32%, 32%)))

Getting average of top 1/3, second 1/3, and last 1/3 of values in column

I have a column with numbers and a reference column. I'm trying to separate the numbers column into first third, second third, and last third and take the average of each.
Values Ref column
1.7 cow
2.3 cow
2.6 cow
1.8 sheep
1.3 sheep
2.2 sheep
1.5 sheep
1.2 sheep
2.3 sheep
1.5 goose
2.5 goose
So, for example, the average of the first two values for "sheep", second two, and last two. In other words, I want to take the average of each 1/3 of cells adjacent to "sheep".
Add a column to cumulatively count the instances of the word you're looking at, then check that row number in your AVERAGE.
C2 = =CountIf($B$2:$B2, $B2) and fill down => values should be {1,2,3,1,2,3,4,5,6,1,2}
E1 = sheep
E2 = =CountIf($B:$B, $E$1) => 6
E3 = {=Average(If(($B:$B = $E$1) * ($C:$C <= $E$2 / 3), $A:$A))} (note this is an array formula, as designated by the {} around it) => 1.55
E3 = {=Average(If(($B:$B = $E$1) * ($C:$C > $E$2 / 3) * ($C:$C <= 2 * $E$2 / 3), $A:$A))} => 1.85
E3 = {=Average(If(($B:$B = $E$1) * ($C:$C > 2 * $E$2 / 3), $A:$A))} => 1.75
Array formulas, if I remember correctly, are entered the same as normal formulas (don't include the {}, that gets entered automatically), but you press Ctrl (and possibly Shift) with Enter when you finish.
NB - these look at the entire column. You can speed them up by changing $A:$A to $A$2:$A$12 (likewise for $B:$B and $C:$C). Just bear in mind that for any data you append to this list, you'll need to update the formulas; but you can insert data into the middle of the list and it will update them automatically.
use a formula like this:
=AVERAGE(INDEX(A:A,MATCH($D$2,B:B,0)+(D3-1)*COUNTIF(B:B,$D$2)/3):INDEX(A:A,MATCH($D$2,B:B,0)+((D3)*COUNTIF(B:B,$D$2)/3)-1))
This does require that the ref column be sorted and like references grouped.
This array formula will return the averages even if not sorted:
=AVERAGE(INDEX(INDEX(A:A,N(IF({1},MODE.MULT(IF($B$1:$B$12=$D$2,ROW($A$1:$A$12)*{1,1}))))),N(IF({1},ROW(INDEX(A:A,1+(D3-1)*COUNTIF(B:B,$D$2)/3):INDEX(A:A,D3*COUNTIF(B:B,$D$2)/3))))))
array formula need to be entered with Ctrl-Shift-enter instead of Enter when exiting edit mode.
Well supposing there were 7 sheep values and you wanted to do a weighted mean (e.g. the first mean would be calculated from the first two sheep plus a third of the third one)?
I have attempted a general solution for this dividing any number of animals into any number of fractions and finding their average values. My approach is to use the elegant overlap formula from #Barry Houdini as used here and work out the overlap between the intervals (in the case of 7 animals divided into 3):
0 to 2.33
2.33 to 4.67
4.67 to 7
and the numbers of the animals
0 to 1
1 to 2
2 to 3
and so on.
In H4
=IF(ROWS($1:1)<=$H$2,ROWS($1:1)/$H$2*COUNTIF(B$2:B$16,$G$2),"")
In G4
=IF(H4="","",H4-COUNTIF(B$2:B$16,$G$2)/$H$2)
The main formula in I4 is
=IF(H4="","",SUM(TEXT(IF(C$2:C$16<H4,C$2:C$16,H4)-IF((C$2:C$16-1)>G4,C$2:C$16-1,G4),"general;\0")
*A$2:A$16*(B$2:B$16=$G$2))/(COUNTIF(B$2:B$16,$G$2)/$H$2))
entered as an array formula.
The fractions can be changed to halves, quarters etc. by changing the number in H2.

Compare percentage value against decimal Excel

I am a bit stumped with this issue, I was wondering if anyone could suggest a solution. In Excel I have a table which looks like this:
1 2 3 4 5 Result Score
80% 85% 90% 95% 100% 92.5% 3.50
What I am trying to calculate is that proportional score, based on where the result falls within the preset decimal 1-5 score.
Thanks.
In your case where each increment is 5% you could use a simple calculation like
=MAX(0,F2-75%)*20
[where result is in F2]
....but assuming that you want to interpolate the score given potentially less linear values in your table try this formula where your table is in A1:E2
=LOOKUP(F2,A2:E2,A1:E1+(F2-A2:D2)*(B1:E1-A1:D1)/(B2:E2-A2:D2))
for linear interpolation this would be general formula, just name the ranges or replace with cell references:
= (perc - minperc) / (maxperc - minperc) * (maxscore - minscore) + minscore

Resources