Front Loaded and Back Loaded | Normal Distribution Column Chart and S Curves in Excel - excel

Most of us may be aware of normal distribution curves however those who are new to front-loaded and back-loaded normal distribution, I would like to provide the background and then would proceed on stating my problem.
Front-Loaded Distribution: As demonstrated below, it have a rapid start. For e.g. in a project when more resources assumed to be consumed early in the project, cost/hours is distributed aggressively at the start of project.
Back-Loaded Distribution: Contrary to Front-Loaded distribution, it start out with a lower slope and increasingly steep towards the end of the project. For e.g. when most resources assumed to be consumed late in the project.
In the above charts, green line is S-Curve which represents cumulative distribution (utilization of resources over the proposed time) and the blue Columns represents the isolated distribution of resources (Cost/Hours) in that period.
For reference, I am providing the Bell Curve / standard normal distribution (when Mean=Median) chart (below) and the associated formula to begin with.
Problem Statement: I was able to generate the normal distribution curve (See below with formulae) however I am unable to find a solution for Front loaded or Back Loaded curves.
How to bring the skewness to the right (front-loaded / positively skewed distribution which means mean is greater than median) and left skewed (back-loaded / negatively skewed distribution which means mean is less than median) in a normal distribution?
Formula Explaned:
Cell B8 denotes arbitrarily chosen standard deviation. It affects the kurtosis of normal distribution. In the above screenshot, I am choosing the range of the normal distribution to be from -3SD to 3SD.
Cell B9 to B18 denotes the even distribution of Z-Score using the formula:
=B8-((2*$B$8)/Period)
Cell C9 to C18 denotes the normal distribution on the basis of Z Score and the Amount using the formula:
=(NORMSDIST(B9)-NORMSDIST(B8))*Amount/(1-2*NORMSDIST($B$8))
Update: Following one of the link in comment, I closest got to the below situation. The issue is highlighted in Yellow pattern as due to the usage of volatile Rand() function the charts are not smooth as they should be. As my given formula above do not create ZigZag pattern, I am sure we can have skewed normal distribution and smooth too !
Note:
I am using Excel 2016, so I welcome if any newly introduced formula can solve my problem. Also, I am not hesitant to use UDFs.
The numbers of front-load and back-load distribution are notional. They could vary. I am only interested in shape of resulting chart.
Kindly help !

You can generate the curve using below methods and can use the numbers generated by them for your requirement.
With formulae
The curve
Notes:
If you want to change the bins you have to drag the cells down or up
in order to complete the series
If you want to change the total cost, you can change the multiplier
If you want to change the tilting of the curve you can change the
divider in column C which is currently set to 2, if it is -2 the tilt
will change direction, you can experiment with different numbers,
the direction depends upon either it is less than zero or greater
than zero
For copy past
=A2+180/($G$3-1)
=RADIANS(A2)
=$G$4*SIN(B2 + SIN(B2)/2)

I used the actual mathematical formulas to arrive at the result. It looks like to me what you wanted to achieve. The orange cells in 'Skewed' section are the ones which can be changed to vary the degree and direction of skew. Some pictures for demonstration are below, followed by the formulas used.
Formulas in row 5, column
B:=(A5*$A$2)+0 (0 is the mean, you can change as you like)
C:=(1/($A$2* SQRT(2*PI())))*EXP(-(B5^2)/2)
D:=0.5*(1+ERF(B5/SQRT(2)))
E:=$A$1*C5
F: =(A5*$A$2*(1+$F$2*SIN((F4*PI())/(2*$F$4))))+0 (0 is the mean, you can change as you like)
G:=(1/($A$2* SQRT(2*PI())))*EXP(-((F5+$G$2)^2)/2)
H:=0.5*(1+ERF((B5+$G$2)/SQRT(2)))
I:=$A$1*G5

If you want to make sure the bins always have a value in them, you can use the following approach, which uses normal distributions and simply changes the mean and the standard deviation to get a curve that you want.
Changing the mean moves the peak to the left or right. Changing the standard deviation makes the quantities more uniform or more variable. I've used 0-1000 as my default range in the example below, but it should be easy to modify the formula to bring any value you want. NOTE in order to fulfill your requirement that all bins must be non-zero, you need to manually adjust the numbers till you get a curve that suits.
Yellow cells are for data entry, green cells are a count (so if you add bins, they would need to be numbered according to the sequence).
Formula in cell B7 (copied down to cell B16):
=NORMDIST($A7*1000/MAX($A$6:$A$17),$B$3,$B$4,TRUE)-NORMDIST($A6*1000/MAX($A$6:$A$17),$B$3,$B$4,TRUE)
Formula in cell C7 (copied down to cell C16):
=IF(A7=MAX($A$6:$A$17),$C$5-SUM(C$6:C6),ROUND(B7/SUM($B$7:$B$17)*$C$5,0))
Adding new bins is simple enough and is still based on a 0-1000 range, so you don't need to change any numbers other than adding rows and copying down the formulae:
The above example is also showing how a narrow standard deviation and a high mean combine to make the starting bins have very little quantity. But there is still a value (as long as count is big enough).
You may want to pre-define the different skewness selections if this is going to be used by other people (make column B dependent on a lookup, for example) but hopefully this is extensible enough for your needs.

If you are open to a Python answer the I can give you the code to get Python Pandas libary to generate the random observations from a skewed Normal and then bin (bucket) them for you. The following in a Python script which captures the use case but also can be created using COM and so creatable from VBA.
import numpy as np
import pandas as pd
from scipy.stats import skewnorm
class PythonSkewedNormal(object):
_reg_clsid_ = "{1583241D-27EA-4A01-ACFB-4905810F6B98}"
_reg_progid_= 'SciPyInVBA.PythonSkewedNormal'
_public_methods_ = ['GeneratePopulation','BinnedSkewedNormal']
def GeneratePopulation(self,a, sz):
# https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.random.seed.html
np.random.seed(10);
#https://docs.scipy.org/doc/scipy-0.19.1/reference/generated/scipy.stats.skewnorm.html
return skewnorm.rvs(a, size=sz).tolist();
def BinnedSkewedNormal(self,a, sz, bins):
# https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.random.seed.html
np.random.seed(10);
#https://docs.scipy.org/doc/scipy-0.19.1/reference/generated/scipy.stats.skewnorm.html
pop = skewnorm.rvs(a, size=sz); #.tolist();
bins2 = np.array(bins)
bins3 = pd.cut(pop,bins2)
table = pd.value_counts(bins3, sort=False)
table.index = table.index.astype(str)
return table.reset_index().values.tolist();
if __name__=='__main__':
print ("Registering COM server...")
import win32com.server.register
win32com.server.register.UseCommandLine(PythonSkewedNormal)
And the VBA client code
Option Explicit
Sub TestPythonSkewedNormal()
Dim skewedNormal As Object
Set skewedNormal = CreateObject("SciPyInVBA.PythonSkewedNormal")
Dim lSize As Long
lSize = 100
Dim shtData As Excel.Worksheet
Set shtData = ThisWorkbook.Worksheets.Item("Sheet3") '<--- change sheet to your circumstances
shtData.Cells.Clear
Dim vBins
vBins = Array(-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5)
'Stop
Dim vBinnedData
vBinnedData = skewedNormal.BinnedSkewedNormal(-5, lSize, vBins)
Dim rngData As Excel.Range
Set rngData = shtData.Cells(2, 1).Resize(UBound(vBins) - LBound(vBins), 2)
rngData.Value2 = vBinnedData
'Stop
End Sub
Sample output
(-5, -4] 0
(-4, -3] 0
(-3, -2] 4
(-2, -1] 32
(-1, 0] 57
(0, 1] 7
(1, 2] 0
(2, 3] 0
(3, 4] 0
(4, 5] 0
Original code deposited on my blog

Based on #usmanhaq 's ans, vba macro made for distribution curve simulation. Corrected for 100% scaling of front & backloading curve.
click here to go Github Lib

Related

Excel: How to find closest number in table, many times

Excel
Need to find nearest float in a table, for each integer 0..99
https://www.excel-easy.com/examples/closest-match.html explains a great technique for finding the CLOSEST number from an array to a constant cell.
I need to perform this for many values (specifically, find nearest to a vertical list of integers 0..99 from within a list of floats).
Array formulas don't allow the compare-to value (integers) to change as we move down the list of integers, it treats it like a constant location.
I tried Tables, referring to the integers (works) but the formula from the above web site requires an Array operation (F2, control shift Enter), which are not permitted in Tables. Correction: You can enter the formula, control-enter the array function for one cell, copy the formulas, then insert table. Don't change the search cell reference!
Update:
I can still use array operations, but I manually have to copy the desired function into each 100 target cells. No biggie.
Fixed typo in formula. See end of question for details about "perfection".
Example code:
AI4=some integer
AJ4=MATCH(MIN(ABS(Table[float_column]-AI4)), ABS(Table[float_column]-AI4), 0)
repeat for subsequent integers in AI5...AI103
Example data:
0.1 <= matches 0
0.5
0.95 <= matches 1
1.51 <= matches 2
2.89
Consider the case where target=5, and 4.5, 5.5 exist in the list. One gives -0.5 and the other +0.5. Searching for ABS(-.5) will give the first one. Either one is decent, unless your data is non-monotonic.
This still needs a better solution.
Thanks in advance!
I had another problem, which pushed to a better solution.
Specifically, since the Y values for the X that I am interested in can be at varying distances in X, I will interpolate X between the X point before and after. Ie search for less than or equal, also greater than or equal, interpolate the desired X, then interpolate the Y values.
I could go a step further and interpolate N - 1 to N + 1, which will give cleaner results for noisy data.

cplex/opl model - 4 index parameter - data sheet connection with excel

I'm a total beginner with CPLEX and OPL, so maybe you can help me with the coding of a mixed integer programming model.
In my case: I have an optimization function including a parameter transportation cost which are specific for the starting point (Hubs h), the destination (DCs i), the transported good (Products k) and the mode of transportation (TransportOptions r) used.
I wrote it like this:
float transportC_Hub_DC[Hubs][DCs][Products][TransportOptions] = ...;
//transport cost of one unit of gook k vor starting point h to destination i using transportation option r
I would like to fill this array with its multiple dimensions from an excel spread sheet. At the moment my spreadsheet has the four indexes in separate columns and the specifice transportation cost in another column. It looks like this:
Excel Datasheet
My problem is that I do not know how to make the programme understand how the transportation cost data are ordered. How does the programme know that in the first cell of the column "transportation cost" is the cost for the specific combination of the different indexes? So how do I tell the programme that I used h=1, i=1, k=1, r=1 in the first cell and h=1, i=1, k=1, r=2 in the second cell and not h=1, i=1, k=2, r=1 in the second cell? What do I have to write in the model or the data file in CPLEX to make this clear?
See technote http://www-01.ibm.com/support/docview.wss?rs=0&context=SSCMS55&uid=swg21401340&loc=en_US&cs=utf-8&cc=us&lang=all
The idea is to read a tuple set and then turn your tuple set into a 4D array.

Excel Calculating Relative Position

I'm new here, and I thought I would ask a question that certainly isn't found in the Microsoft Help Center and that I haven't been able to find a solution to either.
I am trying to calculate probability on things, and for the most part, Excel is very helpful in it. I'm running into problems though as I add additional variables.
My sheet currently is comparing dice rolls of 4 8-sided dice. 2 dice have certain symbols and the two other dice have different symbols. Some symbols negate each other, and in the end I come to a damage output number. When comparing 2 or 3 dice, the possible combinations are limited. 3 dice having 512 possibilities. With 4 now, there are 4096 possibilities and it's only going to get higher. This is why I need what I'm asking for.
Is there a way for a cell to understand is current position in reference to the block of cells it's currently in?
For example: I'm calculating a reroll possibility, but it will only happen half the time, meaning there are 12 possibilities of a single die with reroll option. So the current possibility table I'm developing is going to be 96 separate tables of 96 possible outcomes each. Table 1/1 is going to compare the first row of the 2 dice Attack roll table vs. the first row of the 2 dice Defense roll table. Row 1 Column 1 of this table is going to give the outcome of R1C1 of Attack table vs. R1C1 of Defense table. R1C2 of the table is going to give the outcome of R1C1 of Attack table vs. R1C2 of Defense table. R2C1 of this table will give the outcome of R1C2 of Attack table vs. R1C1 of Defense table, etc...
I know how to do the referencing to the tables, so I've made it so once I build one table, I can copy and paste it to build the other 96. But as I compare more dice rolls, this will quickly become too cumbersome to handle. If there is a way for a cell to understand where it's relative position is in a given block of cells (i.e. R2C1 of my example table understands that it is R2C1), it would cut down on my load immensely, and allow for me to continue building these probability tables so o can better understand tradeoffs in certain areas.
Any help is greatly appreciated.
Here's an Excel UDF I wrote for basic dice calculation %. It may not work directly for your example with negative/conditional outcomes, but it does have flexibility for testing more than one dice and the number of sides dice, so it might inspire you with some ideas. As previous comments suggested, if you gave exact parameters, you could probably get a specific example. My example returns a %. It currently only measures the probability of a single outcome, but you could do more than one formula per cell =DiceRollOdds(3,2)+DiceRollOdds(4,2) (to measure the probability of 3 and 4), or you can modify the code get something more specific.
Function DiceRollOdds(OutcomeToCheck As Integer, NumberOfDice As Integer, Optional SidesOnDice As Integer) As Double
Dim SuccessResult As Integer, FailedResult As Integer, SingleDice As Integer, RollResult As Integer
If SidesOnDice = 0 Then
SidesOnDice = 6
End If
Dim Rolls As Integer
For Rolls = 1 To (SidesOnDice ^ NumberOfDice)
RollResult = 0
For SingleDice = 0 To NumberOfDice - 1
RollResult = Int(Rolls / SidesOnDice ^ SingleDice) Mod SidesOnDice + 1 + RollResult
Next SingleDice
If RollResult = OutcomeToCheck Then
SuccessResult = SuccessResult + 1
Else
FailedResult = FailedResult + 1
End If
Next Rolls
DiceRollOdds = SuccessResult / (FailedResult + SuccessResult)
End Function

Using MS Excel to simulate a formula using a variable?

I'm very confused on how to use Microsoft Excel to simulate a "problem", but I've been assured that it's possible.
I have the equation
v(t) = (mg/c)(1-e^((-c/m)(t)))
And I know the values of m, g, k, and c.
m = 170
g = 32 ft/s^2
k = 2.5 lb/ft
c = 1.2 lb/ft/s
So my formula changes into
v(t) = (170*32/1.2)(1-e^((-1.2/170)(t)))
v(t) = (453.33)(1-e^((-.00705)(t)))
The problem is about a bungee jumper, and this is one function that I should use to find velocity, and another that is used for x (distance), but if I can learn how to properly implement this one, I should be able to easily figure out the other one.
I need to somehow implement this in Excel, as a spreadsheet simulation. I have no idea how to implement this in Excel, and I don't know the formulas to do it. I know I could just go through the formula manually and just substitute variables in for t (i.e., .5, 1, 1.5, 2, 2.5, ...), but I know there's supposed to be some way for Excel to do it for me. Additionally, I'm not sure how to simplify the powers and the "e" in my formula, and I actually don't know if I need it if I can just sub in variables like I think I can. Any help would be greatly appreciated.
EDIT: The other state equation, x(t), is below
x(t) = (mg/c)(t) + ((m^2 * g) / c^2))e^((-c/m)(t) - (m^2 * g / c^2))
This formula as mentioned in the OP:
v(t) = (453.33)(1-e^((-.00705)(t)))
needs a little adaptation, as suggested by #Tim Williams, to be suitable for Excel:
=453.33*(1-EXP(-.00705*t)
Excel does not multiply letters by numbers but will attempt to interpret t above as a named range (which may contain one or more numbers) before objecting. So t may be the name given to a range starting .5 and stepping .5 up to and including 10 (may easily be created with Fill , Series…).
If the above formula is then placed in the same row as 0.5 and copied down to suit the results should be as required.
It may however be worth noting that naming a range as a single letter is not best practice and for accuracy, convenience and versatility the constants (eg re gravity) and the variables (eg mass) would be better fed as parameters to the formula.

Binning in Excel

Which formulae in MS Excel can we use for -
equi-depth binning
equi-width binning
Here's what I used. The data I was binning was in A2:A2001.
Equi-width:
I calculated the width in a separate cell (U2), using this formula:
=(MAX($A$2:$A$2001) - MIN($A$2:$A$2001) + 0.00000001)/10
10 is the number of bins. The + 0.00000000001 is there because without it, values equal to the maximum were getting put into their own bin.
Then, for the actual binning, I used this:
=ROUNDDOWN(($A2-MIN($A$2:$A$2001))/$U$2, 0)
This function is finding how many bin-widths above the minimum your value is, by dividing (value - minimum) by the bin width. We only care about how many full bin-widths fit into the value, not fractional ones, so we use ROUNDDOWN to chop off all the fractional bin-widths (that is, show 0 decimal places).
Equi-depth
This one is simpler.
=ROUNDDOWN(PERCENTRANK($A$2:$A$2001, $A2)*10, 0)
First, get the percentile rank of the current cell ($A2) out of all the cells being binned ($A$2:$A$2001). This will be a value between 0 and 1, so to convert it into bins, just multiply by the total number of bins you want (I used 10). Then, chop off the decimals the same way as before.
For either of these, if you want your bins to start at 1 rather than 0, just add a +1 to the end of the formula.
Best approach is to use the built-in method:
http://support.microsoft.com/kb/214269
I think the VBA version of the addin (step 3 with most versions) will also give you the code.
Put this formula in B1:
=MAX( ROUNDUP( PERCENTRANK($A$1:$A$8, A1) *4, 0),1)
Fill down the formula all across B column and you are done. The formula divides the range into 4 equal buckets and it returns the bucket number which the cell A1 falls into. The first bucket contains the lowest 25% of values.
General pattern is:
=MAX( ROUNDUP ( PERCENTRANK ([Range], [TestCell]) * [NumberOfBuckets], 0), 1)
You may have to build the matrix to graph.
For the bin bracket you could use =PERCENTILE() for equi-depth and a proportion of the difference =Max(Data) - Min(Data) for equi-width.
You could obtain the frequency with =COUNTIF(). The bin's Mean could be obtained using =SUMPRODUCT((Data>LOWER_BRACKET)*(Data<UPPER_BRACKET)*Data)/frequency
More complex statistics could be reached hacking around with SUMPRODUCT and/or Array formulas (which I do not recommend since are very hard to comprehend for a non-programmer)

Resources