What is this formula trying to prove? - excel-formula

I have a large spreadsheet with a number of forumlas and they all make complete sense apart from one, which is listed below. Does anyone have any idea what this NORMALDIST calculation is trying to acheive or tell me? It has relevants to HE
=MAX(1,NORMDIST(3,N18,N18/4,TRUE)-NORMDIST(0,N18,N18/4,TRUE) + 2*(NORMDIST(6,N18,N18/4,TRUE)-NORMDIST(3,N18,N18/4,TRUE)) + 3*(NORMDIST(9,N18,N18/4,TRUE)-NORMDIST(6,N18,N18/4,TRUE)) + 4*(NORMDIST(12,N18,N18/4,TRUE)-NORMDIST(9,N18,N18/4,TRUE)) + 5*(NORMDIST(15,N18,N18/4,TRUE)-NORMDIST(12,N18,N18/4,TRUE)) + 6*(NORMDIST(18,N18,N18/4,TRUE)-NORMDIST(15,N18,N18/4,TRUE)) + 7*(NORMDIST(21,N18,N18/4,TRUE)-NORMDIST(18,N18,N18/4,TRUE)) + 8*(NORMDIST(24,N18,N18/4,TRUE)-NORMDIST(21,N18,N18/4,TRUE)) + 9*(NORMDIST(27,N18,N18/4,TRUE)-NORMDIST(24,N18,N18/4,TRUE)) + 10*(NORMDIST(30,N18,N18/4,TRUE)-NORMDIST(27,N18,N18/4,TRUE)) + 11*(NORMDIST(33,N18,N18/4,TRUE)-NORMDIST(30,N18,N18/4,TRUE)) + 12*(NORMDIST(36,N18,N18/4,TRUE)-NORMDIST(33,N18,N18/4,TRUE)) + 13*(NORMDIST(39,N18,N18/4,TRUE)-NORMDIST(36,N18,N18/4,TRUE)) + 14*(NORMDIST(42,N18,N18/4,TRUE)-NORMDIST(39,N18,N18/4,TRUE)) + 15*(NORMDIST(45,N18,N18/4,TRUE)-NORMDIST(42,N18,N18/4,TRUE)) + 16*(NORMDIST(48,N18,N18/4,TRUE)-NORMDIST(45,N18,N18/4,TRUE)) + 17*(NORMDIST(51,N18,N18/4,TRUE)-NORMDIST(48,N18,N18/4,TRUE)) + 18*(NORMDIST(54,N18,N18/4,TRUE)-NORMDIST(51,N18,N18/4,TRUE)) + 19*(NORMDIST(57,N18,N18/4,TRUE)-NORMDIST(54,N18,N18/4,TRUE)) + 20*(NORMDIST(60,N18,N18/4,TRUE)-NORMDIST(57,N18,N18/4,TRUE)) + 21*(NORMDIST(63,N18,N18/4,TRUE)-NORMDIST(60,N18,N18/4,TRUE)) + 22*(NORMDIST(66,N18,N18/4,TRUE)-NORMDIST(63,N18,N18/4,TRUE)) + 23*(NORMDIST(69,N18,N18/4,TRUE)-NORMDIST(66,N18,N18/4,TRUE)))
Strange question I know, but could not think of where else to ask!!!
Cheers

The equation has a series of terms of the form N*[NORMDIST(3N,mu,sigma)-NORMDIST(3N-3,mu,sigma)] where mu is the mean (N18 in the equation), sigma is the standard deviation (N18/4), and with N going from 1 to 23. This appears to be an estimate involving the average of the normal distribution. It would be more rigorous for N to go from minus infinity to plus infinity and it's not clear why this formula truncated the interval to 1..23. Nevertheless, if the person who wrote the equation was calculating the average, then from the properties of the normal distribution you can derive a closed form solution as:
Total of all NORMDIST terms = mu/3 + 1/2
This will be accurate as long as mu (N18) is in the between 0 and 30. If you plug this into the equation you get
=MAX(1,N18/3+0.5)
Hope that helps.

From the docs...
NORMDIST function
Excel for Office 365 Excel for Office 365 for Mac Excel 2019 Excel 2016 More...
Returns the normal distribution for the specified mean and standard deviation. This function has a very wide range of applications in statistics, including hypothesis testing.
Important: This function has been replaced with one or more new functions that may provide improved accuracy and whose names better reflect their usage. Although this function is still available for backward compatibility, you should consider using the new functions from now on, because this function may not be available in future versions of Excel.
For more information about the new function, see NORM.DIST function.

Related

Python erasure coding library that can handle larger strings?

I'm looking for an python erasure coding library that works for larger inputs. So far I've checked out:
unireedsolomon: fails for 256-byte inputs, unmaintained
reedsolo/reedsolomon: fails for a 300-byte input silently.
Reed-Solomon clearly a learning project, bug tracker disabled
pyeclib: fails for 100-byte input using reed-solomon encoding, and doesn't seem to provide any documentation on valid parameters, such that I couldn't figure out how to test other algorithms (nor does liberasurecode)
I want something that can handle n=10,000 k=2,000 or so, ideally larger.
Only the field polynomial has to be prime or primitive, not the generating polynomial. If you wanted a RS(10000, 8000, 2000) (n = 10000, k = 8000, n-k = 2000) code, GF(2^16) with primitive reducing polynomial x^16 + x^12 + x^3 + x^1 + 1 could be used. The generating polynomial would be of degree 2000. Assuming first consecutive root is 2, then generating polynomial = (x-2)(x-4)(x-8) ... (x-2^2000) (all of this math done in GF(2^16), + and - are both xor). Correction would involve generating 2000 syndromes and using Berlekamp Massey or Sugiyama's extended Euclid decoder. I don't know if there are Python libraries that support GF(2^16).
https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction#Berlekamp%E2%80%93Massey_decoder
https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction#Euclidean_decoder
Large n and k can be avoided by interleaving. Tape drives like LTO treat large data blocks as matrices interleaved across rows (called C1) and down columns (called C2) using GF(2^8). LTO-8 uses RS(249,237,13), which I assume is the ECC used down columns to correct rows. With 32 read|write heads, there's an interleave of 32, probably across rows. I don't know what the RS() code is across rows, or what the interleave down columns is.

Pyomo: define objective Rule based on condition

In a transport problem, I'm trying to insert the following rule into the objective function:
If a supply of BC <19,000 tons, then we will have a penalty of $ 125 / MT
I added a constraint to check the condition but would like to apply the penalty in the objective function.
I was able to do this in Excel Solver, but the values ​​do not match. I've already checked both, and debugged the code, but I could not figure out what's wrong.
Here is the constraint:
def bc_rule(model):
return sum(model.x[supplier, market] for supplier in model.suppliers \
for market in model.markets \
if 'BC' in supplier) >= 19000
model.bc_rules = Constraint(rule=bc_rule, doc='Minimum production')
The problem is in the objective rule:
def objective_rule(model):
PENALTY_THRESHOLD = 19000
PENALTY_COST = 125
cost = sum(model.costs[supplier, market] * model.x[supplier, market] for supplier in model.suppliers for market in model.markets)
# what is the problem here?
bc = sum(model.x[supplier, market] for supplier in model.suppliers \
for market in model.markets \
if 'BC' in supplier)
if bc < PENALTY_THRESHOLD:
cost += (PENALTY_THRESHOLD - bc) * PENALTY_COST
return cost
model.objective = Objective(rule=objective_rule, sense=minimize, doc='Define objective function')
I'm getting a much lower value than found in Excel Solver.
Your condition (if) depends on a variable in your model.
Normally, ifs should never be used in a mathematical model, and that is not only for Pyomo. Even in Excel, if statements in formulas are simply converted to scalar value before optimization, so I would be very careful when saying that it is the real optimal value.
The good news is that if statements are easily converted into mathematical constraints.
For that, you need to add a binary variable (0/1) to your model. It will take the value of 1 if bc <= PENALTY_TRESHOLD. Let's call this variable y, and is defined as model.y = Var(domain=Binary).
You will add model.y * PENALTY_COST as a term of your objective function to include the penalty cost.
Then, for the constraint, add the following piece of code:
def y_big_M(model):
bigM = 10000 # Should be a big number, big enough that it will be bigger than any number in your
# model, but small enough that it will stay around the same order of magnitude. Avoid
# utterly big number like 1e12 and + if you don't need to, since having numbers too
# large causes problems.
PENALTY_TRESHOLD = 19000
return PENALTY_TRESHOLD - sum(
model.x[supplier, market]
for supplier in model.suppliers
for market in model.markets
if 'BC' in supplier
) <= model.y * bigM
model.y_big_M = Constraint(rule=y_big_M)
The previous constraint ensures that y will take a value greater than 0 (i.e. 1) when the sum that calculates bc is smaller than the PENALTY_TRESHOLD. Any value of this difference that is greater than 0 will force the model to put 1 in the value of variable y, since if y=1, the right hand side of the constraint will be 1 * bigM, which is a very big number, big enough that bc will always be smaller than bigM.
Please, also check your Excel model to see if your if statements really works during the solver computations. Last time I checked, Excel solver do not convert if statements into bigM constraints. The modeling technique I showed you works for absolutely all programming method, even in Excel.

Excel VBA: Implementing Box Muller, Zigurrat and Ratio of Uniforms Algorithm

This is a very specific question mixing up stochastic knowledge and VBA skills. So very exciting!
I'm trying to compare several methods for generating standard, normally distributed numbers given a source of uniformly distributed random numbers. Therefore I'm implementing the Box Muller Algorithm, Ziggurat Algorithm and Ratio of Uniforms Algorithm. Every single implementation works great in terms of generating a clean standard, normally distribution. (checked by Shapiro-Wilk-Test).
What I want to find out: which is the quickest method?
Testing every single program with a total of 10^7 generated numbers these are the run times:
Box-Muller: 3,7 seconds
Ziggurat: 1,28 seconds
Ratio of Uniforms: 10,77 seconds
Actually I am very happy about those readings, because I didn't expect it to be that fast. Of course the run time of every single method depends as well on my programm-skills and VBA knowledge.
My problem: after doing some research I found out that the Ratio of Uniforms Algorithm should be the quickest (about 3 to 4 times quicker than Box Muller). This information just leans on this stack:
I am curious if this is just a wrong claim of this user or (what I do expect more) if my code is not perfectly implemented. Therefore I'll post my code and hope someone could help me with my question, if my code is just not good enough or if the Ratio of Uniforms just doesn't work that quick as mentioned.
Sub RatioUniforms()
Dim x(10000000) As Double
Dim passing As Long
Dim amount As Long: amount = 10000000
Dim u1 As Double
Dim u2 As Double
Dim v2 As Double
Do While passing <= amount
Do
u1 = Rnd 'rnd= random number(0,1)
Loop Until u1 <> 0 'u1 musn't become 0
v2 = Rnd
u2 = (2 * v2 - 1) * (2 * exp(-1)) ^ (1 / 2)
If u1 ^ 2 <= exp(-1 / 2 * u2 ^ 2 / u1 ^ 2) Then
x(passing) = u2 / u1
passing = passing + 1
End If
Loop
End Sub
Thank you very much helping me on this toppic. Maybe some of you have tried those algorithms in VBA or other languages and can help me with there experience about the run time? If you need something else to know about my other implementations just let me know. Have a great day!

Numerical differentiation using Cauchy (CIF)

I am trying to create a module with a mathematical class for Taylor series, to have it easily accessible for other projects. Hence I wish to optimize it as far as I can.
For those who are not too familiar with Taylor series, it will be a necessity to be able to differentiate a function in a point many times. Given that the normal definition of the mathematical derivative of a function will require immense precision for higher order derivatives, I've decided to use Cauchy's integral formula instead. With a little bit of work, I've managed to rearrange the formula a little bit, as you can see on this picture: Rearranged formula. This provided me with much more accurate results on higher order derivatives than the traditional definition of the derivative. Here is the function i am currently using to differentiate a function in a point:
def myDerivative(f, x, dTheta, degree):
riemannSum = 0
theta = 0
while theta < 2*np.pi:
functionArgument = np.complex128(x + np.exp(1j*theta))
secondFactor = np.complex128(np.exp(-1j * degree * theta))
riemannSum += f(functionArgument) * secondFactor * dTheta
theta += dTheta
return factorial(degree)/(2*np.pi) * riemannSum.real
I've tested this function in my main function with a carefully thought out mathematical function which I know the derivatives of, namely f(x) = sin(x).
def main():
print(myDerivative(f, 0, 2*np.pi/(4*4096), 16))
pass
These derivatives seems to freak out at around the derivative of degree 16. I've also tried to play around with dTheta, but with no luck. I would like to have higher orders as well, but I fear I've run into some kind of machine precission.
My question is in it's simplest form: What can I do to improve this function in order to get higher order of my derivatives?
I seem to have come up with a solution to the problem. I did this by rearranging Cauchy's integral formula in a different way, by exploiting that the initial contour integral can be an arbitrarily large circle around the point of differentiation. Be aware that it is very important that the function is analytic in the complex plane for this to be valid.
New formula
Also this gives a new function for differentiation:
def myDerivative(f, x, dTheta, degree, contourRadius):
riemannSum = 0
theta = 0
while theta < 2*np.pi:
functionArgument = np.complex128(x + contourRadius*np.exp(1j*theta))
secondFactor = (1/contourRadius)**degree*np.complex128(np.exp(-1j * degree * theta))
riemannSum += f(functionArgument) * secondFactor * dTheta
theta += dTheta
return factorial(degree) * riemannSum.real / (2*np.pi)
This gives me a very accurate differentiation of high orders. For instance I am able to differentiate f(x)=e^x 50 times without a problem.
Well, since you are working with a discrete approximation of the derivative (via dTheta), sooner or later you must run into trouble. I'm surprised you were able to get at least 15 accurate derivatives -- good work! But to get derivatives of all orders, either you have to put a limit on what you're willing to accept and say it's good enough, or else compute the derivatives symbolically. Take a look at Sympy for that. Sympy probably has some functions for computing Taylor series too.

Rounding Error: Harmonic mean with exponent of small numbers

Let us say I have log_a1=-1000, log_a2=-1001, and log_a3=-1002.
n=3
I want to get the harmonic mean (HM) of a1, a2 and a3 (not log_a1, log_a2 and log_a3) such that HM = n/[1/exp(log_a1) + 1/exp(log_a2) + 1/exp(log_a3)].
However, due to rounding error exp(log_a1)=exp(-1000)=0 and accordingly 1/exp(log_a)=inf and HM=0.
Is there any mathematical trick to do? It is okay to get either HM or log(HM).
The best approach is probably to keep things in log scale. Many scientific languages have a log-add-exp function (e.g. numpy.logaddexp in python) that does what you want to high precision, with both the input and the result in log form.
The idea is that you want to compute e^-1000 + e^-1001 + e^-1002, so you factor it to e^-1000 (1 + + e^-1 + e^-2) and take the log. The result is -1000 + log(1 + e^-1 + e^-2), which can be computed without loss of precision.
log(HM)=log(n)-log(1)+log_a_max - log(sum(1./exp(log_ai - log_a_max)))
For a=[-1000, -1001, -1002];
log(HM)=-1001.309

Resources