Why is POISSON function not consistent in Microsoft Excel? - excel

There is a definition in POISSON function that:
#NUM! error – Occurs if either:
The given value of x is less than zero;
The given value of mean is
less than zero.
But I try to do this in Excel 2013. It gave me differnt value. Here is my example:
=POISSON(0,-0.5,FALSE)
the result is: 1.648721271
instead of #NUM!
Any thoughts?

Speculatively, the bug might have come about as an optimization. Poisson(x,m,TRUE) is defined as e^(-m)*(m^x)/x!. One way to compute m^x when m is floating-point is as e^(x*Ln(m)). In a spreadsheet, you can observe that
=POISSON(A1,A2,TRUE) - EXP(-A2)*EXP(A1*LN(A2))/FACT(A1)
always evaluates to exactly 0 whenever A1,A2 are in the correct domain (and not e.g. 0.0000000001 as might be the case if the calculation had used a different approach).
Furthermore, EXP(-A2)*EXP(A1*LN(A2))/FACT(A1) fails when it should fail, giving #NUM! when fed 0, -0.5. My speculation is that the Excel programmers initially used a formula which failed when it should have failed, letting the called functions raise the error when appropriate. Then someone had the bright idea of just returning EXP(-mean) when x = 0 (since in that case the rest of the expression is 1 when it is defined at all). After all -- why bother to compute something when you know that it is 1?
What I find astonishing is that the bug is still there with POISSON.DIST Excel had been (and still is, although to a lesser extent) heavily criticized for the accuracy of its statistical functions and tests. So much so that "Friends don't let friends use Excel for statistics" is a relatively well-known saying among statisticians. See this for a discussion. The dotted statistical functions such as POISSON.DIST were explicitly designed to address the many complaints which had piled up. POISSON itself is just kept around for backwards compatibility. It is strange how this bug slipped through what should have been a thorough rewriting of these functions from the ground up.

Related

How do I use Excel Solver for exponents?

My equation is Q^Q^.25=6,512,786
I have the following input
And here is my solver
I keep getting errors.
The problem is with the non-standard way that Excel handles iterated exponents. Excel parses Q^Q^0.25 as (Q^Q)^0.25, which grows very rapidly. You probably intended Q^(Q^0.25), which, while still growing rapidly, doesn't explode in the same way. If you change your equation in D2 to
=C2^(C2^0.25)
and rerun the solver with the same settings (but a nonzero starting value at C2 since 0^0 is undefined) you will get convergence, to approximately 117.44.
On Edit: It turns out that in the programming world there is a greater diversity of how associativity of exponentiation is handled than I realized. This answer to a related question contains an interesting table surveying how different programming languages handle it.

Unclear why functions from Data.Ratio are not exposed and how to work around

I am implementing an algorithm using Data.Ratio (convergents of continued fractions).
However, I encounter two obstacles:
The algorithm starts with the fraction 1%0 - but this throws a zero denominator exception.
I would like to pattern match the constructor a :% b
I was exploring on hackage. An in particular the source seems to be using exactly these features (e.g. defining infinity = 1 :% 0, or pattern matching for numerator).
As beginner, I am also confused where it is determined that (%), numerator and such are exposed to me, but not infinity and (:%).
I have already made a dirty workaround using a tuple of integers, but it seems silly to reinvent the wheel about something so trivial.
Also would be nice to learn how read the source which functions are exposed.
They aren't exported precisely to prevent people from doing stuff like this. See, the type
data Ratio a = a:%a
contains too many values. In particular, e.g. 2/6 and 3/9 are actually the same number in ℚ and both represented by 1:%3. Thus, 2:%6 is in fact an illegal value, and so is, sure enough, 1:%0. Or it might be legal but all functions know how to treat them so 2:%6 is for all observable means equal to 1:%3 – I don't in fact know which of these options GHC chooses, but at any rate it's an implementation detail and could change in future releases without notice.
If the library authors themselves use such values for e.g. optimisation tricks that's one thing – they have after all full control over any algorithmic details and any undefined behaviour that could arise. But if users got to construct such values, it would result in brittle code.
So – if you find yourself starting an algorithm with 1/0, then you should indeed not use Ratio at all there but simply store numerator and denominator in a plain tuple, which has no such issues, and only make the final result a Ratio with %.

Why would more array accesses perform better?

I'm taking a course on coursera that uses minizinc. In one of the assignments, I was spinning my wheels forever because my model was not performing well enough on a hidden test case. I finally solved it by changing the following types of accesses in my model
from
constraint sum(neg1,neg2 in party where neg1 < neg2)(joint[neg1,neg2]) >= m;
to
constraint sum(i,j in 1..u where i < j)(joint[party[i],party[j]]) >= m;
I dont know what I'm missing, but why would these two perform any differently from eachother? It seems like they should perform similarly with the former being maybe slightly faster, but the performance difference was dramatic. I'm guessing there is some sort of optimization that the former misses out on? Or, am I really missing something and do those lines actually result in different behavior? My intention is to sum the strength of every element in raid.
Misc. Details:
party is an array of enum vars
party's index set is 1..real_u
every element in party should be unique except for a dummy variable.
solver was Gecode
verification of my model was done on a coursera server so I don't know what optimization level their compiler used.
edit: Since minizinc(mz) is a declarative language, I'm realizing that "array accesses" in mz don't necessarily have a direct corollary in an imperative language. However, to me, these two lines mean the same thing semantically. So I guess my question is more "Why are the above lines different semantically in mz?"
edit2: I had to change the example in question, I was toting the line of violating coursera's honor code.
The difference stems from the way in which the where-clause "a < b" is evaluated. When "a" and "b" are parameters, then the compiler can already exclude the irrelevant parts of the sum during compilation. If "a" or "b" is a variable, then this can usually not be decided during compile time and the solver will receive a more complex constraint.
In this case the solver would have gotten a sum over "array[int] of var opt int", meaning that some variables in an array might not actually be present. For most solvers this is rewritten to a sum where every variable is multiplied by a boolean variable, which is true iff the variable is present. You can understand how this is less efficient than an normal sum without multiplications.

Excel Solver Curve Fitting Failing - MatLab recast

I am having some strange problems with excel's solver. Basically what I am trying to do is curve fit my data. I have two different lines, one is my calibration line and the other is the derived line that I am attempting to match up to the calibration line. My line depends on 19 different variable parameters (Perhaps this is too many? I have tried fewer without result) and I am using solver to adjust these parameters to make the two lines as close as possible.
For Example:
The QP column contains the variables I would like changed, changing these will draw me closer or further from the calibration curve. Each subsequent value of QP must be greater than the first.
Col=B Col=C
Power .QP_'
1 ..... 57000
2 ..... 65000
3 ..... 70000
4 ..... 80000
5 ..... 80000
Therefore my excel solver parameters look like this: C1:C19>=0,C1:C19<=100000 and C2>=C1, C3>=C2,C4>=C3... I have also tried making another column of the differences between each value and then saying that these must be diff>=0.
To compare this with my calibration curve I have taken the calibration curve data and subtracted my data derived from QP and then squared that to create my sum of the squares error. For example:
(Calibration-DerivedQP)^2=SS(x) <- where x represents the row number
Sum(SS(x))=SSE
SSE is what I have set solver to minimize. And upon changing QP everything automatically updates. There are no if statements being used and no pivot tables are used.
If I remove the parameters similar to C2>=C1 everything works perfectly, except the derived values are not feasible. But when the solver is run with these parameters, nothing gets changed and no matter which guesses I used as starting values ( so that I can ensure I haven't guessed a local minimum), the solver cannot improve upon my solution. This has led me to believe that something in my parameters is being broken, since I can very easily improve on my solution by guess and check. The rest of solvers settings are at the defaults, and the evolutionary method is used since my curve isn't smooth (I don't think) I had this working in the past and now something seems to be broken. Any ideas are appreciated! Thank you so much! Sorry if I am missing any critical information. I am also familiar with matlab and R if there are better methods in those languages.
I found the solution to my problem. I don't know if this will be helpful to anyone else since my problem vague and pretty specific to me. That being said, my problem was in the constraints. I changed some data on my excel sheet to allow for fewer constraints. An example might look like this:
Guess..........Squared......Added..................Q
-12..............(-12)^2....... 0
-16..............(-16)^2.......=(-16)^2+0.............256
+7.................(7)^2..........=(7)^2+(-16)^2+0....305
Now I allow solver to guess any number subject to minimal constraints.
Essentially, what is happening now, is the excel sheet allows for any guess that solver makes to work. By squaring the numbers it give me positive values, and the added column ensures that each successive value is equal to or greater than the first. This means there are very few constraints. I also changed the solver option from evolutionary to GRG Nonlinear.
Tips for getting solver to work:
Try and use the spreadsheet to set constraints (other than bounds, bounds seem to be good) wherever possible, the more constraints that I set in solver, the less likely my solution was to work.
Hope that helps, sorry if I have provided any incorrect information.

Is my program Turing-complete?

I spent a week or two programming a simple logic solver. Having built it, I found myself wondering whether the language it solves is Turing-complete or not. So I coded up a small set of equations which accept any valid expression in the SKI combinator calculus, and produce a result set which contains the normal form of that expression. Since SKI is Turing-complete, proving that my language can execute SKI would demonstrate its Turing-completeness.
There is a glitch, however. The solver does not reduce the expression in normal order. Actually what it does is to try every possible reduction order. Which means that the solution set is typically huge. If a normal form exists, it will be in there somewhere, but it's difficult to tell where.
This brings me to two questions:
Is my language Turing-complete? Or do I need to find a better proof?
Is the number of solutions a computable function of the input?
(At first I assumed that the size of the solution set was exponential or factorial in the input size. But on closer inspection, this is not true. You can write huge expressions which are already in normal form, and tiny expressions which do not terminate. I have a feeling that determining the size of the solution set might be equivilent to solving the Halting Problem, but I'm not completely sure...)
A) As augustss says, your system is clearly turing complete.
B) You're right that determining solution size is the same as the halting problem. If a sequence doesn't terminate, then you get an infinite solution set. So to determine if the set is infinite, you need to determine if a reduction sequence terminates. But that's precisely the halting problem!
C) As I recall, a system that, given a set of instructions to a turing machine, merely says how many steps they take to terminate (which is, I suppose, the cardinality of your solution set) or fails to terminate if the instructions themselves fail to terminate is in itself turing complete. So that should help with the intuition here.
In answer to my own question... I found that by tweaking the source code, I can make it so that if the input SKI expression has a normal form, that normal form will always be solution #1. So if you just ignore any further solutions, the program reduces any SKI expression to normal form.
I believe this constitutes a "better proof". ;-)

Resources