Excel: Avoiding the use of IF formula - Insights

Excel: Avoiding the use of IF formula - Insights - excel

Supposed I have
=IF(A1>0, 100, 50)
which I can also write as
=(A1>0)*(100) + (A1<=0)*(50)
I am interested in their performances. So my question is, who will win in the racing tracks?
Anyway I can think of two drawbacks (not related to performance).
The first drawback I had in mind is the latter form cannot return a text.
The second drawback is error handling, supposed I have =IFERROR(A1/B1, 0), I cannot substitute it with =(B1=0)*(0) + (B1<>0)*(A1/B1) which will still return a DIV/0 error.
In terms of performance drawback, I have one tickling my mind:
The latter form computes both IfTrue part and IfFalse part, which seems to be a drawback unless IF also compute both parts which I'm not sure about.
In terms of performance benefits:
The IF formula needs to consider for number or text outcome. The latter form simply return a number which makes it more light weighted.
Thanks!

Related

How would I construct an integer optimization model corresponding to a graph

Suppose we're given some sort of graph where the feasible region of our optimization problem is given. For example: here is an image
How would I go on about constructing these constraints in an integer optimization problem? Anyone got any tips? Thanks!

Mate, I agree with the others that you should be a little more specific than that paint-ish picture ;). In particular you are neither specifying any objective/objective direction nor are you giving any context, what about this graph should be integer-variable related, except for the existence of disjunctive feasible sets, which may be modeled by MIP-techniques. It seems like your problem is formalization of what you conceptualized. However, in case you are just being lazy and are just interested in modelling disjunctive regions, you should be looking into disjunctive programming techniques, such as "big-M" (Note: big-M reformulations can be problematic). You should be aiming at some convex-hull reformulation if you can attain one (fairly easily).
Back to your picture, it is quite clear that you have a problem in two real dimensions (let's say in R^2), where the constraints bounding the feasible set are linear (the lines making up the feasible polygons).
So you know that you have two dimensions and need two real continuous variables, say x[1] and x[2], to formulate each of your linear constraints (a[i,1]*x[1]+a[i,2]<=rhs[i] for some index i corresponding to the number of lines in your graph). Additionally your variables seem to be constrained to the first orthant so x[1]>=0 and x[2]>=0 should hold. Now, to add disjunctions you want some constraints that only hold when a certain condition is true. Therefore, you can add two binary decision variables, say y[1],y[2] and an additional constraint y[1]+y[2]=1, to tell that only one set of constraints can be active at the same time. You should be able to implement this with the help of big-M by reformulating the constraints as follows:
If you bound things from above with your line:
a[i,1]*x[1]+a[i,2]-rhs[i]<=M*(1-y[1]) if i corresponds to the one polygon,
a[i,1]*x[1]+a[i,2]-rhs[i]<=M*(1-y[2]) if i corresponds to the other polygon,
and if your line bounds things from below:
-M*(1-y[1])<=-a[i,1]*x[1]-a[i,2]+rhs[i] if i corresponds to the one polygon,
-M*(1-y[1])<=-a[i,1]*x[1]-a[i,2]+rhs[i] if i corresponds to the other polygon.
It is important that M is sufficiently large, but not too large to cause numerical issues.
That being said, I am by no means an expert on these disjunctive programming techniques, so feel free to chime in, add corrections or make things clearer.
Also, a more elaborate question typically yields more elaborate and satisfying answers ;) If you had gone to the effort of making up a true small example problem you likely would have gotten a full formulation of your problem or even an executable piece of code in no time.

Why would more array accesses perform better?

I'm taking a course on coursera that uses minizinc. In one of the assignments, I was spinning my wheels forever because my model was not performing well enough on a hidden test case. I finally solved it by changing the following types of accesses in my model
from
constraint sum(neg1,neg2 in party where neg1 < neg2)(joint[neg1,neg2]) >= m;
to
constraint sum(i,j in 1..u where i < j)(joint[party[i],party[j]]) >= m;
I dont know what I'm missing, but why would these two perform any differently from eachother? It seems like they should perform similarly with the former being maybe slightly faster, but the performance difference was dramatic. I'm guessing there is some sort of optimization that the former misses out on? Or, am I really missing something and do those lines actually result in different behavior? My intention is to sum the strength of every element in raid.
Misc. Details:
party is an array of enum vars
party's index set is 1..real_u
every element in party should be unique except for a dummy variable.
solver was Gecode
verification of my model was done on a coursera server so I don't know what optimization level their compiler used.
edit: Since minizinc(mz) is a declarative language, I'm realizing that "array accesses" in mz don't necessarily have a direct corollary in an imperative language. However, to me, these two lines mean the same thing semantically. So I guess my question is more "Why are the above lines different semantically in mz?"
edit2: I had to change the example in question, I was toting the line of violating coursera's honor code.

The difference stems from the way in which the where-clause "a < b" is evaluated. When "a" and "b" are parameters, then the compiler can already exclude the irrelevant parts of the sum during compilation. If "a" or "b" is a variable, then this can usually not be decided during compile time and the solver will receive a more complex constraint.
In this case the solver would have gotten a sum over "array[int] of var opt int", meaning that some variables in an array might not actually be present. For most solvers this is rewritten to a sum where every variable is multiplied by a boolean variable, which is true iff the variable is present. You can understand how this is less efficient than an normal sum without multiplications.

partial functions vs input verification

I really love using total functions. That said, sometimes I'm not sure what the best approach is for guaranteeing that. Lets say that I'm writing a function similar to chunksOf from the split package, where I want to split up a list into sublists of a given size. Now I'd really rather say that the input for sublist size needs to be a positive int (so excluding 0). As I see it I have several options:
1) all-out: make a newtype for PositiveInt, hide the constructor, and only expose safe functions for creating a PositiveInt (perhaps returning a Maybe or some union of Positive | Negative | Zero or what have you). This seems like it could be a huge hassle.
2) what the split package does: just return an infinite list of size-0 sublists if the size <= 0. This seems like you risk bugs not getting caught, and worse: those bugs just infinitely hanging your program with no indication of what went wrong.
3) what most other languages do: error when the input is <= 0. I really prefer total functions though...
4) return an Either or Maybe to cover the case that the input might have been <= 0. Similar to #1, it seems like using this could just be a hassle.
This seems similar to this post, but this has more to do with error conditions than just being as precise about types as possible. I'm looking for thoughts on how to decide what the best approach for a case like this is. I'm probably most inclined towards doing #1, and just dealing with the added overhead, but I'm concerned that I'll be kicking myself down the road. Is this a decision that needs to be made on a case-by-case basis, or is there a general strategy that consistently works best?

Why is POISSON function not consistent in Microsoft Excel?

There is a definition in POISSON function that:
#NUM! error – Occurs if either:
The given value of x is less than zero;
The given value of mean is
less than zero.
But I try to do this in Excel 2013. It gave me differnt value. Here is my example:
=POISSON(0,-0.5,FALSE)
the result is: 1.648721271
instead of #NUM!
Any thoughts?

Speculatively, the bug might have come about as an optimization. Poisson(x,m,TRUE) is defined as e^(-m)*(m^x)/x!. One way to compute m^x when m is floating-point is as e^(x*Ln(m)). In a spreadsheet, you can observe that
=POISSON(A1,A2,TRUE) - EXP(-A2)*EXP(A1*LN(A2))/FACT(A1)
always evaluates to exactly 0 whenever A1,A2 are in the correct domain (and not e.g. 0.0000000001 as might be the case if the calculation had used a different approach).
Furthermore, EXP(-A2)*EXP(A1*LN(A2))/FACT(A1) fails when it should fail, giving #NUM! when fed 0, -0.5. My speculation is that the Excel programmers initially used a formula which failed when it should have failed, letting the called functions raise the error when appropriate. Then someone had the bright idea of just returning EXP(-mean) when x = 0 (since in that case the rest of the expression is 1 when it is defined at all). After all -- why bother to compute something when you know that it is 1?
What I find astonishing is that the bug is still there with POISSON.DIST Excel had been (and still is, although to a lesser extent) heavily criticized for the accuracy of its statistical functions and tests. So much so that "Friends don't let friends use Excel for statistics" is a relatively well-known saying among statisticians. See this for a discussion. The dotted statistical functions such as POISSON.DIST were explicitly designed to address the many complaints which had piled up. POISSON itself is just kept around for backwards compatibility. It is strange how this bug slipped through what should have been a thorough rewriting of these functions from the ground up.

Functional alternative to caching known "answers"

I think the best way to form this question is with an example...so, the actual reason I decided to ask about this is because of because of Problem 55 on Project Euler. In the problem, it asks to find the number of Lychrel numbers below 10,000. In an imperative language, I would get the list of numbers leading up to the final palindrome, and push those numbers to a list outside of my function. I would then check each incoming number to see if it was a part of that list, and if so, simply stop the test and conclude that the number is NOT a Lychrel number. I would do the same thing with non-lychrel numbers and their preceding numbers.
I've done this before and it has worked out nicely. However, it seems like a big hassle to actually implement this in Haskell without adding a bunch of extra arguments to my functions to hold the predecessors, and an absolute parent function to hold all of the numbers that I need to store.
I'm just wondering if there is some kind of tool that I'm missing here, or if there are any standards as a way to do this? I've read that Haskell kind of "naturally caches" (for example, if I wanted to define odd numbers as odds = filter odd [1..], I could refer to that whenever I wanted to, but it seems to get complicated when I need to dynamically add elements to a list.
Any suggestions on how to tackle this?
Thanks.
PS: I'm not asking for an answer to the Project Euler problem, I just want to get to know Haskell a bit better!

I believe you're looking for memoizing. There are a number of ways to do this. One fairly simple way is with the MemoTrie package. Alternatively if you know your input domain is a bounded set of numbers (e.g. [0,10000)) you can create an Array where the values are the results of your computation, and then you can just index into the array with your input. The Array approach won't work for you though because, even though your input numbers are below 10,000, subsequent iterations can trivially grow larger than 10,000.
That said, when I solved Problem 55 in Haskell, I didn't bother doing any memoization whatsoever. It turned out to just be fast enough to run (up to) 50 iterations on all input numbers. In fact, running that right now takes 0.2s to complete on my machine.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Excel: Avoiding the use of IF formula - Insights - excel

Related

How would I construct an integer optimization model corresponding to a graph

Why would more array accesses perform better?

partial functions vs input verification

Why is POISSON function not consistent in Microsoft Excel?

Functional alternative to caching known "answers"

Categories

Resources