When to use sum v.s. lpSum using pulp? - pulp

In the case study "A Set Partitioning Problem" in the pulp documentation "sum" is used to express the constraints, for example:
#A guest must seated at one and only one table
for guest in guests:
seating_model += sum([x[table] for table in possible_tables
if guest in table]) == 1, "Must_seat_%s"%guest
Whereas "lpSum" seems to be used otherwise. Consider for example the following constraint in the case study A Transportation Problem
for b in Bars:
prob += lpSum([vars[w][b] for w in Warehouses])>=demand[b], "Sum_of_Products_into_Bar%s"%b
Why is "sum" used in the first example? And when to use "sum" v.s. "lpSum"?

You should be using lpSum every time you have at least one pulp variable inside the sum expression. Using sum is not wrong, just more inefficient. We should probably change the docs so they are consistent. Feel free to open an issue or, even better, do a PR so we can correct the Set Partitioning Problem example.

Related

How is it in Spark, reduce and aggregate are the same

• In math, we think of reduce when a denominator and numerator share the same multiplier and that multiplier is “reduced” to a simpler concept (as in “is divisible by”).
• In math, an aggregate, while similar, does not produce the same value in a reduced form, instead, aggregate will produce a single value that is representative of the whole, the whole being a derived state of the data and used primarily for statistical purposes. For example: “Out of 10 sales people, we generated $60000 in capital sales.”
• https://docs.databricks.com/sql/language-manual/functions/reduce.html
• https://docs.databricks.com/sql/language-manual/functions/aggregate.html
Apparently in Apache Spark, reduce means the same thing as aggregate as explained by Databricks.
Can someone clarify the difference or explain how these two words (reduce and aggregate), with a perceptibly different context, can be considered to be the same?
From the doc. you posted, you can find the explanation of the reduce, that is
This function is a synonym for aggregate function.

How to limit solution domain in modelica

I have a very simple model in OpenModelica.
model doubleSolution
Real x ;
equation
x^2 -4 = 0;
end doubleSolution;
There are two mathematical solutions for this problem x={-2,+2}.
The Openmodelica Solver will provide just one result. In this case +2.
What if I'm intested in the other solution?
Using proper start values e.g. Real x(Start=-7) might help as a workaround, but I'm not sure if this is always a robust solution. I'd prefer if I could directly limit the solution range e.g. by (x < 0). Are such bundary conditions possible?
As you already noticed using a start value is one option. If that is a robust solution depends on how good the start value is. For this example the Newton-Raphson method is used, which highly depends on a good start value.
You can use max and min to give a variable a range where it is valid.
Check for example 4.8.1 Real Type of the Modelcia Language Specification to see what attributes type Real has.
Together with a good start value this should be robust enough and at least give you a warning if the x becomes bigger then 0.0.
model doubleSolution
Real x(max=0, start=-7);
equation
x^2 -4 = 0;
end doubleSolution;
Another option would be to add an assert to the equations:
assert(value >= min and value <= max , "Variable value out of limit");
For the min and max attribute this assert is added automatically.

PSPP: How can I perform a Wilcoxon test against a single reference value

I am using PSPP and want to compare a sample's median against a given median value but all the options I find compare two variables against each other. I have tried a workaround by definining a variable that's filled with my reference value:
NPAR TEST
/Wilcoxon [Variable of my actual data set] WITH [Variable filled with my reference value].
NPAR TEST
/SIGN [Variable of my actual data set] WITH [Variable filled with my reference value].
But this is a) a dodgy workaround, and b) the results are nowhere near the Wilcoxon test result I get with SPSS.
What would be the correct syntax, or a better workaround?
As far as I can see PSPP does not have direct implementation for this, however your workaround of comparing to a sample with a constant value set to the median appears to be correct rather than "a dodgy workaround".
This is confirmed in this IBM answer to the same question regarding SPSS:
The one-sample Wilcoxon test can also be handled as a special case of the Wilcoxon matched pairs test, with the second variable being a constant value equal to the null hypothesized value against which you want to test. Simply compute a constant variable, then use that along with your variable of interest in the paired samples test. For a discussion of why this is legitimate, refer to a nonparametric statistics text such as Section 5.1 of W. J. Conover's (1971) Practical Nonparametric Statistics (Wiley).
I don't have access to that textbook to check their answer, but I would consider IBM a reliable source anyway. I do not know why you would have found A different result using this method, it seems to behave correctly when I try it: perhaps double check you entered the data in the same way under SPSS and PSPP?

Why would more array accesses perform better?

I'm taking a course on coursera that uses minizinc. In one of the assignments, I was spinning my wheels forever because my model was not performing well enough on a hidden test case. I finally solved it by changing the following types of accesses in my model
from
constraint sum(neg1,neg2 in party where neg1 < neg2)(joint[neg1,neg2]) >= m;
to
constraint sum(i,j in 1..u where i < j)(joint[party[i],party[j]]) >= m;
I dont know what I'm missing, but why would these two perform any differently from eachother? It seems like they should perform similarly with the former being maybe slightly faster, but the performance difference was dramatic. I'm guessing there is some sort of optimization that the former misses out on? Or, am I really missing something and do those lines actually result in different behavior? My intention is to sum the strength of every element in raid.
Misc. Details:
party is an array of enum vars
party's index set is 1..real_u
every element in party should be unique except for a dummy variable.
solver was Gecode
verification of my model was done on a coursera server so I don't know what optimization level their compiler used.
edit: Since minizinc(mz) is a declarative language, I'm realizing that "array accesses" in mz don't necessarily have a direct corollary in an imperative language. However, to me, these two lines mean the same thing semantically. So I guess my question is more "Why are the above lines different semantically in mz?"
edit2: I had to change the example in question, I was toting the line of violating coursera's honor code.
The difference stems from the way in which the where-clause "a < b" is evaluated. When "a" and "b" are parameters, then the compiler can already exclude the irrelevant parts of the sum during compilation. If "a" or "b" is a variable, then this can usually not be decided during compile time and the solver will receive a more complex constraint.
In this case the solver would have gotten a sum over "array[int] of var opt int", meaning that some variables in an array might not actually be present. For most solvers this is rewritten to a sum where every variable is multiplied by a boolean variable, which is true iff the variable is present. You can understand how this is less efficient than an normal sum without multiplications.

A reverse inference engine (find a random X for which foo(X) is true)

I am aware that languages like Prolog allow you to write things like the following:
mortal(X) :- man(X). % All men are mortal
man(socrates). % Socrates is a man
?- mortal(socrates). % Is Socrates mortal?
yes
What I want is something like this, but backwards. Suppose I have this:
mortal(X) :- man(X).
man(socrates).
man(plato).
man(aristotle).
I then ask it to give me a random X for which mortal(X) is true (thus it should give me one of 'socrates', 'plato', or 'aristotle' according to some random seed).
My questions are:
Does this sort of reverse inference have a name?
Are there any languages or libraries that support it?
EDIT
As somebody below pointed out, you can simply ask mortal(X) and it will return all X, from which you can simply pick a random one from the list. What if, however, that list would be very large, perhaps in the billions? Obviously in that case it wouldn't do to generate every possible result before picking one.
To see how this would be a practical problem, imagine a simple grammar that generated a random sentence of the form "adjective1 noun1 adverb transitive_verb adjective2 noun2". If the lists of adjectives, nouns, verbs, etc. are very large, you can see how the combinatorial explosion is a problem. If each list had 1000 words, you'd have 1000^6 possible sentences.
Instead of the deep-first search of Prolog, a randomized deep-first search strategy could be easyly implemented. All that is required is to randomize the program flow at choice points so that every time a disjunction is reached a random pole on the search tree (= prolog program) is selected instead of the first.
Though, note that this approach does not guarantees that all the solutions will be equally probable. To guarantee that, it is required to known in advance how many solutions will be generated by every pole to weight the randomization accordingly.
I've never used Prolog or anything similar, but judging by what Wikipedia says on the subject, asking
?- mortal(X).
should list everything for which mortal is true. After that, just pick one of the results.
So to answer your questions,
I'd go with "a query with a variable in it"
From what I can tell, Prolog itself should support it quite fine.
I dont think that you can calculate the nth solution directly but you can calculate the n first solutions (n randomly picked) and pick the last. Of course this would be problematic if n=10^(big_number)...
You could also do something like
mortal(ID,X) :- man(ID,X).
man(X):- random(1,4,ID), man(ID,X).
man(1,socrates).
man(2,plato).
man(3,aristotle).
but the problem is that if not every man was mortal, for example if only 1 out of 1000000 was mortal you would have to search a lot. It would be like searching for solutions for an equation by trying random numbers till you find one.
You could develop some sort of heuristic to find a solution close to the number but that may affect (negatively) the randomness.
I suspect that there is no way to do it more efficiently: you either have to calculate the set of solutions and pick one or pick one member of the superset of all solutions till you find one solution. But don't take my word for it xd

Resources