Choco Solver ICF constraint to define the standard deviation of an IntVar array within limit - constraint-programming

Say, I am having an IntVar array
int n = 10;
IntVar[] x = VariableFactory.boundedArray("x", n, 0, 100, solver);
I need to define a constraint that restricts the standard deviation(can be a number with decimal points) of this array less than a predefined real number, say 3.45.

The deviation constraint is not (yet) implemented in choco. My company can implement it and add it to the library for you if you wish. Contact us to get a commercial offer (https://www.cosling.com/#contact).
Otherwise, you may encode the deviation as a continuous constraint (as in this example https://github.com/chocoteam/choco-solver/blob/master/choco-samples/src/main/java/org/chocosolver/samples/real/SmallSantaClaude.java) but it requires to install Ibex solver, with the jni bridge (http://www.ibex-lib.org/doc/java-install.html).
Best,
Jean-Guillaume Fages
https://www.cosling.com/

Related

Simultaneous Subset sums

I am dealing with a problem which is a variant of a subset-sum problem, and I am hoping that the additional constraint could make it easier to solve than the classical subset-sum problem. I have searched for a problem with this constraint but I have been unable to find a good example with an appropriate algorithm either on StackOverflow or through googling elsewhere.
The problem:
Assume you have two lists of positive numbers A1,A2,A3... and B1,B2,B3... with the same number of elements N. There are two sums Sa and Sb. The problem is to find the simultaneous set Q where |sum (A{Q}) - Sa| <= epsilon and |sum (B{Q}) - Sb| <= epsilon. So, if Q is {1, 5, 7} then A1 + A5 + A7 - Sa <= epsilon and B1 + B5 + B7 - Sb <= epsilon. Epsilon is an arbitrarily small positive constant.
Now, I could solve this as two completely separate subset sum problems, but removing the simultaneity constraint results in the possibility of erroneous solutions (where Qa != Qb). I also suspect that the additional constraint should make this problem easier than the two NP complete problems. I would like to solve an instance with 18+ elements in both lists of numbers, and most subset-sum algorithms have a long run time with this number of elements. I have investigated the pseudo-polynomial run time dynamic programming algorithm, but this has the problems that a) the speed relies on a short bit-depth of the list of numbers (which does not necessarily apply to my instance) and b) it does not take into account the simultaneity constraint.
Any advice on how to use the simultaneity constraint to reduce the run time? Is there a dynamic programming approach I could use to take into account this constraint?
If I understand your description of the problem correctly (I'm confused about why you have the distance symbols around "sum (A{Q}) - Sa" and "sum (B{Q}) - Sb", it doesn't seem to fit the rest of the explanation), then it is in NP.
You can see this by making a reduction from Subset sum (SUB) to Simultaneous subset sum (SIMSUB).
If you have a SUB problem consisting of a set X = {x1,x2,...,xn} and a target called t and you have an algorithm that solves SIMSUB when given two sets A = {a1,a2,...,an} and B = {b1,b2,...,bn}, two intergers Sa and Sb and a value for epsilon then we can solve SUB like this:
Let A = X and let B be a set of length n consisting of only 0's. Set Sa = t, Sb = 0 and epsilon = 0. You can now run the SIMSUB algorithm on this problem and get the solution to your SUB problem.
This shows that SUBSIM is as least as hard as SUB and therefore in NP.

Numerical differentiation using Cauchy (CIF)

I am trying to create a module with a mathematical class for Taylor series, to have it easily accessible for other projects. Hence I wish to optimize it as far as I can.
For those who are not too familiar with Taylor series, it will be a necessity to be able to differentiate a function in a point many times. Given that the normal definition of the mathematical derivative of a function will require immense precision for higher order derivatives, I've decided to use Cauchy's integral formula instead. With a little bit of work, I've managed to rearrange the formula a little bit, as you can see on this picture: Rearranged formula. This provided me with much more accurate results on higher order derivatives than the traditional definition of the derivative. Here is the function i am currently using to differentiate a function in a point:
def myDerivative(f, x, dTheta, degree):
riemannSum = 0
theta = 0
while theta < 2*np.pi:
functionArgument = np.complex128(x + np.exp(1j*theta))
secondFactor = np.complex128(np.exp(-1j * degree * theta))
riemannSum += f(functionArgument) * secondFactor * dTheta
theta += dTheta
return factorial(degree)/(2*np.pi) * riemannSum.real
I've tested this function in my main function with a carefully thought out mathematical function which I know the derivatives of, namely f(x) = sin(x).
def main():
print(myDerivative(f, 0, 2*np.pi/(4*4096), 16))
pass
These derivatives seems to freak out at around the derivative of degree 16. I've also tried to play around with dTheta, but with no luck. I would like to have higher orders as well, but I fear I've run into some kind of machine precission.
My question is in it's simplest form: What can I do to improve this function in order to get higher order of my derivatives?
I seem to have come up with a solution to the problem. I did this by rearranging Cauchy's integral formula in a different way, by exploiting that the initial contour integral can be an arbitrarily large circle around the point of differentiation. Be aware that it is very important that the function is analytic in the complex plane for this to be valid.
New formula
Also this gives a new function for differentiation:
def myDerivative(f, x, dTheta, degree, contourRadius):
riemannSum = 0
theta = 0
while theta < 2*np.pi:
functionArgument = np.complex128(x + contourRadius*np.exp(1j*theta))
secondFactor = (1/contourRadius)**degree*np.complex128(np.exp(-1j * degree * theta))
riemannSum += f(functionArgument) * secondFactor * dTheta
theta += dTheta
return factorial(degree) * riemannSum.real / (2*np.pi)
This gives me a very accurate differentiation of high orders. For instance I am able to differentiate f(x)=e^x 50 times without a problem.
Well, since you are working with a discrete approximation of the derivative (via dTheta), sooner or later you must run into trouble. I'm surprised you were able to get at least 15 accurate derivatives -- good work! But to get derivatives of all orders, either you have to put a limit on what you're willing to accept and say it's good enough, or else compute the derivatives symbolically. Take a look at Sympy for that. Sympy probably has some functions for computing Taylor series too.

How do I define a collection of values with units?

I want to define a collection of data arrays and associate each with units. For instance, one member of the collection could be Vector Float to which I somehow associate the units meters. Another member could be Vector Int to which I associate the units counts. Another could be Vector Double to which I associate the units centimeters. My use case is that for each member of the collection, I want to:
Display the precision of the data (i.e. Int vs Float vs Double vs Int64)
Display the numeric values according to the precision (i.e. [1.0, 3.0, 4.0] or [1, 3, 4] as appropriate).
Display the significance of the values, including units (i.e. Length meters, Count, Length centimeters).
Each data array is read from a file that defines the array of values, the precision, and the units, so none of these are available at compile time.
How can I define the type for such a collection, say, as an IntMap? Can I do so using existing units related libraries or do I need to roll my own? I am open to using any GHC extensions needed to get the job done (such as GADTs).
I have encountered the following issues in trying to formulate a solution.
Vector Float, Vector Double, and Vector Int are all different types, so I cannot put them into the same collection without existential quantification such as in the answer to this question. However, if I use existential quantification using the Num class as a constraint, I remove the precision information.
The units in the dimensional package are at the type level, so arrays with quantity Length cannot be in the same collection as arrays with quantity Time. Again, this might be solved with existential quantification, and again, destroying the needed information. Other units packages I've found treat units on the type level as well.
One possible solution is to take the units and precision information out of the type level and into the data level:
data Quantity = Length LengthUnit
| Time TimeUnit
| Count
| Angle AngleUnit
etc
data TimeUnit = Sec | MSec | Hour | Day
data AngleUnit = Deg | Rad | etc
data LengthUnit = Meter | Centimeter | Mile
etc
data Precision = Int | Float | Double
data DataArray quant prec = forall e . Num e => DataArray Quantity Precision (Vector e)
(This simple units system could be improved by adding prefixes, conversion factors, etc, like units packages usually do. The example above is just for illustrative purposes.)
This presents a number of problems as well. For one, it's easy to define a DataArray that has inconsistent type such as DataArray Angle Int (Vector Float). It also smells of reinventing the wheel, both in implementing a whole new units system and redundantly defining the precision of the data.
I feel like I am overcomplicating something that should be simple. What is a type strategy that achieves my usage goals?

Robust higher statistical moments in IDL

I'm working with noisy data in IDL, so I've been using STDDEV and robust_sigma
. There are papers on robust skewness and kurtosis, for instance [1] and [2], but are there implementations, as for standard deviation? (in IDL or maybe C?)
The documentation of http://idlastro.gsfc.nasa.gov/ftp/pro/robust/robust_sigma.pro states:
; OPTIONAL OUPTUT KEYWORD:
; GOODVEC = Vector of non-trimmed indices of the input vector
So one calls robust_sigma with an extra parameter that keeps track of the "good indices" in the data, those used to compute the robust_sigma, as opposed to those ignored in its computation.
good_indices = lonarr(width)
robo_2 = robust_sigma(data[*], GOODVEC=good_indices)
Then use (only) those good indices to compute the other moments.
robo_3 = skewness(data[good_indices])
robo_4 = kurtosis(data[good_indices])
No need for a special implementation.

Is there a language with constrainable types?

Is there a typed programming language where I can constrain types like the following two examples?
A Probability is a floating point number with minimum value 0.0 and maximum value 1.0.
type Probability subtype of float
where
max_value = 0.0
min_value = 1.0
A Discrete Probability Distribution is a map, where: the keys should all be the same type, the values are all Probabilities, and the sum of the values = 1.0.
type DPD<K> subtype of map<K, Probability>
where
sum(values) = 1.0
As far as I understand, this is not possible with Haskell or Agda.
What you want is called refinement types.
It's possible to define Probability in Agda: Prob.agda
The probability mass function type, with sum condition is defined at line 264.
There are languages with more direct refinement types than in Agda, for example ATS
You can do this in Haskell with Liquid Haskell which extends Haskell with refinement types. The predicates are managed by an SMT solver at compile time which means that the proofs are fully automatic but the logic you can use is limited by what the SMT solver handles. (Happily, modern SMT solvers are reasonably versatile!)
One problem is that I don't think Liquid Haskell currently supports floats. If it doesn't though, it should be possible to rectify because there are theories of floating point numbers for SMT solvers. You could also pretend floating point numbers were actually rational (or even use Rational in Haskell!). With this in mind, your first type could look like this:
{p : Float | p >= 0 && p <= 1}
Your second type would be a bit harder to encode, especially because maps are an abstract type that's hard to reason about. If you used a list of pairs instead of a map, you could write a "measure" like this:
measure total :: [(a, Float)] -> Float
total [] = 0
total ((_, p):ps) = p + probDist ps
(You might want to wrap [] in a newtype too.)
Now you can use total in a refinement to constrain a list:
{dist: [(a, Float)] | total dist == 1}
The neat trick with Liquid Haskell is that all the reasoning is automated for you at compile time, in return for using a somewhat constrained logic. (Measures like total are also very constrained in how they can be written—it's a small subset of Haskell with rules like "exactly one case per constructor".) This means that refinement types in this style are less powerful but much easier to use than full-on dependent types, making them more practical.
Perl6 has a notion of "type subsets" which can add arbitrary conditions to create a "sub type."
For your question specifically:
subset Probability of Real where 0 .. 1;
and
role DPD[::T] {
has Map[T, Probability] $.map
where [+](.values) == 1; # calls `.values` on Map
}
(note: in current implementations, the "where" part is checked at run-time, but since "real types" are checked at compile-time (that includes your classes), and since there are pure annotations (is pure) inside the std (which is mostly perl6) (those are also on operators like *, etc), it's only a matter of effort put into it (and it shouldn't be much more).
More generally:
# (%% is the "divisible by", which we can negate, becoming "!%%")
subset Even of Int where * %% 2; # * creates a closure around its expression
subset Odd of Int where -> $n { $n !%% 2 } # using a real "closure" ("pointy block")
Then you can check if a number matches with the Smart Matching operator ~~:
say 4 ~~ Even; # True
say 4 ~~ Odd; # False
say 5 ~~ Odd; # True
And, thanks to multi subs (or multi whatever, really – multi methods or others), we can dispatch based on that:
multi say-parity(Odd $n) { say "Number $n is odd" }
multi say-parity(Even) { say "This number is even" } # we don't name the argument, we just put its type
#Also, the last semicolon in a block is optional
Nimrod is a new language that supports this concept. They are called Subranges. Here is an example. You can learn more about the language here link
type
TSubrange = range[0..5]
For the first part, yes, that would be Pascal, which has integer subranges.
The Whiley language supports something very much like what you are saying. For example:
type natural is (int x) where x >= 0
type probability is (real x) where 0.0 <= x && x <= 1.0
These types can also be implemented as pre-/post-conditions like so:
function abs(int x) => (int r)
ensures r >= 0:
//
if x >= 0:
return x
else:
return -x
The language is very expressive. These invariants and pre-/post-conditions are verified statically using an SMT solver. This handles examples like the above very well, but currently struggles with more complex examples involving arrays and loop invariants.
For anyone interested, I thought I'd add an example of how you might solve this in Nim as of 2019.
The first part of the questions is straightfoward, since in the interval since since this question was asked, Nim has gained the ability to generate subrange types on floats (as well as ordinal and enum types). The code below defines two new float subranges types, Probability and ProbOne.
The second part of the question is more tricky -- defining a type with constrains on a function of it's fields. My proposed solution doesn't directly define such a type but instead uses a macro (makePmf) to tie the creation of a constant Table[T,Probability] object to the ability to create a valid ProbOne object (thus ensuring that the PMF is valid). The makePmf macro is evaluated at compile time, ensuring that you can't create an invalid PMF table.
Note that I'm a relative newcomer to Nim so this may not be the most idiomatic way to write this macro:
import macros, tables
type
Probability = range[0.0 .. 1.0]
ProbOne = range[1.0..1.0]
macro makePmf(name: untyped, tbl: untyped): untyped =
## Construct a Table[T, Probability] ensuring
## Sum(Probabilities) == 1.0
# helper templates
template asTable(tc: untyped): untyped =
tc.toTable
template asProb(f: float): untyped =
Probability(f)
# ensure that passed value is already is already
# a table constructor
tbl.expectKind nnkTableConstr
var
totprob: Probability = 0.0
fval: float
newtbl = newTree(nnkTableConstr)
# create Table[T, Probability]
for child in tbl:
child.expectKind nnkExprColonExpr
child[1].expectKind nnkFloatLit
fval = floatVal(child[1])
totprob += Probability(fval)
newtbl.add(newColonExpr(child[0], getAst(asProb(fval))))
# this serves as the check that probs sum to 1.0
discard ProbOne(totprob)
result = newStmtList(newConstStmt(name, getAst(asTable(newtbl))))
makePmf(uniformpmf, {"A": 0.25, "B": 0.25, "C": 0.25, "D": 0.25})
# this static block will show that the macro was evaluated at compile time
static:
echo uniformpmf
# the following invalid PMF won't compile
# makePmf(invalidpmf, {"A": 0.25, "B": 0.25, "C": 0.25, "D": 0.15})
Note: A cool benefit of using a macro is that nimsuggest (as integrated into VS Code) will even highlight attempts to create an invalid Pmf table.
Modula 3 has subrange types. (Subranges of ordinals.) So for your Example 1, if you're willing to map probability to an integer range of some precision, you could use this:
TYPE PROBABILITY = [0..100]
Add significant digits as necessary.
Ref: More about subrange ordinals here.

Resources