absolute value in cplex c++ - visual-c++

I have to use absolute value in the cost function of some linear problem.
Part that bother me like this
for (t=0;t<T;t++)
for (i=0;i<I; i++){
for (j=1;j<J; j++)
Sum += |x[i][j][t]-x[i][j][t-1]|*L/2;
Sum += |x[i][0][t]-x[i][0][t-1]|*V/2;
}
I am writing my code in c++ and I don't know how to implement absolute value. x is integer value.
I have tried with cplex.getValue(x[i][j][t])-cplex.getValue(x[i][j][t-1]) >0 but it couldn't work.

Since the absolute value function is nonlinear ( the reason is explained in this math question ) you need to linearize the objective function first.
Basically, you need to express each absolute-valued-term of that summation with a new variable and optimize the sum of these new variables (subject to some additional constraints). The method is explained in detail in Section 7.2 of Linear Programming textbook of Thomas S. Ferguson.

Related

Discretizing PDE in space for use with modelica

I am currently doing a course called "Modeling of dynamic systems" and have been given the task of modeling a warm water tank in modelica with a distributed temperature description.
Most of the tasks have gone well, and my group is left with the task of introducing the heat flux due to buoyancy effects into the model. Here is where we get stuck.
the equation given is this:
Given PDE
But how do we discretize this into something we can use in modelica?
The discretized version we ended up with was this:
(Qd_pp_b[k+1] - Qd_pp_b[k]) / h_dz = -K_b *(T[k+1] - 2 * T[k] + T[k-1]) / h_dz^2
where Qd_pp_b is the left-hand side variable, ie the heat flux, k is the current slice of the tank and T is the temperature in the slices.
Are we on the right path? or completely wrong?
This doesn't seem to be a differential equation (as is) so this does not make sense without surrounding problem. For the second derivative you should always create auxiliary variables and for each partial derivative a separate equation. I added dummy values for parameters and dummy equations for T[k]. This can be simulated, is this about what you expected?
model test
constant Integer n = 10;
Real[n] Qd_pp_b;
Real[n] dT;
Real[n] T;
parameter Real K_b = 1;
equation
for k in 1:n loop
der(Qd_pp_b[k]) = -K_b *der(dT[k]);
der(T[k]) = dT[k];
T[k] = sin(time+k);
end for;
end test;

Numerical differentiation using Cauchy (CIF)

I am trying to create a module with a mathematical class for Taylor series, to have it easily accessible for other projects. Hence I wish to optimize it as far as I can.
For those who are not too familiar with Taylor series, it will be a necessity to be able to differentiate a function in a point many times. Given that the normal definition of the mathematical derivative of a function will require immense precision for higher order derivatives, I've decided to use Cauchy's integral formula instead. With a little bit of work, I've managed to rearrange the formula a little bit, as you can see on this picture: Rearranged formula. This provided me with much more accurate results on higher order derivatives than the traditional definition of the derivative. Here is the function i am currently using to differentiate a function in a point:
def myDerivative(f, x, dTheta, degree):
riemannSum = 0
theta = 0
while theta < 2*np.pi:
functionArgument = np.complex128(x + np.exp(1j*theta))
secondFactor = np.complex128(np.exp(-1j * degree * theta))
riemannSum += f(functionArgument) * secondFactor * dTheta
theta += dTheta
return factorial(degree)/(2*np.pi) * riemannSum.real
I've tested this function in my main function with a carefully thought out mathematical function which I know the derivatives of, namely f(x) = sin(x).
def main():
print(myDerivative(f, 0, 2*np.pi/(4*4096), 16))
pass
These derivatives seems to freak out at around the derivative of degree 16. I've also tried to play around with dTheta, but with no luck. I would like to have higher orders as well, but I fear I've run into some kind of machine precission.
My question is in it's simplest form: What can I do to improve this function in order to get higher order of my derivatives?
I seem to have come up with a solution to the problem. I did this by rearranging Cauchy's integral formula in a different way, by exploiting that the initial contour integral can be an arbitrarily large circle around the point of differentiation. Be aware that it is very important that the function is analytic in the complex plane for this to be valid.
New formula
Also this gives a new function for differentiation:
def myDerivative(f, x, dTheta, degree, contourRadius):
riemannSum = 0
theta = 0
while theta < 2*np.pi:
functionArgument = np.complex128(x + contourRadius*np.exp(1j*theta))
secondFactor = (1/contourRadius)**degree*np.complex128(np.exp(-1j * degree * theta))
riemannSum += f(functionArgument) * secondFactor * dTheta
theta += dTheta
return factorial(degree) * riemannSum.real / (2*np.pi)
This gives me a very accurate differentiation of high orders. For instance I am able to differentiate f(x)=e^x 50 times without a problem.
Well, since you are working with a discrete approximation of the derivative (via dTheta), sooner or later you must run into trouble. I'm surprised you were able to get at least 15 accurate derivatives -- good work! But to get derivatives of all orders, either you have to put a limit on what you're willing to accept and say it's good enough, or else compute the derivatives symbolically. Take a look at Sympy for that. Sympy probably has some functions for computing Taylor series too.

Modelica Time Dependent Equations

I am new to Modelica, and I am wondering if it is possible to write a kind of dynamic programming equation. Assume time is discretized by an integer i, and in my specific application x is boolean and f is a boolean function of x.
x(t_i) = f(x(t_{i+d}))
Where d can be a positive or negative integer. Of course, I would initialize x accordingly, either true or false.
Any help or references would be greatly appreciated!
It is possible. In Modelica the discretization in time is usually carried on by the compiler, you have to take care of the equations (continous dynamics). Otherwise, if you want to generate events at discrete time points, you can do it using when statements.
I suggest you to take a look at Introduction to Object-Oriented Modeling and Simulation with OpenModelica (PDF format, 6.6 MB) - a more recent tutorial (2012) by Peter Fritzson. There is a section that on Discrete Events and Hybrid Systems, that should clarify how to implement your equations in Modelica.
Below you can find an example from that tutorial about the model of a bouncing ball, as you can see discretization in time is not considered when you write your dynamic equations. So the continous model of the ball v=der(s), a=der(v) and than the discrete part inside the when clause that handles the contact with the ground:
model BouncingBall "the bouncing ball model"
parameter Real g=9.81; //gravitational acc.
parameter Real c=0.90; //elasticity constant
Real height(start=10),velocity(start=0);
equation
der(height) = velocity;
der(velocity)=-g;
when height<0 then
reinit(velocity, -c*velocity);
end when;
end BouncingBall;
Hope this helps,
Marco
If I understand your question, you want to use the last n evaluations of x to determine the next value of x. If so, this code shows how to do this:
model BooleanHistory
parameter Integer n=10 "How many points to keep";
parameter Modelica.SIunits.Time dt=1e-3;
protected
Boolean x[n];
function f
input Integer n;
input Boolean past[n-1];
output Boolean next;
algorithm
next :=not past[1]; // Example
end f;
initial equation
x = {false for i in 1:n};
equation
when sample(0,dt) then
x[2:n] = pre(x[1:(n-1)]);
x[1] = f(n, x[2:n]);
end when;
end BooleanHistory;

What is an efficient way to compute the Dice coefficient between 900,000 strings?

I have a corpus of 900,000 strings. They vary in length, but have an average character count of about 4,500. I need to find the most efficient way of computing the Dice coefficient of every string as it relates to every other string. Unfortunately, this results in the Dice coefficient algorithm being used some 810,000,000,000 times.
What is the best way to structure this program for increased efficiency? Obviously, I can prevent computing the Dice of sections A and B, and then B and A--but this only halves the work required. Should I consider taking some shortcuts or creating some sort of binary tree?
I'm using the following implementation of the Dice coefficient algorithm in Java:
public static double diceCoefficient(String s1, String s2) {
Set<String> nx = new HashSet<String>();
Set<String> ny = new HashSet<String>();
for (int i = 0; i < s1.length() - 1; i++) {
char x1 = s1.charAt(i);
char x2 = s1.charAt(i + 1);
String tmp = "" + x1 + x2;
nx.add(tmp);
}
for (int j = 0; j < s2.length() - 1; j++) {
char y1 = s2.charAt(j);
char y2 = s2.charAt(j + 1);
String tmp = "" + y1 + y2;
ny.add(tmp);
}
Set<String> intersection = new HashSet<String>(nx);
intersection.retainAll(ny);
double totcombigrams = intersection.size();
return (2 * totcombigrams) / (nx.size() + ny.size());
}
My ultimate goal is to output an ID for every section that has a Dice coefficient of greater than 0.9 with another section.
Thanks for any advice that you can provide!
Make a single pass over all the Strings, and build up a HashMap which maps each bigram to a set of the indexes of the Strings which contain that bigram. (Currently you are building the bigram set 900,000 times, redundantly, for each String.)
Then make a pass over all the sets, and build a HashMap of [index,index] pairs to common-bigram counts. (The latter Map should not contain redundant pairs of keys, like [1,2] and [2,1] -- just store one or the other.)
Both of these steps can easily be parallelized. If you need some sample code, please let me know.
NOTE one thing, though: from the 26 letters of the English alphabet, a total of 26x26 = 676 bigrams can be formed. Many of these will never or almost never be found, because they don't conform to the rules of English spelling. Since you are building up sets of bigrams for each String, and the Strings are so long, you will probably find almost the same bigrams in each String. If you were to build up lists of bigrams for each String (in other words, if the frequency of each bigram counted), it's more likely that you would actually be able to measure the degree of similarity between Strings, but then the calculation of Dice's coefficient as given in the Wikipedia article wouldn't work; you'd have to find a new formula.
I suggest you continue researching algorithms for determining similarity between Strings, try implementing a few of them, and run them on a smaller set of Strings to see how well they work.
You should come up with some kind of inequality like: D(X1,X2) > 1-p, D(X1,X3) < 1-q and p D(X2,X3) < 1-q+p . Or something like that. Now, if 1-q+p < 0.9, then probably you don't have to evaluate D(X2,X3).
PS: I am not sure about this exact inequality, but I have a gut feeling that this might be right (but I do not have enough time to actually do the derivations now). Look for some of the inequalities with other similarity measures and see if any of them are valid for Dice co-efficient.
=== Also ===
If there are a elements in set A, and if your threshold is r (=0.9), then set B should have number of elements b should be such that: r*a/(2-r) <= b <= (2-r)*a/r . This should eliminate need for lots of comparisons IMHO. You can probably sort the strings according to length and use the window describe above to limit comparisons.
Disclaimer first: This will not reduce the number of comparisons you'll have to make. But this should make a Dice comparison faster.
1) Don't build your HashSets every time you do a diceCoefficient() call! It should speed things up considerably if you just do it once for each string and keep the result around.
2) Since you only care about if a particular bigram is present in the string, you could get away with a BitSet with a bit for each possible bigram, rather than a full HashMap. Coefficient calculation would then be simplified to ANDing two bit sets and counting the number of set bits in the result.
3) Or, if you have a huge number of possible bigrams (Unicode, perhaps?) - or monotonous strings with only a handful of bigrams each - a sorted Array of bigrams might provide faster, more space-efficent comparisons.
Is their charset limited somehow? If it is, you can compute character counts by their code in each string and compare these numbers. After such pre-computation (it will occupy 2*900K*S bytes of memory [if we assume no character is found more then 65K time in the same string], where S is different character count). Then computing the coefficent would take O(S) time. Sure, this would be helpful if S<4500.

MATLAB: fastest way to do a root-mean-squared error between a vector and array of vectors

I have a question regarding the fastest way to compute the RMSE between a single vector and an array of vectors. Specifically, I have a vector A representing an point and would like to find the index in a list B of points that A is closest to. Right now I am using:
tempmat = bsxfun(#minus,A,B);
tempmat1 = sqrt(sum(tempmat.^2,2);
index = find(tempmat1 == min(tempmat1));
this takes about 0.058 seconds to calculate the index. Is there a faster way in MATLAB of doing this? I performing this calculations literally millions of times.
Many thanks for reading,
Joe
tempmat = bsxfun(#minus,A,B);
tmpmat1 = sum(tempmat.^2,2);
[m,index] = min(tempmat1);
m = sqrt(m); %# optional, only if you need the actual numerical value
This avoids calculating sqrt on the whole array, since the minumum of the squared differences will have the same index. It also uses the second output of min to avoid the second pass of find.
You'll probably find that
tempmat = A - B(ones(1, size(A,1)), :)
is faster than the bsxfun version, unless size(A,1) is exceptionally large.
This assumes that A is your array and B is your vector. The RSS calculation implies that you have row vectors.
Also, I presume you know that you're calculating the RSS not RMS.

Resources