svm KKT condition violation that alpha choice in SMO - svm

when I was read the paper about SMO for SVM.
John C. Platt paper
choosing the alpha which violated KKT condition is very confusing me. I do not very understand the code as below.
if ((r2 < -tol && alph2 < C) || (r2 > tol && alph2 > 0))
why does it choose r2 < -tol when alph2 < C, not r2 < tol?

Defining Importance of KKT Conditions
First, note that the KKT conditions for SVM are especially helpful for getting rid of vectors that are not the support vectors. With that, consider a (non-support) vector point for example with label 1 that is far above the hyperplane (correctly identified). Consider that if you plug the point into decision equation, you will likely get a value like 2.5, because that point is not a support vector. Points like that are systematically weeded out during the optimization phase of SVM so that the only points with weighted value are support vectors close to the hyperplane.
Given: The KKT conditions are violated when:
[a_i < C and y_i (+b) < 1], or
[a_i > 0 and y_i (+b) > 1]
Now let's look at the pseudocode you mentioned from Platt.
Pseudocode Excerpt from Plat 1998:
E2 = SVM output on point[i2] – y2 (check in error cache)
r2 = E2*y2
if ((r2 < -tol && alph2 < C) || (r2 > tol && alph2 > 0))
Platt's Paper and the KKT Conditions
Given the values of r2 and E2, Platt's if-statement says the following mathematically:
if [y_i ((w,x) + b - y_i) < -tol && alpha2 < C] or [y_i ((w,x) + b - y_i) > tol && alpha2 > 0]
Let's set tol to 0. The equation of the first condition now reads:
y_i ((w,x) + b - y_i) < 0
Rewritten, this is essentially:
y_i ((w,x) + b) < 1, since (y_i)*(y_i) will always be positive 1. The condition:
[y_i ((w,x) + b) < 1 and a < C] exactly matches our definition of what violates the KKT conditions. If you view the other condition, it will also exactly match the other KKT violation.
What this code means, going further, is that here Platt is checking for ("non-border") violators of the KKT conditions. He's essentially trying to weed out those points that are obviously not close to the hyperplane at all.

Related

Conditional Constraint Solving

How would you approach the following constraint optimization problem:
I have a set of integer variables x[i] that can takes only 4 values in the [1,4] range
There are constraints of the form C <= x[i], x[i] <= C, and x[i] <= x[j]
There are also conditional constraints, but exclusively of the form "if 2 <= x[i] then 3 <= x[j]"
I want to minimize the number of variables that have the value 3
Edit: because I have a large (thousands) number of variables and constraints and performance is critical, I’m looking for a dedicated algorithm, not using a general-purpose constraint solver.
You could encode each variable as a pair of binary variables:
x[i] = 1 + 2*x2[i] + x1[i]
The inequality constraints can now be partly resolved:
1 <= x[i] can be ignored, as always true for any variable
2 <= x[i] implies (x2[i] or x1[i])
3 <= x[i] implies x2[i]
4 <= x[i] implies (x2[i] and x1[i])
1 >= x[i] implies (!x2[i] and !x1[i])
2 >= x[i] implies !x2[i]
3 >= x[i] implies (!x2[i] or !x1[i])
4 >= x[i] can be ignored, as always true for any variable
x[i] <= x[j] implies (!x2[i] or x2[j]) and
(!x1[i] or x2[j] o x1[j]) and
(!x2[i] or !x1[i] or x1[j])
Conditional constraint
if 2 <= x[i] then 3 <= x[j]
translates to
x2[j] or !x1[i]
The encoding shown above can be directly written as Conjunctive Normal Form (CNF) suitable for a SAT solver. Tools like SATInterface or bc2cnf help to automate this translation.
To minimize the number of variables which have value 3, a counting circuit combined with a digital comparator could be constructed/modelled.
Variable x[i] has value 3, if (x2[i] and !x1[i]) is true. These expressions could be inputs of a counter. The counting result could then be compared to some value which is decreased until no more solutions can be found.
Bottom line:
The problem can be solved with a general purpose solver like a SAT solver (CaDiCal, Z3, Cryptominisat) or a constraint solver like Minizinc. I am not aware of a dedicated algorithm which would outperform the general purpose solvers.
Actually, there is a fairly simple and efficient algorithm for this particular problem.
It is enough to maintain and propagate intervals and start propagating the conditional constraints when the lower bounds become >= 2.
At the end, if the interval is exactly [3,4], the optimal solution is to select 4.
More precisely:
initialize l[i]:=1, u[i]:=4
propagate constraints until fixpoint as follows:
Constraint "C<=x[i]": l[i]:=max(l[i],C)
Constraint "x[i]<=C": u[i]:=min(u[i],C)
Constraint "x[i]<=x[j]": l[j]:=max(l[j],l[i]) and u[i]:=min(u[i],u[j])
Constraint 2<=x[i] ==> 3<=x[j]: if 2<=l[i], then l[j]:=max(l[j], 3)
If u[i]<l[i], there is no solution
Otherwise, select:
x[i]=l[i] if u[i]<=3
x[i]=u[i] otherwise
This is correct and optimal because:
any solution is such that l[i]<=x[i]<=u[i], so if u[i]<l[i], there is no solution
otherwise, x[i]=l[i] is clearly a solution (but NOT x[i]=u[i] because it can be that u[i]>=2 but u[j] is not >=3)
bumping all x[i] from 3 to 4 when possible is still a solution because this change doesn't activate any new conditional constraints
what remains are the variables that are forced to be 3 (l[i]=u[i]=3), so we have found a solution with the minimal number of 3
In more details, here is a full proof:
assume that a solution x[i] is such that l[i]<=x[i]<=u[i] and let's prove that this invariant is preserved by application of any propagation rule:
Constraint "x[i]<=x[j]": x[i]<=x[j]<=u[j] and so x[i] is both <=u[i] and <=u[j] and hence <=min(u[i],u[j]). Similarly, l[i]<=x[i]<=x[j] so max(l[i],l[j])<=x[j]
The constraints "x[i]<=C" and "C<=x[i]" are similar
For the constraint "2<=x[i] ==> 3<=x[j]": either l[i]<2 and the propagation rule doesn't apply or 2<=l[i] and then 2<=l[i]<=x[i] implying 3<=x[j]. So 3<=x[j] and l[j]<=x[j] hence max(3,l[j])<=x[j]
as a result, when the fixpoint is reached and no rule can be applied anymore, if any i is such that u[i]<l[i], then there is no solution
otherwise, let's prove that this x[i] is a solution where: x[i]=l[i] if u[i]<=3 and x[i]=u[i] otherwise:
Note that x[i] is either l[i] or u[i], so l[i]<=x[i]<=u[i]
For all constraints "C<=x[i]", at fixpoint, we have l[i]=max(l[i],C), i.e., C<=l[i]<=x[i] and the constraint is satisfied
For all constraints "x[i]<=C", at fixpoint, we similarly have u[i]<=C and x[i]<=u[i]<=C and the constraint is satisfied
For all "x[i]<=x[j]", at fixpoint, we have: u[i] = min(u[i],u[j]) so u[i]<=u[j] and l[j] = max(l[j],l[i]), so l[i]<=l[j]. Then:
If u[j]<=3 then u[i]<=u[j]<=3 so x[i]=l[i]<=l[j]=x[j]
Otherwise, x[j]=u[j] and x[i]<=u[i]<=u[j]=x[j]
For all "2<=x[i] ==> 3<=x[j]": assume 2<=x[i]:
If u[i]<=3, then either:
l[i]<=2 and the fixpoint means l[j]:=max(l[j], 3) so 3<=l[j]<=x[j]
or l[i]=3 and 3=l[i]<=l[j]<=x[j]
If u[i]>3, then 3<u[i]<=u[j] and 3<u[i]<=u[j]=x[j]
Finally the solution is optimal because:
if l[i]=u[i]=3, any solution must have x[i]=3
otherwise, x[i] != 3: if u[i]<=3, then either u[i]=3 and x[i]=l[i]<3 or x[i]<=u[i]<3; and if u[i]>3 then x[i]=u[i]!=3

Linear programming - Non-Mutual Positivity Constraint

I am attempting a maximisation problem subject to various constraints.
i.e. max y = x1 + x2 + x3 + .... + xn
where each xi is a vector of values over time: x1 = (x11, x12, x13,...)
Some of the constraints state that specific values of xit cannot be positive in the same time period.
i.e. if(x1t > 0), x2t = 0; if(x2t > 0), x1t = 0
For context, the constraint is equivalent to "maximise the revenue of a shop, but you cant sell product A and B on the same day"
How do I go about formulating an LP model in Excel (using solver) to solve this.
This is called a complementarity constraint. One way of modeling this is:
x(1,t) * x(2,t) = 0
x(i,t) ≥ 0
However, this is nonlinear (and in a somewhat nasty way). A linear approach, using an extra binary variable δ can look like:
x(1,t) ≤ UP(1,t) * δ(t)
x(2,t) ≤ UP(2,t) * (1-δ(t))
x(i,t) ∈ [0,UP(i,t)] 'UP is an upper bound on x'
δ(t) ∈ {0,1} 'δ is a binary variable'

SVG Matrix : differenciate a flip, a rotation, and a flip PLUS a rotation

I am working with some SVG files, and I would like to know how to differentiate a simple rotation, from a flip PLUS a rotation.
What I know :
example matrix :
transform="matrix(a,b,c,d,e,f)" Theorical
transform="matrix(1.866 0 -0 1.866 91.480 278.480)" Practical
We can determinate the flip of an element in this matrix by looking at the elements "a" and "d". A negative "a" means an horizontal flip and a negative "d" means a vertical flip.
My troubles arrive when I do a flip PLUS a rotation. The fact is that when I do a simple rotation, "a" and "d" can be negative too ! So how can we determinate if we have only a flip, or only a rotation, or a rotation PLUS a flip ?
Here is a matrix of an element on which I did a simple horizontal flip :
transform="matrix(-2.150 -0.012 -0.012 2.150 252.235 43.335)"
"a" element(-2.150) is negative.
Here is a matrix of an element on which I did a rotation of 135 degrees anti clockwise :
transform="matrix(-1.560 -1.479 1.479 -1.560 245.655 46.646)"
"a" element(-1.560) is negative too, but it is a simple rotation with no flip.
Here is a matrix of an element on which I did a horizontal flip PLUS a rotation of 135 degrees anti clockwise :
transform="matrix(1.674 -1.349 -1.349 -1.674 238.428 45.969)"
"a" element(1.674) is positive dispite of the flip.
Do you know a method with which I could always know if there is a simple rotation, or a simple flip, or a rotation PLUS a flip ?
If I am not clear enough, do not hesitate to ask me for more details.
Short answer: if ad - bc < 0, it's a reflection.
Long answer: if I understand the Mozilla docs correctly, (x, y) -> (ax + cy, bx + dy) plus a translation we don't need to worry about.
So, what we do is imagine the vectors in 3 dimensions: (x,y,0) -> (ax+cy,bx+dy,0). Take a unit i vector (1,0,0), and a unit j vector (0,1,0) and apply the transformation to each, to get i' and j'.
Now, the cunning bit: calculate the cross product of i' and j' and see whether it still points in the same direction as i x j (= k, (0,0,1) ). If so, the pair i' and j' are the same 'handedness' as i and j and no reflection has taken place. If it is opposite (i.e. pointing along -k) it has been reflected.
Cranking through the numbers, i' = (a,b,0) and j' is (c,d,0), and i' x j' is (0,0,ad-bc). If ad-bc < 0, then it's pointing along -k and a reflection has taken place.
Thanks for the answer, but it did not completely answered my question. I found a solution, so I post it here. I know my solution is not the best and could be improved easily, but I did not have the time to optimize it.
Here is what I did :
With the a and b element of the matrix, I calculated the radiant angle with the atan2 ( in Java Math.atan2(b, a) ). I transformed it in degrees.
Once we have got the degree angle, I did four cases, one for each quarter of the trigonometric circle :
First, I noticed that when you do a flip (horizontal or vertical), "b" and "c" elements have always the same sign.
So, here is the first condition : (sign of b) == (sign of c).
Depending on the sign of "b" element (or "c", because "b" == "c") AND of "d" element, we can determinate if we do a horizontal or a vertical flip.
So to determine precisely if we do a flip (H or V), we need to know the sign of "b" and "c", PLUS the sign of "d".
The last condition is on the sign of the degree angle calculated at the start. It will help us to determine the angle of rotation.
What I understood is that the axes move while we do a flip.
Let me show you the complete solution. I found four conditions, one for each quarter of the trigonometric circle :
double radianAngle = Math.atan2(b, a);
double degreeAngle = Math.toDegrees(radianAngle);
// if "b" and "c" have the same sign means there is a flip (H or V)
if (((b < 0) == ( c < 0))) {
if (b < 0 && d > 0 && degreeAngle < 0) {
// It is a horizontal flip
// The new angle is (- 180 - degreeAngle)
} else if (b > 0 && d < 0 && degreeAngle > 0) {
// It is a vertical flip
// The new angle is (-degreeAngle)
} else if (b > 0 && d > 0 && degreeAngle > 0) {
// It is a horizontal flip
// The new angle is (180 - degreeAngle)
} else if (b < 0 && d < 0 && degreeAngle < 0) {
// It is a horizontal flip
// The new angle is (-degreeAngle)
}
} else {
// No flip (H or V)
}
There are two more conditions which match the cases to the limits :
if (degreeAngle == 180 && a < 0) {
// It is a pure horizontal flip, with no rotation
} else if (degreeAngle == 0 && d < 0) {
// It is a pure vertical flip with no rotation
}
Here is what I finally used, and it works for all the cases.
I think it would have been easier to understand with more draws, but I do not have the time to do a more detailed answer.
Hope it helps,

Line segment intersection

I found this code snippet on raywenderlich.com, however the link to the explanation wasn't valid anymore. I "translated" the answer into Swift, I hope you can understand, it's actually quite easy even without knowing the language. Could anyone explain what exactly is going on here? Thanks for any help.
class func linesCross(#line1: Line, line2: Line) -> Bool {
let denominator = (line1.end.y - line1.start.y) * (line2.end.x - line2.start.x) -
(line1.end.x - line1.start.x) * (line2.end.y - line2.start.y)
if denominator == 0 { return false } //lines are parallel
let ua = ((line1.end.x - line1.start.x) * (line2.start.y - line1.start.y) -
(line1.end.y - line1.start.y) * (line2.start.x - line1.start.x)) / denominator
let ub = ((line2.end.x - line2.start.x) * (line2.start.y - line1.start.y) -
(line2.end.y - line2.start.y) * (line2.start.x - line1.start.x)) / denominator
//lines may touch each other - no test for equality here
return ua > 0 && ua < 1 && ub > 0 && ub < 1
}
You can find a detailed segment-intersection algorithm
in the book Computational Geometry in C, Sec. 7.7.
The SegSegInt code described there is available here.
I recommend avoiding slope calculations.
There are several "degenerate" cases that require care: collinear segments
overlapping or not, one segment endpoint in the interior of the other segments,
etc. I wrote the code to return an indication of these special cases.
This is what the code is doing.
Every point P in the segment AB can be described as:
P = A + u(B - A)
for some constant 0 <= u <= 1. In fact, when u=0 you get P=A, and you getP=B when u=1. Intermediate values of u will give you intermediate values of P in the segment. For instance, when u = 0.5 you will get the point in the middle. In general, you can think of the parameter u as the ratio between the lengths of AP and AB.
Now, if you have another segment CD you can describe the points Q on it in the same way, but with a different u, which I will call v:
Q = C + v(D - C)
Again, keep in mind that Q lies between C and D if, and only if, 0 <= v <= 1 (same as above for P).
To find the intersection between the two segments you have to equate P=Q. In other words, you need to find u and v, both between 0 and 1 such that:
A + u(B - A) = C + v(D - C)
So, you have this equation and you have to see if it is solvable within the given constraints on u and v.
Given that A, B, C and D are points with two coordinates x,y each, you can open the equation above into two equations:
ax + u(bx - ax) = cx + v(dx - cx)
ay + u(by - ay) = cy + v(dy - cy)
where ax = A.x, ay = A.y, etc., are the coordinates of the points.
Now we are left with a 2x2 linear system. In matrix form:
|bx-ax cx-dx| |u| = |cx-ax|
|by-ay cy-dy| |v| |cy-ay|
The determinant of the matrix is
det = (bx-ax)(cy-dy) - (by-ay)(cx-dx)
This quantity corresponds to the denominator of the code snippet (please check).
Now, multiplying both sides by the cofactor matrix:
|cy-dy dx-cx|
|ay-by bx-ax|
we get
det*u = (cy-dy)(cx-ax) + (dx-cx)(cy-ay)
det*v = (ay-by)(cx-ax) + (bx-ax)(cy-ay)
which correspond to the variables ua and ub defined in the code (check this too!)
Finally, once you have u and v you can check whether they are both between 0 and 1 and in that case return that there is intersection. Otherwise, there isn't.
For a given line the slope is
m=(y_end-y_start)/(x_end-x_start)
if two slopes are equal, the lines are parallel
m1=m1
(y1_end-y_start)/(x1_end-x1_start)=(y2_end-y2_start)/(x2_end-x2_start)
And this is equivalent to checking that the denominator is not zero,
Regarding the rest of the code, find the explanation on wikipedia under "Given two points on each line"

Which method is best for computing the Matthews correlation coefficient (MCC) values for an unrelated data set?

Which method is best suited for computing the Matthews correlation coefficient (MCC) values for an unrelated data set?
I'm not sure what is meant by "best method" here, but given a confusion matrix the computation should be straightforward. In Python:
import math
# tp is true positives, fn is false negatives, etc
mcc = (tp*tn - fp*fn) / math.sqrt( (tp + fp)*(tp + fn)*(tn + fp)*(tn + fn) )
Previous answer is correct, however in your formula you might want to also consider those cases where any of the four sums in the denominator is zero; in such cases, the denominator can be arbitrarily set to one.
For the sake of completeness, I'm adding R code below (the original code can be found here)
mcc <- function (actual, predicted)
{
# handles zero denominator and verflow error on large-ish products in denominator.
#
# actual = vector of true outcomes, 1 = Positive, 0 = Negative
# predicted = vector of predicted outcomes, 1 = Positive, 0 = Negative
# function returns MCC
TP <- sum(actual == 1 & predicted == 1)
TN <- sum(actual == 0 & predicted == 0)
FP <- sum(actual == 0 & predicted == 1)
FN <- sum(actual == 1 & predicted == 0)
sum1 <- TP+FP; sum2 <-TP+FN ; sum3 <-TN+FP ; sum4 <- TN+FN;
denom <- as.double(sum1)*sum2*sum3*sum4 # as.double to avoid overflow error on large products
if (any(sum1==0, sum2==0, sum3==0, sum4==0)) {
denom <- 1
}
mcc <- ((TP*TN)-(FP*FN)) / sqrt(denom)
return(mcc)
}

Resources