I have a question please , it's about 'Isomap' nonlinear dimensionality reduction, in normal cases when I introduce a matrix distance of 100 * 100
and I apply Isomap [http://isomap.stanford.edu/][1] I get the coordinates of 100 points ,in other cases I do not understand why, with a matrix of 150 * 150 i obtain juste 35 or 50 points ?
The first step of Isomap is usually to create a "nearest neighbor matrix" so that every point is connected to its 4 or 6 or 8 or something nearest neighbors.
So, you may start with a distance matrix that is 100 x 100 and every point has a distance to 99 other points, after this first step the distances for anything but the (4 or 6 or 8) closest points are set to infinity.
Then Isomap computes a shortest path distance, hopping between nearby points to get to farther away points.
In your case, when you create a matrix of 150 points, I think that once you only keep the nearby points in the first step, the points become disconnected, and there is path between distant points. The default behavior of many Isomap codes is to return the Isomap embedding of the largest collection of connected points.
How can you fix this?
1. You can increase the number of nearest neighbors that you use until you get all the points included.
Caveat: In many natural cases, if you include most or all neighbors, this ends up in the case where the shortest-path part of the procedure does nothing, and this reduces to a problem called "multi-dimensional scaling" which gives a linear embedding.
Related
I am currently researching the traveling salesman problem, and was wondering if anyone would be able to simply explain the held karp lower bound. I have been looking at a lot of papers and i am struggling to understand it. If someone could simply explain it that would be great.
I also know there is the method of calculating a minimum spanning tree of the vertices not including the starting vertex and then adding the two minimum edges from the starting vertex.
I'll try to explain this without going in too much details. I'll avoid formal proofs and I'll try to avoid technical jargon. However, you might want to go over everything again once you have read everything through. And most importantly; try the algorithms out yourself.
Introduction
A 1-tree is a tree that consists of a vertex attached with 2 edges to a spanning tree. You can check for yourself that every TSP tour is a 1-Tree.
There is also such a thing as a minimum-1-Tree of a graph. That is the resulting tree when you follow this algorithm:
Exclude a vertex from your graph
Calculate the minimum spanning tree of the resulting graph
Attach the excluded vertex with it's 2 smallest edges to the minimum spanning tree
*For now I'll assume that you know that a minimum-1-tree is a lower bound for the optimal TSP tour. There is an informal proof at the end.
You will find that the resulting tree is different when you exclude different vertices. However all of the resulting trees can be considered lower bounds for the optimal tour in the TSP. Therefore the largest of the minimum-1-trees you have found this way is a better lower bound then the others found this way.
Held-Karp lower bound
The Held-Karp lower bound is an even tighter lower bound.
The idea is that you can alter the original graph in a special way. This modified graph will generate different minimum-1-trees then the original.
Furthermore (and this is important so I'll repeat it throughout this paragraph with different words), the modification is such that the length of all the valid TSP tours are modified by the same (known) constant. In other words, the length of a valid TSP solution in this new graph = the length of a valid solution in the original graph plus a known constant. For example: say the weight of the TSP tour visiting vertices A, B, C and D in that order in the original graph = 10. Then the weight of the TSP tour visiting the same vertices in the same order in the modified graph = 10 + a known constant.
This, of course, is true for the optimal TSP tour as well. Therefore the optimal TSP tour in the modified graph is also an optimal tour in the original graph. And a minimum-1-Tree of the modified graph is a lower bound for the optimal tour in the modified graph. Again, I'll just assume you understand that this generates a lower bound for your modified graph's optimal TSP tour. By substracting another known constant from the found lower bound of your modified graph, you have a new lower bound for your original graph.
There are infinitly many of such modifications to your graph. These different modifications result in different lower bounds. The tightest of these lower bounds is the Held-Karp lower bound.
How to modify your graph
Now that I have explained what the Held-Karp lower bound is, I will show you how to modify your graph to generate different minimum-1-trees.
Use the following algorithm:
Give every vertex in your graph an arbitrary weight
update the weight of every edge as follows: new edge weight = edge weight + starting vertex weight + ending vertex weight
For example, your original graph has the vertices A, B and C with edge AB = 3, edge AC = 5 and edge BC = 4. And for the algorithm you assign the (arbitrary) weights to the vertices A: 30, B: 40, C:50 then the resulting weights of the edges in your modified graph are AB = 3 + 30 + 40 = 73, AC = 5 + 30 + 50 = 85 and BC = 4 + 40 + 50 = 94.
The known constant for the modification is twice the sum of the weights given to the vertices. In this example the known constant is 2 * (30 + 40 + 50) = 240. Note: the tours in the modified graph are thus equal to the original tours + 240. In this example there is only one tour namely ABC. The tour in the original graph has a length of 3 + 4 + 5 = 12. The tour in the modified graph has a length of 73 + 85 + 94 = 252, which is indeed 240 + 12.
The reason why the constant equals twice the sum of the weights given to the vertices is because every vertex in a TSP tour has degree 2.
You will need another known constant. The constant you substract from your minimum-1-tree to get a lower bound. This depends on the degree of the vertices of your found minimum-1-tree. You will need to multiply the weight you have given each vertex by the degree of the vertex in that minimum-1-tree. And add that all up. For example if you have given the following weights A: 30, B:40, C:50, D:60 and in your minimum spanning tree vertex A has degree 1, vertex B and C have degree 2, vertex D has degree 3 then your constant to substract to get a lower bound = 1 * 30 + 2 * 40 + 2 * 50 + 3 * 60 = 390.
How to find the Held-Karp lower bound
Now I believe there is one more question unanswered: how do I find the best modification to my graph, so that I get the tightest lower bound (and thus the Held-Karp lower bound)?
Well, that's the hard part. Without delving too deep: there are ways to get closer and closer to the Held-Karp lower bound. Basicly one can keep modifying the graph such that the degree of all vertices get closer and closer to 2. And thus closer and closer to a real tour.
Minimum-1-tree is a lower bound
As promised I would give an informal proof that a minimum-1-tree is a lower bound for the optimal TSP solution. A minimum-1-Tree is made of two parts: a minimum-spanning-tree and a vertex attached to it with 2 edges. A TSP tour must go through the vertex attached to the minimum spanning tree. The shortest way to do so is through the attached edges. The tour must also visit all the vertices in the minimum spanning tree. That minimum spanning tree is a lower bound for the optimal TSP for the graph excluding the attached vertex. Combining these two facts one can conclude that the minimum-1-tree is a lower bound for the optimal TSP tour.
Conclusion
When you modify a graph in a certain way and find the minimum-1-Tree of this modified graph to calculate a lower bound. The best possible lower bound through these means is the Held-Karp lower bound.
I hope this answers your question.
Links
For a more formal approach and additional information I recommend the following links:
ieor.berkeley.edu/~kaminsky/ieor251/notes/3-16-05.pdf
http://www.sciencedirect.com/science/article/pii/S0377221796002147
I'm using a static KD-Tree for nearest neighbor search in 3D space. However, the client's specifications have now changed so that I'll need a weighted nearest neighbor search instead. For example, in 1D space, I have a point A with weight 5 at 0, and a point B with weight 2 at 4; the search should return A if the query point is from -5 to 5, and should return B if the query point is from 5 to 6. In other words, the higher-weighted point takes precedence within its radius.
Google hasn't been any help - all I get is information on the K-nearest neighbors algorithm.
I can simply remove points that are completely subsumed by a higher-weighted point, but this generally isn't the case (usually a lower-weighted point is only partially subsumed, like in the 1D example above). I could use a range tree to query all points in an NxNxN cube centered on the query point and determine the one with the greatest weight, but the naive implementation of this is wasteful - I'll need to set N to the point with the maximum weight in the entire tree, even though there may not be a point with that weight within the cube, e.g. let's say the point with the maximum weight in the tree is 25, then I'll need to set N to 25 even though the point with the highest weight for any given cube probably has a much lower weight; in the 1D case, if I have a point located at 100 with weight 25 then my naive algorithm would need to set N to 25 even if I'm outside of the point's radius.
To sum up, I'm looking for a way that I can query the KD tree (or some alternative/variant) such that I can quickly determine the highest-weighted point whose radius covers the query point.
FWIW, I'm coding this in Java.
It would also be nice if I could dynamically change a point's weight without incurring too high of a cost - at present this isn't a requirement, but I'm expecting that it may be a requirement down the road.
Edit: I found a paper on a priority range tree, but this doesn't exactly address the same problem in that it doesn't account for higher-priority points having a greater radius.
Use an extra dimension for the weight. A point (x,y,z) with weight w is placed at (N-w,x,y,z), where N is the maximum weight.
Distances in 4D are defined by…
d((a, b, c, d), (e, f, g, h)) = |a - e| + d((b, c, d), (f, g, h))
…where the second d is whatever your 3D distance was.
To find all potential results for (x,y,z), query a ball of radius N about (0,x,y,z).
I think I've found a solution: the nested interval tree, which is an implementation of a 3D interval tree. Rather than storing points with an associated radius that I then need to query, I instead store and query the radii directly. This has the added benefit that each dimension does not need to have the same weight (so that the radius is a rectangular box instead of a cubic box), which is not presently a project requirement but may become one in the future (the client only recently added the "weighted points" requirement, who knows what else he'll come up with).
Given a general planar 3D polygon, is there a general way to find the orthonormal basis for that planar polygon?
The most straight forward way to do it is to assume to take the first 3 points of the polygon, and form two vectors each, and these are the two orthonormal basis vectors that we are looking for. But the problem for this approach is that these 3 points may line on the same line in the polygon, and hence instead of getting two orthonormal vectors, we get only one.
Another approach to find the second orthonormal vector is to loop through the polygon and find another point that forms a different orthonormal vector than the first one, but this approach is susceptible to numerical errors (e.g, what if the second vector is almost the same with the first vector? The numerical errors can be significant).
Is there any other better approach?
You can use the cross product of any two lines connected by any two vertices. If the cross product is too low then you're in degenerate territory.
You can also take the centroid (the avg of the points, which is guaranteed to lie on the same plane) and do pick the largest of any two cross products of the vectors from the centroid to any vertex. This will be the most accurate normal. Please note that if the largest cross product is small, you may have an inaccurate normal.
If you can't find any cross product that isn't close to 0, your original poly is degenerate and a normal will be hard to find. You could use arbitrary precision or adaptive precision algebra in this case, but, of course, the round-off error is already significant in the source data, so this may not help. If possible, remove degenerate polys first, and if you have to, sew the mesh back up :).
It's a bit ott but one way would be to compute the covariance matrix of the points, and then diagonalise that. If the points are indeed planar then one of the eigenvalues of the covariance matrix will be zero (or rather very small, due to finite precision arithmetic) and the corresponding eigenvector will be a normal to the plane; the other two eigenvectors will span the plane of the polygon.
If you have N points, and the i'th coordinate of the k'th point is p[k,i], then the mean (vector) and (3x3) covariance matrix can be computed by
m[i] = Sum{ k | p[k,i]}/N (i=1..3)
C[i,j] = Sum{ k | (p[k,i]-m[i])*(p[k,j]-m[j]) }/N (i,j=1..3)
Note that C is symmetric, so that to find how to diagonalise it you might want to look up the "symmetric eigenvalue problem"
I've a small set of data points (around 10) in a 2D space, and each of them have a category label. I wish to classify a new data point based on the existing data point labels and also associate a 'probability' for belonging to any particular label class.
Is it appropriate to label the new point based on the label to its nearest neighbor( like a K-nearest neighbor, K=1)? For getting the probability I wish to permute all the labels and calculate all the minimum distance of the unknown point and the rest and finding the fraction of cases where the minimum distance is lesser or equal to the distance that was used to label it.
Thanks
The Nearest Neighbour method is already using the Bayes theorem to estimate the probability using the points in a ball containing your chosen K points. There is no need to transform, as the number of points in the ball of K points belonging to each label divided by the total number of points in that ball already is an approximation of the posterior probability of that label. In other words:
P(label|z) = P(z|label)P(label) / P(z) = K(label)/K
This is obtained using the Bayes rule of probability on an estimated probability estimated using a subset of the data. In particular, using:
VP(x) = K/N (this gives you the probability of a point in a ball of volume V)
P(x) = K/NV (from above)
P(x=label) = K(label)/N(label)V (where K(label) and N(label) are the number of points in the ball of that given class and the number of points in the total samples of that class)
and
P(label) = N(label)/N.
Therefore, just pick a K, calculate the distances, count the points and by checking their labels and recounting you will have your probability.
Roweis uses a probabilistic framework with KNN in his publication Neighbourhood Component Analysis. The idea is to use a "soft" nearest neighbour classification, where the probability that a point i uses another point j as its neighbour is defined by
,
where d_ij is the euclidean distance between point i and j.
The are no probabilities for such K-nearest classification method because it is discriminative classification as well as SVM. There are should be used postporcess for learning probabilities on unseen data with generative model like logistic regression.
1. learn K nearest classifier
2. Train logistic regression on distance and average distance to K nearest for validation data.
Check for details LibSVM article.
Sort the distances to the 10 centres; they could be
1 5 6 ... — one near, others far
1 1 1 5 6 ... — 3 near, others far
... lots of possibilities.
You could combine the 10 distances to a single number, e.g. 1 - (nearest / average) ** p,
but that's throwing away information.
(Different powers p makes the hills around the centres steeper or flatter.)
If your centres are really Gaussian hills though, take a look at
Multivariate kernel density estimation.
Added:
There are zillions of functions that go smoothly between 0 and 1,
but that doesn't make them probabilities of something.
"Probability" means either that chance, likelihood, is involved,
as in probability of rain;
or that you're trying to impress somebody.
Added again: scholar.google.com "(single|1) nearest neighbor classifier" gets > 300 hits;
"k nearest neighbor classifier" gets almost 3000.
It seems to me (non-expert) that, out of 10 different ways of mapping k-NN distances to labels,
each one might be better than the 9 others — for some data, with some error measure.
Anyway, you could try asking stats.stackexchange.com ,
The answer is : it depends.
Imagine your labels are the surname of a person, and the X,Y coordinates represent some essential characteristics of the person's DNA sequence. Clearly a more close DNA description enhance the probability of having the same surnames.
Now suppose the X,Y is the lat/long of the work office for that person. Working closer isn't related to label (surname) sharing.
So, it depends on the semantic of your tags and axes.
HTH!
I have a reference set of n points, and another set which 'approximates' each of those points. How do I find out the absolute/percentage error between the approximation and my reference set.
Put in other words, I have a canned animation and a simulation. How do I know how much is the 'drift' between the 2 in terms of a single number? That is, how good is the simulation approximating the vertices as compared to those of the animation.
I actually do something like this for all vertices: |actual - reference|/|actual| and then average out the errors by dividing the number of verts. Is this correct at all?
Does this measurement really have to be a percentage value? I'm guessing you have one reference set, and then several sets that approximate this set and you want to pick the one that is "the best" in some sense.
I'd add the squared distances between the actual and the reference:
avgSquareDrift = sum(1..n, |actual - reference|^2) / numvertices
Main advantage with this approach, is that we dont need to take apply the square root, which is a costly operation.
If you sum the formula you have over all vertices (and then divide by the number of verts) you will have calculated the average percentage error in position for all vertices.
However, this percentage error is probably not quite what you want, because vertices closer to the origin will have a greater "percentage error" for the same displacement because their magnitude is smaller.
If you don't divide by anything at all, you will have the average drift in world units, which may be exactly what you want:
average_drift = sum(1->numvertices, |actual - reference|) / numvertices
You may want to divide by something more appropriate to your particular situation to get a meaningful unitless number. If you divide average_drift by the height of your model, you will have the error as a percentage of the model size, which could be useful.
If individual vertices are likely to have more error if they are a long distance from a vertex 'parented' to them, as could be the case if they are vertices of a jointed model, you could divide each error by the length of their parent joint to get the average error normalised for joint orientation -- i.e. what the average drift would be if each joint were of unit length:
orientation_drift = sum(1->numvertices, |actual - reference| / jointlength) / numvertices