I have a region that's partitioned into a bunch of subregions called blocks. I have a graph that's encoded as follows: every block is given a node, and (i,j) is an edge iff blocks i and j touch. I have a (long, long) list of points, and for each point I want to find the block that contains that point. Is there a faster algorithm than just picking a random vertex on the graph and A* searching on Euclidean distance?
Start with a random block.
Determine if the block contains the point.
Determine wich of the neighbours blocks (you have neighbours through edges) has the shortest distance to target.
Move to that block.
Repeat steps 2-4 until step "2" is true.
Note that if you track the block you come from you don't need to test this block at step "3".
Related
Given a set P of n points in 2D, for any point x in P, what is the fastest way to find out the farthest neighbor of x? By farthest neighbor, we mean a point in P which has the maximum Euclidean distance to x.
To the best of my knowledge, the current standard kNN search algorithm for various trees (R-Trees, quadtrees, kd-trees) was developed by:
G. R. Hjaltason and H. Samet., "Distance browsing in spatial
databases.", ACM TODS 24(2):265--318. 1999
See here. It traverses the tree based on a priority queue of nearest nodes/entries. One key insight is that the algorithm also works for farthest neighbor search.
The basic algorithm uses a priority queue. The queue can contain tree nodes as well as data entries, all sorted by their distance to your search point.
As initial step it adds the root node to the priority queue. Then repeat the following until k entries have been found:
Take the first element from the queue. If it is an entry, return it. If it is a node, add all elements in the node to the priority queue.
Repeat 1.
The paper describes an implementation for R-Trees, but they claim it can be applied to most tree-like structures. I have implemented the nearest neighbor version myself for R-Trees and PH-Trees (a special type of quadtree), both in Java. I think I know how to do it efficiently for KD-Trees but I believe it is somewhat complicated.
def calculateShortestPath(self,vertexList,edgeList,startVertex):
startVertex.minDistance=0
for i in range(0,len(vertexList)-1):#N-1 ITERATION
for edge in edgeList:
#RELAXATION PROCESS
u=edge.startVertex
v=edge.targetVertex
newDistance=u.minDistance+edge.weight
if newDistance<v.minDistance:
v.minDistance=newDistance
v.predecessor=u
for edge in edgeList:# FINAL ITERATION TO DETECT NEGATIVE CYCLES
if self.hasCycle(edge):
print("NEGATIVE CYCLE DETECTED")
self.HAS_CYCLE=True
return
The above function is a part of the implementation of the Bellman-Ford Algorithm. My question is that how can one be sure that after N-1 iterations , the minimum distance has been calculated ? In case of Dijkstra it was understood that once the priority queue has gone empty all the shortest paths have been created but I can't understand the reasoning behind the N-1 over here.
N-Length of the Vertex List.
Vertex List-contains the different vertex.
EdgeList-List of the different Edges.
The implementation may be wrong since I read it from a tutorial video.Thanks For The Help
The outer loop executes N-1 times, because the shortest path can not contain more edges, otherwise the shortest path will contain a loop which can be avoided.
Minor: if you have N vertexes and N edges then at least 1 vertex is used twice, so such a path will contain a loop.
The algorithm unlike Djkstra, is not greedy, but dynamic. In the first iteration of the loops it builds one possible path between two vertex and then at each iteration it improves the path by at least one edge. As the shortest path can use maximum n-1 edges, the iteration of the loop continues that much to find the shortest path.
For negative cycle, algorithm at nth iteration checks one more time to see if an edge exists to decrease the weight of the shortest path having n-1 edges. If yes, then that edge must be negative as the shortest path with all positive edges should be consisted of n-1 not n edges.
You can take any graph and make sure that it does not have a negative edge sum cycle and taking the right order of edges (selecting edge with source vertex according to topological sort order) you can come to the answer in only one iteration by just relaxing every edge once.
The n-1 term comes when we take into consideration that we do not take edges in a logical pattern and we process it in a random fashion.
Let's assume I have a polygon and I have computed all of its self-intersections. How do I determine whether a specific edge is inside or outside according to the nonzero fill rule? By "outside edge" I mean an edge which lies between a filled region and a non-filled region.
Example:
On the left is an example polygon, filled according to the nonzero fill rule. On the right is the same polygon with its outside edges highlighted in red. I'm looking for an algorithm that, given the edges of the polygon and their intersections with each other, can mark each of the edges as either outside or inside.
Preferably, the solution should generalize to paths that are composed of e.g. Bezier curves.
[EDIT] two more examples to consider:
I've noticed that the "outside edge" that is enclosed within the shape must cross an even number of intersections before they get to the outside. The "non-outside edges" that are enclosed must cross an odd number of intersections.
You might try an algorithm like this
isOutside = true
edge = find first outside edge*
edge.IsOutside = isOutside
while (not got back to start) {
edge = next
if (gone over intersection)
isOutside = !isOutside
edge.IsOutside = isOutside
}
For example:
*I think that you can always find an outside edge by trying each line in turn: try extending it infinitely - if it does not cross another line then it should be on the outside. This seems intuitively true but I wonder if there are some pathological cases where you cannot find a start line using this rule. Using this method of finding the first line will not work with curves.
I think, you problem can be solved in two steps.
A triangulation of a source polygon with algorithm that supports self-intersecting polygons. Good start is Seidel algorithm. The section 5.2 of the linked PDF document describes self-intersecting polygons.
A merge triangles into the single polygon with algorithm that supports holes, i.e. Weiler-Atherton algorithm. This algorithm can be used for both the clipping and the merging, so you need it's "merging" case. Maybe you can simplify the algorithm, cause triangles form first step are not intersecting.
I realized this can be determined in a fairly simple way, using a slight modification of the standard routine that computes the winding number. It is conceptually similar to evaluating the winding both immediately to the left and immediately to the right of the target edge. Here is the algorithm for arbitrary curves, not just line segments:
Pick a point on the target segment. Ensure the Y derivative at that point is nonzero.
Subdivide the target segment at the Y roots of its derivative. In the next point, ignore the portion of the segment that contains the point you picked in step 1.
Determine the winding number at the point picked in 1. This can be done by casting a ray in the +X direction and seeing what intersects it, and in what direction. Intersections at points where Y component of derivative is positive are counted as +1. While doing this, ignore the Y-monotonic portion that contains the point you picked in step 1.
If the winding number is 0, we are done - this is definitely an outside edge. If it is nonzero and different than -1, 0 or 1, we are done - this is definitely an inside edge.
Inspect the derivative at the point picked in step 1. If intersection of the ray with that point would be counted as -1 and the winding number obtained in step 3 is +1, this is an outside edge; similarly for +1/-1 case. Otherwise this is an inside edge.
In essence, we are checking whether intersection of the ray with the target segment changes the winding number between zero and non-zero.
I'd suggest what I feel is a simpler implementation of your solution that has worked for me:
1. Pick ANY point on the target segment. (I arbitrarily pick the midpoint.)
2. Construct a ray from that point normal to the segment. (I use a left normal ray for a CW polygon and a right normal ray for a CCW polygon.)
3. Count the intersections of the ray with the polygon, ignoring the target segment itself. Here you can chose a NonZero winding rule [decrement for polygon segments crossing to the left (CCW) and increment for a crossing to the right (CW); where an inside edge yields a zero count] or an EvenOdd rule [count all crossings where an inside edge yields an odd count]. For line segments, crossing direction is determined with a simple left-or-right test for its start and end points. For arcs and curves it can be done with tangents at the intersection, an exercise for the reader.
My purpose for this analysis is to divide a self-intersecting polygon into an equivalent set of not self-intersecting polygons. To that end, it's useful to likewise analyze the ray in the opposite direction and sense if the original polygon would be filled there or not. This results in an inside/outside determination for BOTH sides of the segment, yielding four possible states. I suspect an OUTSIDE-OUTSIDE state might be valid only for a non-closed polygon, but for this analysis it might be desirable to temporarily close it. Segments with the same state can be collected into non-intersecting polygons by tracing their shared intersections. In some cases, such as with a pure fill, you might even decide to eliminate INSIDE-INSIDE polygons as redundant since they fill an already-filled space.
And thanks for your original solution!!
I've implemented K-Means in Java and have a bit of a head scratcher. I select my initial centroids by choosing a random value in each dimension within the range of values of the data points. I've run into cases where this results in one or more of these centroids not ending up being the closet centroid of any data point. So what do I do for the next iteration? Just leave it at its original randomized value? Pick a new random value? Compute as an average of the other centroids? Seems like this isn't accounted for in the original algorithm, but probably I've just missed something.
Most implementations of k-means define initial centroids using actual data points, not random points in the bounding box drawn by the variables. However, some suggestions for solving your actual problem are below.
You could take another data-point at random and make it a new cluster centroid. This is very simple and fast to implement, and shouldn't affect the algorithm adversely.
You could also try making a smarter initial selection of cluster centroids using kmeans++. This algorithm chooses the first centroid randomly, and picks the remaining K-1 centroids to try and maximize the inter-centroid distance. By picking smarter centroids, you are much less likely to encounter the problem of a centroid being assigned zero data points.
If you wanted to be slightly more clever clever, you could use the kmeans++ algorithm to make a new centroid whenever a centroid gets assigned zero data points.
The way I've used it, the initial values were taken as random points from the data set, not random points in the spanned space. That means each cluster has at least one point in it initially. You could still get unlucky with outliers but with any luck you'll be able to detect this and restart with different points. (Provided "K clusters of points" is an adequate description of your data)
Instead of picking random values (which can be pretty meaningless if the space of possible values is large in comparison to the clusters), many implementations pick random points from the dataset as the initial centroids.
I have an interesting problem coming up soon and I've started to think about the algorithm. The more I think about it, the more I get frightened because I think it's going to scale horribly (O(n^4)), unless I can get smart. I'm having trouble getting smart about this one. Here's a simplified description of the problem.
I have N polygons (where N can be huge >10,000,000) that are stored as a list of M vertices (where M is on the order of 100). What I need to do is for each polygon create a list of any vertices that are shared among other polygons (Think of the polygons as surrounding regions of interest, sometimes the regions but up against each other). I see something like this
Polygon i | Vertex | Polygon j | Vertex
1 1 2 2
1 2 2 3
1 5 3 1
1 6 3 2
1 7 3 3
This mean that vertex 1 in polygon 1 is the same point as vertex 2 in polygon 2, and vertex 2 in polygon 1 is the same point as vertex 3 in polygon 2. Likewise vertex 5 in polygon 1 is the same as vertex 1 in polygon 3....
For simplicity, we can assume that polygons never overlap, the closest they get is touching at the edge, and that all the vertices are integers (to make the equality easy to test).
The only thing I can thing of right now is for each polygon I have to loop over all of the polygons and vertices giving me a scaling of O(N^2*M^2) which is going to be very bad in my case. I can have very large files of polygons, so I can't even store it all in RAM, so that would mean multiple reads of the file.
Here's my pseudocode so far
for i=1 to N
Pi=Polygon(i)
for j = i+1 to N
Pj=Polygon(j)
for ii=1 to Pi.VertexCount()
Vi=Pi.Vertex(ii)
for jj=1 to Pj.VertexCount()
Vj=Pj.Vertex(jj)
if (Vi==Vj) AddToList(i,ii,j,jj)
end for
end for
end for
end for
I'm assuming that this has come up in the graphics community (I don't spend much time there, so I don't know the literature). Any Ideas?
This is a classic iteration-vs-memory problem. If you're comparing every polygon with every other polygon, you'll run into a O(n^2) solution. If you build a table as you step through all the polygons, then march through the table afterwards, you get a nice O(2n) solution. I ask a similar question during interviews.
Assuming you have the memory available, you want to create a multimap (one key, multiple entries) with each vertex as the key, and the polygon as the entry. Then you can walk each polygon exactly once, inserting the vertex and polygon into the map. If the vertex already exists, you add the polygon as an additional entry to that vertex key.
Once you've hit all the polygons, you walk the entire map once and do whatever you need to do with any vertex that has more than one polygon entry.
If you have the polygon/face data you don't even need to look at the vertices.
Create an array from [0..M] (where M is the number of verts)
iterate over the polygons and increment the array entry of each vertex index.
This gives you an array that describes how many times each vertex is used.*
You can then do another pass over the polygons and check the entry for each vertex. If it's > 1 you know that vertex is shared by another polygon.
You can build upon this strategy further if you need to store/find other information. For example instead of a count you could store polygons directly in the array allowing you to get a list of all faces that use a given vertex index. At this point you're effectively creating a map where vertex indices are the key.
(*this example assumes you have no degenerate polygons, but those could easily be handled).
Well, one simple optimization would be to make a map (hashtable, probably) that maps each distinct vertex (identified by its coordinates) to a list of all polygons of which it is a part. That cuts down your runtime to something like O(NM) - still large but I have my doubts that you could do better, since I can't imagine any way to avoid examining all the vertices.