KDTree with periodic boundary conditions and pair distances in output - python-3.x

I want to run a nearest neighbour search over >10k points that lie within a periodic box and returns me the distances of these points together with their indices.
So far I tried sklearn.neighbors.KDTree(positions).query_radius(positions, r=maximum_distance,return_distance=True) which returns me the nearest neighbour distances within a max. radius, however it does not work for periodic boundary condition (PBC). Another method I have explored is scipy.spatial.cKDTree(positions, boxsize=box_size).query_pairs(r=maximum_distance) which works with PBC but does not return distances between pairs.
Would it be possible to extend sklearn.neighbors.KDTree with the capability to handle PBC as scipy.spatial.cKDTreedoes?
Or
Would it be possible to extend scipy.spatial.cKDTree with the capability to return pair distances?

The answer is:
scipy.spatial.cKDTree().query()

Related

how to find farthest neighbors in Euclidean space?

Given a set P of n points in 2D, for any point x in P, what is the fastest way to find out the farthest neighbor of x? By farthest neighbor, we mean a point in P which has the maximum Euclidean distance to x.
To the best of my knowledge, the current standard kNN search algorithm for various trees (R-Trees, quadtrees, kd-trees) was developed by:
G. R. Hjaltason and H. Samet., "Distance browsing in spatial
databases.", ACM TODS 24(2):265--318. 1999
See here. It traverses the tree based on a priority queue of nearest nodes/entries. One key insight is that the algorithm also works for farthest neighbor search.
The basic algorithm uses a priority queue. The queue can contain tree nodes as well as data entries, all sorted by their distance to your search point.
As initial step it adds the root node to the priority queue. Then repeat the following until k entries have been found:
Take the first element from the queue. If it is an entry, return it. If it is a node, add all elements in the node to the priority queue.
Repeat 1.
The paper describes an implementation for R-Trees, but they claim it can be applied to most tree-like structures. I have implemented the nearest neighbor version myself for R-Trees and PH-Trees (a special type of quadtree), both in Java. I think I know how to do it efficiently for KD-Trees but I believe it is somewhat complicated.

Calculating average curvature of a polygonal chain with non-uniform segment length

I'm using a polygonal chain to approximate a curve. I want to approximate the average of a function of curvature of all points that lie on the curve. One function of curvature that I need is, for example, the square of curvature.
I can get near that by choosing some points on the chain, calculating the curvature in those points, applying the function on it (for example squaring it), and then averaging the calculated values.
I need both accuracy and speed. I appreciate both — fast, but approximate; as well as accurate, but slow solutions. I'm working in Java, but the answer doesn't need to be written in Java — it doesn't even need to contain any code at all.
Polygonal chain with uniform segment length
If the polygonal chain's segments all have equal length, I can just calculate the curvature in the vertices and then average that. I see two ways to get the curvature in a vertex.
One way is to get the circle that goes through the selected vertex, the vertex before it, and the one after it. The curvature is then 1/radius of the circle.
The other way is to calculate the external angle (in radians) of the two segments connected at the selected vertex and then divide its absolute value by the length of a segment. In the following image, φ marks the external angle:
I am not sure if this method is correct, as I haven't mathematically derived it, but I've noticed through experimentation that it gives similar results to the above method.
Polygonal chain with non-uniform segment length
Unfortunately, though, there's no guarantee that the segments have uniform length.
If I try using the first of the above methods, vertices connected to longer segments give lower curvatures, even if they are visibly sharper. I tried substituting previous and next vertices with a point x units before the selected vertex and a point x units after it. I don't know what to set the x constant to, to get accurate results. All the values I've tried seemed to give inaccurate results.
If I try using the second method, I don't know what length to divide the angle by. If I don't divide by anything at all, I actually get pretty good results for comparing two curves and determining which one is curvier, but I need to be able to determine the actual curvature in a point.
With both of these methods there's also the problem that parts with shorter segments (where points are denser) will affect the average more.
Another possible solution would be to ignore the vertices and instead use an array of points on the chain that are evenly spaced, treat them as a new polygonal chain (connect the points with straight lines), and then calculate curvatures on this new chain instead, using one of the methods I mentioned under the header titled "Polygonal chain with uniform segment length".
Finding such an array of points is not trivial, though, because I have to choose a segment length, and only after producing the points, I can see if the length of the resulting chain is divisible by the chosen segment length.
If you aren't short on space, the last solution you mentioned would be the best, because the "sphere" approximation, as you've perhaps realized, would give awful results in more extreme cases, especially if the curvature is large or changes sign quickly.
There are many ways to do interpolations, the simplest being quadratic and cubic splines. However if you have more pre-processing time, Lagrange polynomials produce very good results: https://en.wikipedia.org/wiki/Lagrange_polynomial.
Side note on your angle division method, consider this diagram:
(From simple geometry the inside angle there is also theta)
For a << l. So the curvature:
So your approximation is in fact correct for small curvatures.
An alternative is to use a local parabola approximation to estimate the curvature. Basically, to estimate the curvature at point P(i), you take P(i-1), P(i) and P(i+1) and construct a parabola from these 3 points. Then, you compute the curvature at P(i) from the parabola. Remember to use chord-length (or centripetal) parametrization when constructing the parabola.

Closest distance + vertices of two meshes

I would like to find two vertices of two meshes (1 vertex per mesh) that define the closest distance between them. Or the two triangles would be fine I guess.
However I'm not sure how to search for this in CGAL's documentation, I'm sure that this is doable with some existing tool (probably based on a 3d distance field and/or AABBs). Could I please get a hint (keywords/link) on what to look for?
I've been pointed to the Optimal Distances CGAL package, but it's not exactly what I want, since it outputs the distance and the coordinates, so finding the vertex ID is an additional computational overhead.
I've already implemented a collision detection with CGAL to find the triangle-triangle intersection in a triangle-soup, using AABB-trees. I guess that I should be somehow close to this, although now a simple soup with all me object-triangles wouldn't do the job.
The solution found was this:
CGAL's Optimal Distances package can give an approximation of the closest distance between the convex hulls of two meshes, without explicitly computing the hulls. As a result one gets the shortest distance between these hulls, and the coordinates of the 2 points that lie on them and define this distance.
Then these coordinates can be used as a search-query in kd-trees that contains the original vertices of the meshes in order to find the closest vertices.
In case one mesh is non-convex, the hull that CGAL is using is very approximate, so convex decomposition might be necessary. In such a case one would have to check distances for each convex part and then take the shortest distance.
The above would result in something like this:
enter link description here

Calculating the distance between each pair of a set of points

So I'm working on simulating a large number of n-dimensional particles, and I need to know the distance between every pair of points. Allowing for some error, and given the distance isn't relevant at all if exceeds some threshold, are there any good ways to accomplish this? I'm pretty sure if I want dist(A,C) and already know dist(A,B) and dist(B,C) I can bound it by [dist(A,B)-dist(B,C) , dist(A,B)+dist(B,C)], and then store the results in a sorted array, but I'd like to not reinvent the wheel if there's something better.
I don't think the number of dimensions should greatly affect the logic, but maybe for some solutions it will. Thanks in advance.
If the problem was simply about calculating the distances between all pairs, then it would be a O(n^2) problem without any chance for a better solution. However, you are saying that if the distance is greater than some threshold D, then you are not interested in it. This opens the opportunities for a better algorithm.
For example, in 2D case you can use the sweep-line technique. Sort your points lexicographically, first by y then by x. Then sweep the plane with a stripe of width D, bottom to top. As that stripe moves across the plane new points will enter the stripe through its top edge and exit it through its bottom edge. Active points (i.e. points currently inside the stripe) should be kept in some incrementally modifiable linear data structure sorted by their x coordinate.
Now, every time a new point enters the stripe, you have to check the currently active points to the left and to the right no farther than D (measured along the x axis). That's all.
The purpose of this algorithm (as it is typically the case with sweep-line approach) is to push the practical complexity away from O(n^2) and towards O(m), where m is the number of interactions we are actually interested in. Of course, the worst case performance will be O(n^2).
The above applies to 2-dimensional case. For n-dimensional case I'd say you'll be better off with a different technique. Some sort of space partitioning should work well here, i.e. to exploit the fact that if the distance between partitions is known to be greater than D, then there's no reason to consider the specific points in these partitions against each other.
If the distance beyond a certain threshold is not relevant, and this threshold is not too large, there are common techniques to make this more efficient: limit the search for neighbouring points using space-partitioning data structures. Possible options are:
Binning.
Trees: quadtrees(2d), kd-trees.
Binning with spatial hashing.
Also, since the distance from point A to point B is the same as distance from point B to point A, this distance should only be computed once. Thus, you should use the following loop:
for point i from 0 to n-1:
for point j from i+1 to n:
distance(point i, point j)
Combining these two techniques is very common for n-body simulation for example, where you have particles affect each other if they are close enough. Here are some fun examples of that in 2d: http://forum.openframeworks.cc/index.php?topic=2860.0
Here's a explanation of binning (and hashing): http://www.cs.cornell.edu/~bindel/class/cs5220-f11/notes/spatial.pdf

Intersecting points with a polygon in OpenCV

My inputs
I have a vector<Point2f> that contains the contours of a polygon. I also have a list of points that need to be intersected with this polygon.
The problem
I want to calculate how much of these points intersect with the polygon. I want to repeat this calculation on a number of polygons to see which one contains the highest number of points.
Does OpenCV implement such intersection functionality of its own or will I need to implement an intersection function myself? I'm worried that if I try to implement it myself, the result will be unnecessarily slow. If OpenCV can't do it, are there other free graphics libraries that can perform this task?
pointPolygonTest does exactly what you're looking for, and it's pretty well optimized. The parameter is a Mat which you can make with the constructor that takes your vector of points.
The function determines whether the point is inside a contour, outside, or lies on an edge (or coincides with a vertex). It returns positive (inside), negative (outside) or zero (on an edge) value, correspondingly. When measureDist=false , the return value is +1, -1 and 0, respectively. Otherwise, the return value it is a signed distance between the point and the nearest contour edge.
Your problem seems easily parallelizable, though, i.e. each batch of candidate polygons could run on a different thread, so I'd definitely look into that if you're concerned about performance.

Resources