KD Tree alternative/variant for weighted data

KD Tree alternative/variant for weighted data - search

I'm using a static KD-Tree for nearest neighbor search in 3D space. However, the client's specifications have now changed so that I'll need a weighted nearest neighbor search instead. For example, in 1D space, I have a point A with weight 5 at 0, and a point B with weight 2 at 4; the search should return A if the query point is from -5 to 5, and should return B if the query point is from 5 to 6. In other words, the higher-weighted point takes precedence within its radius.
Google hasn't been any help - all I get is information on the K-nearest neighbors algorithm.
I can simply remove points that are completely subsumed by a higher-weighted point, but this generally isn't the case (usually a lower-weighted point is only partially subsumed, like in the 1D example above). I could use a range tree to query all points in an NxNxN cube centered on the query point and determine the one with the greatest weight, but the naive implementation of this is wasteful - I'll need to set N to the point with the maximum weight in the entire tree, even though there may not be a point with that weight within the cube, e.g. let's say the point with the maximum weight in the tree is 25, then I'll need to set N to 25 even though the point with the highest weight for any given cube probably has a much lower weight; in the 1D case, if I have a point located at 100 with weight 25 then my naive algorithm would need to set N to 25 even if I'm outside of the point's radius.
To sum up, I'm looking for a way that I can query the KD tree (or some alternative/variant) such that I can quickly determine the highest-weighted point whose radius covers the query point.
FWIW, I'm coding this in Java.
It would also be nice if I could dynamically change a point's weight without incurring too high of a cost - at present this isn't a requirement, but I'm expecting that it may be a requirement down the road.
Edit: I found a paper on a priority range tree, but this doesn't exactly address the same problem in that it doesn't account for higher-priority points having a greater radius.

Use an extra dimension for the weight. A point (x,y,z) with weight w is placed at (N-w,x,y,z), where N is the maximum weight.
Distances in 4D are defined by…
d((a, b, c, d), (e, f, g, h)) = |a - e| + d((b, c, d), (f, g, h))
…where the second d is whatever your 3D distance was.
To find all potential results for (x,y,z), query a ball of radius N about (0,x,y,z).

I think I've found a solution: the nested interval tree, which is an implementation of a 3D interval tree. Rather than storing points with an associated radius that I then need to query, I instead store and query the radii directly. This has the added benefit that each dimension does not need to have the same weight (so that the radius is a rectangular box instead of a cubic box), which is not presently a project requirement but may become one in the future (the client only recently added the "weighted points" requirement, who knows what else he'll come up with).

Related

Converting intensities to probabilities in ppp

Apologies for the overlap with existing questions; mine is at a more basic skill level. I am working with very sparse occurrences spanning very large areas, so I would like to calculate probability at pixels using the density.ppp function (as opposed to relrisk.ppp, where specifying presences+absences would be computationally intractable). Is there a straightforward way to convert density (intensity) to probabilities at each point?
Maxdist=50
dtruncauchy=function(x,L=60) L/(diff(atan(c(-1,1)*Maxdist/L)) * (L^2 + x^2))
dispersfun=function(x,y) dtruncauchy(sqrt(x^2+y^2))
n=1e3; PPP=ppp(1:n,1:n, c(1,n),c(1,n), marks=rep(1,n));
density.ppp(PPP,cutoff=Maxdist,kernel=dispersfun,at="points",leaveoneout=FALSE) #convert to probabilies?
Thank you!!

I think there is a misunderstanding about fundamentals. The spatstat package is designed mainly for analysing "mapped point patterns", datasets which record the locations where events occurred or things were located. It is designed for "presence-only" data, not "presence/absence" data (with some exceptions).
The relrisk function expects input data about the presence of two different types of events, such as the mapped locations of trees belonging to two different species, and then estimates the spatially-varying probability that a tree will belong to each species.
If you have 'presence-only' data stored in a point pattern object X of class "ppp", then density(X, ....) will produce a pixel image of the spatially-varying intensity (expected number of points per unit area). For example if the spatial coordinates were expressed in metres, then the intensity values are "points per square metre". If you want to calculate the probability of presence in each pixel (i.e. for each pixel, the probability that there is at least one presence point in the pixel), you just need to multiply the intensity value by the area of one pixel, which gives the expected number of points in the pixel. If pixels are small (the usual case) then the presence probability is just equal to this value. For physically larger pixels the probability is 1 - exp(-m) where m is the expected number of points.
Example:
X <- redwood
D <- density(X, 0.2)
pixarea <- with(D, xstep * ystep)
M <- pixarea * D
p <- 1 - exp(-M)
then M and p are images which should be almost equal, and can both be interpreted as probability of presence.
For more information see Chapter 6 of the spatstat book.
If, instead, you had a pixel image of presence/absence data, with pixel values equal to 1 or 0 for presence or absence respectively, then you can just use the function blur in the spatstat package to perform kernel smoothing of the image, and the resulting pixel values are presence probabilities.

How to approximate coordinates basing on azimuths?

Suppose I have a series of (imperfect) azimuth readouts, giving me vague angles between a number of points. Lines projected from points A, B, C obviously [-don't-always-] never converge in a single point to define the location of point D. Hence, angles as viewed from A, B and C need to be adjusted.
To make it more fun, I might be more certain of the relative positions of specific points (suppose I locate them on a satellite image, or I know for a fact they are oriented perfectly north-south), so I might want to use that certainty in my calculations and NOT adjust certain angles at all.
By what technique should I average the resulting coordinates, to achieve a "mostly accurate" overall shape?
I considered treating the difference between non-adjusted and adjusted angles as "tension" and trying to "relieve" it in subsequent passes, but that approach gives priority to points calculated earlier.
Another approach could be to calculate the total "tension" in the set, then shake all angles by a random amount, see if that resulted in less tension, and repeat for possibly improved results, trying to evolve a possibly better solution.

As I understand it you have a bunch of unknown points (p[] say) and a number of measurements of azimuths, say Az[i,j] of p[j] from p[i]. You want to find the coordinates of the points.
You'll need to fix one point. This is because if the values of p[] is a solution -- i.e. gave the measured azimuths -- so too is q[] where for some fixed x,
q[i] = p[i] + x
I'll suppose you fix p[0].
You'll also need to fix a distance. This is because if p[] is a solution, so too is q[] where now for some fixed s,
q[i] = p[0] + s*(p[i] - p[0])
I'll suppose you fix dist(p[0], p[1]), and that there is and azimuth Az[1,2]. You'd be best to choose p[0] p[1] so that there is a reliable azimuth between them. Then we can compute p[1].
The usual way to approach such problems is least squares. That is we seek p[] to minimise
Sum square( (Az[i,j] - Azimuth( p[i], p[j]))/S[i,j])
where Az[i,j] is your measurement data
Azimuth( r, s) is the function that gives the azimuth of the point s from the point r
S[i,j] is the 'sd' of the measurement A[i,j] -- the higher the sd of a particular observation is, relative to the others, the less it affects the final result.
The above is a non linear least squares problem. There are many solvers available for this, but generally speaking as well as providing the data -- the Az[] and the S[] -- and the observation model -- the Azimuth function -- you need to provide an initial estimate of the state -- the values sought, in your case p[2] ..
It is highly likely that if your initial estimate is wrong the solver will fail.
One way to find this estimate would be to start with a set K of known point indices and seek to expand it. You would start with K being {0,1}. Then look for points that have as many azimuths as possible to points in K, and for such points estimate geometrically their position from the known points and the azimuths, and add them to K. If at the end you have all the points in K, then you can go on to the least squares. If it isn't its possible that a different pair of initial fixed points might do better, or maybe you are stuck.
The latter case is a real possibility. For example suppose you had points p[0],p[1],p[2],p[3] and azimuths A[0,1], A[1,2], A[1,3], A[2,3].
As above we fix the positions of p[0] and p[1]. But we can't compute positions of p[2] and p[3] because we do not know the distances of 2 or 3 from 1. The 1,2,3 triangle could be scaled arbitrarily and still give the same azimuths.

Find all the planar surfaces in an rgbd image using depth and normal data

Many questions deal with generating normal from depth or depth from normal, but I want to ask about a simple way to generate all the planar surfaces given the depth and normal of an image.
I already have depth and normal of each pixel in the image. For each pixel (ui, vi), assume that we can get its 3D coordinates (xi, yi, zi) with zi as the depth and normal vector (nix, niy, niz). Thus, a unique tangent plane is defined by: nix(x - xi) + niy(y - yi) + niz(z - zi) = 0. Then, for each pixel we can define a unique planar surface by the above equation.
What is a common practice in finding the function f such that f(u, v) = (x, y, z) (from pixel to 3D coordinates)? Is pinhole model (plus the depth data) an effective and accurate one?
How does one generate all the planar surfaces effectively? One way is to iterate through all the pixels in the image and find all the planes, but this seems like an ineffective method.

If its pinhole model
make sure your 3D data is not distorted by projection.
group your points by normal
this is easy or hard depending on the points/normal accuracy. Simply sort the points by normals which leads to O(n.log(n)) where n is number of points.
test/group by planes in single normal group
The idea is to pick 3 points from a group compute plane from it and test which points of the group belongs to it. If too low count you got wrong points picked (not belonging to the same plane) and need to pick different ones. Also if the picked points are too close to each or on the same line you can not get correct plane from it.
The math function for plane is:
x*nx + y*ny + z*nz + d = 0
where (nx,ny,nz) is your normal of the group (unit vector) and (x,y,z) is your point position. So you just compute d from a known point (one of the picked ones (x0,y0,z0) ) ...
d = -x0*nx -y0*ny -z0*nz
and then just test which points are sattisfying this condition:
threshod=1e-20; // just accuracy margin
fabs(x*nx + y*ny + z*nz + d) <= threshod
now remove matched points from the group (move them into found plane object) and apply this bullet again on the remaining points until they count is low or no valid plane is found...
then test another group until no groups are left...
I think RANSAC can speed things up to avoid brute force in this case but never used it myself so google ...

A possible approach for the planes is to consider the set of normal vectors and perform clustering on them (for instance by k-means). Then every cluster can correspond to several parallel surfaces. By evaluating the distance from the origin (a scalar function), you can form sub-clusters which will separate those surfaces. Finally, points at constant distance can belong to different coplanar patches, which you can separate by connected component labelling.
It is likely that clustering on the normal vectors and distance simultaneously (hence in a 4D space) will yield better results and be simpler. Be sure to normalize the vectors. Another option is to represent the vectors by just two parameters (such as spherical angles), but this will lead to a quite non-uniform mapping, and create phase wrapping issues.

Uniform spatial bins on surface of a sphere

Is there a spatial lookup grid or binning system that works on the surface of a (3D) sphere? I have the requirements that
The bins must be uniform (so you can look up in constant time if there exists a point r distance away from any spot on the sphere, given constant r.)†
The number of bins must be at most linear with the surface area of the sphere. (Alternatively, increasing the surface resolution of the grid shouldn’t make it grow faster than the area it maps.)
I’ve already considered
Spherical coordinates: not good because the cells created are extremely nonuniform making it useless for proximity testing.
Cube meshes: Less distortion than spherical coordinates, but still very difficult to determine which cells to search for a given query.
3D voxel binning: Wastes the entire interior volume of the sphere with empty bins that will never be used (as well as the empty bins at the 6 corners of the bounding cube). Space requirements grow with O(n sqrt(n)) with increasing sphere surface area.
kd-Trees: perform poorly in 3D and are technically logarithmic complexity, not constant per query.
My best idea for a solution involves using the 3D voxel binning method, but somehow excluding the voxels that the sphere will never intersect. However I have no idea how to determine which voxels to exclude, nor how to calculate an index into such a structure given a query location on the sphere.
† For what it’s worth the points have a minimum spacing so a good grid really would guarantee constant lookup.

My suggestion would be a variant of the spherical coordinates, such that the polar angle is not sampled uniformly but instead the sine of this angle is sampled uniformly. This way, the element of area sinφ dφ dΘ is kept constant, leading to tiles of the same area (though variable aspect ratio).
At the poles, merge all tiles in a single disk-like polygon.
Another possibility is to project a regular icosahedron onto the sphere and to triangulate the spherical triangles so obtained. This takes a little of spherical trigonometry.

I had a similar problem and used "sparse" 3D voxel binning. Basically, my spatial index is a hash map from (x, y, z) coordinates to bins.
Because I also had a minimum distance constraint on my points, I chose the bin size such that a bin can contain at most one point. This is accomplished if the edge of the (cubic) bins is at most d / sqrt(3), where d is the minimum separation of two points on the sphere. The advantage is that you can represent a full bin as a single point, and an empty bin can just be absent from the hash map.
My only query was for points within a radius d (the same d), which then requires scanning the surrounding 125 bins (a 5×5×5 cube). You could technically leave off the 8 corners to get this down to 117, but I didn't bother.
An alternative for the bin size is to optimize it for queries rather than storage size and simplicity, and choose it such that you always have to scan at most 27 bins (a 3×3×3 cube). That would require a bin edge length of d. I think (but haven't thought hard about it) that a bin could contain up to 4 points in that case. You could represent these with a fixed-size array to save one pointer indirection.
In either case, the memory usage of your spatial index will be O(n) for n points, so it doesn't get any better than that.

Calculating the Held Karp Lower bound For The Traveling Salesman(TSP)

I am currently researching the traveling salesman problem, and was wondering if anyone would be able to simply explain the held karp lower bound. I have been looking at a lot of papers and i am struggling to understand it. If someone could simply explain it that would be great.
I also know there is the method of calculating a minimum spanning tree of the vertices not including the starting vertex and then adding the two minimum edges from the starting vertex.

I'll try to explain this without going in too much details. I'll avoid formal proofs and I'll try to avoid technical jargon. However, you might want to go over everything again once you have read everything through. And most importantly; try the algorithms out yourself.
Introduction
A 1-tree is a tree that consists of a vertex attached with 2 edges to a spanning tree. You can check for yourself that every TSP tour is a 1-Tree.
There is also such a thing as a minimum-1-Tree of a graph. That is the resulting tree when you follow this algorithm:
Exclude a vertex from your graph
Calculate the minimum spanning tree of the resulting graph
Attach the excluded vertex with it's 2 smallest edges to the minimum spanning tree
*For now I'll assume that you know that a minimum-1-tree is a lower bound for the optimal TSP tour. There is an informal proof at the end.
You will find that the resulting tree is different when you exclude different vertices. However all of the resulting trees can be considered lower bounds for the optimal tour in the TSP. Therefore the largest of the minimum-1-trees you have found this way is a better lower bound then the others found this way.
Held-Karp lower bound
The Held-Karp lower bound is an even tighter lower bound.
The idea is that you can alter the original graph in a special way. This modified graph will generate different minimum-1-trees then the original.
Furthermore (and this is important so I'll repeat it throughout this paragraph with different words), the modification is such that the length of all the valid TSP tours are modified by the same (known) constant. In other words, the length of a valid TSP solution in this new graph = the length of a valid solution in the original graph plus a known constant. For example: say the weight of the TSP tour visiting vertices A, B, C and D in that order in the original graph = 10. Then the weight of the TSP tour visiting the same vertices in the same order in the modified graph = 10 + a known constant.
This, of course, is true for the optimal TSP tour as well. Therefore the optimal TSP tour in the modified graph is also an optimal tour in the original graph. And a minimum-1-Tree of the modified graph is a lower bound for the optimal tour in the modified graph. Again, I'll just assume you understand that this generates a lower bound for your modified graph's optimal TSP tour. By substracting another known constant from the found lower bound of your modified graph, you have a new lower bound for your original graph.
There are infinitly many of such modifications to your graph. These different modifications result in different lower bounds. The tightest of these lower bounds is the Held-Karp lower bound.
How to modify your graph
Now that I have explained what the Held-Karp lower bound is, I will show you how to modify your graph to generate different minimum-1-trees.
Use the following algorithm:
Give every vertex in your graph an arbitrary weight
update the weight of every edge as follows: new edge weight = edge weight + starting vertex weight + ending vertex weight
For example, your original graph has the vertices A, B and C with edge AB = 3, edge AC = 5 and edge BC = 4. And for the algorithm you assign the (arbitrary) weights to the vertices A: 30, B: 40, C:50 then the resulting weights of the edges in your modified graph are AB = 3 + 30 + 40 = 73, AC = 5 + 30 + 50 = 85 and BC = 4 + 40 + 50 = 94.
The known constant for the modification is twice the sum of the weights given to the vertices. In this example the known constant is 2 * (30 + 40 + 50) = 240. Note: the tours in the modified graph are thus equal to the original tours + 240. In this example there is only one tour namely ABC. The tour in the original graph has a length of 3 + 4 + 5 = 12. The tour in the modified graph has a length of 73 + 85 + 94 = 252, which is indeed 240 + 12.
The reason why the constant equals twice the sum of the weights given to the vertices is because every vertex in a TSP tour has degree 2.
You will need another known constant. The constant you substract from your minimum-1-tree to get a lower bound. This depends on the degree of the vertices of your found minimum-1-tree. You will need to multiply the weight you have given each vertex by the degree of the vertex in that minimum-1-tree. And add that all up. For example if you have given the following weights A: 30, B:40, C:50, D:60 and in your minimum spanning tree vertex A has degree 1, vertex B and C have degree 2, vertex D has degree 3 then your constant to substract to get a lower bound = 1 * 30 + 2 * 40 + 2 * 50 + 3 * 60 = 390.
How to find the Held-Karp lower bound
Now I believe there is one more question unanswered: how do I find the best modification to my graph, so that I get the tightest lower bound (and thus the Held-Karp lower bound)?
Well, that's the hard part. Without delving too deep: there are ways to get closer and closer to the Held-Karp lower bound. Basicly one can keep modifying the graph such that the degree of all vertices get closer and closer to 2. And thus closer and closer to a real tour.
Minimum-1-tree is a lower bound
As promised I would give an informal proof that a minimum-1-tree is a lower bound for the optimal TSP solution. A minimum-1-Tree is made of two parts: a minimum-spanning-tree and a vertex attached to it with 2 edges. A TSP tour must go through the vertex attached to the minimum spanning tree. The shortest way to do so is through the attached edges. The tour must also visit all the vertices in the minimum spanning tree. That minimum spanning tree is a lower bound for the optimal TSP for the graph excluding the attached vertex. Combining these two facts one can conclude that the minimum-1-tree is a lower bound for the optimal TSP tour.
Conclusion
When you modify a graph in a certain way and find the minimum-1-Tree of this modified graph to calculate a lower bound. The best possible lower bound through these means is the Held-Karp lower bound.
I hope this answers your question.
Links
For a more formal approach and additional information I recommend the following links:
ieor.berkeley.edu/~kaminsky/ieor251/notes/3-16-05.pdf
http://www.sciencedirect.com/science/article/pii/S0377221796002147

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string