Place rectangle maximizing rectangle intersections [closed]

Place rectangle maximizing rectangle intersections [closed] - geometry

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I have this problem: given a set of rectangles {R1,R2...Rn}, and a new rectangle Rq, find where to put Rq so it intersects (it does not matter how much area) the maximum number of the rectangles in the set. I´m searching for a simple resolution not involving too complex data structures, however, any working answer will be very appreciated, thanks!

Here is a O(n^2) or O(n (log n + m)) (average case only) complexity algorithm (where m is the maximum number of intersecting rectangles). See O(n log n) method later on.
The first idea, is to consider that there are n places to look for solutions. That is, for each rectangle Ri, the case where Ri is the right-most rectangle that intersects Rq is a candidate solution.
You will scan from left to right. Adding the rectangles in order of minimum x-coordinate to a buffer container, and also to a priority queue (based on maximum x-coordinate). As you add each rectangle, also check which rectangles need to be removed (based on the priority queue) so that the range of rectangles in the buffer can be intersected by the rectangle Rq if you ignore the y-axis for now.
Before adding each rectangle Ri to the buffer, you can consider how many rectangles in the buffer can be intersected if you require Rq to intersect Ri (with Ri being the right-most intersecting rectangle with Rq). This can be done in linear time.
Thus overall complexity is O(n^2).
The runtime of the algorithm can be improved, by using an interval tree for any rectangle nodes stored in the buffer. Since Wikipedia gives a good amount of information on this, and how it may be implemented, I will not explain how it works here, particularly because the balancing of it can be quite complex. This will improve the average complexity to O(n (log n + m)), where m is the answer - the maximum number of intersecting rectangles (which could still be as large as n in the worst case). This is due to the algorithmic complexity of querying an interval tree (O(log n + m) time per query). The worst case is still O(n^2), because m for the number of results returned (for interval trees), may be up to O(n), even if the answer is not on the order of n.
Here follows a method with O(n log n) time, but is quite complex.
Instead of storing rectangles as a sorted array, self balancing binary search tree (based on only minimum y), or an interval tree, the rectangles can also be stored in a custom data structure.
Consider the self balancing binary search tree (based on maximum y). We can add some metadata to this. Notably, since we try to maximize the rectangles intersected in the buffer, and keep looking for how the maximum changes as the buffer scans from left to right, we can consider the number of rectangles intersected if each rectangle in the buffer is the bottom-most rectangle.
If we can store the number of rectangles that is, we might be able to query it faster.
The solution is to store it in a different way, to reduce update costs. A binary search tree has a number of edges. Suppose each node should know how many rectangles can be intersected if that node is the bottom-most rectangle to be intersected. Instead, we can store the change in the number of rectangles intersected by the nodes on either side of each edge (call this the diff). For example, for the following tree (with min & max y of the rectangles):
(3,4)
/ \
(2,3) (5,6)
For Rq with a height of 2, if (2,3) is bottom-most, we can intersect 2 rectangles in total, for (3,4), it is 2, and for (5,6), it is 1.
Thus, the edges' diff can be 0 and -1, representing the change in the answer from child to parent.
Using the weights, we can quickly find the highest number, if each node also stores the maximum sum of a subpath in its subtree (call this maxdiff). For the leaves, it is 0, but for (3,4), the maximum of 0 and -1 is 0, thus, we can store 0 as the maximum sum of the edges of a subpath in the root node.
We therefore know that the optimal answer is 0 more than the answer of the root node. We can find the answer of the root node, but keeping more metadata.
However, all this metadata can be queried and updated in log n time. This is because when we add a rectangle, adding it to the self balancing binary search tree takes log n time, as we may need to do up to log n rotations. When we do a rotation, we may need to update the weights on the edges, as well as maxdiff on the affected nodes - however this is O(1) for each rotation.
In addition, the effect of the extra rectangle on the diff of the edges needs to be updated. However, at most O(log n) edges must be updated, specifically, if we search the tree by the min y and max y of the new rectangle, we will find all the edges that need to be updated (although whether an edge that is traversed must be updated depends on whether we took the left or right child - some careful working is needed here to determine the exact rule).
Deleting a rectangle is just the reverse, so takes O(log n) time.
In this way, we can update the buffer in O(log n) time, and query it in O(log n) time, for an overall algorithmic complexity of O(n log n).
Note: to find the maximum rectangles intersected in the buffer, we use maxdiff, which is the difference in answer between optimal, and that of the root node. To find the actual answer, we must find the difference between the answer of the lowest ordered rectangle in the binary search tree, and the root node. This is easily done in O(log n) time, by going down the left-most path, and summing up diff on the edges. Finally, find the answer on the lowest ordered rectangle. This is done by storing a 2nd self balancing binary search tree, ordered by minimum y. We can use this to find the number of rectangles intersected if the bottom-most rectangle is the lowest ordered rectangle when sorting by maximum y.
Update of this extra BST and querying also take O(log n) time, so this extra work does not change the overall complexity.

Related

Fast Quadtree Query

Suppose I have a quadtree of heightmap with the root the coarsest representation, and the leaves the most refined. Near the camera I want to draw the most detail, so I traverse the tree based on node distance to the camera. At each node where I stop recursing, I want to draw that node (i.e., the heightfield) at that detail. I put these nodes in a list S.
Now that I know what nodes to draw at what detail, given an arbitrary point (x,y) in the world (ignoring height z), I want to know which node in S the point intersects quickly and on the GPU. So some ideas are:
Iterate through S and do a bounds/point intersection test. Seems slow.
Pack the quadtree with leaves in S in an array representation so I can traverse on the GPU. More complicated, still have to traverse tree, but faster than going through every node.
Create a uniform grid representing the most refined quadtree level where the grid basically gives me the cell depth and cell offset at each entry. This would be very fast grid lookup, but potentially a lot of memory.
My question is: Any other ideas I am missing?

How to find the cubes passed through by a triangle

Given a triangle with vertice A, B and C in 3D world and a axis-aligned bounding cuboid with length*width*height=nd*md*ld(n, m, l are integers and d is float) containing it, partition the cuboid into n*m*l cubes and how to find the cubes passed through by the triangle?
There are many algorithm to detect whether a triangle and a cube intersect. Loop over all cubes the problem can be solved. However, the complexity of this approach is O(n*m*l) or O(n^3). Is there an approach with complexity O(n^2) or even O(nlogn)?

You cannot improve upon O(n m l) for the following reason: select m=1 and l=1.
Then one has a planar arrangement of n cubes, and your triangle could intersect every one. If you need to report each cube intersected, you would have to report all n cubes.
But clearly this is just a flaw in your problem statement. What you should ask is the situation where n=m=l. So now you have an n x n x n set of cubes, and one triangle can only intersect O(n^2) of them.
In this case, certainly a triangle might intersect Ω(n^2) cubes, so one cannot
improve upon quadratic complexity. This rules out O(n log n).
So the question becomes: Is there a subcubic algorithm for identifying
the O(n^2) cubes intersected by a triangle? (And one may replace "triangle"
with "plane.")
I believe the answer is Yes. One method is to construct an octree representing
the cubes. Searches for "voxels" and "octree intersection" may lead you
to explicit algorithms.

KD Tree alternative/variant for weighted data

I'm using a static KD-Tree for nearest neighbor search in 3D space. However, the client's specifications have now changed so that I'll need a weighted nearest neighbor search instead. For example, in 1D space, I have a point A with weight 5 at 0, and a point B with weight 2 at 4; the search should return A if the query point is from -5 to 5, and should return B if the query point is from 5 to 6. In other words, the higher-weighted point takes precedence within its radius.
Google hasn't been any help - all I get is information on the K-nearest neighbors algorithm.
I can simply remove points that are completely subsumed by a higher-weighted point, but this generally isn't the case (usually a lower-weighted point is only partially subsumed, like in the 1D example above). I could use a range tree to query all points in an NxNxN cube centered on the query point and determine the one with the greatest weight, but the naive implementation of this is wasteful - I'll need to set N to the point with the maximum weight in the entire tree, even though there may not be a point with that weight within the cube, e.g. let's say the point with the maximum weight in the tree is 25, then I'll need to set N to 25 even though the point with the highest weight for any given cube probably has a much lower weight; in the 1D case, if I have a point located at 100 with weight 25 then my naive algorithm would need to set N to 25 even if I'm outside of the point's radius.
To sum up, I'm looking for a way that I can query the KD tree (or some alternative/variant) such that I can quickly determine the highest-weighted point whose radius covers the query point.
FWIW, I'm coding this in Java.
It would also be nice if I could dynamically change a point's weight without incurring too high of a cost - at present this isn't a requirement, but I'm expecting that it may be a requirement down the road.
Edit: I found a paper on a priority range tree, but this doesn't exactly address the same problem in that it doesn't account for higher-priority points having a greater radius.

Use an extra dimension for the weight. A point (x,y,z) with weight w is placed at (N-w,x,y,z), where N is the maximum weight.
Distances in 4D are defined by…
d((a, b, c, d), (e, f, g, h)) = |a - e| + d((b, c, d), (f, g, h))
…where the second d is whatever your 3D distance was.
To find all potential results for (x,y,z), query a ball of radius N about (0,x,y,z).

I think I've found a solution: the nested interval tree, which is an implementation of a 3D interval tree. Rather than storing points with an associated radius that I then need to query, I instead store and query the radii directly. This has the added benefit that each dimension does not need to have the same weight (so that the radius is a rectangular box instead of a cubic box), which is not presently a project requirement but may become one in the future (the client only recently added the "weighted points" requirement, who knows what else he'll come up with).

Calculating the distance between each pair of a set of points

So I'm working on simulating a large number of n-dimensional particles, and I need to know the distance between every pair of points. Allowing for some error, and given the distance isn't relevant at all if exceeds some threshold, are there any good ways to accomplish this? I'm pretty sure if I want dist(A,C) and already know dist(A,B) and dist(B,C) I can bound it by [dist(A,B)-dist(B,C) , dist(A,B)+dist(B,C)], and then store the results in a sorted array, but I'd like to not reinvent the wheel if there's something better.
I don't think the number of dimensions should greatly affect the logic, but maybe for some solutions it will. Thanks in advance.

If the problem was simply about calculating the distances between all pairs, then it would be a O(n^2) problem without any chance for a better solution. However, you are saying that if the distance is greater than some threshold D, then you are not interested in it. This opens the opportunities for a better algorithm.
For example, in 2D case you can use the sweep-line technique. Sort your points lexicographically, first by y then by x. Then sweep the plane with a stripe of width D, bottom to top. As that stripe moves across the plane new points will enter the stripe through its top edge and exit it through its bottom edge. Active points (i.e. points currently inside the stripe) should be kept in some incrementally modifiable linear data structure sorted by their x coordinate.
Now, every time a new point enters the stripe, you have to check the currently active points to the left and to the right no farther than D (measured along the x axis). That's all.
The purpose of this algorithm (as it is typically the case with sweep-line approach) is to push the practical complexity away from O(n^2) and towards O(m), where m is the number of interactions we are actually interested in. Of course, the worst case performance will be O(n^2).
The above applies to 2-dimensional case. For n-dimensional case I'd say you'll be better off with a different technique. Some sort of space partitioning should work well here, i.e. to exploit the fact that if the distance between partitions is known to be greater than D, then there's no reason to consider the specific points in these partitions against each other.

If the distance beyond a certain threshold is not relevant, and this threshold is not too large, there are common techniques to make this more efficient: limit the search for neighbouring points using space-partitioning data structures. Possible options are:
Binning.
Trees: quadtrees(2d), kd-trees.
Binning with spatial hashing.
Also, since the distance from point A to point B is the same as distance from point B to point A, this distance should only be computed once. Thus, you should use the following loop:
for point i from 0 to n-1:
for point j from i+1 to n:
distance(point i, point j)
Combining these two techniques is very common for n-body simulation for example, where you have particles affect each other if they are close enough. Here are some fun examples of that in 2d: http://forum.openframeworks.cc/index.php?topic=2860.0
Here's a explanation of binning (and hashing): http://www.cs.cornell.edu/~bindel/class/cs5220-f11/notes/spatial.pdf

Decomposition to Convex Polygons

This question is a little involved. I wrote an algorithm for breaking up a simple polygon into convex subpolygons, but now I'm having trouble proving that it's not optimal (i.e. minimal number of convex polygons using Steiner points (added vertices)). My prof is adamant that it can't be done with a greedy algorithm such as this one, but I can't think of a counterexample.
So, if anyone can prove my algorithm is suboptimal (or optimal), I would appreciate it.
The easiest way to explain my algorithm with pictures (these are from an older suboptimal version)
What my algorithm does, is extends the line segments around the point i across until it hits a point on the opposite edge.
If there is no vertex within this range, it creates a new one (the red point) and connects to that:
If there is one or more vertices in the range, it connects to the closest one. This usually produces a decomposition with the fewest number of convex polygons:
However, in some cases it can fail -- in the following figure, if it happens to connect the middle green line first, this will create an extra unneeded polygon. To this I propose double checking all the edges (diagonals) we've added, and check that they are all still necessary. If not, remove it:
In some cases, however, this is not enough. See this figure:
Replacing a-b and c-d with a-c would yield a better solution. In this scenario though, there's no edges to remove so this poses a problem. In this case I suggest an order of preference: when deciding which vertex to connect a reflex vertex to, it should choose the vertex with the highest priority:
lowest) closest vertex
med) closest reflex vertex
highest) closest reflex that is also in range when working backwards (hard to explain) --
In this figure, we can see that the reflex vertex 9 chose to connect to 12 (because it was closest), when it would have been better to connect to 5. Both vertices 5 and 12 are in the range as defined by the extended line segments 10-9 and 8-9, but vertex 5 should be given preference because 9 is within the range given by 4-5 and 6-5, but NOT in the range given by 13-12 and 11-12. i.e., the edge 9-12 elimates the reflex vertex at 9, but does NOT eliminate the reflex vertex at 12, but it CAN eliminate the reflex vertex at 5, so 5 should be given preference.
It is possible that the edge 5-12 will still exist with this modified version, but it can be removed during post-processing.
Are there any cases I've missed?
Pseudo-code (requested by John Feminella) -- this is missing the bits under Figures 3 and 5
assume vertices in `poly` are given in CCW order
let 'good reflex' (better term??) mean that if poly[i] is being compared with poly[j], then poly[i] is in the range given by the rays poly[j-1], poly[j] and poly[j+1], poly[j]
for each vertex poly[i]
if poly[i] is reflex
find the closest point of intersection given by the ray starting at poly[i-1] and extending in the direction of poly[i] (call this lower bound)
repeat for the ray given by poly[i+1], poly[i] (call this upper bound)
if there are no vertices along boundary of the polygon in the range given by the upper and lower bounds
create a new vertex exactly half way between the lower and upper bound points (lower and upper will lie on the same edge)
connect poly[i] to this new point
else
iterate along the vertices in the range given by the lower and upper bounds, for each vertex poly[j]
if poly[j] is a 'good reflex'
if no other good reflexes have been found
save it (overwrite any other vertex found)
else
if it is closer then the other good reflexes vertices, save it
else
if no good reflexes have been found and it is closer than the other vertices found, save it
connect poly[i] to the best candidate
repeat entire algorithm for both halves of the polygon that was just split
// no reflex vertices found, then `poly` is convex
save poly
Turns out there is one more case I didn't anticipate: [Figure 5]
My algorithm will attempt to connect vertex 1 to 4, unless I add another check to make sure it can. So I propose stuffing everything "in the range" onto a priority queue using the priority scheme I mentioned above, then take the highest priority one, check if it can connect, if not, pop it off and use the next. I think this makes my algorithm O(r n log n) if I optimize it right.
I've put together a website that loosely describes my findings. I tend to move stuff around, so get it while it's hot.

I believe the regular five pointed star (e.g. with alternating points having collinear segments) is the counterexample you seek.
Edit in response to comments
In light of my revised understanding, a revised answer: try an acute five pointed star (e.g. one with arms sufficiently narrow that only the three points comprising the arm opposite the reflex point you are working on are within the range considered "good reflex points"). At least working through it on paper it appears to give more than the optimal. However, a final reading of your code has me wondering: what do you mean by "closest" (i.e. closest to what)?
Note
Even though my answer was accepted, it isn't the counter example we initially thought. As #Mark points out in the comments, it goes from four to five at exactly the same time as the optimal does.
Flip-flop, flip flop
On further reflection, I think I was right after all. The optimal bound of four can be retained in a acute star by simply assuring that one pair of arms have collinear edges. But the algorithm finds five, even with the patch up.
I get this:
removing dead ImageShack link
When the optimal is this:
removing dead ImageShack link

I think your algorithm cannot be optimal because it makes no use of any measure of optimality. You use other metrics like 'closest' vertices, and checking for 'necessary' diagonals.
To drive a wedge between yours and an optimal algorithm, we need to exploit that gap by looking for shapes with close vertices which would decompose badly. For example (ignore the lines, I found this on the intertubenet):
concave polygon which forms a G or U shape http://avocado-cad.wiki.sourceforge.net/space/showimage/2007-03-19_-_convexize.png
You have no protection against the centre-most point being connected across the concave 'gap', which is external to the polygon.
Your algorithm is also quite complex, and may be overdoing it - just like complex code, you may find bugs in it because complex code makes complex assumptions.
Consider a more extensive initial stage to break the shape into more, simpler shapes - like triangles - and then an iterative or genetic algorithm to recombine them. You will need a stage like this to combine any unnecessary divisions between your convex polys anyway, and by then you may have limited your possible decompositions to only sub-optimal solutions.
At a guess something like:
decompose into triangles
non-deterministically generate a number of recombinations
calculate a quality metric (number of polys)
select the best x% of the recombinations
partially decompose each using triangles, and generate a new set of recombinations
repeat from 4 until some measure of convergence is reached

but vertex 5 should be given preference because 9 is within the range given by 4-5 and 6-5
What would you do if 4-5 and 6-5 were even more convex so that 9 didn't lie within their range? Then by your rules the proper thing to do would be to connect 9 to 12 because 12 is the closest reflex vertex, which would be suboptimal.

Found it :( They're actually quite obvious.
*dead imageshack img*
A four leaf clover will not be optimal if Steiner points are allowed... the red vertices could have been connected.
*dead imageshack img*
It won't even be optimal without Steiner points... 5 could be connected to 14, removing the need for 3-14, 3-12 AND 5-12. This could have been two polygons better! Ouch!

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string