Algorithm for unique find edges from polygon mesh

Algorithm for unique find edges from polygon mesh - graphics

I'm looking for a good algorithm that can give me the unique edges from a set of polygon data. In this case, the polygons are defined by two arrays. One array is the number of points per polygon, and the other array is a list of vertex indices.
I have a version that is working, but performance gets slow when reaching over 500,000 polys. My version walks over each face and adds each edge's sorted vertices to an stl::set. My data set will be primarily triangle and quad polys, and most edges will be shared.
Is there a smarter algorithm for this?

Yes
Use a double hash map.
Every edge has two indexes A,B. lets say that A > B.
The first, top level hash-map maps A to another hash-map which is in turn maps B to some value which represents the information you want about every edge. (or just a bool if you don't need to keep information for edges).
Essentially this creates a two level tree composed of hash maps.
To look up an edge in this structure you take the larger index, look it up in the top level and end up with a hash map. then take the smaller index and look it up in this second hash map.

Just to clarify, you want, for a polygon list like this:
A +-----+ B
\ |\
\ 1 | \
\ | \
\ | 2 \
\| \
C +-----+ D
Then instead of edges like this:
A - B -+
B - C +- first polygon
C - A -+
B - D -+
D - C +- second polygon
C - B -+
then you want to remove the duplicate B - C vs. C - B edge and share it?
What kind of performance problem are you seeing with your algorithm? I'd say a set that has a reasonable hash implementation should perform pretty ok. On the other hand, if your hash is not optimal for the data, you'll have lots of collisions which might affect performance badly.

You are both correct. Using a good hashset has gotten the performance well beyond required levels. I ended up rolling my own little hash set.
The total number of edges will be between N/2 and N. N being the number of unique vertices in the mesh. All shared edges will be N/2, and all unique edges will be N. From there I allocate a buffer of uint64's and pack my indices into these values. Using a small set of unique tables I can find the unique edges fast!

heres a C implementation of edge hashing used in Blender exactly for the purpose of quickly creating edges from faces, may give some hints for others to make their own.
http://gitorious.org/blenderprojects/blender/blobs/master/blender/source/blender/blenlib/intern/edgehash.c
http://gitorious.org/blenderprojects/blender/blobs/master/blender/source/blender/blenlib/BLI_edgehash.h
This uses BLI_mempool,
https://gitorious.org/blenderprojects/blender/blobs/master/blender/source/blender/blenlib/intern/BLI_mempool.c
https://gitorious.org/blenderprojects/blender/blobs/master/blender/source/blender/blenlib/BLI_mempool.h

First you need to make sure your vertices are unique. That is if you want only one edge at a certain position. Then I use this data structure
typedef std::pair<int, int> Edge;
Edge sampleEdge;
std::map<Edge, bool> uniqueEdges;
Edge contains the vertex indices that make up the edge in sorted order. Hence if sampleEdge is an edge made up of vertices with index numbers 12 and 5, sampleEdge.first = 5 and sampleEdge.12
Then you can just do
uniqueEdges[sampleEdge] = true;
for all the edges. uniqueEdges will hold all the unique edges.

Related

Optimizations for Raycasting

I've been wanting to build a 3D engine starting from scratch as a coding challenge with the end objective of implementing it on a fantasy console.
The best (i.e. most simple?) way I found was raytracing/raycasting. I haven't found much by looking online for raycasting algorithms, only finding point-in-polygon problems (which only tell me whether a ray intersects a polygon or not, not quite my interest since I wouldn't have info about the first intersection nor I'd have the intersection points).
The only solution I could think of is brute forcing the ray by moving it at small intervals and every time check whether that point is occupied by something or not (which would require having filled shapes and wouldn't let me have 2D shapes since they would never be rendered, although none of those is a problem). Still, it looks way too complex performance-wisely.
As far as I know, most of those problems are solved using linear algebra, but I'm not quite as competent as to build up a solution on my own. Does this problem have a practical solution?
EDIT: I just found an algebric solution in 2D which could maybe be expanded in 3D. The idea is:
For each edge, check whether one of the two vertices are in the field of view (i.e. if O is the origin of every ray and P is the vertex, you have to check first that the point is inside the far point of sight, and then whether the angle with the forward vector is less than the angle of vision). If at least one of the two vertices is inside the field of view, add them to an array E.
If we have an array R of rays to shoot and an array of arrays I of info about hit points, we can loop for each ray in R and for each edge in E and then store f(ray, edge) in I, where f is a function that gives us info on whether the ray and the edge collided and where they did.
f uses basic linear algebra: both the ray and the edge are, for all purposes, two segments. Two segments are just parts of two lines. Let's say that if the edge has vertices A and B (AB is the vector that goes from A to B) and if the far point is called P (OP is the vector that goes from O to P). We can create two lines, r and s, defined by A + ηAB and O + λOP. After we do a check to see whether r and s are parallel (check if the absolute value of the dot product of AB and OP is equal to the norm of AB times the norm of OP), it's trivial to get the values for η and λ.
Now, if η < 0 OR η > 1 we have that the two segments are not colliding.
After we've done this for every ray and every edge, we compare every element in each array i in I to see which one had the lowest λ. The lowest λ carries the first collision and hence the data to show on screen.
Everything here is linear algebra, though I fear that it might still be computationally heavy, since there's a lot going on, and it's still only 2D.

Why need to alternate dimension in kd-tree construction

I have a question regarding the way to partition the spaces in kd-tree algorithm.
Assuming I have points in the plane, with (x,y) coordinate. Assuming we're not in a particular situation when points are in the same line. I was thinking why we need to alternate the splitting coordinate, at one level, use x axis, the following level, use y axis. What matters if we use only x direction to split spaces, we always have a binary tree, and search algorithm always take log(n) in average (assuming we have relatively well balanced tree).
What give me more when I split space by alternating splitting directions? I wonder if it's related to some general probabilistic properties in multi-dimension?

I think the problem comes when you start searching the tree, for example with a window query (rectangular query).
Lets assume a rectangular dataset with evenly distributed points between -1,000 and 1,000 in every direction.
If you sort by x, then a query that should return all point with (-900 < x < 900) && (1 < y < 10) may have to scan almost the whole tree.
The log(n) search would only work if your query only limits x and not y, i.e. (1<x<10) && (-inf<y<+inf).

KD Tree alternative/variant for weighted data

I'm using a static KD-Tree for nearest neighbor search in 3D space. However, the client's specifications have now changed so that I'll need a weighted nearest neighbor search instead. For example, in 1D space, I have a point A with weight 5 at 0, and a point B with weight 2 at 4; the search should return A if the query point is from -5 to 5, and should return B if the query point is from 5 to 6. In other words, the higher-weighted point takes precedence within its radius.
Google hasn't been any help - all I get is information on the K-nearest neighbors algorithm.
I can simply remove points that are completely subsumed by a higher-weighted point, but this generally isn't the case (usually a lower-weighted point is only partially subsumed, like in the 1D example above). I could use a range tree to query all points in an NxNxN cube centered on the query point and determine the one with the greatest weight, but the naive implementation of this is wasteful - I'll need to set N to the point with the maximum weight in the entire tree, even though there may not be a point with that weight within the cube, e.g. let's say the point with the maximum weight in the tree is 25, then I'll need to set N to 25 even though the point with the highest weight for any given cube probably has a much lower weight; in the 1D case, if I have a point located at 100 with weight 25 then my naive algorithm would need to set N to 25 even if I'm outside of the point's radius.
To sum up, I'm looking for a way that I can query the KD tree (or some alternative/variant) such that I can quickly determine the highest-weighted point whose radius covers the query point.
FWIW, I'm coding this in Java.
It would also be nice if I could dynamically change a point's weight without incurring too high of a cost - at present this isn't a requirement, but I'm expecting that it may be a requirement down the road.
Edit: I found a paper on a priority range tree, but this doesn't exactly address the same problem in that it doesn't account for higher-priority points having a greater radius.

Use an extra dimension for the weight. A point (x,y,z) with weight w is placed at (N-w,x,y,z), where N is the maximum weight.
Distances in 4D are defined by…
d((a, b, c, d), (e, f, g, h)) = |a - e| + d((b, c, d), (f, g, h))
…where the second d is whatever your 3D distance was.
To find all potential results for (x,y,z), query a ball of radius N about (0,x,y,z).

I think I've found a solution: the nested interval tree, which is an implementation of a 3D interval tree. Rather than storing points with an associated radius that I then need to query, I instead store and query the radii directly. This has the added benefit that each dimension does not need to have the same weight (so that the radius is a rectangular box instead of a cubic box), which is not presently a project requirement but may become one in the future (the client only recently added the "weighted points" requirement, who knows what else he'll come up with).

Calculating the distance between each pair of a set of points

So I'm working on simulating a large number of n-dimensional particles, and I need to know the distance between every pair of points. Allowing for some error, and given the distance isn't relevant at all if exceeds some threshold, are there any good ways to accomplish this? I'm pretty sure if I want dist(A,C) and already know dist(A,B) and dist(B,C) I can bound it by [dist(A,B)-dist(B,C) , dist(A,B)+dist(B,C)], and then store the results in a sorted array, but I'd like to not reinvent the wheel if there's something better.
I don't think the number of dimensions should greatly affect the logic, but maybe for some solutions it will. Thanks in advance.

If the problem was simply about calculating the distances between all pairs, then it would be a O(n^2) problem without any chance for a better solution. However, you are saying that if the distance is greater than some threshold D, then you are not interested in it. This opens the opportunities for a better algorithm.
For example, in 2D case you can use the sweep-line technique. Sort your points lexicographically, first by y then by x. Then sweep the plane with a stripe of width D, bottom to top. As that stripe moves across the plane new points will enter the stripe through its top edge and exit it through its bottom edge. Active points (i.e. points currently inside the stripe) should be kept in some incrementally modifiable linear data structure sorted by their x coordinate.
Now, every time a new point enters the stripe, you have to check the currently active points to the left and to the right no farther than D (measured along the x axis). That's all.
The purpose of this algorithm (as it is typically the case with sweep-line approach) is to push the practical complexity away from O(n^2) and towards O(m), where m is the number of interactions we are actually interested in. Of course, the worst case performance will be O(n^2).
The above applies to 2-dimensional case. For n-dimensional case I'd say you'll be better off with a different technique. Some sort of space partitioning should work well here, i.e. to exploit the fact that if the distance between partitions is known to be greater than D, then there's no reason to consider the specific points in these partitions against each other.

If the distance beyond a certain threshold is not relevant, and this threshold is not too large, there are common techniques to make this more efficient: limit the search for neighbouring points using space-partitioning data structures. Possible options are:
Binning.
Trees: quadtrees(2d), kd-trees.
Binning with spatial hashing.
Also, since the distance from point A to point B is the same as distance from point B to point A, this distance should only be computed once. Thus, you should use the following loop:
for point i from 0 to n-1:
for point j from i+1 to n:
distance(point i, point j)
Combining these two techniques is very common for n-body simulation for example, where you have particles affect each other if they are close enough. Here are some fun examples of that in 2d: http://forum.openframeworks.cc/index.php?topic=2860.0
Here's a explanation of binning (and hashing): http://www.cs.cornell.edu/~bindel/class/cs5220-f11/notes/spatial.pdf

Finding shared vertices among polygons

I have an interesting problem coming up soon and I've started to think about the algorithm. The more I think about it, the more I get frightened because I think it's going to scale horribly (O(n^4)), unless I can get smart. I'm having trouble getting smart about this one. Here's a simplified description of the problem.
I have N polygons (where N can be huge >10,000,000) that are stored as a list of M vertices (where M is on the order of 100). What I need to do is for each polygon create a list of any vertices that are shared among other polygons (Think of the polygons as surrounding regions of interest, sometimes the regions but up against each other). I see something like this
Polygon i | Vertex | Polygon j | Vertex
1 1 2 2
1 2 2 3
1 5 3 1
1 6 3 2
1 7 3 3
This mean that vertex 1 in polygon 1 is the same point as vertex 2 in polygon 2, and vertex 2 in polygon 1 is the same point as vertex 3 in polygon 2. Likewise vertex 5 in polygon 1 is the same as vertex 1 in polygon 3....
For simplicity, we can assume that polygons never overlap, the closest they get is touching at the edge, and that all the vertices are integers (to make the equality easy to test).
The only thing I can thing of right now is for each polygon I have to loop over all of the polygons and vertices giving me a scaling of O(N^2*M^2) which is going to be very bad in my case. I can have very large files of polygons, so I can't even store it all in RAM, so that would mean multiple reads of the file.
Here's my pseudocode so far
for i=1 to N
Pi=Polygon(i)
for j = i+1 to N
Pj=Polygon(j)
for ii=1 to Pi.VertexCount()
Vi=Pi.Vertex(ii)
for jj=1 to Pj.VertexCount()
Vj=Pj.Vertex(jj)
if (Vi==Vj) AddToList(i,ii,j,jj)
end for
end for
end for
end for
I'm assuming that this has come up in the graphics community (I don't spend much time there, so I don't know the literature). Any Ideas?

This is a classic iteration-vs-memory problem. If you're comparing every polygon with every other polygon, you'll run into a O(n^2) solution. If you build a table as you step through all the polygons, then march through the table afterwards, you get a nice O(2n) solution. I ask a similar question during interviews.
Assuming you have the memory available, you want to create a multimap (one key, multiple entries) with each vertex as the key, and the polygon as the entry. Then you can walk each polygon exactly once, inserting the vertex and polygon into the map. If the vertex already exists, you add the polygon as an additional entry to that vertex key.
Once you've hit all the polygons, you walk the entire map once and do whatever you need to do with any vertex that has more than one polygon entry.

If you have the polygon/face data you don't even need to look at the vertices.
Create an array from [0..M] (where M is the number of verts)
iterate over the polygons and increment the array entry of each vertex index.
This gives you an array that describes how many times each vertex is used.*
You can then do another pass over the polygons and check the entry for each vertex. If it's > 1 you know that vertex is shared by another polygon.
You can build upon this strategy further if you need to store/find other information. For example instead of a count you could store polygons directly in the array allowing you to get a list of all faces that use a given vertex index. At this point you're effectively creating a map where vertex indices are the key.
(*this example assumes you have no degenerate polygons, but those could easily be handled).

Well, one simple optimization would be to make a map (hashtable, probably) that maps each distinct vertex (identified by its coordinates) to a list of all polygons of which it is a part. That cuts down your runtime to something like O(NM) - still large but I have my doubts that you could do better, since I can't imagine any way to avoid examining all the vertices.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string