I have a the above model represented in a Face Table List where the F1, F2,...F_n are the faces of the model and their face number is the index of the list array. Each list element is another array of 3 vertices. And each vertex is an array of 3 integers representing its x,y,z coordinates.
I want to find out all the neighbouring faces of the vertex with coordinates (x2, y2, z2). I came out with this code that I believe would do the task:
List faceList; //say the faceList is the table in the picture above.
int[] targetVertex = {x2, y2, z2}; //say this is the vertex I want to find with coordinates (x2, y2, z2)
List faceIndexFoundList; //This is the result, which is a list of index of the neighbouring faces of the targetVertex
for(int i=0; i<faceList.length; i++) {
bool vertexMatched = true;
for(int j=0; j<faceList[i].length; j++) {
if(faceList[i][j][0] != targetVertex[0] && faceList[i][j][1] != targetVertex[1] && faceList[i][j][2] != targetVertex[2]) {
vertexMatched = false;
break;
}
}
if(vertexMatched == true) {
faceIndexFoundList.add(i);
}
}
I was told that the complexity to do the task is O(N^2). But with the code that I have, it looks like only O(N). The length of targetVertex is 3 since there is only 3 vertices per polygon. So, the second inner loop is merely a constant. Then, I left only with the outer for loop, which is then O(N) only.
What is the complexity of the code that I have above? What could I have done wrong?
The complexity is (aproximatly) faceList.length * faceList[i].length, these are independent, but can both grow very large, and as they grow they will each approch infinity at which point (conceptually) they will converge on n, resulting in the complexity being O(n^2)
If the vertex list is explicitly limited to 3, then the complexity becomes faceList[i].length * 3, which is O(n)
It's pretty obvious that in the worst case you must look at each vertex of each polygon.
This is just O(size of the table) in your post, which in turn is the sum of all row lengths or the sum of all polygon vertex counts, whichever you prefer.
If you say polygons have no more than m vertices and there are n polygons, then the algorithm is O(mn).
FWIW it's possible to get the answer with no searching at all with a more sophisticated data structure. See for example the winged edge data structure and others. In this case, you just go to the vertex you're interested in and traverse the links that connect all adjacent polygons. Cost is constant for each polygon in the output.
These fancier data structures for polygonal meshes support lots of frequently used operations with wonderful efficiency.
From Wikipedia:
Big O notation is used to describe the limiting behavior of a function when the argument tends towards a particular value or infinity.
In this single case you might only be running the one for loop. But what happens when the number of vertices of the polygon approaches infinity? Do the majority of the cases cause the second for loop to run, or to break? This will determine whether your function is O(n) or O(n^2).
Related
I'm interested in a fast way to calculate the rotation-independent center of a simple, convex, (non-intersecting) 2D polygon.
The example below (on the left) shows the mean center (sum of all points divided by the total), and the desired result on the right.
Some options I've already considered.
bound-box center (depends on rotation, and ignores points based on their relation to the axis).
Straight skeleton - too slow to calculate.
I've found a way which works reasonably well, (weight the points by the edge-lengths) - but this means a square-root call for every edge - which I'd like to avoid.(Will post as an answer, even though I'm not entirely satisfied with it).
Note, I'm aware of this questions similarity with:What is the fastest way to find the "visual" center of an irregularly shaped polygon?
However having to handle convex polygons increases the complexity of the problem significantly.
The points of the polygon can be weighted by their edge length which compensates for un-even point distribution.
This works for convex polygons too but in that case the center point isn't guaranteed to be inside the polygon.
Psudo-code:
def poly_center(poly):
sum_center = (0, 0)
sum_weight = 0.0
for point in poly:
weight = ((point - point.next).length +
(point - point.prev).length)
sum_center += point * weight
sum_weight += weight
return sum_center / sum_weight
Note, we can pre-calculate all edge lengths to halve the number of length calculations, or reuse the previous edge-length for half+1 length calculations. This is just written as an example to show the logic.
Including this answer for completeness since its the best method I've found so far.
There is no much better way than the accumulation of coordinates weighted by the edge length, which indeed takes N square roots.
If you accept an approximation, it is possible to skip some of the vertices by curve simplification, as follows:
decide of a deviation tolerance;
start from vertex 0 and jump to vertex M (say M=N/2);
check if the deviation along the polyline from 0 to M exceeds the tolerance (for this, compute the height of the triangle formed by the vertices 0, M/2, M);
if the deviation is exceeded, repeat recursively with 0, M/4, M/2 and M/2, 3M/4, M;
if the deviation is not exceeded, assume that the shape is straight between 0 and M.
continue until the end of the polygon.
Where the points are dense (like the left edge on your example), you should get some speedup.
I think its easiest to do something with the center of masses of the delaunay triangulation of the polygon points. i.e.
def _centroid_poly(poly):
T = spatial.Delaunay(poly).simplices
n = T.shape[0]
W = np.zeros(n)
C = 0
for m in range(n):
sp = poly[T[m,:],:]
W[m] = spatial.ConvexHull(sp).volume
C += W[m] +np.mean(sp, axis = 0)
return C / np.sum(W)
This works well for me!
I'm developping a multiplayer game with node.js. Every second I get the coordinates (X, Y, Z) of every player. How can I have, for each player a list of all players located closer than a given distance from him ?
Any idea to avoid a O(n²) calculation?
You are not looking for clustering algorithms.
Instead, you are looking for a database index that supports radius queries.
Examples:
R*-tree
kd-tree
M-tree
Gridfile
Octree (for 3d, quadtree for 2d)
Any of these should do the trick, and yield an O(n log n) performance theoretically. In practise, it's not as easy as this. If all your objects are really close, "closer than a given coordinate" may mean every object, i.e. O(n^2).
What you are looking for is a quadtree in 3 dimensions, i.e. an octree. An octree is basically the same as the binary tree, but instead of two children per node, it has 2^D = 2^3 = 8 children per node, where D is the dimension.
For example, imagine a cube. In order to create the next level of the root, you actually have every node representing the 8 sub-cubes inside the cube and so on.
This tree will yield fast lookups but careful not to use it for more dimensions. I had built a polymorphic quadtree and wouldn't go to more than 8-10 dimensions, because it was becoming too flat.
The other approach would be the kd-tree, where actually you halve the dataset (the players) at every step.
You could use a library that provides nearest neighbour searching.
I'm answering my own question because I have the answer now. Thanks to G. Samaras and Anony-Mousse:
I use a kd-tree algorithm:
First I build the tree with all the players
Then for each player I calculate the list of all the players within given range arround this player
This is very fast and easy with the npm module kdtree: https://www.npmjs.org/package/kdtree
var kd = require('kdtree');
var tree = new kd.KDTree(3); // A new tree for 3-dimensional points
var players = loadPlayersPosition(); // players is an array containing all the positions
for (var p in players){ //let's build the tree
tree.insert(players[p].x, players[p].y, players[p].z, players[p].username);
}
nearest = [];
for (var p in players){ //let's look for neighboors
var RANGE = 1000; //1km range
close = tree.nearestRange(players[p].x, players[p].y, players[p].z, RANGE);
nearest.push(close);
}
It returns nearest that is an array conataining for each player all his neighboors within a range of 1000m. I made some tests on my PC with 100,000 simulated players. It takes only 500 ms to build the tree and another 500 ms to find the nearest neigboors pairs. I find it very fast for such a big number of players.
bonus: if you need to do this with latitude and longitude instead of x, y, z, just convert lat, lon to cartesian x, y z, because for short distances chord distance on a sphere ~ great circle distance
I have an NSDictionary of about 2000 locations with lat and long and I am dropping pins on map based on if they are in the visible map region.
Currently every time the pan the map I simply loop through my dictionary and calculate the distance to see if the location is visible, if so drop a pin.
CLLocationCoordinate2D centre = [self.map centerCoordinate];
CLLocation *mapCenter =[[CLLocation alloc] initWithLatitude: centre.latitude longitude: centre.longitude];
for (int i=0; i < [self.dealersSource count]; i++) {
CLLocation *d = [[CLLocation alloc] initWithLatitude: [[[self.dealersSource objectAtIndex:i] valueForKey:#"lat"] floatValue]
longitude: [[[self.dealersSource objectAtIndex:i] valueForKey:#"long"] floatValue]];
CLLocationDistance distance = [d distanceFromLocation:mapCenter];
float dist =(distance/1609.344);
if (dist <= radius && dist !=0) {
// this will be visible on the map, add to list of annotations
}
}
This works but seems pretty inefficient and can be slow on older iPads - especially if more and more locations get added to this list. I would like to be able to use some sort of NSPredicate to filter my initial list before I start looping though them.
There is not really any standard Objective-C structure that is well-suited to finding values within a range -- you pretty much have to search one-by-one (though you can use "predicates" to "hide" the search inside filteredArray... operations, etc, and so write fewer lines of code).
The best structure for efficiently finding values between bounds on a line is probably an array sorted on the values which is searched with a binary search algorithm. You'd do one binary search for the lower bound an another for the upper bound. This is log(n) complexity, so fairly efficient for large lists (if you don't have to sort the lists very often).
Precisely how one would do this for a 2-d surface is harder to figure. Perhaps first use the above technique to find "candidates" in the X direction, then check their Y coordinate. Would not be log(n), though.
I have a set of m non-rotated, integer (pixel) aligned rectangles, each of which may or may not overlap. The rectangles cover thousands of pixels. I need to find the minimum sized bounding box that covers all areas that are covered by n of the m rectangles.
A (dirty) way of doing this is to paint a canvas that covers the area of all the targets. This is O(mk) where m is the number of rectangles and k is the number of pixels per rectangle. However since k is much greater than m I think there is a better solution out there.
This feels like a dynamic programming problem...but I am having trouble figuring out the recursion.
Solution which is better but still not great:
Sort the start and end points of all the rectangles in the X direction O(mlogm), iterate and find the x positions that may have over n rectangles, O(m) loop. For each x position that may have over n rectangles, take the rectangles at that position and sort the starts and stops at that position (O(mlogm)). Find the region of overlap, keep track of the bounds that way. Overall, O(m^2logm).
Hello MadScienceDreams,
Just to clarify, the bounding box is also non-rotated, correct?
If this is the case, then just keep track of the four variables: minX, maxX, minY, maxY–representing left-most, right-most, top-most, and bottom-most pixels–that define the bounding box, loop through each of the rectangles updating the four variables, and defining the new bounding box given those four variables.
EDIT
It looks like you are asking about finding the bounds of some subset of rectangles, not the whole set.
So you have M rectangles, and you choose N rectangles from them, and find the bounds within that.
Even in this situation, looping through the N rectangles and keeping track of their bound would be at most O(m), which isn't bad at all.
I feel that I must be misunderstanding your question since this response isn't what you are probably looking for; is your question actually trying to ask how to precompute the bounds so that given any subset, know the total bounds in constant time?
Is this defines your question? For bounding box => #rect_label >= n
How about we starts with one box and find the next box that has nearest furthest corner from it. Now we have a region with two box. Recursively find the next region, until we have n boxes.
While we need to start on every box, we only need to actively work on the currently smallest regions. The effect is we start from the smallest cluster of boxes and expand out from there.
If n is closer to m than 0, we can reverse the search tree so that we start from the omni-all-enclosing box, chopping off each bordering box to create the next search level. Assuming we only actively work on the smallest remaining region, effect is we chop off the emptiest region first.
Is it too complicated? Sorry I can't remember the name of this search. I'm not good at maths, so I'll skip the O notation. >_<
I propose the following algorithm :
prepareData();
if (findBorder('left')) {
foreach (direction in ['top', 'right', 'bottom']) {
findBorder(direction)
}
} else noIntersectionExists
prepareData (O(mlogm)):
Order vertical bounds and horizontal bounds
Save the result as:
- two arrays that point to the rectangle (arrX and arrY)
- save the index as a property of the rectangle (rectangle.leftIndex, rectangle.topIndex, etc.
findBorder(left): // the other direction are similar
best case O(n), worst case O(2m-n)
arrIntersections = new Array;
//an intersection has a depth (number of rectangles intersected), a top and bottom border and list of rectangles
for(i=0; i < 2*m-n-1; i++){ // or (i = 2*m-1; i > n; i--)
if(isLeftBorder(arrX[i])){
addToIntersections(arrX[i].rectangle, direction);
if(max(intersections.depth) = n) break;
} else {
removeFromIntersections(arrX[i].rectangle, direction);
}
}
addToIntersections(rectangle, direction): // explanations for direction=left
Best case: O(n), worst case: O(m)
hasIntersected = false;
foreach(intersection in intersection){
if(intersect(intersection, rectangle)){
hasIntersected = true
intersections[] = {
depth: intersection.depth,
bottom: min(intersection.bottom, rectangle.bottom),
top: max(...)}
intersection.depth++
intersection.bottom = max(intersection.bottom, rectangle.bottom)
intersection.top = max(...)
}
}
if(!hasIntersected)
intersections[]={depth:1, bottom:rectangle.bottom, top:rectangle.top}
This gives an overall order between O(n^2) and O(m*(m-n/2))
I hope my pseudo code is clear enough
I would like to do an algebraic curve fit of 2D data points, but for various reasons - it isn't really possible to have much of the sample data in memory at once, and iterating through all of it is an expensive process.
(The reason for this is that actually I need to fit thousands of curves simultaneously based on gigabytes of data which I'm reading off disk, and which is therefore sloooooow).
Note that the number of polynomial coefficients will be limited (perhaps 5-10), so an exact fit will be extremely unlikely, but this is ok as I'm trying to find an underlying pattern in data with a lot of random noise.
I understand how one can use a genetic algorithm to fit a curve to a dataset, but this requires many passes through the sample data, and thus isn't practical for my application.
Is there a way to fit a curve with a single pass of the data, where the state that must be maintained from sample to sample is minimal?
I should add that the nature of the data is that the points may lie anywhere on the X axis between 0.0 and 1.0, but the Y values will always be either 1.0 or 0.0.
So, in Java, I'm looking for a class with the following interface:
public interface CurveFit {
public void addData(double x, double y);
public List<Double> getBestFit(); // Returns the polynomial coefficients
}
The class that implements this must not need to keep much data in its instance fields, no more than a kilobyte even for millions of data points. This means that you can't just store the data as you get it to do multiple passes through it later.
edit: Some have suggested that finding an optimal curve in a single pass may be impossible, however an optimal fit is not required, just as close as we can get it in a single pass.
The bare bones of an approach might be if we have a way to start with a curve, and then a way to modify it to get it slightly closer to new data points as they come in - effectively a form of gradient descent. It is hoped that with sufficient data (and the data will be plentiful), we get a pretty good curve. Perhaps this inspires someone to a solution.
Yes, it is a projection. For
y = X beta + error
where lowercased terms are vectors, and X is a matrix, you have the solution vector
\hat{beta} = inverse(X'X) X' y
as per the OLS page. You almost never want to compute this directly but rather use LR, QR or SVD decompositions. References are plentiful in the statistics literature.
If your problem has only one parameter (and x is hence a vector as well) then this reduces to just summation of cross-products between y and x.
If you don't mind that you'll get a straight line "curve", then you only need six variables for any amount of data. Here's the source code that's going into my upcoming book; I'm sure that you can figure out how the DataPoint class works:
Interpolation.h:
#ifndef __INTERPOLATION_H
#define __INTERPOLATION_H
#include "DataPoint.h"
class Interpolation
{
private:
int m_count;
double m_sumX;
double m_sumXX; /* sum of X*X */
double m_sumXY; /* sum of X*Y */
double m_sumY;
double m_sumYY; /* sum of Y*Y */
public:
Interpolation();
void addData(const DataPoint& dp);
double slope() const;
double intercept() const;
double interpolate(double x) const;
double correlate() const;
};
#endif // __INTERPOLATION_H
Interpolation.cpp:
#include <cmath>
#include "Interpolation.h"
Interpolation::Interpolation()
{
m_count = 0;
m_sumX = 0.0;
m_sumXX = 0.0;
m_sumXY = 0.0;
m_sumY = 0.0;
m_sumYY = 0.0;
}
void Interpolation::addData(const DataPoint& dp)
{
m_count++;
m_sumX += dp.getX();
m_sumXX += dp.getX() * dp.getX();
m_sumXY += dp.getX() * dp.getY();
m_sumY += dp.getY();
m_sumYY += dp.getY() * dp.getY();
}
double Interpolation::slope() const
{
return (m_sumXY - (m_sumX * m_sumY / m_count)) /
(m_sumXX - (m_sumX * m_sumX / m_count));
}
double Interpolation::intercept() const
{
return (m_sumY / m_count) - slope() * (m_sumX / m_count);
}
double Interpolation::interpolate(double X) const
{
return intercept() + slope() * X;
}
double Interpolation::correlate() const
{
return m_sumXY / sqrt(m_sumXX * m_sumYY);
}
Why not use a ring buffer of some fixed size (say, the last 1000 points) and do a standard QR decomposition-based least squares fit to the buffered data? Once the buffer fills, each time you get a new point you replace the oldest and re-fit. That way you have a bounded working set that still has some data locality, without all the challenges of live stream (memoryless) processing.
Are you limiting the number of polynomial coefficients (i.e. fitting to a max power of x in your polynomial)?
If not, then you don't need a "best fit" algorithm - you can always fit N data points EXACTLY to a polynomial of N coefficients.
Just use matrices to solve N simultaneous equations for N unknowns (the N coefficients of the polynomial).
If you are limiting to a max number of coefficients, what is your max?
Following your comments and edit:
What you want is a low-pass filter to filter out noise, not fit a polynomial to the noise.
Given the nature of your data:
the points may lie anywhere on the X axis between 0.0 and 1.0, but the Y values will always be either 1.0 or 0.0.
Then you don't need even a single pass, as these two lines will pass exactly through every point:
X = [0.0 ... 1.0], Y = 0.0
X = [0.0 ... 1.0], Y = 1.0
Two short line segments, unit length, and every point falls on one line or the other.
Admittedly, an algorithm to find a good curve fit for arbitrary points in a single pass is interesting, but (based on your question), that's not what you need.
Assuming that you don't know which point should belong to which curve, something like a Hough Transform might provide what you need.
The Hough Transform is a technique that allows you to identify structure within a data set. One use is for computer vision, where it allows easy identification of lines and borders within the field of sight.
Advantages for this situation:
Each point need be considered only once
You don't need to keep a data structure for each candidate line, just one (complex, multi-dimensional) structure
Processing of each line is simple
You can stop at any point and output a set of good matches
You never discard any data, so it's not reliant on any accidental locality of references
You can trade off between accuracy and memory requirements
Isn't limited to exact matches, but will highlight partial matches too.
An approach
To find cubic fits, you'd construct a 4-dimensional Hough space, into which you'd project each of your data-points. Hotspots within Hough space would give you the parameters for the cubic through those points.
You need the solution to an overdetermined linear system. The popular methods are Normal Equations (not usually recommended), QR factorization, and singular value decomposition (SVD). Wikipedia has decent explanations, Trefethen and Bau is very good. Your options:
Out-of-core implementation via the normal equations. This requires the product A'A where A has many more rows than columns (so the result is very small). The matrix A is completely defined by the sample locations so you don't have to store it, thus computing A'A is reasonably cheap (very cheap if you don't need to hit memory for the node locations). Once A'A is computed, you get the solution in one pass through your input data, but the method can be unstable.
Implement an out-of-core QR factorization. Classical Gram-Schmidt will be fastest, but you have to be careful about stability.
Do it in-core with distributed memory (if you have the hardware available). Libraries like PLAPACK and SCALAPACK can do this, the performance should be much better than 1. The parallel scalability is not fantastic, but will be fine if it's a problem size that you would even think about doing in serial.
Use iterative methods to compute an SVD. Depending on the spectral properties of your system (maybe after preconditioning) this could converge very fast and does not require storage for the matrix (which in your case has 5-10 columns each of which are the size of your input data. A good library for this is SLEPc, you only have to find a the product of the Vandermonde matrix with a vector (so you only need to store the sample locations). This is very scalable in parallel.
I believe I found the answer to my own question based on a modified version of this code. For those interested, my Java code is here.