Minimum cost arborescence of a specific subset of vertices - search

I can use the Chu-Liu/Edmonds algorithm to get the minimum cost arborescence of a directed weighted graph. I want to apply this for a pre-specified subset of k vertices. (I know exactly which k vertices must be included in the the tree).
What are the steps required to apply the Chu-Liu/Edmonds algorithm?
Similar to Construct a minimum spanning tree covering a specific subset of the vertices, but for directed graphs / minimum cost arborescence (vs. undirected graphs / minimum spanning tree).

Related

How to quickly find k nearest points to a plane in 3 or more dimensions

I need to quickly find the k nearest points to a plane (or hyperplane) in 3 (or more) dimensions. Is there a fast way to perform this search, using some sort of clever data structure (similar to how a kd-tree works for k nearest neighbors)?
I know I can rotate the plane and all the points so that the plane is orthogonal to one of the axes, then measure the distance of each point to the plane by simply using the ordinate in that axis. However, the time complexity of this brute force approach is O(N), where (N) is the number of points. Since I have to find the k nearest neighbors for a large number of planes and a large number of points, I need to find a faster algorithm if possible.
I think you can simply use any spatial data structure (kd-tree, R-tree, ...) that supports custom distance functions. You should be able to define a distance function that simply uses the distance to the plane instead of distance to a center point.
How to calculate this distance is described by #Spektre.
I have no idea how that scales, because it may depend on the kNN search algorithm used by the implementation.
However, I believe the standard algorithm (Hjaltason and Samet: "Distance browsing in spatial databases.") should at least be better than O(n).
In case you are using Java, the R-Tree, Quadtree and PH-Tree indexes in my TinSpin library all use this algorithm.
measure the distance by using dot product with hyper plane normal... So let:
n - be the hyperplane normal unit vector
p0 - be any point point on the hyperplane
p[i] - be i-th point from your pointcloud i={ 0,1,2...n-1 }
then the distance to hyperplane plane is:
d = |dot( p[i] - p0 , n )|
as you can see no need to transform/align anything and its O(1) without any expensive operations. I expect that any pre sorting of points or using clever structures will be slower than this in most cases...
Now you got 2 options either compute the d for each point and then quick sort which leads to O(n.log(n)) time and O(n) space complexity.
Or remember k closest points on the run leading to O(k*n) time and O(k) space.
So if k is small (k < log(n)) or you have not enough memory to spare use second approach otherwise use first one ...

Finding 1-hop, 2-hop, ..., k-hop neighbors in Python using Networkx

I am trying to find 1-hop, 2-hop, and if needed, k-hop neighbors of some specific nodes (lets say l nodes) in a graph using nx.single_source_dijkstra_path_length.
what is the time complexity according to each step (1-hop, 2-hop, ...), and
is there any faster algorithm?
If you are looking at unweighted graph, you can use breadth-first-search and the time complexity for small k should be on average O(<k>^k), where <k>is the average degree of the regarded graph.
If you want to calculate multiple distances in a weighted graph, you should rather use the multi_source_dijkstra_path_length. I am not sure which runtime this algorithm has, but probably it is an improvement in comparison to multiple runs of Dijkstra, which has O(|V|log(|V|)+|E|) (depending on the implementation).
If you want to use a threshold the maximal distance in a weighted graph, it probably depends on the weight distribution on your edges and the minimal or average edge weight, which influences the number of nodes needed to calculate to reach the threshold.

Uniform spatial bins on surface of a sphere

Is there a spatial lookup grid or binning system that works on the surface of a (3D) sphere? I have the requirements that
The bins must be uniform (so you can look up in constant time if there exists a point r distance away from any spot on the sphere, given constant r.)†
The number of bins must be at most linear with the surface area of the sphere. (Alternatively, increasing the surface resolution of the grid shouldn’t make it grow faster than the area it maps.)
I’ve already considered
Spherical coordinates: not good because the cells created are extremely nonuniform making it useless for proximity testing.
Cube meshes: Less distortion than spherical coordinates, but still very difficult to determine which cells to search for a given query.
3D voxel binning: Wastes the entire interior volume of the sphere with empty bins that will never be used (as well as the empty bins at the 6 corners of the bounding cube). Space requirements grow with O(n sqrt(n)) with increasing sphere surface area.
kd-Trees: perform poorly in 3D and are technically logarithmic complexity, not constant per query.
My best idea for a solution involves using the 3D voxel binning method, but somehow excluding the voxels that the sphere will never intersect. However I have no idea how to determine which voxels to exclude, nor how to calculate an index into such a structure given a query location on the sphere.
† For what it’s worth the points have a minimum spacing so a good grid really would guarantee constant lookup.
My suggestion would be a variant of the spherical coordinates, such that the polar angle is not sampled uniformly but instead the sine of this angle is sampled uniformly. This way, the element of area sinφ dφ dΘ is kept constant, leading to tiles of the same area (though variable aspect ratio).
At the poles, merge all tiles in a single disk-like polygon.
Another possibility is to project a regular icosahedron onto the sphere and to triangulate the spherical triangles so obtained. This takes a little of spherical trigonometry.
I had a similar problem and used "sparse" 3D voxel binning. Basically, my spatial index is a hash map from (x, y, z) coordinates to bins.
Because I also had a minimum distance constraint on my points, I chose the bin size such that a bin can contain at most one point. This is accomplished if the edge of the (cubic) bins is at most d / sqrt(3), where d is the minimum separation of two points on the sphere. The advantage is that you can represent a full bin as a single point, and an empty bin can just be absent from the hash map.
My only query was for points within a radius d (the same d), which then requires scanning the surrounding 125 bins (a 5×5×5 cube). You could technically leave off the 8 corners to get this down to 117, but I didn't bother.
An alternative for the bin size is to optimize it for queries rather than storage size and simplicity, and choose it such that you always have to scan at most 27 bins (a 3×3×3 cube). That would require a bin edge length of d. I think (but haven't thought hard about it) that a bin could contain up to 4 points in that case. You could represent these with a fixed-size array to save one pointer indirection.
In either case, the memory usage of your spatial index will be O(n) for n points, so it doesn't get any better than that.

Python: How to find the MAXIMUM spanning tree of a graph [duplicate]

Does the opposite of Kruskal's algorithm for minimum spanning tree work for it? I mean, choosing the max weight (edge) every step?
Any other idea to find maximum spanning tree?
Yes, it does.
One method for computing the maximum weight spanning tree of a network G –
due to Kruskal – can be summarized as follows.
Sort the edges of G into decreasing order by weight. Let T be the set of edges comprising the maximum weight spanning tree. Set T = ∅.
Add the first edge to T.
Add the next edge to T if and only if it does not form a cycle in T. If
there are no remaining edges exit and report G to be disconnected.
If T has n−1 edges (where n is the number of vertices in G) stop and
output T . Otherwise go to step 3.
Source: https://web.archive.org/web/20141114045919/http://www.stats.ox.ac.uk/~konis/Rcourse/exercise1.pdf.
From Maximum Spanning Tree at Wolfram MathWorld:
"A maximum spanning tree is a spanning tree of a weighted graph having maximum weight. It can be computed by negating the weights for each edge and applying Kruskal's algorithm (Pemmaraju and Skiena, 2003, p. 336)."
If you invert the weight on every edge and minimize, do you get the maximum spanning tree? If that works you can use the same algorithm. Zero weights will be a problem, of course.
Although this thread is too old, I have another approach for finding the maximum spanning tree (MST) in a graph G=(V,E)
We can apply some sort Prim's algorithm for finding the MST. For that I have to define Cut Property for the maximum weighted edge.
Cut property: Let say at any point we have a set S which contains the vertices that are in MST( for now assume it is calculated somehow ). Now consider the set S/V ( vertices not in MST ):
Claim: The edge from S to S/V which has the maximum weight will always be in every MST.
Proof: Let's say that at a point when we are adding the vertices to our set S the maximum weighted edge from S to S/V is e=(u,v) where u is in S and v is in S/V. Now consider an MST which does not contain e. Add the edge e to the MST. It will create a cycle in the original MST. Traverse the cycle and find the vertices u' in S and v' in S/V such that u' is the last vertex in S after which we enter S/V and v' is the first vertex in S/V on the path in cycle from u to v.
Remove the edge e'=(u',v') and the resultant graph is still connected but the weight of e is greater than e' [ as e is the maximum weighted edge from S to S/V at this point] so this results in an MST which has sum of weights greater than original MST. So this is a contradiction. This means that edge e must be in every MST.
Algorithm to find MST:
Start from S={s} //s is the start vertex
while S does not contain all vertices
do
{
for each vertex s in S
add a vertex v from S/V such that weight of edge e=(s,v) is maximum
}
end while
Implementation:
we can implement using Max Heap/Priority Queue where the key is the maximum weight of the edge from a vertex in S to a vertex in S/V and value is the vertex itself. Adding a vertex in S is equal to Extract_Max from the Heap and at every Extract_Max change the key of the vertices adjacent to the vertex just added.
So it takes m Change_Key operations and n Extract_Max operations.
Extract_Min and Change_Key both can be implemented in O(log n). n is the number of vertices.
So This takes O(m log n) time. m is the number of edges in the graph.
Let me provide an improvement algorithm:
first construct an arbitrary tree (using BFS or DFS)
then pick an edge outside the tree, add to the tree, it will form a cycle, drop the smallest weight edge in the cycle.
continue doing this util all the rest edges are considered
Thus, we'll get the maximum spanning tree.
This tree satisfies any edge outside the tree, if added will form a cycle and the edge outside <= any edge weights in the cycle
In fact, this is a necessary and sufficient condition for a spanning tree to be maximum spanning tree.
Pf.
Necessary: It's obvious that this is necessary, or we could swap edge to make a tree with a larger sum of edge weights.
Sufficient: Suppose tree T1 satisfies this condition, and T2 is the maximum spanning tree.
Then for the edges T1 ∪ T2, there're T1-only edges, T2-only edges, T1 ∩ T2 edges, if we add a T1-only edge(x1, xk) to T2, we know it will form a cycle, and we claim, in this cycle there must exist one T2-only edge that has the same edge weights as (x1, xk). Then we can exchange these edges will produce a tree with one more edge in common with T2 and has the same sum of edge weights, repeating doing this we'll get T2. so T1 is also a maximum spanning tree.
Prove the claim:
suppose it's not true, in the cycle we must have a T2-only edge since T1 is a tree. If none of the T2-only edges has a value equal to that of (x1, xk), then each of T2-only edges makes a loop with tree T1, then T1 has a loop leads to a contradiction.
This algorithm taken from UTD professor R. Chandrasekaran's notes. You can refer here: Single Commodity Multi-terminal Flows
Negate the weight of original graph and compute minimum spanning tree on the negated graph will give the right answer. Here is why: For the same spanning tree in both graphs, the weighted sum of one graph is the negation of the other. So the minimum spanning tree of the negated graph should give the maximum spanning tree of the original one.
Only reversing the sorting order, and choosing a heavy edge in a vertex cut does not guarantee a Maximum Spanning Forest (Kruskal's algorithm generates forest, not tree). In case all edges have negative weights, the Max Spanning Forest obtained from reverse of kruskal, would still be a negative weight path. However the ideal answer is a forest of disconnected vertices. i.e. a forest of |V| singleton trees, or |V| components having total weight of 0 (not the least negative).
Change the weight in a reserved order(You can achieve this by taking a negative weight value and add a large number, whose purpose is to ensure non-negative) Then run your family geedy-based algorithm on the minimum spanning tree.

find orthonormal basis for a planar 3D ( possibly degenerate) polygon

Given a general planar 3D polygon, is there a general way to find the orthonormal basis for that planar polygon?
The most straight forward way to do it is to assume to take the first 3 points of the polygon, and form two vectors each, and these are the two orthonormal basis vectors that we are looking for. But the problem for this approach is that these 3 points may line on the same line in the polygon, and hence instead of getting two orthonormal vectors, we get only one.
Another approach to find the second orthonormal vector is to loop through the polygon and find another point that forms a different orthonormal vector than the first one, but this approach is susceptible to numerical errors (e.g, what if the second vector is almost the same with the first vector? The numerical errors can be significant).
Is there any other better approach?
You can use the cross product of any two lines connected by any two vertices. If the cross product is too low then you're in degenerate territory.
You can also take the centroid (the avg of the points, which is guaranteed to lie on the same plane) and do pick the largest of any two cross products of the vectors from the centroid to any vertex. This will be the most accurate normal. Please note that if the largest cross product is small, you may have an inaccurate normal.
If you can't find any cross product that isn't close to 0, your original poly is degenerate and a normal will be hard to find. You could use arbitrary precision or adaptive precision algebra in this case, but, of course, the round-off error is already significant in the source data, so this may not help. If possible, remove degenerate polys first, and if you have to, sew the mesh back up :).
It's a bit ott but one way would be to compute the covariance matrix of the points, and then diagonalise that. If the points are indeed planar then one of the eigenvalues of the covariance matrix will be zero (or rather very small, due to finite precision arithmetic) and the corresponding eigenvector will be a normal to the plane; the other two eigenvectors will span the plane of the polygon.
If you have N points, and the i'th coordinate of the k'th point is p[k,i], then the mean (vector) and (3x3) covariance matrix can be computed by
m[i] = Sum{ k | p[k,i]}/N (i=1..3)
C[i,j] = Sum{ k | (p[k,i]-m[i])*(p[k,j]-m[j]) }/N (i,j=1..3)
Note that C is symmetric, so that to find how to diagonalise it you might want to look up the "symmetric eigenvalue problem"

Resources