Python: How to find the MAXIMUM spanning tree of a graph [duplicate]

Python: How to find the MAXIMUM spanning tree of a graph [duplicate] - python-3.x

Does the opposite of Kruskal's algorithm for minimum spanning tree work for it? I mean, choosing the max weight (edge) every step?
Any other idea to find maximum spanning tree?

Yes, it does.
One method for computing the maximum weight spanning tree of a network G –
due to Kruskal – can be summarized as follows.
Sort the edges of G into decreasing order by weight. Let T be the set of edges comprising the maximum weight spanning tree. Set T = ∅.
Add the first edge to T.
Add the next edge to T if and only if it does not form a cycle in T. If
there are no remaining edges exit and report G to be disconnected.
If T has n−1 edges (where n is the number of vertices in G) stop and
output T . Otherwise go to step 3.
Source: https://web.archive.org/web/20141114045919/http://www.stats.ox.ac.uk/~konis/Rcourse/exercise1.pdf.

From Maximum Spanning Tree at Wolfram MathWorld:
"A maximum spanning tree is a spanning tree of a weighted graph having maximum weight. It can be computed by negating the weights for each edge and applying Kruskal's algorithm (Pemmaraju and Skiena, 2003, p. 336)."

If you invert the weight on every edge and minimize, do you get the maximum spanning tree? If that works you can use the same algorithm. Zero weights will be a problem, of course.

Although this thread is too old, I have another approach for finding the maximum spanning tree (MST) in a graph G=(V,E)
We can apply some sort Prim's algorithm for finding the MST. For that I have to define Cut Property for the maximum weighted edge.
Cut property: Let say at any point we have a set S which contains the vertices that are in MST( for now assume it is calculated somehow ). Now consider the set S/V ( vertices not in MST ):
Claim: The edge from S to S/V which has the maximum weight will always be in every MST.
Proof: Let's say that at a point when we are adding the vertices to our set S the maximum weighted edge from S to S/V is e=(u,v) where u is in S and v is in S/V. Now consider an MST which does not contain e. Add the edge e to the MST. It will create a cycle in the original MST. Traverse the cycle and find the vertices u' in S and v' in S/V such that u' is the last vertex in S after which we enter S/V and v' is the first vertex in S/V on the path in cycle from u to v.
Remove the edge e'=(u',v') and the resultant graph is still connected but the weight of e is greater than e' [ as e is the maximum weighted edge from S to S/V at this point] so this results in an MST which has sum of weights greater than original MST. So this is a contradiction. This means that edge e must be in every MST.
Algorithm to find MST:
Start from S={s} //s is the start vertex
while S does not contain all vertices
do
{
for each vertex s in S
add a vertex v from S/V such that weight of edge e=(s,v) is maximum
}
end while
Implementation:
we can implement using Max Heap/Priority Queue where the key is the maximum weight of the edge from a vertex in S to a vertex in S/V and value is the vertex itself. Adding a vertex in S is equal to Extract_Max from the Heap and at every Extract_Max change the key of the vertices adjacent to the vertex just added.
So it takes m Change_Key operations and n Extract_Max operations.
Extract_Min and Change_Key both can be implemented in O(log n). n is the number of vertices.
So This takes O(m log n) time. m is the number of edges in the graph.

Let me provide an improvement algorithm:
first construct an arbitrary tree (using BFS or DFS)
then pick an edge outside the tree, add to the tree, it will form a cycle, drop the smallest weight edge in the cycle.
continue doing this util all the rest edges are considered
Thus, we'll get the maximum spanning tree.
This tree satisfies any edge outside the tree, if added will form a cycle and the edge outside <= any edge weights in the cycle
In fact, this is a necessary and sufficient condition for a spanning tree to be maximum spanning tree.
Pf.
Necessary: It's obvious that this is necessary, or we could swap edge to make a tree with a larger sum of edge weights.
Sufficient: Suppose tree T1 satisfies this condition, and T2 is the maximum spanning tree.
Then for the edges T1 ∪ T2, there're T1-only edges, T2-only edges, T1 ∩ T2 edges, if we add a T1-only edge(x1, xk) to T2, we know it will form a cycle, and we claim, in this cycle there must exist one T2-only edge that has the same edge weights as (x1, xk). Then we can exchange these edges will produce a tree with one more edge in common with T2 and has the same sum of edge weights, repeating doing this we'll get T2. so T1 is also a maximum spanning tree.
Prove the claim:
suppose it's not true, in the cycle we must have a T2-only edge since T1 is a tree. If none of the T2-only edges has a value equal to that of (x1, xk), then each of T2-only edges makes a loop with tree T1, then T1 has a loop leads to a contradiction.
This algorithm taken from UTD professor R. Chandrasekaran's notes. You can refer here: Single Commodity Multi-terminal Flows

Negate the weight of original graph and compute minimum spanning tree on the negated graph will give the right answer. Here is why: For the same spanning tree in both graphs, the weighted sum of one graph is the negation of the other. So the minimum spanning tree of the negated graph should give the maximum spanning tree of the original one.

Only reversing the sorting order, and choosing a heavy edge in a vertex cut does not guarantee a Maximum Spanning Forest (Kruskal's algorithm generates forest, not tree). In case all edges have negative weights, the Max Spanning Forest obtained from reverse of kruskal, would still be a negative weight path. However the ideal answer is a forest of disconnected vertices. i.e. a forest of |V| singleton trees, or |V| components having total weight of 0 (not the least negative).

Change the weight in a reserved order(You can achieve this by taking a negative weight value and add a large number, whose purpose is to ensure non-negative) Then run your family geedy-based algorithm on the minimum spanning tree.

Related

In case of a given graph , Is that possible to build trapezoidal map in linear time

[This regarding to Computational geometry in CS]
Let's say that I have a graph G which contains v vectices and e edges, For instance a veronoi diagram VD(G).
[I'd like to build a trapezodial map out of my given graph,][1]
Is that possible to build trapezodial map in linear time for a given graph, Instead of O(nlogn) regular construction time ?
I have been thinking about sweep line trapezoidal map construction where for each edge during the sweep line would construct the upper and lower sites.
Thanks in advanced

No, the graph may consist of v/2 horizontal segments stacked on top of each other. Building the trapezoidal map means you sort these segments by height, and that takes at least c v log v time.

Calculating the Held Karp Lower bound For The Traveling Salesman(TSP)

I am currently researching the traveling salesman problem, and was wondering if anyone would be able to simply explain the held karp lower bound. I have been looking at a lot of papers and i am struggling to understand it. If someone could simply explain it that would be great.
I also know there is the method of calculating a minimum spanning tree of the vertices not including the starting vertex and then adding the two minimum edges from the starting vertex.

I'll try to explain this without going in too much details. I'll avoid formal proofs and I'll try to avoid technical jargon. However, you might want to go over everything again once you have read everything through. And most importantly; try the algorithms out yourself.
Introduction
A 1-tree is a tree that consists of a vertex attached with 2 edges to a spanning tree. You can check for yourself that every TSP tour is a 1-Tree.
There is also such a thing as a minimum-1-Tree of a graph. That is the resulting tree when you follow this algorithm:
Exclude a vertex from your graph
Calculate the minimum spanning tree of the resulting graph
Attach the excluded vertex with it's 2 smallest edges to the minimum spanning tree
*For now I'll assume that you know that a minimum-1-tree is a lower bound for the optimal TSP tour. There is an informal proof at the end.
You will find that the resulting tree is different when you exclude different vertices. However all of the resulting trees can be considered lower bounds for the optimal tour in the TSP. Therefore the largest of the minimum-1-trees you have found this way is a better lower bound then the others found this way.
Held-Karp lower bound
The Held-Karp lower bound is an even tighter lower bound.
The idea is that you can alter the original graph in a special way. This modified graph will generate different minimum-1-trees then the original.
Furthermore (and this is important so I'll repeat it throughout this paragraph with different words), the modification is such that the length of all the valid TSP tours are modified by the same (known) constant. In other words, the length of a valid TSP solution in this new graph = the length of a valid solution in the original graph plus a known constant. For example: say the weight of the TSP tour visiting vertices A, B, C and D in that order in the original graph = 10. Then the weight of the TSP tour visiting the same vertices in the same order in the modified graph = 10 + a known constant.
This, of course, is true for the optimal TSP tour as well. Therefore the optimal TSP tour in the modified graph is also an optimal tour in the original graph. And a minimum-1-Tree of the modified graph is a lower bound for the optimal tour in the modified graph. Again, I'll just assume you understand that this generates a lower bound for your modified graph's optimal TSP tour. By substracting another known constant from the found lower bound of your modified graph, you have a new lower bound for your original graph.
There are infinitly many of such modifications to your graph. These different modifications result in different lower bounds. The tightest of these lower bounds is the Held-Karp lower bound.
How to modify your graph
Now that I have explained what the Held-Karp lower bound is, I will show you how to modify your graph to generate different minimum-1-trees.
Use the following algorithm:
Give every vertex in your graph an arbitrary weight
update the weight of every edge as follows: new edge weight = edge weight + starting vertex weight + ending vertex weight
For example, your original graph has the vertices A, B and C with edge AB = 3, edge AC = 5 and edge BC = 4. And for the algorithm you assign the (arbitrary) weights to the vertices A: 30, B: 40, C:50 then the resulting weights of the edges in your modified graph are AB = 3 + 30 + 40 = 73, AC = 5 + 30 + 50 = 85 and BC = 4 + 40 + 50 = 94.
The known constant for the modification is twice the sum of the weights given to the vertices. In this example the known constant is 2 * (30 + 40 + 50) = 240. Note: the tours in the modified graph are thus equal to the original tours + 240. In this example there is only one tour namely ABC. The tour in the original graph has a length of 3 + 4 + 5 = 12. The tour in the modified graph has a length of 73 + 85 + 94 = 252, which is indeed 240 + 12.
The reason why the constant equals twice the sum of the weights given to the vertices is because every vertex in a TSP tour has degree 2.
You will need another known constant. The constant you substract from your minimum-1-tree to get a lower bound. This depends on the degree of the vertices of your found minimum-1-tree. You will need to multiply the weight you have given each vertex by the degree of the vertex in that minimum-1-tree. And add that all up. For example if you have given the following weights A: 30, B:40, C:50, D:60 and in your minimum spanning tree vertex A has degree 1, vertex B and C have degree 2, vertex D has degree 3 then your constant to substract to get a lower bound = 1 * 30 + 2 * 40 + 2 * 50 + 3 * 60 = 390.
How to find the Held-Karp lower bound
Now I believe there is one more question unanswered: how do I find the best modification to my graph, so that I get the tightest lower bound (and thus the Held-Karp lower bound)?
Well, that's the hard part. Without delving too deep: there are ways to get closer and closer to the Held-Karp lower bound. Basicly one can keep modifying the graph such that the degree of all vertices get closer and closer to 2. And thus closer and closer to a real tour.
Minimum-1-tree is a lower bound
As promised I would give an informal proof that a minimum-1-tree is a lower bound for the optimal TSP solution. A minimum-1-Tree is made of two parts: a minimum-spanning-tree and a vertex attached to it with 2 edges. A TSP tour must go through the vertex attached to the minimum spanning tree. The shortest way to do so is through the attached edges. The tour must also visit all the vertices in the minimum spanning tree. That minimum spanning tree is a lower bound for the optimal TSP for the graph excluding the attached vertex. Combining these two facts one can conclude that the minimum-1-tree is a lower bound for the optimal TSP tour.
Conclusion
When you modify a graph in a certain way and find the minimum-1-Tree of this modified graph to calculate a lower bound. The best possible lower bound through these means is the Held-Karp lower bound.
I hope this answers your question.
Links
For a more formal approach and additional information I recommend the following links:
ieor.berkeley.edu/~kaminsky/ieor251/notes/3-16-05.pdf
http://www.sciencedirect.com/science/article/pii/S0377221796002147

KD Tree alternative/variant for weighted data

I'm using a static KD-Tree for nearest neighbor search in 3D space. However, the client's specifications have now changed so that I'll need a weighted nearest neighbor search instead. For example, in 1D space, I have a point A with weight 5 at 0, and a point B with weight 2 at 4; the search should return A if the query point is from -5 to 5, and should return B if the query point is from 5 to 6. In other words, the higher-weighted point takes precedence within its radius.
Google hasn't been any help - all I get is information on the K-nearest neighbors algorithm.
I can simply remove points that are completely subsumed by a higher-weighted point, but this generally isn't the case (usually a lower-weighted point is only partially subsumed, like in the 1D example above). I could use a range tree to query all points in an NxNxN cube centered on the query point and determine the one with the greatest weight, but the naive implementation of this is wasteful - I'll need to set N to the point with the maximum weight in the entire tree, even though there may not be a point with that weight within the cube, e.g. let's say the point with the maximum weight in the tree is 25, then I'll need to set N to 25 even though the point with the highest weight for any given cube probably has a much lower weight; in the 1D case, if I have a point located at 100 with weight 25 then my naive algorithm would need to set N to 25 even if I'm outside of the point's radius.
To sum up, I'm looking for a way that I can query the KD tree (or some alternative/variant) such that I can quickly determine the highest-weighted point whose radius covers the query point.
FWIW, I'm coding this in Java.
It would also be nice if I could dynamically change a point's weight without incurring too high of a cost - at present this isn't a requirement, but I'm expecting that it may be a requirement down the road.
Edit: I found a paper on a priority range tree, but this doesn't exactly address the same problem in that it doesn't account for higher-priority points having a greater radius.

Use an extra dimension for the weight. A point (x,y,z) with weight w is placed at (N-w,x,y,z), where N is the maximum weight.
Distances in 4D are defined by…
d((a, b, c, d), (e, f, g, h)) = |a - e| + d((b, c, d), (f, g, h))
…where the second d is whatever your 3D distance was.
To find all potential results for (x,y,z), query a ball of radius N about (0,x,y,z).

I think I've found a solution: the nested interval tree, which is an implementation of a 3D interval tree. Rather than storing points with an associated radius that I then need to query, I instead store and query the radii directly. This has the added benefit that each dimension does not need to have the same weight (so that the radius is a rectangular box instead of a cubic box), which is not presently a project requirement but may become one in the future (the client only recently added the "weighted points" requirement, who knows what else he'll come up with).

Place rectangle maximizing rectangle intersections [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I have this problem: given a set of rectangles {R1,R2...Rn}, and a new rectangle Rq, find where to put Rq so it intersects (it does not matter how much area) the maximum number of the rectangles in the set. I´m searching for a simple resolution not involving too complex data structures, however, any working answer will be very appreciated, thanks!

Here is a O(n^2) or O(n (log n + m)) (average case only) complexity algorithm (where m is the maximum number of intersecting rectangles). See O(n log n) method later on.
The first idea, is to consider that there are n places to look for solutions. That is, for each rectangle Ri, the case where Ri is the right-most rectangle that intersects Rq is a candidate solution.
You will scan from left to right. Adding the rectangles in order of minimum x-coordinate to a buffer container, and also to a priority queue (based on maximum x-coordinate). As you add each rectangle, also check which rectangles need to be removed (based on the priority queue) so that the range of rectangles in the buffer can be intersected by the rectangle Rq if you ignore the y-axis for now.
Before adding each rectangle Ri to the buffer, you can consider how many rectangles in the buffer can be intersected if you require Rq to intersect Ri (with Ri being the right-most intersecting rectangle with Rq). This can be done in linear time.
Thus overall complexity is O(n^2).
The runtime of the algorithm can be improved, by using an interval tree for any rectangle nodes stored in the buffer. Since Wikipedia gives a good amount of information on this, and how it may be implemented, I will not explain how it works here, particularly because the balancing of it can be quite complex. This will improve the average complexity to O(n (log n + m)), where m is the answer - the maximum number of intersecting rectangles (which could still be as large as n in the worst case). This is due to the algorithmic complexity of querying an interval tree (O(log n + m) time per query). The worst case is still O(n^2), because m for the number of results returned (for interval trees), may be up to O(n), even if the answer is not on the order of n.
Here follows a method with O(n log n) time, but is quite complex.
Instead of storing rectangles as a sorted array, self balancing binary search tree (based on only minimum y), or an interval tree, the rectangles can also be stored in a custom data structure.
Consider the self balancing binary search tree (based on maximum y). We can add some metadata to this. Notably, since we try to maximize the rectangles intersected in the buffer, and keep looking for how the maximum changes as the buffer scans from left to right, we can consider the number of rectangles intersected if each rectangle in the buffer is the bottom-most rectangle.
If we can store the number of rectangles that is, we might be able to query it faster.
The solution is to store it in a different way, to reduce update costs. A binary search tree has a number of edges. Suppose each node should know how many rectangles can be intersected if that node is the bottom-most rectangle to be intersected. Instead, we can store the change in the number of rectangles intersected by the nodes on either side of each edge (call this the diff). For example, for the following tree (with min & max y of the rectangles):
(3,4)
/ \
(2,3) (5,6)
For Rq with a height of 2, if (2,3) is bottom-most, we can intersect 2 rectangles in total, for (3,4), it is 2, and for (5,6), it is 1.
Thus, the edges' diff can be 0 and -1, representing the change in the answer from child to parent.
Using the weights, we can quickly find the highest number, if each node also stores the maximum sum of a subpath in its subtree (call this maxdiff). For the leaves, it is 0, but for (3,4), the maximum of 0 and -1 is 0, thus, we can store 0 as the maximum sum of the edges of a subpath in the root node.
We therefore know that the optimal answer is 0 more than the answer of the root node. We can find the answer of the root node, but keeping more metadata.
However, all this metadata can be queried and updated in log n time. This is because when we add a rectangle, adding it to the self balancing binary search tree takes log n time, as we may need to do up to log n rotations. When we do a rotation, we may need to update the weights on the edges, as well as maxdiff on the affected nodes - however this is O(1) for each rotation.
In addition, the effect of the extra rectangle on the diff of the edges needs to be updated. However, at most O(log n) edges must be updated, specifically, if we search the tree by the min y and max y of the new rectangle, we will find all the edges that need to be updated (although whether an edge that is traversed must be updated depends on whether we took the left or right child - some careful working is needed here to determine the exact rule).
Deleting a rectangle is just the reverse, so takes O(log n) time.
In this way, we can update the buffer in O(log n) time, and query it in O(log n) time, for an overall algorithmic complexity of O(n log n).
Note: to find the maximum rectangles intersected in the buffer, we use maxdiff, which is the difference in answer between optimal, and that of the root node. To find the actual answer, we must find the difference between the answer of the lowest ordered rectangle in the binary search tree, and the root node. This is easily done in O(log n) time, by going down the left-most path, and summing up diff on the edges. Finally, find the answer on the lowest ordered rectangle. This is done by storing a 2nd self balancing binary search tree, ordered by minimum y. We can use this to find the number of rectangles intersected if the bottom-most rectangle is the lowest ordered rectangle when sorting by maximum y.
Update of this extra BST and querying also take O(log n) time, so this extra work does not change the overall complexity.

Does an efficient algorithm exist to determine the points of intersection between the edges of two possibly non-convex polygons?

Here's the task I'm trying to solve:
Given a polygon A with N vertices and a polygon B with M vertices, find all intersections between a segment in A and a segment in B.
Both A and B may be non-convex.
So far, I have implemented the obvious solution(check every edge in A with every edge in B, O(M*N)).
Now, for certain polygons it is in fact possible that there are (almost) M*N intersections, so the worst case for any such algorithm is O(M*N).
My question is:
Does there exist an algorithm for determining the points of intersection between two non-convex polygons with complexity in the average case that is lower than O(N*M)
If yes, then please give me the name of the algorithm, if no - some resource that proves it to be impossible.

Excerpt from a paper on the Greiner-Hormann (PDF) polygon clipping algorithm:
... if we have a
polygon with n edges and another with
m edges, the number of intersections
can be nm in the worst case. So the
average number of intersections grows
on the order of O(nm).
There is a
well-known result in computational
geometry based on the plane sweep
algorithm, which says that if there
are N line segments generating k
intersections, then these
intersections can be reported in time
O((N+k) log(N)) [7]. Note that this
relation yields an even worse
complexity in the worst case.
I believe N in the second paragraph is m + n from the first paragraph. The average time depends the average value of k, the number of intersections. If there are only a handful of intersections the time goes to O(N log N).
The reference to the "well-known" result is:
[7] F. P. Preparata and M. I. Shamos.
Computational Geometry: An
Introduction. Texts and Monographs in
Computer Science. Springer, New York,
1985.
Here's a paper on the line sweep algorithm.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string