Rope data structure weights are characters in node plus weight of left subtree or both left and right subtrees? - string

Wikipedia entry says:
Each node has a "weight" equal to the length of its string plus the sum of all the weights in its left subtree. Thus a node with two children divides the whole string into two parts: the left subtree stores the first part of the string. The right subtree stores the second part and its weight is the sum of the two parts.
I'm a bit confused, it says first that a nodes weight is the length of its string plus the sum of all the weights in its left subtree. Then it says if a node has two children (and thus a left and a right subtree), that the weight is the sum of both parts, and not just the left subtree. Looking at the diagram makes sense (the 9 directly below the 22 is a 9 and not larger because the right child/subtree of 7 does not contribute to the weight) but the phrasing seems off to me or am I misunderstanding something?

Yeah, the phrasing is off. The "weight" is the partition point, so it only includes the left substring (or the included string, if that's what you have instead).
You don't need to store the total length of a node, but modifying the rope requires that all parent nodes be notified of the change (which should be O(log n), so that's ok.)

Related

Kademlia The relationship between distance and height of the smallest subtree

I was looking at Kademlia's paper, and I had a problem I couldn't understand.
In a fully-populated binary tree of 160-bit IDs, the magnitude of the distance between two IDs is the height of the smallest subtree containing them both.
d(101,010) = 5 ^ 2 = 7
but Lowest Common Ancestor height is 4:Count from one or 3:Count from zero (root)
This result is obviously wrong, and I must have something wrong, so how should I interpret this sentence
I am looking forward to your reply. Thank you
Pseudo Reliable Broadcast in the Kademlia P2P
System
Kademlia, in turn, organizes its nodes to a binary tree.
(For an in-depth discussion of the internal mechanisms of
Kademlia, please refer to [2].) Distance between nodes is
calculated using the XOR (exclusive or) function, which
essentially captures the idea of the binary tree topology. For
any nodes A and B, the magnitude of their distance
d(A,B)=AB, e.g. the most significant nonzero bit of d is the
height of the smallest subtree containing both of them.
Kademlia: A Peer-to-peer Information System
Based on the XOR Metric
We next note that XOR captures the notion of distance implicit in our binarytree-based sketch of the system. In a fully-populated binary tree of 160-bit IDs,
the magnitude of the distance between two IDs is the height of the smallest
subtree containing them both. When a tree is not fully populated, the closest
leaf to an ID x is the leaf whose ID shares the longest common prefix of x. If
there are empty branches in the tree, there might be more than one leaf with the
longest common prefix. In that case, the closest leaf to x will be the closest leaf
to ID x˜ produced by flipping the bits in x corresponding to the empty branches
of the tree.
That sentence is talking about the magnitude of the distance, not the exact distance. The exact distance is simply the XOR between both addresses.
In the particular case of 101 and 010 the distance is 111, the maximal possible distance, thus they share no common subtree other than the whole tree itself and thus the magnitude is 3 bits (assuming a 3bit-keyspace) which is also the maximal height. The equivalent in CIDR subnetting would be the /0 mask, i.e. 0 shared prefix bits.

Quickest way to find closest set of point

I have three arrays of points:
A=[[5,2],[1,0],[5,1]]
B=[[3,3],[5,3],[1,1]]
C=[[4,2],[9,0],[0,0]]
I need the most efficient way to find the three points (one for each array) that are closest to each other (within one pixel in each axis).
What I'm doing right now is taking one point as reference, let's say A[0], and cycling all other B and C points looking for a solution. If A[0] gives me no result I'll move the reference to A[1] and so on. This approach as a huge problem because if I increase the number of points for each array and/or the number of arrays it requires too much time to converge some times, especially if the solution is in the last members of the arrays. So I'm wondering if there is any way to do this without maybe using a reference, or any quicker way than just looping all over the elements.
The rules that I must follow are the following:
the final solution has to be made by only one element from each array like: S=[A[n],B[m],C[j]]
each selected element has to be within 1 pixel in X and Y from ALL the other members of the solution (so Xi-Xj<=1 and Yi-Yj<=1 for each member of the solution).
For example in this simplified case the solution would be: S=[A[1],B[2],C[1]]
To clarify further the problem: what I wrote above it's just a simplify example to explain what I need. In my real case I don't know a priori the length of the lists nor the number of lists I have to work with, could be A,B,C, or A,B,C,D,E... (each of one with different number of points) etc. So I also need to find a way to make it as general as possible.
This requirement:
each selected element has to be within 1 pixel in X and Y from ALL the other members of the solution (so Xi-Xj<=1 and Yi-Yj<=1 for each member of the solution).
massively simplifies the problem, because it means that for any given (xi, yi), there are only nine possible choices of (xj, yj).
So I think the best approach is as follows:
Copy B and C into sets of tuples.
Iterate over A. For each point (xi, yi):
Iterate over the values of x from xi−1 to xi+1 and the values of y from yi−1 to yi+1. For each resulting point (xj, yj):
Check if (xj, yj) is in B. If so:
Iterate over the values of x from max(xi, xj)−1 to min(xi, xj)+1 and the values of y from max(yi, yj)−1 to min(yi, yj)+1. For each resulting point (xk, yk):
Check if (xk, yk) is in C. If so, we're done!
If we get to the end without having a match, that means there isn't one.
This requires roughly O(len(A) + len(B) + len(C)) time and O(len(B) + len(C) extra space.
Edited to add (due to a follow-up question in the comments): if you have N lists instead of just 3, then instead of nesting N loops deep (which gives time exponential in N), you'll want to do something more like this:
Copy B, C, etc., into sets of tuples, as above.
Iterate over A. For each point (xi, yi):
Create a set containing (xi, yi) and its eight neighbors.
For each of the lists B, C, etc.:
For each element in the set of nine points, see if it's in the current list.
Update the set to remove any points that aren't in the current list and don't have any neighbors in the current list.
If the set still has at least one element, then — great, each list contained a point that's within one pixel of that element (with all of those points also being within one pixel of each other). So, we're done!
If we get to the end without having a match, that means there isn't one.
which is much more complicated to implement, but is linear in N instead of exponential in N.
Currently, you are finding the solution with a bruteforce algorithm which has a O(n2) complexity. If your lists contains 1000 items, your algo will need 1000000 iterations to run... (It's even O(n3) as tobias_k pointed out)
Like you can see there: https://en.wikipedia.org/wiki/Closest_pair_of_points_problem, you could improve it by using a divide and conquer algorithm, which would run in a O(n log n) time.
You should search for Delaunay triangulation and/or Voronoi diagram implementations.
NB: if you can use external libs, you should also consider taking a look at the scipy lib: https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.spatial.Delaunay.html

Calculating the Held Karp Lower bound For The Traveling Salesman(TSP)

I am currently researching the traveling salesman problem, and was wondering if anyone would be able to simply explain the held karp lower bound. I have been looking at a lot of papers and i am struggling to understand it. If someone could simply explain it that would be great.
I also know there is the method of calculating a minimum spanning tree of the vertices not including the starting vertex and then adding the two minimum edges from the starting vertex.
I'll try to explain this without going in too much details. I'll avoid formal proofs and I'll try to avoid technical jargon. However, you might want to go over everything again once you have read everything through. And most importantly; try the algorithms out yourself.
Introduction
A 1-tree is a tree that consists of a vertex attached with 2 edges to a spanning tree. You can check for yourself that every TSP tour is a 1-Tree.
There is also such a thing as a minimum-1-Tree of a graph. That is the resulting tree when you follow this algorithm:
Exclude a vertex from your graph
Calculate the minimum spanning tree of the resulting graph
Attach the excluded vertex with it's 2 smallest edges to the minimum spanning tree
*For now I'll assume that you know that a minimum-1-tree is a lower bound for the optimal TSP tour. There is an informal proof at the end.
You will find that the resulting tree is different when you exclude different vertices. However all of the resulting trees can be considered lower bounds for the optimal tour in the TSP. Therefore the largest of the minimum-1-trees you have found this way is a better lower bound then the others found this way.
Held-Karp lower bound
The Held-Karp lower bound is an even tighter lower bound.
The idea is that you can alter the original graph in a special way. This modified graph will generate different minimum-1-trees then the original.
Furthermore (and this is important so I'll repeat it throughout this paragraph with different words), the modification is such that the length of all the valid TSP tours are modified by the same (known) constant. In other words, the length of a valid TSP solution in this new graph = the length of a valid solution in the original graph plus a known constant. For example: say the weight of the TSP tour visiting vertices A, B, C and D in that order in the original graph = 10. Then the weight of the TSP tour visiting the same vertices in the same order in the modified graph = 10 + a known constant.
This, of course, is true for the optimal TSP tour as well. Therefore the optimal TSP tour in the modified graph is also an optimal tour in the original graph. And a minimum-1-Tree of the modified graph is a lower bound for the optimal tour in the modified graph. Again, I'll just assume you understand that this generates a lower bound for your modified graph's optimal TSP tour. By substracting another known constant from the found lower bound of your modified graph, you have a new lower bound for your original graph.
There are infinitly many of such modifications to your graph. These different modifications result in different lower bounds. The tightest of these lower bounds is the Held-Karp lower bound.
How to modify your graph
Now that I have explained what the Held-Karp lower bound is, I will show you how to modify your graph to generate different minimum-1-trees.
Use the following algorithm:
Give every vertex in your graph an arbitrary weight
update the weight of every edge as follows: new edge weight = edge weight + starting vertex weight + ending vertex weight
For example, your original graph has the vertices A, B and C with edge AB = 3, edge AC = 5 and edge BC = 4. And for the algorithm you assign the (arbitrary) weights to the vertices A: 30, B: 40, C:50 then the resulting weights of the edges in your modified graph are AB = 3 + 30 + 40 = 73, AC = 5 + 30 + 50 = 85 and BC = 4 + 40 + 50 = 94.
The known constant for the modification is twice the sum of the weights given to the vertices. In this example the known constant is 2 * (30 + 40 + 50) = 240. Note: the tours in the modified graph are thus equal to the original tours + 240. In this example there is only one tour namely ABC. The tour in the original graph has a length of 3 + 4 + 5 = 12. The tour in the modified graph has a length of 73 + 85 + 94 = 252, which is indeed 240 + 12.
The reason why the constant equals twice the sum of the weights given to the vertices is because every vertex in a TSP tour has degree 2.
You will need another known constant. The constant you substract from your minimum-1-tree to get a lower bound. This depends on the degree of the vertices of your found minimum-1-tree. You will need to multiply the weight you have given each vertex by the degree of the vertex in that minimum-1-tree. And add that all up. For example if you have given the following weights A: 30, B:40, C:50, D:60 and in your minimum spanning tree vertex A has degree 1, vertex B and C have degree 2, vertex D has degree 3 then your constant to substract to get a lower bound = 1 * 30 + 2 * 40 + 2 * 50 + 3 * 60 = 390.
How to find the Held-Karp lower bound
Now I believe there is one more question unanswered: how do I find the best modification to my graph, so that I get the tightest lower bound (and thus the Held-Karp lower bound)?
Well, that's the hard part. Without delving too deep: there are ways to get closer and closer to the Held-Karp lower bound. Basicly one can keep modifying the graph such that the degree of all vertices get closer and closer to 2. And thus closer and closer to a real tour.
Minimum-1-tree is a lower bound
As promised I would give an informal proof that a minimum-1-tree is a lower bound for the optimal TSP solution. A minimum-1-Tree is made of two parts: a minimum-spanning-tree and a vertex attached to it with 2 edges. A TSP tour must go through the vertex attached to the minimum spanning tree. The shortest way to do so is through the attached edges. The tour must also visit all the vertices in the minimum spanning tree. That minimum spanning tree is a lower bound for the optimal TSP for the graph excluding the attached vertex. Combining these two facts one can conclude that the minimum-1-tree is a lower bound for the optimal TSP tour.
Conclusion
When you modify a graph in a certain way and find the minimum-1-Tree of this modified graph to calculate a lower bound. The best possible lower bound through these means is the Held-Karp lower bound.
I hope this answers your question.
Links
For a more formal approach and additional information I recommend the following links:
ieor.berkeley.edu/~kaminsky/ieor251/notes/3-16-05.pdf
http://www.sciencedirect.com/science/article/pii/S0377221796002147

KD Tree alternative/variant for weighted data

I'm using a static KD-Tree for nearest neighbor search in 3D space. However, the client's specifications have now changed so that I'll need a weighted nearest neighbor search instead. For example, in 1D space, I have a point A with weight 5 at 0, and a point B with weight 2 at 4; the search should return A if the query point is from -5 to 5, and should return B if the query point is from 5 to 6. In other words, the higher-weighted point takes precedence within its radius.
Google hasn't been any help - all I get is information on the K-nearest neighbors algorithm.
I can simply remove points that are completely subsumed by a higher-weighted point, but this generally isn't the case (usually a lower-weighted point is only partially subsumed, like in the 1D example above). I could use a range tree to query all points in an NxNxN cube centered on the query point and determine the one with the greatest weight, but the naive implementation of this is wasteful - I'll need to set N to the point with the maximum weight in the entire tree, even though there may not be a point with that weight within the cube, e.g. let's say the point with the maximum weight in the tree is 25, then I'll need to set N to 25 even though the point with the highest weight for any given cube probably has a much lower weight; in the 1D case, if I have a point located at 100 with weight 25 then my naive algorithm would need to set N to 25 even if I'm outside of the point's radius.
To sum up, I'm looking for a way that I can query the KD tree (or some alternative/variant) such that I can quickly determine the highest-weighted point whose radius covers the query point.
FWIW, I'm coding this in Java.
It would also be nice if I could dynamically change a point's weight without incurring too high of a cost - at present this isn't a requirement, but I'm expecting that it may be a requirement down the road.
Edit: I found a paper on a priority range tree, but this doesn't exactly address the same problem in that it doesn't account for higher-priority points having a greater radius.
Use an extra dimension for the weight. A point (x,y,z) with weight w is placed at (N-w,x,y,z), where N is the maximum weight.
Distances in 4D are defined by…
d((a, b, c, d), (e, f, g, h)) = |a - e| + d((b, c, d), (f, g, h))
…where the second d is whatever your 3D distance was.
To find all potential results for (x,y,z), query a ball of radius N about (0,x,y,z).
I think I've found a solution: the nested interval tree, which is an implementation of a 3D interval tree. Rather than storing points with an associated radius that I then need to query, I instead store and query the radii directly. This has the added benefit that each dimension does not need to have the same weight (so that the radius is a rectangular box instead of a cubic box), which is not presently a project requirement but may become one in the future (the client only recently added the "weighted points" requirement, who knows what else he'll come up with).

Place rectangle maximizing rectangle intersections [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I have this problem: given a set of rectangles {R1,R2...Rn}, and a new rectangle Rq, find where to put Rq so it intersects (it does not matter how much area) the maximum number of the rectangles in the set. I´m searching for a simple resolution not involving too complex data structures, however, any working answer will be very appreciated, thanks!
Here is a O(n^2) or O(n (log n + m)) (average case only) complexity algorithm (where m is the maximum number of intersecting rectangles). See O(n log n) method later on.
The first idea, is to consider that there are n places to look for solutions. That is, for each rectangle Ri, the case where Ri is the right-most rectangle that intersects Rq is a candidate solution.
You will scan from left to right. Adding the rectangles in order of minimum x-coordinate to a buffer container, and also to a priority queue (based on maximum x-coordinate). As you add each rectangle, also check which rectangles need to be removed (based on the priority queue) so that the range of rectangles in the buffer can be intersected by the rectangle Rq if you ignore the y-axis for now.
Before adding each rectangle Ri to the buffer, you can consider how many rectangles in the buffer can be intersected if you require Rq to intersect Ri (with Ri being the right-most intersecting rectangle with Rq). This can be done in linear time.
Thus overall complexity is O(n^2).
The runtime of the algorithm can be improved, by using an interval tree for any rectangle nodes stored in the buffer. Since Wikipedia gives a good amount of information on this, and how it may be implemented, I will not explain how it works here, particularly because the balancing of it can be quite complex. This will improve the average complexity to O(n (log n + m)), where m is the answer - the maximum number of intersecting rectangles (which could still be as large as n in the worst case). This is due to the algorithmic complexity of querying an interval tree (O(log n + m) time per query). The worst case is still O(n^2), because m for the number of results returned (for interval trees), may be up to O(n), even if the answer is not on the order of n.
Here follows a method with O(n log n) time, but is quite complex.
Instead of storing rectangles as a sorted array, self balancing binary search tree (based on only minimum y), or an interval tree, the rectangles can also be stored in a custom data structure.
Consider the self balancing binary search tree (based on maximum y). We can add some metadata to this. Notably, since we try to maximize the rectangles intersected in the buffer, and keep looking for how the maximum changes as the buffer scans from left to right, we can consider the number of rectangles intersected if each rectangle in the buffer is the bottom-most rectangle.
If we can store the number of rectangles that is, we might be able to query it faster.
The solution is to store it in a different way, to reduce update costs. A binary search tree has a number of edges. Suppose each node should know how many rectangles can be intersected if that node is the bottom-most rectangle to be intersected. Instead, we can store the change in the number of rectangles intersected by the nodes on either side of each edge (call this the diff). For example, for the following tree (with min & max y of the rectangles):
(3,4)
/ \
(2,3) (5,6)
For Rq with a height of 2, if (2,3) is bottom-most, we can intersect 2 rectangles in total, for (3,4), it is 2, and for (5,6), it is 1.
Thus, the edges' diff can be 0 and -1, representing the change in the answer from child to parent.
Using the weights, we can quickly find the highest number, if each node also stores the maximum sum of a subpath in its subtree (call this maxdiff). For the leaves, it is 0, but for (3,4), the maximum of 0 and -1 is 0, thus, we can store 0 as the maximum sum of the edges of a subpath in the root node.
We therefore know that the optimal answer is 0 more than the answer of the root node. We can find the answer of the root node, but keeping more metadata.
However, all this metadata can be queried and updated in log n time. This is because when we add a rectangle, adding it to the self balancing binary search tree takes log n time, as we may need to do up to log n rotations. When we do a rotation, we may need to update the weights on the edges, as well as maxdiff on the affected nodes - however this is O(1) for each rotation.
In addition, the effect of the extra rectangle on the diff of the edges needs to be updated. However, at most O(log n) edges must be updated, specifically, if we search the tree by the min y and max y of the new rectangle, we will find all the edges that need to be updated (although whether an edge that is traversed must be updated depends on whether we took the left or right child - some careful working is needed here to determine the exact rule).
Deleting a rectangle is just the reverse, so takes O(log n) time.
In this way, we can update the buffer in O(log n) time, and query it in O(log n) time, for an overall algorithmic complexity of O(n log n).
Note: to find the maximum rectangles intersected in the buffer, we use maxdiff, which is the difference in answer between optimal, and that of the root node. To find the actual answer, we must find the difference between the answer of the lowest ordered rectangle in the binary search tree, and the root node. This is easily done in O(log n) time, by going down the left-most path, and summing up diff on the edges. Finally, find the answer on the lowest ordered rectangle. This is done by storing a 2nd self balancing binary search tree, ordered by minimum y. We can use this to find the number of rectangles intersected if the bottom-most rectangle is the lowest ordered rectangle when sorting by maximum y.
Update of this extra BST and querying also take O(log n) time, so this extra work does not change the overall complexity.

Resources