How to split a rope tree?

How to split a rope tree? - string

I came across a rope tree as an alternative data structure for a string.
http://en.wikipedia.org/wiki/Rope_(data_structure)
To concat is easy, but I am stuck on the split operation. The wikipedia article states:
For example, to split the 22-character rope pictured in Figure 2.3 into two equal component ropes of length 11, query the 12th character to locate the node K at the bottom level. Remove the link between K and the right child of G. Go to the parent G and subtract the weight of K from the weight of G. Travel up the tree and remove any right links, subtracting the weight of K from these nodes (only node D, in this case). Finally, build up the newly-orphaned nodes K and H by concatenating them together and creating a new parent P with weight equal to the length of the left node K.
Locating the character and recombining the orphans is no problem. But I don't understand the "Travel up the tree and remove any right links, subtracting the weight of K from these nodes". The example stops at D, but if you follow these instructions verbatim you would continue onto B and remove D as well. What is the correct stopping requirement in this algorithm? And how do you avoid nodes with only one (left or right) child?
A pseudo-code algorithm explaining this part would help tremendously.

The wikipedia article is not very explicit. If the current node is X and its parent is Y you would only travel up if X is the left child of Y. Visually you're going up and to the right as far as you can.

I give Ruby code below. It's as close to executable pseudocode as you'll get. If Ruby isn't right for your implementation, you can always use it as a prototype. If you don't like the recursion, it's easy enough to use standard transformations to make it iterative with an explicit stack.
The natural implementation of split is recursive. The interesting case is near the bottom of the code.
class Rope
# Cat two ropes by building a new binary node.
# The parent count is the left child's length.
def cat(s)
if self.len == 0
s
elsif s.len == 0
self
else
Node.new(self, s, len)
end
end
# Insert a new string into a rope by splitting it
# and concatenating twice with a new leaf in the middle.
def insert(s, p)
a, b = split_at(p)
a.cat(Leaf.new(s)).cat(b)
end
end
class Leaf < Rope
# A leaf holds characters as a string.
attr_accessor :chars
# Construct a new leaf with given characters.
def initialize(chars)
#chars = chars
end
# The count in this rope is just the number of characters.
def count
chars.length
end
# The length is kind of obvious.
def len
chars.length
end
# Convert to a string by just returning the characters.
def to_s
chars
end
# Split by dividing the string.
def split_at(p)
[ Leaf.new(chars[0...p]), Leaf.new(chars[p..-1]) ]
end
end
class Node < Rope
# Fields of the binary node.
attr_accessor :left, :right, :count
# Construct a new binary node.
def initialize(left, right, count)
#left = left
#right = right
#count = count
end
# Length is our count plus right subtree length. Memoize for efficiency.
def len
#len ||= count + right.len
end
# The string rep is just concatenating the children's string reps.
def to_s
left.to_s + right.to_s
end
# Split recursively by splitting the left or right
# subtree and recombining the results.
def split_at(p)
if p < count
a, b = left.split_at(p)
[ a, b.cat(right) ]
elsif p > count
a, b = right.split_at(p - count)
[ left.cat(a), b ]
else
[ left, right ]
end
end
end

After some tinkering and consideration, I think the rules should be this:
First determine the starting point for traveling upwards.
A) If you end up in the middle of a node (node A), split the string at the right character index and create a left and right node. The parents of these new nodes is node A. The left node is your starting point. The right node will be added to the orphans while traveling up.
B) if you end up at the beginning of a node (character wise) and this node is a right node: split of this node (=> orphan node) and use the parent node as your starting point.
C) if you end up at the beginning of a node (character wise) and this node is a left node:
split of this node (=> orphan node) and use this node as your starting point.
D) if you end up at the end of a node (character wise) and this node is a right node:
use the parent node as your starting point
D) if you end up at the end of a node (character wise) and this node is a left node:
use this node as your starting point.
During traveling:
If the node is a left node: move upwards and add its right sibling node to the list of orphans.
If the node is a right node: move upwards (to parent A), but do nothing with the left sibling node. (Or, because all the right nodes up to this point have been orphaned, you could set the starting point as the right node of the parent node A. The parent node A is then your new starting point. This avoids a bunch of nodes with only 1 child node.)
After
Concat all the accumulated orphans into a new Node. This is the right part of your new rootNode. The left part is the endpoint of your travel sequence.
Please correct me if I am wrong.

Related

TSP / CPP variant - subtour constraint

I'm developing an optimization problem that is a variant on Traveling Salesman. In this case, you don't have to visit all the cities, there's a required start and end point, there's a min and max bound on the tour length, you can traverse each arc multiple times if you want, and you have a nonlinear objective function that is associated with the arcs traversed (and number of times you traverse each arc). Decision variables are integers, how many times you traverse each arc.
I've developed a nonlinear integer program in Pyomo and am getting results from the NEOS server. However I didn't put in subtour constraints and my results are two disconnected subtours.
I can find integer programming formulations of TSP that say how to formulate subtour constraints, but this is a little different from the standard TSP and I'm trying to figure out how to start. Any help that can be provided would be greatly appreciated.
EDIT: problem formulation
50 arcs , not exhaustive pairs between nodes. 50 Decision variables N_ab are integer >=0, corresponds to how many times you traverse from a to b. There is a length and profit associated with each N_ab . There are two constraints that the sum of length_ab * N_ab for all ab are between a min and max distance. I have a constraint that the sum of N_ab into each node is equal to the sum N_ab out of the node you can either not visit a node at all, or visit it multiple times. Objective function is nonlinear and related to the interaction between pairs of arcs (not relevant for subtour).
Subtours: looking at math.uwaterloo.ca/tsp/methods/opt/subtour.htm , the formulation isn't applicable since I am not required to visit all cities, and may not be able to. So for example, let's say I have 20 nodes and 50 arcs (all arcs length 10). Distance constraints are for a tour of exactly length 30, which means I can visit at most three nodes (start at A -> B -> C ->A = length 30). So I will not visit the other nodes at all. TSP subtour elimination would require that I have edges from node subgroup ABC to subgroup of nonvisited nodes - which isn't needed for my problem

Here is an approach that is adapted from the prize-collecting TSP (e.g., this paper). Let V be the set of all nodes. I am assuming V includes a depot node, call it node 1, that must be on the tour. (If not, you can probably add a dummy node that serves this role.)
Let x[i] be a decision variable that equals 1 if we visit node i at least once, and 0 otherwise. (You might already have such a decision variable in your model.)
Add these constraints, which define x[i]:
x[i] <= sum {j in V} N[i,j] for all i in V
M * x[i] >= N[i,j] for all i, j in V
In other words: x[i] cannot equal 1 if there are no edges coming out of node i, and x[i] must equal 1 if there are any edges coming out of node i.
(Here, N[i,j] is 1 if we go from i to j, and M is a sufficiently large number, perhaps equal to the maximum number of times you can traverse one edge.)
Here is the subtour-elimination constraint, defined for all subsets S of V such that S includes node 1, and for all nodes i in V \ S:
sum {j in S} (N[i,j] + N[j,i]) >= 2 * x[i]
In other words, if we visit node i, which is not in S, then there must be at least two edges into or out of S. (A subtour would violate this constraint for S equal to the nodes that are on the subtour that contains 1.)
We also need a constraint requiring node 1 to be on the tour:
x[1] = 1
I might be playing a little fast and loose with the directional indices, i.e., I'm not sure if your model sets N[i,j] = N[j,i] or something like that, but hopefully the idea is clear enough and you can modify my approach as necessary.

"unique" crossover for genetic algorithm - TSP

I am creating a Genetic Algorithm to solve the Traveling Salesman Problem.
Currently, two 2D lists represent the two parents that need to be crossed:
path_1 = np.shuffle(np.arange(12).reshape(6, 2))
path_2 = np.arange(12).reshape(6,2)
Suppose each element in the list represents an (x, y) coordinate on a cartesian plane, and the 2D list represents the path that the "traveling salesman" must take (from index 0 to index -1).
Since the TSP requires that all points are included in a path, the resulting child of this crossover must have no duplicate points.
I have little idea as to how I can make such crossover and have the resulting child representative of both parents.

You need to use an ordered crossover operator, like OX1.
OX1 is a fairly simple permutation crossover.
Basically, a swath of consecutive alleles from parent 1 drops down,
and remaining values are placed in the child in the order which they
appear in parent 2.
I used to run TSP with these operators:
Crossover: Ordered Crossver (OX1).
Mutation: Reverse Sequence Mutation (RSM)
Selection: Roulette Wheel Selection

You can do something like this,
Choose half (or any random number between 0 to (length - 1)) coordinates from one parent using any approach, lets say where i % 2 == 0.
These can be positioned into the child using multiple approaches: either randomly, or all in the starting (or ending), or alternate position.
Now the remaining coordinated need to come from the 2nd parent for which you can traverse in the 2nd parent and if the coordinate is not chosen add it in the empty spaces.
For example,
I am choosing even positioned coordinated from parent 1 and putting it in even position indices in the child and then traversing in parent 2 to put the remaining coordinated in the odd position indices in the child.
def crossover(p1, p2, number_of_cities):
chk = {}
for i in range(number_of_cities):
chk[i] = 0
child = [-1] * number_of_cities
for x in range(len(p1)):
if x % 2 == 0:
child[x] = p1[x]
chk[p1[x]] = 1
y = 1
for x in range(len(p2)):
if chk[p2[x]] == 0:
child[y] = p2[x]
y += 2
return child
This approach preserves the order of cities visited from both parents.
Also since it is not symmetric p1 and p2 can be switched to give 2 children and the better (or both) can be chosen.

Dynamic programming algorithm to find palindromes in a directed acyclic graph

The problem is as follows: given a directed acyclic graph, where each node is labeled with a character, find all the longest paths of nodes in the graph that form a palindrome.
The initial solution that I thought of was to simply enumerate all the paths in the graph. This effectively generates a bunch of strings, on which we can then apply Manacher's algorithm to find all the longest palindromes. However, this doesn't seem that efficient, since the amount of paths in a graph is exponential in the number of nodes.
Then I started thinking of using dynamic programming directly on the graph, but my problem is that I cannot figure out how to structure my "dynamic programming array". My initial try was to use a 2d boolean array, where array[i][j] == true means that node i to node j is a palindrome but the problem is that there might be multiple paths from i to j.
I've been stuck on this problem for quite a while now I can't seem to figure it out, any help would be appreciated.

The linear-time trick of Manacher's algorithm relies on the fact that if you know that the longest palindrome centered at character 15 has length 5 (chars 13-17), and there's a palindrome centered at node 19 of length 13 (chars 13-25), then you can skip computing the longest palindrome centered at character 23 (23 = 19 + (19 - 15)) because you know it's just going to be the mirror of the one centered at character 15.
With a DAG, you don't have that kind of guarantee because the palindromes can go off in any direction, not just forwards and backwards. However, if you have a candidate palindrome path from node m to node n, whether you can extend that string to a longer palindrome doesn't depend on the path between m and n, but only on m and n (and the graph itself).
Therefore, I'd do this:
First, sort the graph nodes topologically, so that you have an array s[] of node indexes, and there being an edge from s[i] to s[j] implies that i < j.
I'll also assume that you build up an inverse array or hash structure sinv[] such that s[sinv[j]] == j and sinv[s[n]] == n for all integers j in 0..nodeCount-1 and all node indexes n.
Also, I'll assume that you have functions graphPredecessors, graphSuccessors, and graphLetter that take a node index and return the list of predecessors on the graph, the list of successors, or the letter at that node, respectively.
Then, make a two-dimensional array of integers of size nodeCount by nodeCount called r. When r[i][j] = y, and y > 0, it will mean that if there is a palindrome path from a successor of s[i] to a predecessor of s[j], then that path can be extended by adding s[i] to the front and s[j] to the back, and that the extension can be continued by y more nodes (including s[i] and s[j]) in each direction:
for (i=0; i < nodeCount; i++) {
for (j=i; j < nodeCount; j++) {
if (graphLetter(s[i]) == graphLetter(s[j])) {
r[i][j] = 1;
for (pred in graphPredecessors(s[i])) {
for (succ in graphSuccessors(s[j])) {
/* note that by our sorting, sinv[pred] < i <= j < sinv[succ] */
if (r[sinv[pred]][sinv[succ]] >= r[i][j]) {
r[i][j] = 1 + r[sinv[pred]][sinv[succ]];
}
}
}
} else {
r[i][j] = 0;
}
}
}
Then find the maximum value of r[x][x] for x in 0..nodeSize-1, and of r[lft][rgt] where there is an edge from s[lft] to s[rgt]. Call that maximum value M, and say you found it at location [i][j]. Each such i, j pair will represent the center of a longest palindrome path. As long as M is greater than 1, you then extend each center by finding a pred in graphPredecessors(s[i]) and a succ in graphSuccessors(s[j]) such that r[sinv[pred]][sinv[succ]] == M - 1 (the palindrome is now pred->s[i]->s[j]->succ). You then extend that by finding the appropriate index with an r value of M - 2, etc., stopping when you reach a spot where the value in r is 1.
I think this algorithm overall ends up with a runtime of O(V^2 + E^2), but I'm not entirely certain of that.

Searching through a multi-branch graph and returning a path [C#]

In my situation I have territory objects. Each territory knows what other territories they are connected to through an array. Here is an visualization of said territories as they would appear on a map:
If you were to map out the connections on a graph, they would look like this:
So say I have a unit stationed in territory [b] and I want to move it to territory [e], I'm looking for a method of searching through this graph and returning a final array that represents the path my unit in territory [b] must take. In this scenario, I would be looking for it to return
[b, e].
If I wanted to go from territory [a] to territory [f] then it would return:
[a, b, e, f].
I would love examples, but even just posts pointing me in the right direction are appreciated. Thanks in advance! :)

Have you heard of Breadth-First Search (BFS) before?
Basically, you simply put your initial territory, "a" in your example, into an otherwise empty queue Q
The second data structure you need is an array of booleans with as many elements as you have territories, in this case 9. It helps with remembering which territories we have already checked. We call it V (for "visited"). It needs to be initialized as follows: All elements equal false except the one corresponding to the initial square. That is for all territories t, we have V[t] = false, but V[a] = true because "a" is already in the queue.
The third and final data structure you need is an array to store the parent nodes (i.e. which node we are coming from). It also has as many elements as you have territories. We call it P (for "parent") and every element points to itself initially, that is for all t in P, set P[t] = t.
Then, it is quite simple:
while Q is not empty:
t = front element in the queue (remove it also from Q)
if t = f we can break from while loop //because we have found the goal
for all neighbors s of t for which V[s] = false:
add s into the back of Q //we have never explored the territory s yet as V[s] = false
set V[s] = true //we do not have to visit s again in the future
//we found s because there is a connection from t to s
//therefore, we need to remember that in s we are coming from the node t
//to do this we simply set the parent of s to t:
P[s] = t
How do you read the solution now?
Simply check the parent of f, then the parent of that and then the parent of that and so on until you find the beginning. You will know what the beginning is once you have found an element which has itself as the parent (remember that we let them point to itself initially) or you can also just compare it to a.
Basically, you just need a empty list L, add f into it and then
while f != a:
f = P[f]
add f into L
Note that this obviously fails if there exists no path because f will never equal a.
Therefore, this:
while f != P[f]:
f = P[f]
add f into L
is a bit nicer. It exploits the fact that initially all territories point to themselves in P.
If you try this on paper with you example above, then you will end up with
L = [f, e, b, a]
If you simply reverse this list, then you have what you wanted.
I don't know C#, so I didn't bother to use C# syntax. I assume that you know it is easiest to index your territories with integers and then use an array to access them.
You will realize quite quickly why this works. It's called breadth-first search because you consider only neighbors of the territory "a" first, with trivially shortest path to them (only 1 edge) and only once you processed all these, then territories that are further away will appear in the queue (only 2 edges from the start now) and so on. This is why we use a queue for this task and not something like a stack.
Also, this is linear in the number of territories and edges because you only need to look at every territory and edge (at most) once (though edges from both directions).
The algorithm I have given to you is basically the same as https://en.wikipedia.org/wiki/Breadth-first_search with only the P data structure added to keep track where you are coming from to be able to figure out the path taken.

Find the minimum gap between two numbers in an AVL tree

I have a data structures homework, that in addition to the regular AVL tree functions, I have to add a function that returns the minimum gap between any two numbers in the AVL tree (the nodes in the AVL actually represent numbers.)
Lets say we have the numbers (as nodes) 1 5 12 20 23 21 in the AVL tree, the function should return the minimum gap between any two numbers. In this situation it should return "1" which is |20-21| or |21-20|.
It should be done in O(1).
Tried to think alot about it, and I know there is a trick but just couldn't find it, I have spent hours on this.
There was another task which is to find the maximum gap, which is easy, it is the difference between the minimal and maximal number.

You need to extend the data structure otherwise you cannot obtain a O(1) search of the minimum gap between the numbers composing the tree.
You have the additional constrain to not increase the time complexity of insert/delete/search function and I assume that you don't want to increase space complexity too.
Let consider a generic node r, with a left subtree r.L and a right subtree r.R; we will extend the information in node r additional number r.x defined as the minimum value between:
(only if r.L is not empty) r value and the value of the rightmost leaf on r.L
(only if r.L is deeper than 1) the x value of the r.L root node
(only if r.R is not empty) r value and the value of the leftmost leaf on r.R
(only if r.R is deeper than 1) the x value of the r.R root node
(or undefined if none of the previous condition is valid, in the case of a leaf node)
Additionally, in order to make fast insert/delete we need to add in each internal node the references to its leftmost and rightmost leaf nodes.
You can see that with these additions:
the space complexity increase by a constant factor only
the insert/delete functions need to update the x values and the leftmost and rightmost leafs of the roots of every altered subtree, but is trivial to implement in a way that need not more than O(log(n))
the x value of the tree root is the value that the function needs to return, therefore you can implement it in O(1)
The minimum gap in the tree is the x value of the root node, more specifically, for each subtree the minimum gap in the subtree elements only is the subtree root x value.
The proof of this statement can be made by recursion:
Let consider a tree rooted by the node r, with a left subtree r.L and a right subtree r.R.
The inductive hypothesis is that the roots of r.L and r.R x values are the values of the minimum gaps between the node values of the subtree.
It's obvious that the minimum gap can be found considering only the pairs of nodes with values adjacent in the value sorted list; the pairs formed by values stored by the nodes of r.L have their minimum gap in the r.L root x value, the same is true considering the right subtree. Given that (any value of nodes in r.L) < value of L root node < (any value of nodes in r.R), the only pairs of adjacent values not considered are two:
the pair composed by the root node value and the higher r.L node value
the pair composed by the root node value and the lower r.R node value
The r.L node with the higher value is its rightmost leaf by the AVL tree properties, and the r.R node with the lower value is its leftmost leaf.
Assigning to r x value the minimum value between the four values (r.L root x value, r.R root x value, (r - r.L root) gap, (r - r.R root) gap) is the same to assign the smaller gap between consecutive node values in the whole tree, that is equivalent to the smaller gap between any possible pair of node values.
The cases where one or two of the subtree is empty are trivial.
The base cases of a tree made of only one or three nodes, it is trivial to see that the x value of the tree root is the minimum gap value.

This function might be helpful for you:
int getMinGap(Node N)
{
int a = Integer.MAX_VALUE ,b = Integer.MAX_VALUE,c = Integer.MAX_VALUE,d = Integer.MAX_VALUE;
if(N.left != null) {
a = N.left.minGap;
c = N.key - N.left.max;
}
if(N.right != null) {
b = N.right.minGap;
d = N.right.min - N.key;
}
int minGap = min(a,min(b,min(c,d)));
return minGap;
}
Here is the Node data structure:
class Node
{
int key, height, num, sum, min, max, minGap;
Node left, right;
Node(int d)
{
key = d;
height = 1;
num = 1;
sum = d;
min = d;
max = d;
minGap = Integer.MAX_VALUE;
}
}

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to split a rope tree? - string

The wikipedia article is not very explicit. If the current node is X and its parent is Y you would only travel up if X is the left child of Y. Visually you're going up and to the right as far as you can.

Related

TSP / CPP variant - subtour constraint

"unique" crossover for genetic algorithm - TSP

Dynamic programming algorithm to find palindromes in a directed acyclic graph

Searching through a multi-branch graph and returning a path [C#]

Find the minimum gap between two numbers in an AVL tree

Categories

Resources