Find the minimum gap between two numbers in an AVL tree - search

I have a data structures homework, that in addition to the regular AVL tree functions, I have to add a function that returns the minimum gap between any two numbers in the AVL tree (the nodes in the AVL actually represent numbers.)
Lets say we have the numbers (as nodes) 1 5 12 20 23 21 in the AVL tree, the function should return the minimum gap between any two numbers. In this situation it should return "1" which is |20-21| or |21-20|.
It should be done in O(1).
Tried to think alot about it, and I know there is a trick but just couldn't find it, I have spent hours on this.
There was another task which is to find the maximum gap, which is easy, it is the difference between the minimal and maximal number.

You need to extend the data structure otherwise you cannot obtain a O(1) search of the minimum gap between the numbers composing the tree.
You have the additional constrain to not increase the time complexity of insert/delete/search function and I assume that you don't want to increase space complexity too.
Let consider a generic node r, with a left subtree r.L and a right subtree r.R; we will extend the information in node r additional number r.x defined as the minimum value between:
(only if r.L is not empty) r value and the value of the rightmost leaf on r.L
(only if r.L is deeper than 1) the x value of the r.L root node
(only if r.R is not empty) r value and the value of the leftmost leaf on r.R
(only if r.R is deeper than 1) the x value of the r.R root node
(or undefined if none of the previous condition is valid, in the case of a leaf node)
Additionally, in order to make fast insert/delete we need to add in each internal node the references to its leftmost and rightmost leaf nodes.
You can see that with these additions:
the space complexity increase by a constant factor only
the insert/delete functions need to update the x values and the leftmost and rightmost leafs of the roots of every altered subtree, but is trivial to implement in a way that need not more than O(log(n))
the x value of the tree root is the value that the function needs to return, therefore you can implement it in O(1)
The minimum gap in the tree is the x value of the root node, more specifically, for each subtree the minimum gap in the subtree elements only is the subtree root x value.
The proof of this statement can be made by recursion:
Let consider a tree rooted by the node r, with a left subtree r.L and a right subtree r.R.
The inductive hypothesis is that the roots of r.L and r.R x values are the values of the minimum gaps between the node values of the subtree.
It's obvious that the minimum gap can be found considering only the pairs of nodes with values adjacent in the value sorted list; the pairs formed by values stored by the nodes of r.L have their minimum gap in the r.L root x value, the same is true considering the right subtree. Given that (any value of nodes in r.L) < value of L root node < (any value of nodes in r.R), the only pairs of adjacent values not considered are two:
the pair composed by the root node value and the higher r.L node value
the pair composed by the root node value and the lower r.R node value
The r.L node with the higher value is its rightmost leaf by the AVL tree properties, and the r.R node with the lower value is its leftmost leaf.
Assigning to r x value the minimum value between the four values (r.L root x value, r.R root x value, (r - r.L root) gap, (r - r.R root) gap) is the same to assign the smaller gap between consecutive node values in the whole tree, that is equivalent to the smaller gap between any possible pair of node values.
The cases where one or two of the subtree is empty are trivial.
The base cases of a tree made of only one or three nodes, it is trivial to see that the x value of the tree root is the minimum gap value.

This function might be helpful for you:
int getMinGap(Node N)
{
int a = Integer.MAX_VALUE ,b = Integer.MAX_VALUE,c = Integer.MAX_VALUE,d = Integer.MAX_VALUE;
if(N.left != null) {
a = N.left.minGap;
c = N.key - N.left.max;
}
if(N.right != null) {
b = N.right.minGap;
d = N.right.min - N.key;
}
int minGap = min(a,min(b,min(c,d)));
return minGap;
}
Here is the Node data structure:
class Node
{
int key, height, num, sum, min, max, minGap;
Node left, right;
Node(int d)
{
key = d;
height = 1;
num = 1;
sum = d;
min = d;
max = d;
minGap = Integer.MAX_VALUE;
}
}

Related

Maximum Sum of XOR operation on a selected element with array elements with an optimize approach

Problem: Choose an element from the array to maximize the sum after XOR all elements in the array.
Input for problem statement:
N=3
A=[15,11,8]
Output:
11
Approach:
(15^15)+(15^11)+(15^8)=11
My Code for brute force approach:
def compute(N,A):
ans=0
for i in A:
xor_sum=0
for j in A:
xor_sum+=(i^j)
if xor_sum>ans:
ans=xor_sum
return ans
Above approach giving the correct answer but wanted to optimize the approach to solve it in O(n) time complexity. Please help me to get this.
If you have integers with a fixed (constant) number of c bites then it should be possible because O(c) = O(1). For simplicity reasons I assume unsigned integers and n to be odd. If n is even then we sometimes have to check both paths in the tree (see solution below). You can adapt the algorithm to cover even n and negative numbers.
find max in array with length n O(n)
if max == 0 return 0 (just 0s in array)
find the position p of the most significant bit of max O(c) = O(1)
p = -1
while (max != 0)
p++
max /= 2
so 1 << p gives a mask for the highest set bit
build a tree where the leaves are the numbers and every level stands for a position of a bit, if there is an edge to the left from the root then there is a number that has bit p set and if there is an edge to the right there is a number that has bit p not set, for the next level we have an edge to the left if there is a number with bit p - 1 set and an edge to the right if bit p - 1 is not set and so on, this can be done in O(cn) = O(n)
go through the array and count how many times a bit at position i (i from 0 to p) is set => sum array O(cn) = O(n)
assign the root of the tree to node x
now for each i from p to 0 do the following:
if x has only one edge => x becomes its only child node
else if sum[i] > n / 2 => x becomes its right child node
else x becomes its left child node
in this step we choose the best path through the tree that gives us the most ones when xoring O(cn) = O(n)
xor all the elements in the array with the value of x and sum them up to get the result, actually you could have built the result already in the step before by adding sum[i] * (1 << i) to the result if going left and (n - sum[i]) * (1 << i) if going right O(n)
All the sequential steps are O(n) and therefore overall the algorithm is also O(n).

TSP / CPP variant - subtour constraint

I'm developing an optimization problem that is a variant on Traveling Salesman. In this case, you don't have to visit all the cities, there's a required start and end point, there's a min and max bound on the tour length, you can traverse each arc multiple times if you want, and you have a nonlinear objective function that is associated with the arcs traversed (and number of times you traverse each arc). Decision variables are integers, how many times you traverse each arc.
I've developed a nonlinear integer program in Pyomo and am getting results from the NEOS server. However I didn't put in subtour constraints and my results are two disconnected subtours.
I can find integer programming formulations of TSP that say how to formulate subtour constraints, but this is a little different from the standard TSP and I'm trying to figure out how to start. Any help that can be provided would be greatly appreciated.
EDIT: problem formulation
50 arcs , not exhaustive pairs between nodes. 50 Decision variables N_ab are integer >=0, corresponds to how many times you traverse from a to b. There is a length and profit associated with each N_ab . There are two constraints that the sum of length_ab * N_ab for all ab are between a min and max distance. I have a constraint that the sum of N_ab into each node is equal to the sum N_ab out of the node you can either not visit a node at all, or visit it multiple times. Objective function is nonlinear and related to the interaction between pairs of arcs (not relevant for subtour).
Subtours: looking at math.uwaterloo.ca/tsp/methods/opt/subtour.htm , the formulation isn't applicable since I am not required to visit all cities, and may not be able to. So for example, let's say I have 20 nodes and 50 arcs (all arcs length 10). Distance constraints are for a tour of exactly length 30, which means I can visit at most three nodes (start at A -> B -> C ->A = length 30). So I will not visit the other nodes at all. TSP subtour elimination would require that I have edges from node subgroup ABC to subgroup of nonvisited nodes - which isn't needed for my problem
Here is an approach that is adapted from the prize-collecting TSP (e.g., this paper). Let V be the set of all nodes. I am assuming V includes a depot node, call it node 1, that must be on the tour. (If not, you can probably add a dummy node that serves this role.)
Let x[i] be a decision variable that equals 1 if we visit node i at least once, and 0 otherwise. (You might already have such a decision variable in your model.)
Add these constraints, which define x[i]:
x[i] <= sum {j in V} N[i,j] for all i in V
M * x[i] >= N[i,j] for all i, j in V
In other words: x[i] cannot equal 1 if there are no edges coming out of node i, and x[i] must equal 1 if there are any edges coming out of node i.
(Here, N[i,j] is 1 if we go from i to j, and M is a sufficiently large number, perhaps equal to the maximum number of times you can traverse one edge.)
Here is the subtour-elimination constraint, defined for all subsets S of V such that S includes node 1, and for all nodes i in V \ S:
sum {j in S} (N[i,j] + N[j,i]) >= 2 * x[i]
In other words, if we visit node i, which is not in S, then there must be at least two edges into or out of S. (A subtour would violate this constraint for S equal to the nodes that are on the subtour that contains 1.)
We also need a constraint requiring node 1 to be on the tour:
x[1] = 1
I might be playing a little fast and loose with the directional indices, i.e., I'm not sure if your model sets N[i,j] = N[j,i] or something like that, but hopefully the idea is clear enough and you can modify my approach as necessary.

Dynamic programming algorithm to find palindromes in a directed acyclic graph

The problem is as follows: given a directed acyclic graph, where each node is labeled with a character, find all the longest paths of nodes in the graph that form a palindrome.
The initial solution that I thought of was to simply enumerate all the paths in the graph. This effectively generates a bunch of strings, on which we can then apply Manacher's algorithm to find all the longest palindromes. However, this doesn't seem that efficient, since the amount of paths in a graph is exponential in the number of nodes.
Then I started thinking of using dynamic programming directly on the graph, but my problem is that I cannot figure out how to structure my "dynamic programming array". My initial try was to use a 2d boolean array, where array[i][j] == true means that node i to node j is a palindrome but the problem is that there might be multiple paths from i to j.
I've been stuck on this problem for quite a while now I can't seem to figure it out, any help would be appreciated.
The linear-time trick of Manacher's algorithm relies on the fact that if you know that the longest palindrome centered at character 15 has length 5 (chars 13-17), and there's a palindrome centered at node 19 of length 13 (chars 13-25), then you can skip computing the longest palindrome centered at character 23 (23 = 19 + (19 - 15)) because you know it's just going to be the mirror of the one centered at character 15.
With a DAG, you don't have that kind of guarantee because the palindromes can go off in any direction, not just forwards and backwards. However, if you have a candidate palindrome path from node m to node n, whether you can extend that string to a longer palindrome doesn't depend on the path between m and n, but only on m and n (and the graph itself).
Therefore, I'd do this:
First, sort the graph nodes topologically, so that you have an array s[] of node indexes, and there being an edge from s[i] to s[j] implies that i < j.
I'll also assume that you build up an inverse array or hash structure sinv[] such that s[sinv[j]] == j and sinv[s[n]] == n for all integers j in 0..nodeCount-1 and all node indexes n.
Also, I'll assume that you have functions graphPredecessors, graphSuccessors, and graphLetter that take a node index and return the list of predecessors on the graph, the list of successors, or the letter at that node, respectively.
Then, make a two-dimensional array of integers of size nodeCount by nodeCount called r. When r[i][j] = y, and y > 0, it will mean that if there is a palindrome path from a successor of s[i] to a predecessor of s[j], then that path can be extended by adding s[i] to the front and s[j] to the back, and that the extension can be continued by y more nodes (including s[i] and s[j]) in each direction:
for (i=0; i < nodeCount; i++) {
for (j=i; j < nodeCount; j++) {
if (graphLetter(s[i]) == graphLetter(s[j])) {
r[i][j] = 1;
for (pred in graphPredecessors(s[i])) {
for (succ in graphSuccessors(s[j])) {
/* note that by our sorting, sinv[pred] < i <= j < sinv[succ] */
if (r[sinv[pred]][sinv[succ]] >= r[i][j]) {
r[i][j] = 1 + r[sinv[pred]][sinv[succ]];
}
}
}
} else {
r[i][j] = 0;
}
}
}
Then find the maximum value of r[x][x] for x in 0..nodeSize-1, and of r[lft][rgt] where there is an edge from s[lft] to s[rgt]. Call that maximum value M, and say you found it at location [i][j]. Each such i, j pair will represent the center of a longest palindrome path. As long as M is greater than 1, you then extend each center by finding a pred in graphPredecessors(s[i]) and a succ in graphSuccessors(s[j]) such that r[sinv[pred]][sinv[succ]] == M - 1 (the palindrome is now pred->s[i]->s[j]->succ). You then extend that by finding the appropriate index with an r value of M - 2, etc., stopping when you reach a spot where the value in r is 1.
I think this algorithm overall ends up with a runtime of O(V^2 + E^2), but I'm not entirely certain of that.

Finding median in AVL tree

I have an AVL tree in which I want to return the median element in O(1).
I know I can save a pointer to it every time I insert new element without changing the runtime of the insertion (by saving the size of the subtrees and traversing until I find the n/2'th size subtree).
But I want to know if I can do this using the fact that in every insertion the median shifts "to the right", and in every deletion the median shifts "to the left".
In a more general manner: How can I keep track of the i'th element in an AVL tree using predecessor and successor?
Given an AVL (self balanced binary search tree), find the median. Remember you can't just take the root element even if the tree is balanced because even with the tree balanced you don't know if the median is exactly the root element on a left or right son. Iterative algorithm used to find the median of an AVL. This algorithm is based on a property of every AVL tree, you can get a sorted collection containing the elements of this tree using an in order traversal. Using this property we can get a sorted collection of nodes and then find the median. The complexity order of this algorithm is O(N) in time and space terms where N is the number of nodes in the tree.
public class AvlTreeMedian {
BinaryTreeInOrder binaryTreeInOrder;
public AvlTreeMedian() {
this.binaryTreeInOrder = new BinaryTreeInOrder();
}
public double find(BinaryNode<Integer> root) {
if (root == null) {
throw new IllegalArgumentException("You can't pass a null binary tree to this method.");
}
List<BinaryNode<Integer>> sortedElements = binaryTreeInOrder.getIterative(root);
double median = 0;
if (sortedElements.size() % 2 == 0) {
median = (sortedElements.get(sortedElements.size() / 2).getData() + sortedElements.get(
sortedElements.size() / 2 - 1).getData()) / 2;
} else {
median = sortedElements.get(sortedElements.size() / 2).getData();
}
return median;
}
}

How to split a rope tree?

I came across a rope tree as an alternative data structure for a string.
http://en.wikipedia.org/wiki/Rope_(data_structure)
To concat is easy, but I am stuck on the split operation. The wikipedia article states:
For example, to split the 22-character rope pictured in Figure 2.3 into two equal component ropes of length 11, query the 12th character to locate the node K at the bottom level. Remove the link between K and the right child of G. Go to the parent G and subtract the weight of K from the weight of G. Travel up the tree and remove any right links, subtracting the weight of K from these nodes (only node D, in this case). Finally, build up the newly-orphaned nodes K and H by concatenating them together and creating a new parent P with weight equal to the length of the left node K.
Locating the character and recombining the orphans is no problem. But I don't understand the "Travel up the tree and remove any right links, subtracting the weight of K from these nodes". The example stops at D, but if you follow these instructions verbatim you would continue onto B and remove D as well. What is the correct stopping requirement in this algorithm? And how do you avoid nodes with only one (left or right) child?
A pseudo-code algorithm explaining this part would help tremendously.
The wikipedia article is not very explicit. If the current node is X and its parent is Y you would only travel up if X is the left child of Y. Visually you're going up and to the right as far as you can.
I give Ruby code below. It's as close to executable pseudocode as you'll get. If Ruby isn't right for your implementation, you can always use it as a prototype. If you don't like the recursion, it's easy enough to use standard transformations to make it iterative with an explicit stack.
The natural implementation of split is recursive. The interesting case is near the bottom of the code.
class Rope
# Cat two ropes by building a new binary node.
# The parent count is the left child's length.
def cat(s)
if self.len == 0
s
elsif s.len == 0
self
else
Node.new(self, s, len)
end
end
# Insert a new string into a rope by splitting it
# and concatenating twice with a new leaf in the middle.
def insert(s, p)
a, b = split_at(p)
a.cat(Leaf.new(s)).cat(b)
end
end
class Leaf < Rope
# A leaf holds characters as a string.
attr_accessor :chars
# Construct a new leaf with given characters.
def initialize(chars)
#chars = chars
end
# The count in this rope is just the number of characters.
def count
chars.length
end
# The length is kind of obvious.
def len
chars.length
end
# Convert to a string by just returning the characters.
def to_s
chars
end
# Split by dividing the string.
def split_at(p)
[ Leaf.new(chars[0...p]), Leaf.new(chars[p..-1]) ]
end
end
class Node < Rope
# Fields of the binary node.
attr_accessor :left, :right, :count
# Construct a new binary node.
def initialize(left, right, count)
#left = left
#right = right
#count = count
end
# Length is our count plus right subtree length. Memoize for efficiency.
def len
#len ||= count + right.len
end
# The string rep is just concatenating the children's string reps.
def to_s
left.to_s + right.to_s
end
# Split recursively by splitting the left or right
# subtree and recombining the results.
def split_at(p)
if p < count
a, b = left.split_at(p)
[ a, b.cat(right) ]
elsif p > count
a, b = right.split_at(p - count)
[ left.cat(a), b ]
else
[ left, right ]
end
end
end
After some tinkering and consideration, I think the rules should be this:
First determine the starting point for traveling upwards.
A) If you end up in the middle of a node (node A), split the string at the right character index and create a left and right node. The parents of these new nodes is node A. The left node is your starting point. The right node will be added to the orphans while traveling up.
B) if you end up at the beginning of a node (character wise) and this node is a right node: split of this node (=> orphan node) and use the parent node as your starting point.
C) if you end up at the beginning of a node (character wise) and this node is a left node:
split of this node (=> orphan node) and use this node as your starting point.
D) if you end up at the end of a node (character wise) and this node is a right node:
use the parent node as your starting point
D) if you end up at the end of a node (character wise) and this node is a left node:
use this node as your starting point.
During traveling:
If the node is a left node: move upwards and add its right sibling node to the list of orphans.
If the node is a right node: move upwards (to parent A), but do nothing with the left sibling node. (Or, because all the right nodes up to this point have been orphaned, you could set the starting point as the right node of the parent node A. The parent node A is then your new starting point. This avoids a bunch of nodes with only 1 child node.)
After
Concat all the accumulated orphans into a new Node. This is the right part of your new rootNode. The left part is the endpoint of your travel sequence.
Please correct me if I am wrong.

Resources