Finding median in AVL tree

Finding median in AVL tree - median

I have an AVL tree in which I want to return the median element in O(1).
I know I can save a pointer to it every time I insert new element without changing the runtime of the insertion (by saving the size of the subtrees and traversing until I find the n/2'th size subtree).
But I want to know if I can do this using the fact that in every insertion the median shifts "to the right", and in every deletion the median shifts "to the left".
In a more general manner: How can I keep track of the i'th element in an AVL tree using predecessor and successor?

Given an AVL (self balanced binary search tree), find the median. Remember you can't just take the root element even if the tree is balanced because even with the tree balanced you don't know if the median is exactly the root element on a left or right son. Iterative algorithm used to find the median of an AVL. This algorithm is based on a property of every AVL tree, you can get a sorted collection containing the elements of this tree using an in order traversal. Using this property we can get a sorted collection of nodes and then find the median. The complexity order of this algorithm is O(N) in time and space terms where N is the number of nodes in the tree.
public class AvlTreeMedian {
BinaryTreeInOrder binaryTreeInOrder;
public AvlTreeMedian() {
this.binaryTreeInOrder = new BinaryTreeInOrder();
}
public double find(BinaryNode<Integer> root) {
if (root == null) {
throw new IllegalArgumentException("You can't pass a null binary tree to this method.");
}
List<BinaryNode<Integer>> sortedElements = binaryTreeInOrder.getIterative(root);
double median = 0;
if (sortedElements.size() % 2 == 0) {
median = (sortedElements.get(sortedElements.size() / 2).getData() + sortedElements.get(
sortedElements.size() / 2 - 1).getData()) / 2;
} else {
median = sortedElements.get(sortedElements.size() / 2).getData();
}
return median;
}
}

Related

Implement Breadth First Search using Java: I don't understand some of the code

class CheckBFS {
public static String bfs(Graph g){
String result = "";
//Checking if the graph has no vertices
if (g.vertices < 1){
return result;
}
//Boolean Array to hold the history of visited nodes (by default-false)
boolean[] visited = new boolean[g.vertices];
for(int i=0;i<g.vertices;i++)
{
//Checking whether the node is visited or not
if(!visited[i])
{
result = result + bfsVisit(g, i, visited);
}
}
return result;
}
public static String bfsVisit(Graph g, int source, boolean[] visited) {
String result = "";
//Create Queue for Breadth First Traversal and enqueue source in it
Queue<Integer> queue = new Queue<>(g.vertices);
queue.enqueue(source);
visited[source] = true;
//Traverse while queue is not empty
while (!queue.isEmpty()) {
//Dequeue a vertex/node from queue and add it to result
int current_node = queue.dequeue();
result += String.valueOf(current_node);
//Get adjacent vertices to the current_node from the array,
//and if they are not already visited then enqueue them in the Queue
DoublyLinkedList<Integer>.Node temp = null;
if(g.adjacencyList[current_node] != null)
temp = g.adjacencyList[current_node].headNode;
while (temp != null) {
if (!visited[temp.data]) {
queue.enqueue(temp.data);
visited[temp.data] = true; //Visit the current Node
}
temp = temp.nextNode;
}
}//end of while
return result;
}
public static void main(String args[]) {
Graph g = new Graph(5);
g.addEdge(0,1);
g.addEdge(0,2);
g.addEdge(1,3);
g.addEdge(1,4);
System.out.println("Graph1:");
g.printGraph();
System.out.println("BFS traversal of Graph1 : " + bfs(g));
System.out.println();
Graph g2 = new Graph(5);
g2.addEdge(0,1);
g2.addEdge(0,4);
g2.addEdge(1,2);
g2.addEdge(3,4);
System.out.println("Graph2:");
g2.printGraph();
System.out.println("BFS traversal of Graph2 : " + bfs(g2));
}
}
hello,guys! I know this may sound stupid but I am very new to coding and I have no one in real life to seek help for.
My question is: could anybody explain bfsVisit method to me?
especially the code below, so we created a queue to store all the nodes/vertices that we have visited and the part below is how we "extract" nodes/ vertices from the doublyLinkedList one by one and put them into the queue?
DoublyLinkedList.Node temp = null;
if(g.adjacencyList[current_node] != null)
temp = g.adjacencyList[current_node].headNode;
while (temp != null) {
if (!visited[temp.data]) {
queue.enqueue(temp.data);
visited[temp.data] = true; //Visit the current Node
}
temp = temp.nextNode;
}
I am not sure if I get the logic of all the two methods in class CheckBFS
why should we separate the methods into bfs and bfsVisit?
So bfs is the primary method that count/store/record the nodes we visited and bfsVisit is actually the method that help us traverse the array of linkedlist?
Thanks very much!!!!

could anybody explain bfsVisit method to me?
bfsVisit intends to visit all nodes that can be reached from the given source node. Here are some aspects of that part of the algorithm:
visited is important, as it makes sure that the algorithm will not fall into an infinite loop when it encounters a cycle in the graph. Imagine for instance three nodes A, B, and C, where A connects to B, and B connects to C, and C connects to A. We don't want the algorithm to just keep running in circles, revisiting the same nodes over and over again. And so when A is visited the first time, it is marked as visited in the visited boolean array. When the algorithm would meet A again, it would skip it.
queue is something that is typical for breadth-first traversals: it ensures that the nodes that are closest (in terms of edges) to the source node are visited first. So first the nodes that are only one edge apart from the source node are visited, then those that are two edges apart, ...etc. To achieve this order we need a first-in-first-out data structure, which is what queue is.
especially the code below, so we created a queue to store all the nodes/vertices that we have visited and the part below is how we "extract" nodes/ vertices from the doublyLinkedList one by one and put them into the queue?
Yes. The aspect of the doubly linked list is not that important: it happens to be the data structure that the Graph class uses to give access to a node's neighboring nodes. But that is an implementation detail of Graph and is not essential for the breadth-first algorithm. What matters is that somehow you can get hold of the neighboring nodes. This could have been a vector or an array, or any collection data type that could provide you with the neighbors. It is determined by the Graph class.
temp represents that list of neighbors. I find the name of that variable not helpful. Its name could have been more descriptive, like neighbor.
That inner loop intends to loop over all direct neighbors of a given node current_node. For each of the neighbors it is first assured that it was not visited before. In that case it is marked as visited and put at the end of the queue. The queue will hold it until it becomes the first in the queue, and then it will play the role of current_node so that it expends to its own neighbors, ...etc.
temp = temp->nextNode just picks the next neighbor from the list of neighbors. As I stated before, this is an implementation detail and would have looked different if the collection of neighbors would have been provided in a different data structure. What matters is that this loop goes through the list of neighbors and deals with those that were not visited before.
why should we separate the methods into bfs and bfsVisit?
Some of the reasons:
bfsVisit can only visit nodes that are connected to the source node. If the graph happens to be a graph with disconnected components. In that case it is impossible for bfsVisit to find all nodes of the graph. That's why we have the loop in bfs: it makes sure to run bfsVisit on each component of the graph, as otherwise we would only have visited one component.
bfs actually does not perform a BFS traversal itself. I should stress here that the name bfs for the main function is thus a bit misleading: BFS is a term that only makes sense for the collection of nodes in one component. The loop we find in bfs would actually look exactly the same if the purpose was to perform a dfs traversal! That loop's purpose is only to ensure that all nodes of a graph are visited even when the graph has multiple components. Notice how bfs does not use a queue for its own purposes. It just iterates the nodes in their numerical order without any regard of edges. This is entirely different from bfsVisit which looks for connected nodes, following edges.
bfsVisit is also a kind of helper function: it receives arguments that the main function bfs did not get itself as arguments, but manages itself:
visited is the most important of those. bfs starts out by the assumption that no nodes have been visited at all. bfsVisit however needs to tell bfs which nodes it had visited once it completes.
source is the starting point for bfsVisit: it is always one of the nodes that belong to a graph component that was not visited before. This is different from bst which just chooses itself which nodes to visit without any specific node given by the caller.
I hope this clarifies it.

Stacking and dynamic programing

Basically I'm trying to solve this problem :
Given N unit cube blocks, find the smaller number of piles to make in order to use all the blocks. A pile is either a cube or a pyramid. For example two valid piles are the cube 4 *4 *4=64 using 64 blocks, and the pyramid 1²+2²+3²+4²=30 using 30 blocks.
However, I can't find the right angle to approach it. I feel like it's similar to the knapsack problem, but yet, couldn't find an implementation.
Any help would be much appreciated !

First I will give a recurrence relation which will permit to solve the problem recursively. Given N, let
SQUARE-NUMS
TRIANGLE-NUMS
be the subset of square numbers and triangle numbers in {1,...,N} respectively. Let PERMITTED_SIZES be the union of these. Note that, as 1 occurs in PERMITTED_SIZES, any instance is feasible and yields a nonnegative optimum.
The follwing function in pseudocode will solve the problem in the question recursively.
int MinimumNumberOfPiles(int N)
{
int Result = 1 + min { MinimumNumberOfPiles(N-i) }
where i in PERMITTED_SIZES and i smaller than N;
return Result;
}
The idea is to choose a permitted bin size for the items, remove these items (which makes the problem instance smaller) and solve recursively for the smaller instances. To use dynamic programming in order to circumvent multiple evaluation of the same subproblem, one would use a one-dimensional state space, namely an array A[N] where A[i] is the minimum number of piles needed for i unit blocks. Using this state space, the problem can be solved iteratively as follows.
for (int i = 0; i < N; i++)
{
if i is 0 set A[i] to 0,
if i occurs in PERMITTED_SIZES, set A[i] to 1,
set A[i] to positive infinity otherwise;
}
This initializes the states which are known beforehand and correspond to the base cases in the above recursion. Next, the missing states are filled using the following loop.
for (int i = 0; i <= N; i++)
{
if (A[i] is positive infinity)
{
A[i] = 1 + min { A[i-j] : j is in PERMITTED_SIZES and j is smaller than i }
}
}
The desired optimal value will be found in A[N]. Note that this algorithm only calculates the minimum number of piles, but not the piles themselves; if a suitable partition is needed, it has to be found either by backtracking or by maintaining additional auxiliary data structures.
In total, provided that PERMITTED_SIZES is known, the problem can be solved in O(N^2) steps, as PERMITTED_SIZES contains at most N values.
The problem can be seen as an adaptation of the Rod Cutting Problem where each square or triangle size has value 0 and every other size has value 1, and the objective is to minimize the total value.
In total, an additional computation cost is necessary to generate PERMITTED_SIZES from the input.
More precisely, the corresponding choice of piles, once A is filled, can be generated using backtracking as follows.
int i = N; // i is the total amount still to be distributed
while ( i > 0 )
{
choose j such that
j is in PERMITTED_SIZES and j is smaller than i
and
A[i] = 1 + A[i-j] is minimized
Output "Take a set of size" + j; // or just output j, which is the set size
// the part above can be commented as "let's find out how
// the value in A[i] was generated"
set i = i-j; // decrease amount to distribute
}

search NSDictionary for latitude longitude with a certain distance

I have an NSDictionary of about 2000 locations with lat and long and I am dropping pins on map based on if they are in the visible map region.
Currently every time the pan the map I simply loop through my dictionary and calculate the distance to see if the location is visible, if so drop a pin.
CLLocationCoordinate2D centre = [self.map centerCoordinate];
CLLocation *mapCenter =[[CLLocation alloc] initWithLatitude: centre.latitude longitude: centre.longitude];
for (int i=0; i < [self.dealersSource count]; i++) {
CLLocation *d = [[CLLocation alloc] initWithLatitude: [[[self.dealersSource objectAtIndex:i] valueForKey:#"lat"] floatValue]
longitude: [[[self.dealersSource objectAtIndex:i] valueForKey:#"long"] floatValue]];
CLLocationDistance distance = [d distanceFromLocation:mapCenter];
float dist =(distance/1609.344);
if (dist <= radius && dist !=0) {
// this will be visible on the map, add to list of annotations
}
}
This works but seems pretty inefficient and can be slow on older iPads - especially if more and more locations get added to this list. I would like to be able to use some sort of NSPredicate to filter my initial list before I start looping though them.

There is not really any standard Objective-C structure that is well-suited to finding values within a range -- you pretty much have to search one-by-one (though you can use "predicates" to "hide" the search inside filteredArray... operations, etc, and so write fewer lines of code).
The best structure for efficiently finding values between bounds on a line is probably an array sorted on the values which is searched with a binary search algorithm. You'd do one binary search for the lower bound an another for the upper bound. This is log(n) complexity, so fairly efficient for large lists (if you don't have to sort the lists very often).
Precisely how one would do this for a 2-d surface is harder to figure. Perhaps first use the above technique to find "candidates" in the X direction, then check their Y coordinate. Would not be log(n), though.

Find the minimum gap between two numbers in an AVL tree

I have a data structures homework, that in addition to the regular AVL tree functions, I have to add a function that returns the minimum gap between any two numbers in the AVL tree (the nodes in the AVL actually represent numbers.)
Lets say we have the numbers (as nodes) 1 5 12 20 23 21 in the AVL tree, the function should return the minimum gap between any two numbers. In this situation it should return "1" which is |20-21| or |21-20|.
It should be done in O(1).
Tried to think alot about it, and I know there is a trick but just couldn't find it, I have spent hours on this.
There was another task which is to find the maximum gap, which is easy, it is the difference between the minimal and maximal number.

You need to extend the data structure otherwise you cannot obtain a O(1) search of the minimum gap between the numbers composing the tree.
You have the additional constrain to not increase the time complexity of insert/delete/search function and I assume that you don't want to increase space complexity too.
Let consider a generic node r, with a left subtree r.L and a right subtree r.R; we will extend the information in node r additional number r.x defined as the minimum value between:
(only if r.L is not empty) r value and the value of the rightmost leaf on r.L
(only if r.L is deeper than 1) the x value of the r.L root node
(only if r.R is not empty) r value and the value of the leftmost leaf on r.R
(only if r.R is deeper than 1) the x value of the r.R root node
(or undefined if none of the previous condition is valid, in the case of a leaf node)
Additionally, in order to make fast insert/delete we need to add in each internal node the references to its leftmost and rightmost leaf nodes.
You can see that with these additions:
the space complexity increase by a constant factor only
the insert/delete functions need to update the x values and the leftmost and rightmost leafs of the roots of every altered subtree, but is trivial to implement in a way that need not more than O(log(n))
the x value of the tree root is the value that the function needs to return, therefore you can implement it in O(1)
The minimum gap in the tree is the x value of the root node, more specifically, for each subtree the minimum gap in the subtree elements only is the subtree root x value.
The proof of this statement can be made by recursion:
Let consider a tree rooted by the node r, with a left subtree r.L and a right subtree r.R.
The inductive hypothesis is that the roots of r.L and r.R x values are the values of the minimum gaps between the node values of the subtree.
It's obvious that the minimum gap can be found considering only the pairs of nodes with values adjacent in the value sorted list; the pairs formed by values stored by the nodes of r.L have their minimum gap in the r.L root x value, the same is true considering the right subtree. Given that (any value of nodes in r.L) < value of L root node < (any value of nodes in r.R), the only pairs of adjacent values not considered are two:
the pair composed by the root node value and the higher r.L node value
the pair composed by the root node value and the lower r.R node value
The r.L node with the higher value is its rightmost leaf by the AVL tree properties, and the r.R node with the lower value is its leftmost leaf.
Assigning to r x value the minimum value between the four values (r.L root x value, r.R root x value, (r - r.L root) gap, (r - r.R root) gap) is the same to assign the smaller gap between consecutive node values in the whole tree, that is equivalent to the smaller gap between any possible pair of node values.
The cases where one or two of the subtree is empty are trivial.
The base cases of a tree made of only one or three nodes, it is trivial to see that the x value of the tree root is the minimum gap value.

This function might be helpful for you:
int getMinGap(Node N)
{
int a = Integer.MAX_VALUE ,b = Integer.MAX_VALUE,c = Integer.MAX_VALUE,d = Integer.MAX_VALUE;
if(N.left != null) {
a = N.left.minGap;
c = N.key - N.left.max;
}
if(N.right != null) {
b = N.right.minGap;
d = N.right.min - N.key;
}
int minGap = min(a,min(b,min(c,d)));
return minGap;
}
Here is the Node data structure:
class Node
{
int key, height, num, sum, min, max, minGap;
Node left, right;
Node(int d)
{
key = d;
height = 1;
num = 1;
sum = d;
min = d;
max = d;
minGap = Integer.MAX_VALUE;
}
}

What is the complexity of this code?

I have a the above model represented in a Face Table List where the F1, F2,...F_n are the faces of the model and their face number is the index of the list array. Each list element is another array of 3 vertices. And each vertex is an array of 3 integers representing its x,y,z coordinates.
I want to find out all the neighbouring faces of the vertex with coordinates (x2, y2, z2). I came out with this code that I believe would do the task:
List faceList; //say the faceList is the table in the picture above.
int[] targetVertex = {x2, y2, z2}; //say this is the vertex I want to find with coordinates (x2, y2, z2)
List faceIndexFoundList; //This is the result, which is a list of index of the neighbouring faces of the targetVertex
for(int i=0; i<faceList.length; i++) {
bool vertexMatched = true;
for(int j=0; j<faceList[i].length; j++) {
if(faceList[i][j][0] != targetVertex[0] && faceList[i][j][1] != targetVertex[1] && faceList[i][j][2] != targetVertex[2]) {
vertexMatched = false;
break;
}
}
if(vertexMatched == true) {
faceIndexFoundList.add(i);
}
}
I was told that the complexity to do the task is O(N^2). But with the code that I have, it looks like only O(N). The length of targetVertex is 3 since there is only 3 vertices per polygon. So, the second inner loop is merely a constant. Then, I left only with the outer for loop, which is then O(N) only.
What is the complexity of the code that I have above? What could I have done wrong?

The complexity is (aproximatly) faceList.length * faceList[i].length, these are independent, but can both grow very large, and as they grow they will each approch infinity at which point (conceptually) they will converge on n, resulting in the complexity being O(n^2)
If the vertex list is explicitly limited to 3, then the complexity becomes faceList[i].length * 3, which is O(n)

It's pretty obvious that in the worst case you must look at each vertex of each polygon.
This is just O(size of the table) in your post, which in turn is the sum of all row lengths or the sum of all polygon vertex counts, whichever you prefer.
If you say polygons have no more than m vertices and there are n polygons, then the algorithm is O(mn).
FWIW it's possible to get the answer with no searching at all with a more sophisticated data structure. See for example the winged edge data structure and others. In this case, you just go to the vertex you're interested in and traverse the links that connect all adjacent polygons. Cost is constant for each polygon in the output.
These fancier data structures for polygonal meshes support lots of frequently used operations with wonderful efficiency.

From Wikipedia:
Big O notation is used to describe the limiting behavior of a function when the argument tends towards a particular value or infinity.
In this single case you might only be running the one for loop. But what happens when the number of vertices of the polygon approaches infinity? Do the majority of the cases cause the second for loop to run, or to break? This will determine whether your function is O(n) or O(n^2).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string