I am trying to figure out how to delete root node from this BST
Swap the values of the root with the values of either its predecessor (the maximum value in the left sub-tree) or its successor (the minimum value in the right sub-tree). Once that is done, you perform a delete on the swapped node.
So this would look like this if you used predecessor:
Swap 8 and 9 in the tree
Delete the node that now contains 9
Note: in the future, please try to do your due diligence in terms of researching your question before posting it here. Here is a simple answer to your question.
Related
In VS2015 there is a useful tool for finding what is holding on to a reference to an object. However, in the spaghetti-like code that I am working in expanding this tree out has many leaves that end with [cycle detected] (e.g. A has a reference to B and B has a reference to A). For example:
Is there a way to filter out the branches that contain just leaves with cycles? Or a way to export the data here?
I am currently storing a large number of unsigned 32-bit integers in a bit trie (effectively forming a binary tree with a node for each bit in the 32-bit value.) This is very efficient for fast lookup of exact values.
I now want to be able to search for keys that may or may not be in the trie and find the value for the first key less than or equal to the search key. Is this efficiently possible with a bit trie, or should I use a different data structure?
I am using a trie due to its speed and cache locality, and ideally want to sacrifice neither.
For example, suppose the trie has two keys added:
0x00AABBCC
0x00AABB00
and I an now searching for a key that is not present, 0x00AABB11. I would like to find the first key present in the tree with a value <= the search key, which in this case would be the node for 0x00AABB00.
While I've thought of a possible algorithm for this, I am seeking concrete information on if it is efficiently possible and/or if there are known algorithms for this, which will no doubt be better than my own.
We can think bit trie as a binary search tree. In fact, it is a binary search tree. Take the 32-bit trie for example, suppose left child as 0, right child as 1. For the root, the left subtree is for the numbers less than 0x80000000 and the right subtree is for the numbers no less than 0x80000000, so on and so forth. So you can just use the similar the method to find the largest item not larger than the search key in the binary search tree. Just don't worry about the backtracks, it won't backtrack too much and won't change the search complexity.
When you match fails in the bit trie, just backtrack to find the right-most child of the nearest ancestor of the failed node.
If the data is static--you're not adding or removing items--then I'd take a good look at using a simple array with binary search. You sacrifice cache locality, but that might not be catastrophic. I don't see cache locality as an end in itself, but rather a means of making the data structure fast.
You might get better cache locality by creating a balanced binary tree in an array. Position 0 is the root node, position 1 is left node, position 2 is right node, etc. It's the same structure you'd use for a binary heap. If you're willing to allocate another 4 bytes per node, you could make it a left-threaded binary tree so that if you search for X and end up at the next larger value, following that left thread would give you the next smaller value. All told, though, I don't see where this can outperform the plain array in the general case.
A lot depends on how sparse your data is and what the range is. If you're looking at a few thousand possible values in the range 0 to 4 billion, then the binary search looks pretty attractive. If you're talking about 500 million distinct values, then I'd look at allocating a bit array (500 megabytes) and doing a direct lookup with linear backward scan. That would give you very good cache locality.
A bit trie walks 32 nodes in the best case when the item is found.
A million entries in a red-black tree like std::map or java.util.TreeMap would only require log2(1,000,000) or roughly 20 nodes per query, worst case. And you do not always need to go to the bottom of the tree making average case appealing.
When backtracking to find <= the difference is even more pronounced.
The fewer entries you have, the better the case for a red-black tree
At a minimum, I would compare any solution to a red-black tree.
I have a 4x4 undirected graph, with links/paths between each node vertically,horizontally and diagonally. In my example I've simplified the contents of these nodes to integers. Given a series of numbers of any length, I want to determine if a path exists on the board that consists of these numbers. No node can be used twice. For example searching 789, 548 and 734 on the graph below would return true, but 111, 7343 and 98989 would return false.
I currently have what is essentially a depth first search, but I realized it is missing some paths. In the above example, 12234 could be missed. If the search starts at 1, moves diagonally to 2, and left to 2, there is nowhere else to go. The search then backtracks, marking the rightmost 2 as visited and blocking the only correct path.
The improvement I've been able to come up with is to add additional state to each node to record the depth at which they were visited. That would eliminate this case, and certainly make it more correct. But this is still a problem for 27979 on the graph above. If the search starts at the left-most 2, goes down and right to 7, up-right to 9, up-left to 7, it will again block the correct path.
It seems like I'm using the wrong kind of search here, but what's the right one?
It seems I've come up with a solution, which I'll share in case someone else comes across this is the future.
On each search I build a tree with bidirectional links, so that at each place the path can go multiple ways it branches, like a breadth-first search. This differs in that each branch of the tree can use the nodes of any other unconnected branch. As each node is added to the tree I follow the links back to root and check the node against each link to eliminate the possibility of cyclical paths and allow reuse of nodes from other paths. Once the a branch has reached the depth desired, I backtrack to root to record the path.
I am going through a Udacity course and in one of the lectures (https://www.youtube.com/watch?v=gPQ-g8xkIAQ&feature=player_embedded), the professor gives the function high_common_bits which (taken verbatim from the lecture) looks like this in pseudocode:
function high_common_bits(a,b):
return:
- high order bits that a+b in common
- highest differing bit set
- all remaining bits clear
As an example:
a = 10101
b = 10011
high_common_bits(a,b) => 10100
He then says that this function is used in highly-optimized implementations of tries. Does anyone happen to know which exact implementation he's referring to?
If you are looking for a highly optimized bitwise compressed trie (aka Radix Tree). The BSD routing table uses one in it's implementation. The code is not easy to read though.
He was talking about Succinct Tries, tries in which each node requires only two bits to store (the theoretical minimum).
Steve Hanov wrote a very approachable blog post on Succinct Tries here. You can also read the original paper by Guy Jacobson (written as recently as 1989) which introduced them here.
A compressed trie stores a prefix in one node, then branches from that node to each possible item that's been seen that starts with that prefix.
In this case he's apparently doing a bit-wise trie, so it's storing a prefix of bits -- i.e., the bits at the beginning that the items have in common go in one node, then there are two branches from that node, one to a node for the next bit being a 0, and the other for the next bit being a 1. Presumably those nodes will be compressed as well, so they won't just store the next single bit, but instead store a number of bits that all matched in the items inserted into the trie so far.
In fact, the next bit following a given node may not be stored in the following nodes at all. That bit can be implicit in the link that's followed, so the next nodes store only bits after that.
So I need some help brainstorming, from a theoretical standpoint. Right now I have some code that just draws some objects. The objects lie in the leaves of a quadtree. Now as the objects move I want to keep them placed in the correct leaf of the quadtree.
Right now I am just reconstructing the quadtree on the objects after I change their position. I was trying to figure out a way to correct the tree without rebuilding it completely. All I can think of is having a bunch of pointers to adjacent leaf nodes.
Does anyone have an idea of how to figure out the node into which an object moves without just having a ton of pointers everywhere or a link to articles on this? All I could find was different ways to build the quadtree, nothing about updating it.
If I understand your question. You want some way of mapping between spatial coordinates and leaves on the quadtree.
Here's one possible solution I've been looking at:
For simplicity, let's do the 1D case first. And lets assume we have 32 gridpoints in x. Every grid point then corresponds to some leaf on a quadtree of depth five. (depth 0 = the whole grid, depth 1 = 2 points, depth 2 = 4 points... depth 5 = 32 points).
Each leaf could be represented by the branch indices leading to the leaf. At each level there are two branches we can label A and B. So, a particular leaf might be labeled BBAAB, which would mean, go down the B branch, then the B branch, then the A branch, then the B branch and then the B branch.
So, how do you map e.g. BBABB to an x grid point between 0..31? Just convert it to binary, so that BBABB->11011 = 27. Thus, the mapping from gridpoint to leaf-node is simply a matter of translating the letters A and B into 0s and 1s and then interpreting the result as a binary number.
For the 2D case, it's only slightly more complicated. Now we have four branches from each node, so we can label each branch path using a four-letter alphabet, e.g. starting from the root and taking the 3rd branch and then the fourth branch and then the first branch and then the second branch and then the second branch again we would generate the string CDABB.
Now to convert the string (e.g. 'CDABB') into a pair of gridvalues (x,y).
Let's assume A is lower-left, B is lower right, C is upper left and D is upper right. Then, symbolically, we could write, A.x=0, A.y=0 / B.x=1, B.y=0 / C.x=0, C.y=1 / D.x=1, D.y=1.
Taking the example CDABB, we first look at its x values (CDABB).x = (01011), which gives us the x grid point. And similarly for y.
Finally, if you want to find out e.g. the node immediately to the right of CDABB, then simply convert it to a pair of binary numbers in x and y, add +1 to the x value and convert the new pair of binary numbers back into a string.
I'm sure this has all been discovered, but I haven't yet found this information on the web.
If you have the spatial data necessary to insert an element into the quad-tree in the first place (ex: its point or rectangle), then you have the same data needed to remove it.
An easy way is before you move an element, remove it from the quad-tree using the same data you used to originally insert it, then move it, then re-insert.
Removal from the quad-tree can first remove the element from the leaf node(s), then if the leaf nodes become empty, remove them from their parents. If the parents become empty, remove them from their parents, and so forth.
This simple method is efficient enough for a complex world of objects moving every frame as long as you implement the quad-tree efficiently (ex: use a free list for the nodes). There shouldn't have to be a heap allocation on a per-node basis to insert it, nor a heap deallocation involved in removing every single node. Most node allocations/deallocations should be a simple constant-time operation just involving, say, the manipulation of a couple of integers or pointers.
You can also make this a little more complex if you like. You can start off storing the previous position of an object and then move it. If the new position occupies nodes other than the previous position, then remove the object from the nodes it no longer occupies and insert it to the new ones. Otherwise just keep it in the same node(s).
Update
I usually try to avoid linking my previous answers, but in this case I ended up doing a pretty comprehensive write up on the topic which would be hard to replicate anywhere else. Here it is: https://stackoverflow.com/a/48330314/4842163