Binary Search Tree Generation from increasing index - search

I have a vector of parent pointers [ 0 1 1 2 2 3 3 5 5 ....] which is basically a binary tree. The index is the child and the corresponding value represents the index of its parent in the same vector.
e.g: in the above vector, if you count to index 5, the element is 2, which means that its parent lies at index 2. Again at index 2, the element is 1 which means the parent lies at index 1. At index 1 is the element is 0 which is the root node.
How can I create a binary search tree from this?
OR,
I am generating data in binary tree format in which I know the parent and corresponding children, how can I store them in a binary search tree?
Index for children will always be greater than the parent, as shown in the vector above.
An example is: I take node 1, divide it into two nodes, 2 and 3. Then take node 2 and divide it into 4 and 5. Then I take node 4 and divide it into 6 and 7 and so on.
I want to keep the parent child relationship in the binary search tree.
Best Regards
Wajahat

Generate a binary tree with empty elements according to you specification in the vector.
Upon new element arrival, find a place to put it: traverse the tree according to binary search tree rules - all the children in the left subtree are smaller than an element and all the children in the right subtree are larger. Fill the node corresponding to the element in the binary tree.
E.g., if you have this tree at some point of time:
and new value 3 arrives, it will fill the right child of node with value 2.
However, if 5 arrives, there is no place to put it in the predefined tree structure.

Related

Find largest set of lines which are parallel but not colinear

I have a 5 lines below with each line representing a,b,c in ax+by+c = 0
1 0 0
1 2 3
3 4 5
30 40 0
30 40 50
I want to find the largest set of non colinear parallel lines from these lines. The result in the above case will be:-
set of 2 lines
3 4 5
30 40 0
The brute force approach would be to go through all the possible combinations which would be O(n*(n+1)/2) and update the largest possible set after each iteration.
Is there any way to find the set size faster?
A solution would be to transform the coordinates into (angle, distance from origin). Finding the largest set will then be O(nlogn).
Assuming that (a,b) are never both 0, find the distance to origin d using d = c/|(a,b)|. Then, find the angle θ using θ = atan2(b,a). You then have a list of coordinates that looks like this:
[[θ0,d0],
[θ1,d1],
...
]
Sort this list using θ as the key.
Remove all elements that you consider colinear given a threshold. Simply parse the list to check if some pairs of consecutive elements have approximately the same value. Do not forget to test the last element with the first element to account for 360° = 0°. If you encounter a colinear pair, remove one of the element.
Using a minimum and maximum index starting at 0, increase the maximum index until the difference between the first angle (at min index) and the last angle (at max index) pass the angle tolerance that you can accept as being parallel (do not forget to consider that 359.999° is close to 0°). If the size of the set (max index - min index) is bigger than your current best set, note it as the current best. Then, increase the minimum index by one and continue increasing the maximum index until the angle difference test fail again. Continue to do so until the minimum index reaches the end of the list and do not forget to make the maximum index wrap to 0 to consider the cases close to 0° and 360°.
To make it easier to find the elements in the user provided list, you can add the original index to the transformed list, e.g., [[θ0,d0,0],[θ1,d1,1],...].
Some implementation details to consider to avoid making it accidentally O(n^2): Removing an element from an contiguous array is O(n), so instead of removing colinear elements every time you encounter one, note the index in a separate list and recreate the array in a second pass. If you instead use a linked list, the min/max index should be replaced by iterators to avoid the O(n) random access using an index to access an element.

Proof by induction that every non-empty tree of height h contains fewer than 2^n+1 nodes

I am stuck on the induction case of a problem.
The problem:
Define the height of a tree as the maximum number of edges between the
root and any leaf. We consider the height of an empty tree to be -1, and
the height of a tree consisting of a single node to be 0. Prove by induction that every non-empty binary tree of height h contains
fewer than 2 (h+1) nodes.
So I started:
Base case: h = 0 (Since a non-empty tree consists of a single node
or
more, the first case would be an empty node)
= 2 (0+1) = 2(1)= 2
When height is 0 the tree consists of a single, so yes 1 node is
less than 2 nodes.
Inductive step = h less than or greater to 0
This is where I am stuck... I know that the statement is true, since
the height will always be 1 less than the number of nodes, I just
don't know how to prove it algebraically.
Thanks in advance.
Suppose you have a tree with the height of n+1.
Both it's left subtree and right subtree have their height bound by n.
By induction, each subtree has less than 2^(n+1) nodes, meaning at most 2^(n+1) - 1 nodes.
Since we have two subtrees, we have at most 2 * (2^(n+1) - 1 ) = 2^(n+2) - 2.
Add one for the root, and the tree of height n+1 has at most 2^(n+2) - 1, which is less than 2^(n+2), as required.

Generate all "without-replacement" subsets series

I'm looking for a way to generate all possible subcombinations of a set, where each element can be used at most one time.
For example, the set {1,2,3} would yield
{{1},{2},{3}}
{{1},{2,3}}
{{1,2},{3}}
{{2},{1,3}}
{{1,2,3}}
A pseudocode hint would be great. Also, if there is a term for this, or a terminology that applies, I would love to learn it.
First, a few pointers.
The separation of a set into disjoint subsets is called a set partition (Wikipedia, MathWorld).
A common way to encode a set partition is a restricted growth string.
The number of set partitions is a Bell number, and they grow fast: for a set of 20 elements, there are 51,724,158,235,372 set partitions.
Here is how encoding works.
Look at the elements in increasing order: 1, 2, 3, 4, ... .
Let c be the current number of subsets, initially 0.
Whenever the current element is the lowest element of its subset, we assign this set the number c, and then increase c by 1.
Regardless, we write down the number of the subset which contains the current element.
It follows from the procedure that the first element of the string will be 0, and each next element is no greater than the maximum so far plus one. Hence the name, "restricted growth strings".
For example, consider the partition {1,3},{2,5},{4}.
Element 1 is the lowest in its subset, so subset {1,3} is labeled by 0.
Element 2 is the lowest in its subset, so subset {2,5} is labeled by 1.
Element 3 is in the subset already labeled by 0.
Element 4 is the lowest in its subset, so subset {4} is labeled by 2.
Element 5 is in the subset already labeled by 1.
Thus we get the string 01021.
The string tells us:
Element 1 is in subset 0.
Element 2 is in subset 1.
Element 3 is in subset 0.
Element 4 is in subset 2.
Element 5 is in subset 1.
To get a feel of it from a different angle, here are all partitions of a four-element set, along with the respective reduced growth strings:
0000 {1,2,3,4}
0001 {1,2,3},{4}
0010 {1,2,4},{3}
0011 {1,2},{3,4}
0012 {1,2},{3},{4}
0100 {1,3,4},{2}
0101 {1,3},{2,4}
0102 {1,3},{2},{4}
0110 {1,4},{2,3}
0111 {1},{2,3,4}
0112 {1},{2,3},{4}
0120 {1,4},{2},{3}
0121 {1},{2,4},{3}
0122 {1},{2},{3,4}
0123 {1},{2},{3},{4}
As for pseudocode, it's relatively straightforward to generate all such strings.
We do it recursively.
Maintain the value c, assign every number from 0 to c inclusive to the current position, and for each such choice, recursively construct the rest of the string.
Also it is possible to generate them lazily, starting with a string with all zeroes and repeatedly finding the lexicographically next such string, akin to how next_permutation is used to list all permutations.
Lastly, if you'd like to see more than that (along with the mentioned next function), here's a bit of self-promotion.
Recently, we did a learning project at my university, which required the students to implement various functions for combinatorial objects with reasonable efficiency.
Here is the part we got for restricted growth strings; I linked the header part which describes the functions in English.

Binary search - worst/avg case

I'm finding it difficult to understand why/how the worst and average case for searching for a key in an array/list using binary search is O(log(n)).
log(1,000,000) is only 6. log(1,000,000,000) is only 9 - I get that, but I don't understand the explanation. If one did not test it, how do we know that the avg/worst case is actually log(n)?
I hope you guys understand what I'm trying to say. If not, please let me know and I'll try to explain it differently.
Worst case
Every time the binary search code makes a decision, it eliminates half of the remaining elements from consideration. So you're dividing the number of elements by 2 with each decision.
How many times can you divide by 2 before you are down to only a single element? If n is the starting number of elements and x is the number of times you divide by 2, we can write this as:
n / (2 * 2 * 2 * ... * 2) = 1 [the '2' is repeated x times]
or, equivalently,
n / 2^x = 1
or, equivalently,
n = 2^x
So log base 2 of n gives you x, which is the number of decisions being made.
Finally, you might ask, if I used log base 2, why is it also OK to write it as log base 10, as you have done? The base does not matter because the difference is only a constant factor which is "ignored" by Big O notation.
Average case
I see that you also asked about the average case. Consider:
There is only one element in the array that can be found on the first try.
There are only two elements that can be found on the second try. (Because after the first try, we chose either the right half or the left half.)
There are only four elements that can be found on the third try.
You can see the pattern: 1, 2, 4, 8, ... , n/2. To express the same pattern going in the other direction:
Half the elements take the maximum number of decisions to find.
A quarter of the elements take one fewer decision to find.
etc.
Since half of the elements take the maximum amount of time, it doesn't matter how much less time the other elements take. We could assume that all elements take the maximum amount of time, and even if half of them actually take 0 time, our assumption would not be more than double whatever the true average is. We can ignore "double" since it is a constant factor. So the average case is the same as the worst case, as far as Big O notation is concerned.
For binary search, the array should be arranged in ascending or descending order.
In each step, the algorithm compares the search key value with the key value of the middle element of the array.
If the keys match, then a matching element has been found and its index, or position, is returned.
Otherwise, if the search key is less than the middle element's key, then the algorithm repeats its action on the sub-array to the left of the middle element.
Or, if the search key is greater,then the algorithm repeats its action on the sub-array to the right.
If the remaining array to be searched is empty, then the key cannot be found in the array and a special "not found" indication is returned.
So, a binary search is a dichotomic divide and conquer search algorithm. Thereby it takes logarithmic time for performing the search operation as the elements are reduced by half in each of the iteration.
For sorted lists which we can do a binary search, each "decision" made by the binary search compares your key to the middle element, if greater it takes the right half of the list, if less it will take the left half of the list (if it's a match it will return the element at that position) you effectively reduce your list by half for every decision yielding O(logn).
Binary search however, only works for sorted lists. For un-sorted lists you can do a straight search starting with the first element yielding a complexity of O(n).
O(logn) < O(n)
Although it entirely depends on how many searches you'll be doing, your inputs, etc what your best approach would be.
For Binary search the prerequisite is a sorted array as input.
• As the list is sorted:
• Certainly we don't have to check every word in the dictionary to look up a word.
• A basic strategy is to repeatedly halve our search range until we find the value.
• For example, look for 5 in the list of 9 #s below.v = 1 1 3 5 8 10 18 33 42
• We would first start in the middle: 8
• Since 5<8, we know we can look at just the first half: 1 1 3 5
• Looking at the middle # again, narrow down to 3 5
• Then we stop when we're down to one #: 5
How many comparison is needed: 4 =log(base 2)(9-1)=O(log(base2)n)
int binary_search (vector<int> v, int val) {
int from = 0;
int to = v.size()-1;
int mid;
while (from <= to) {
mid = (from+to)/2;
if (val == v[mid])
return mid;
else if (val > v[mid])
from = mid+1;
else
to = mid-1;
}
return -1;
}

How to understand segmented binomial heaps described in <Purely Functional Data Structures>

In chapter 6.3.1 of the thesis Purely Functional Data Structures, says:
Then, whenever we create a new tree from a new element and a segment
of trees of ranks 0... r-1, we simply compare the new element with the
first root in the segment (i.e.,the root of the rank 0 tree). The
smaller element becomes the new root and the larger element becomes
the rank 0 child of the root.
T0' is the new tree has rank 0
T0..T(r-1) are the original trees rank 0 to r-1
The smaller element becomes the new root and the larger element becomes rank 0 child of the root
The question is that step 3 result in two rank 1 trees, which is conflict with the binomial heaps.
Am I misunderstanding?
We are creating a tree of rank r. The structure of a tree of rank r is a root node with r children of ranks 0..r-1.
What the part you quoted means is this.
When we get a new element x we compare it to the element in T0
We create a new tree T0' of rank 0 containing the greater of the two compared elements
We create a new node T containing the lesser of the two compared elements and with T0',T1,T2...T(r-1) as children
Now T is a binomial tree of rank r and it is in heap order.

Resources