comparison of binary and ternary search - python-3.x

Considering this post which says: "big O time for Ternary Search is Log_3 N instead of Binary Search's Log_2 N"
this should be because classical ternary search would require 3 comparisons instead of two, but would this implementation be any more inefficient than a binary search?
#!/usr/bin/python3
List = sorted([1,3,5,6,87,9,56, 0])
print("The list is: {}".format(List))
def ternary_search(L, R, search):
"""iterative ternary search"""
if L > R:
L, R = R, L
while L < R:
mid1 = L + (R-L)//3
mid2 = R - (R-L)//3
if search == List[mid1]:
L = R = mid1
elif search == List[mid2]:
L = R = mid2
elif search < List[mid1]:
R = mid1
elif search < List[mid2]:
L = mid1
R = mid2
else:
L = mid2
if List[L] == search:
print("search {} found at {}".format(List[L], L))
else:
print("search not found")
if __name__ == '__main__':
ternary_search(0, len(List)-1, 6)
This implementation effectively takes only two comparison per iteration. So, ignoring the time taken to compute the mid points, wouldn't it be as effective as a binary search?
Then why not take this further to a n- ary search?
(Although, then the major concern of the search would be the number of midpoint calculations rather than the number of the comparisons, though I don't know if that's the correct answer).

While both have logarithmic complexity, a ternary search will be faster than binary search for a large enough tree.
Though the binary search have 1 less comparison each node, it is deeper than Ternary search tree. For a tree of 100 million nodes, assuming both trees are properly balanced, depth of BST would be ~26, for ternary search tree it is ~16. Again, this speed-up will not be felt unless you have a very big tree.
The answer to your next question "why not take this further to a n-ary search?" is interesting. There are actually trees which take it further, for example b-tree and b+-tree. They are heavily used in database or file systems and can have 100-200 child nodes spawning from a parent.
Why? Because for these trees, you need to invoke an IO operation for every single node you access; which you are probably aware costs a lot, lot more than in memory operations your code performs. So although you'll now have to perform n-1 comparisons in each node, the cost of these in memory operations pales into comparison to the IO cost proportional to tree's depth.
As for in memory operations, remember that when you have a n-ary tree for n elements, you basically have an array, which has have O(n) search complexity, because all elements are now in same node. So, while increasing ary, at one point it stops being more efficient and starts actually harming performance.
So why do we always prefer BSt i.e 2-ary and not 3 or 4 or such? because it's lot simpler to implement. That's it really, no big mystery here.

Related

Algorithm, that finds the k-greatest number in O(n*log(k))

was wondering, if you have given an unsorted list of arrays of any length n >= k,
what is your idea, to find the k-greatest number in O(n*log(k)) time. So the k = 2 -greatest number of an Array containing the numbers 1 to 9 would be 8 for example.
I'm trying to code this in python, if you have an idea how in that time complexity :)
My answer is not python-specific, however you should be able to implement the used concepts in python, or find libraries already implementing them.
The basic idea is to iterate over the list and store the current greatest, second greatest, ... , k-greatest number in a separate data structure. Since you will be iterating over all n entries in your array, the complexity of this is in O(n * insertion_step_complexity)
As seen above, the insertion step needs to not exceed a complexity of O(log(k)) to achieve this you can use a AVL-Tree that has a complexity of O(log(m)) for inserting and deleting items, where m is the number of items stored within the avl-tree.
An algorithm would look like this:
def find_k_greatest_number(k, array):
avl_tree = initialize AVL tree here
avl_items = 0
for number in array:
if (number > avl_tree.smallest_number()):
if (avl_itmes >= k):
avl_tree.delete_smallest_number()
else:
avl_items++
avl_tree.insert(number)
return avl_tree.smallest_number()
Finding the smallest number in a sorted tree is dependent on its height. Since the AVL tree can't exceed the height of log(k) the complexity of finding the smallest number is O(log(k)).

What's the Big-O-notation for this algorithm for printing the prime numbers?

I am trying to figure out the time complexity of the below problem.
import math
def prime(n):
for i in range(2,n+1):
for j in range(2, int(math.sqrt(i))+1):
if i%j == 0:
break
else:
print(i)
prime(36)
This problem prints the prime numbers until 36.
My understanding of the above program:
for every n the inner loop runs for sqrt(n) times so on until n.
so the Big-o-Notation is O(n sqrt(n)).
Does my understanding is right? Please correct me if I am wrong...
Time complexity measures the increase in number or steps (basic operations) as the input scales up:
O(1) : constant (hash look-up)
O(log n) : logarithmic in base 2 (binary search)
O(n) : linear (search for an element in unsorted list)
O(n^2) : quadratic (bubble sort)
To determine the exact complexity of an algorithm requires a lot of math and algorithms knowledge. You can find a detailed description of them here: time complexity
Also keep in mind that these values are considered for very large values of n, so as a rule of thumb, whenever you see nested for loops, think O(n^2).
You can add a steps counter inside your inner for loop and record its value for different values of n, then print the relation in a graph. Then you can compare your graph with the graphs of n, log n, n * sqrt(n) and n^2 to determine exactly where your algorithm is placed.

What is the time complexity of this agorithm (that solves leetcode question 650) (question 2)?

Hello I have been working on https://leetcode.com/problems/2-keys-keyboard/ and came upon this dynamic programming question.
You start with an 'A' on a blank page and you get a number n when you are done you should have n times 'A' on the page. The catch is you are allowed only 2 operations copy (and you can only copy the total amount of A's currently on the page) and paste --> find the minimum number of operations to get n 'A' on the page.
I solved this problem but then found a better solution in the discussion section of leetcode --> and I can't figure out it's time complexity.
def minSteps(self, n):
factors = 0
i=2
while i <= n:
while n % i == 0:
factors += i
n /= i
i+=1
return factors
The way this works is i is never gonna be bigger than the biggest prime factor p of n so the outer loop is O(p) and the inner while loop is basically O(logn) since we are dividing n /= i at each iteration.
But the way I look at it we are doing O(logn) divisions in total for the inner loop while the outer loop is O(p) so using aggregate analysis this function is basically O(max(p, logn)) is this correct ?
Any help is welcome.
Your reasoning is correct: O(max(p, logn)) gives the time complexity, assuming that arithmetic operations take constant time. This assumption is not true for arbitrary large n, that would not fit in the machine's fixed-size number storage, and where you would need Big-Integer operations that have non-constant time complexity. But I will ignore that.
It is still odd to express the complexity in terms of p when that is not the input (but derived from it). Your input is only n, so it makes sense to express the complexity in terms of n alone.
Worst Case
Clearly, when n is prime, the algorithm is O(n) -- the inner loop never iterates.
For a prime n, the algorithm will take more time than for n+1, as even the smallest factor of n+1 (i.e. 2), will halve the number of iterations of the outer loop, and yet only add 1 block of constant work in the inner loop.
So O(n) is the worst case.
Average Case
For the average case, we note that the division of n happens just as many times as n has prime factors (counting duplicates). For example, for n = 12, we have 3 divisions, as n = 2·2·3
The average number of prime factors for 1 < n < x approaches loglogn + B, where B is some constant. So we could say the average time complexity for the total execution of the inner loop is O(loglogn).
We need to add to that the execution of the outer loop. This corresponds to the average greatest prime factor. For 1 < n < x this average approaches C.n/logn, and so we have:
O(n/logn + loglogn)
Now n/logn is the more important term here, so this simplifies to:
O(n/logn)

Longest repeated substring with at least k occurrences correctness

The algorithms for finding the longest repeated substring is formulated as follows
1)build the suffix tree
2)find the deepest internal node with at least k leaf children
But I cannot understand why is this works,so basically what makes this algorithm correct?Also,the source where I found this algorithm says that is find the repeated substring in O(n),where n is the length of the substring,this is also not clear to me!Let's consider the following tree,here the longest repeated substring is "ru" and if we apply DFS it will find it in 5 step but not in 2
Can you explain this stuff to me?
Thanks
image
I suppose you perfectly know O(n) (Big O notation) refers to the order of growth of some quantity as a function of n, and not the equivalence of the quantity with n.
I write this becase reading the question I was in doubt...
I'm writing this as an aswer and not a comment since it's a bit too long for a comment (I suppose...)
Given a string S of N characters, building the corresponding suffix tree is O(N) (using an algorithm such as Ukkonen's).
Now, such a suffix tree can have at most 2N - 1 nodes (root and leaves included).
If you traverse your tree and compute the number of leaves reachable from a given node along with its depth, you'll find the desired result. To do so, you start from the root and explore each of its children.
Some pseudo-code:
traverse(node, depth):
nb_leaves <-- 0
if empty(children(node)):
nb_leaves <-- 1
else:
for child in children(node):
nb_leaves <-- nb_leaves + traverse(child, depth+1)
node.setdepth(depth)
node.setoccurrences(nb_leaves)
return nb_leaves
The initial call is traverse(root, 0). Since the structure is a tree, there is only one call to traverse for each node. This means the maximum number of call to traverse is 2N - 1, therefore the overall traversal is only O(N). Now you just have to keep track of the node with the maximum depth that also verifies: depth > 0 && nb_leaves >= k by adding the relevant bookkeeping mechanism. This does not hinder the overall complexity.
In the end, the complexity of the algorithm to find such a substring is O(N) where N is the length of the input string (and not the length of the matching substring!).
Note: The traversal described above is basically a DFS on the suffix tree.

Suboptimal solution given by A* search

I don't understand how the following graph gives a suboptimal solution with A* search.
The graph above was given as an example where A* search gives a suboptimal solution, i.e the heuristic is admissible but not consistent. Each node has a heuristic value corresponding to it and the weight of traversing a node is given. I don't understand how A* search will expand the nodes.
The heuristic h(n) is not consistent.
Let me first define when a heuristic function is said to be consistent.
h(n) is consistent if
– for every node n
– for every successor n' due to legal action a
– h(n) <= c(n,a,n') + h(n')
Here clearly 'A' is a successor to node 'B'
but h(B) > h(A) + c(A,a,B)
Therefore the heuristic function is not consistent/monotone, and so A* need not give an optimal solution.
Honestly I don't see how A* could return a sub-optimal solution with the given heuristic.
This is for a simple reason: the given heuristic is admissible (and even monotone/consistent).
h(s) <= h*(s) for each s in the graph
You can check this yourself comparing the h value in each node to the cost of shortest path to g.
Given the optimality property of A* I don't see how it could return a sub-optimal solution, which should be S -> A -> G of course.
The only way it could return a suboptimal solution is if it would stop once an action from a node in the frontier leading to the goal is found (so to have a path to the goal), but this would not be A*.
In graph search, we don't expand the already discovered nodes.
f(node) = cost to node from that path + h(node)
At first step:
fringe = {S} - explored = {}
S is discarded from the fringe and A and B are added.
fringe = {A, B} - explored = {S}
Then f(A) = 5 and f(B) = 7. So we discard A from the fringe and add G.
fringe = {G, B} - explored = {S, A}
Then f(G) = 8 and f(B) = 7 so we discard B from the fringe but don't add A since we already explored it.
fringe = {G} - explored = {S, A, B}
Finally we have only G left in the fringe So we discard G from the fringe and we have reached our goal.
The path would be S->A->G.
If this was a tree search problem, then we would find the answer S->B->A->G since we would reconsider the A node.

Resources