Is Binary Search O(log n) or O(n log n)? - search

I have encountered times when it was O(log n), and times when it was O(n log n). This is also a very popular interview question. So when the interviewer blindly asks you what is the run time of a binary search (with no context)? What should you say?

Sounds like a trick question, since there is no context. Looks like interviewer wants to cover cases when binary search is good, and when its not.
So, binary search is great when you have sorted list of elements and you search for single element, and in such case it costs O(logn).
Now, if we don't have sorted array, cost of sorting it is O(n logn), and then you can apply first case. In such case, its better to place values in set or map and then search (execution time will be O(n) for inserting, O(1) for search).
Both of this cases rests on single search. Binary search is not for searching n elements in single execution (or any number of elements depending on n, like n/2 elements, n/4, or even logn elements - for fixed number its ok). For such cases, there are better ways (sets and maps).

O(log n), for average and worst case. Never heard someone claim it is O(n log n).

Related

Is there any way to get the k smallest elements from a list without sorting it in Python?

I want to retrieve k smallest elements from a list in python. But I want to achieve this with less than O(n log n)(that is, without sorting the list) complexity. Is there any way to do so in Python. If yes please let me know. Thanks in advance.
I think Quickselect is what you are looking for.
Quickselect uses the same overall approach as quicksort, choosing one element as a pivot and partitioning the data in two based on the pivot, accordingly as less than or greater than the pivot. However, instead of recursing into both sides, as in quicksort, quickselect only recurses into one side – the side with the element it is searching for. This reduces the average complexity from O(n log n) to O(n), with a worst case of O(n2).
-- https://en.wikipedia.org/wiki/Quickselect
I can think of a few ways sorting/non-sorting solution for this problem:
Rank Selection Algorithm - like quicksort, we can find pivot rank then decide on whether to go left or right, O(N) time
Build a Min Heap O(N), extract k times - O(N + kLogN) time
Priority Queue - like max heap (remove biggest element if there is a new smaller), but instead we can loop through the entire array, build a heap of k size - O(N + NlogK)
Bubble Sort - bubble smallest few element upwards - O(k*N)
Use the heapq.nsmallest, it performs with partial heap-sort

Average Case Big O and the Impact of Sorting

I'm looking at the time complexity for implementations of a method which determines if a String contains all unique characters.
The basic, brute force, approach would be to iterate through the String one character at a time maintaining a HashSet of seen characters. For each character in the iteration we check if the Set already contains it, and if so return false. We return true if the entire String has been searched. This would be O(n) as a worst case complexity. What would be the average case? O(n/2)?
If we try to optimise this by sorting the String into a char array, would it be more or less efficient? Sorting typically takes O(n log n) which is worse than O(n), but a sorted String allows for duplicate characters to be detected much earlier (especially for long strings).
Do we say the worst case is O(n^2 log n) but the average case is better? If so, what is it?
In the un-sorted case, the average case depends entirely on the string! Without knowing/assuming any distribution, it's hard to make any assumption.
A simple case, for a string with randomly-placed characters, where one of the characters repeats once:
the number of possibilities for the repeated characters being arranged is n*(n-1)/2
the probability it is detected repeated in exactly k steps is (k-1)/(n-1)
the probability it is detected in at most k steps is (k*(k-1))/(n*(n-1)), meaning that on average you will detect it (for large n) in about 0.7071*n... [incomplete]
For multiple characters that occur with different frequencies, or you make different assumptions on how characters are distributed in the string, you'll get different probabilities.
Hopefully someone can extend on my answer! :)
If the string is sorted, then you don't need the HashSet.
However, the average case still depends on the distribution of characters in the string: if you get two aa in the beggining, it's pretty efficient; if you get two zz, then you didn't win anything.
The worst case is sorting plus detecting-duplicates, so O(n log n + n), or just O(n log n).
So, it appears it's not advantageous to sort the string beforehand, due to the increased complexity, both in average-case and worst-case.

Most efficient way to print an AVL tree of strings?

I'm thinking that an in order traversal will run in O(n) time. The only thing better than that would be to have something running in logn time. But I don't see how this could be, considering we have to run at least n times.
Is O(n) the lastest we could do here?
Converting and expanding #C.B.'s comment to an answer:
If you have an AVL tree with n strings in it and you want to print all of them, then you have to do at least Θ(n) total work simply because you have to print out each of the n strings. You can often lower-bound the amount of work required to produce a list or otherwise output a sequence of values simply by counting up how many items are going to be in the list.
We can be even more precise here. Suppose the combined length of all the strings in the tree is L. The time required to print out all the strings in the tree has to be at least Θ(L), since it costs some computational effort to output each individual character. Therefore, we can say that we have to do at least Θ(n + L) work to print out all the strings in the tree.
The bound given here just says that any correct algorithm has to do at least this much work, not that there actually is an algorithm that does this much work. But if you look closely at any of the major tree traversals - inorder, preorder, postorder, level-order - you'll find that they all match this time bound.
Now, one area where you can look for savings is in space complexity. A level-order traversal of the tree might require Ω(n) total space if the tree is perfectly balanced (since it holds a whole layer of the tree in memory and the bottommost layer can have Θ(n) nodes in it), while an inorder, preorder, or postorder traversal would only require O(log n) memory because you only need to store the current access path, which has logarithmic height in an AVL tree.

finding element in very big list in less than O(n)

I want to check if an element exists in a list (a very big one in 10,000,000 order) in a O(1) instead of O(n). Lists with elem x ys take O(n)
So i want to use another data type/constructor but it has to be in Prelude(not Array); any suggestions? And if i have to build me data type what it would be like?
Also to sort a big list of numbers in the same order (10,000,000)and indexing an element in the shortest time possible.
The only way to search for an item in a data set in O(1) time is if you already know where it is, but then you don't need to search for it. For unsorted data, search is O(n) time. For sorted data, search is O(log n) time.
You should use either Bloom filter or Hashtable. Neither of them is in Prelude; moreover, both rely on Array to be available.
The only left option is some kind of tree; I would suggest heap. It’s not hard to implement and it also gives you sorting for free.
UPDATE: oops! I have forgotten that heap doesn’t provide lookup. BST is your choice, then.

Algorithm to Find if M28K is unique

Today my younger brother asked me a question, the question is as follows:
Given a list of strings & string M28K, where M28K represents a string which starts
from M, ends with K and has 28chars in between . Find if M28K is unique in the
list of strings or not?
I came upto the following algorithm to find the solution for the problem:
For each string:
find string length(L)
if(L==30) then
if(str[0]=='M' && str[L-1]=='K') then
verify rest of 28 characters are matching or not
This solution doesn't seems to be efficient in terms of time complexity. Can anyone give a better algorithm to solve this problem?
I would go with hashing. Usually, since this sounds like an algorithms homework problem, in my experience, we were not allowed to answer with hashing beacause it really depends on your hash function. If it is not good enough, then you won't get unique values for each string.
I would build the list of strings into a binary sort tree based on the characters in the string. Maintaining an algorithm that says if the string comes before the head node in alphabetical order, place it to the left, and if it comes after the head node, place it to the right. Recursively of course. We have a tree. Now granted worst case this will be completed in O(n) time, which would just effectively be a linked list, but with a good head node, somewhere in the m or n area, this lookup can be completed in O(log n). So the whole operation would take O(n log n) time.
Your provided algorithm, worst case would take O(n^2). Let's say every string was 30 characters, and ended with K and began with L. All excpet the 2nd to last character. Effectively we would search 28 characters of all the provided strings. The n^2 comes into play with finding the size of all the strings. Each string would take O(n) time making it an n^2 algorithm. In my algorithm, we are halving the problem each time, which provides a lot quicker of a search.

Resources