Binary Search Complexity - search

What is the time complexity of binary search taking an array of n elements as input from user..
As the time complexity of binary search is O(log n)
Whereas, the time complexity of taking the array as input from user is O(n)

For the whole program the answer is: O(logn) + O(n) = O(n)
Why is it? because your input of n numbers into an array and binary search is independent of each other.
Now we need to understand time complexity in general:
In the simplest terms, for a problem where the input size is n:
Best case = fastest time to complete, with optimal inputs chosen.
For example, the best case for a sorting algorithm would be data that's already sorted.
Worst case = slowest time to complete, with pessimal inputs are chosen.
For example, the worst case for a sorting algorithm might be data that are sorted in reverse order (but it depends on the particular algorithm).
Average case = arithmetic mean. Run the algorithm many times, using many different inputs of size n that come from some distribution that generates these inputs (in the simplest case, all the possible inputs are equally likely), and compute the total running time (by adding the individual times), and divide by the number of trials. You may also need to normalize the results based on the size of the input sets.
Example (Binary search):
Suppose we have the following binary search function:
binarySearch(arr, x, low, high)
repeat till low = high
mid = (low + high)/2
if (x == arr[mid])
return mid
else if (x > arr[mid]) // x is on the right side
low = mid + 1
else // x is on the left side
high = mid - 1
Now, let's analyze its time complexity.
Best Case Time Complexity of Binary Search
The best case of Binary Search occurs when:
The element to be searched is in the middle of the list
In this case, the element is found in the first step itself and this involves 1 comparison.
Therefore, Best Case Time Complexity of Binary Search is O(1).
Average Case Time Complexity of Binary Search
Let the input be N distinct numbers: a1, a2, ..., a(N-1), aN
We need to find element P.
There are two cases:
Case 1: The element P can be in N distinct indexes from 0 to N-1.
Case 2: There will be a case when the element P is not present in the list.
There are N case 1 and 1 case 2. So, there are N+1 distinct cases to consider in total.
If element P is in index K, then Binary Search will do K+1 comparisons.
This is because:
The element at index N/2 can be found in 1 comparison as Binary Search starts from middle.
Similarly, in the 2nd comparisons, elements at index N/4 and 3N/4 are compared based on the result of 1st comparison.
On this line, in the 3rd comparison, elements at index N/8, 3N/8, 5N/8, 7N/8 are compared based on the result of 2nd comparison.
Based on this, we know that:
Elements requiring 1 comparison: 1
Elements requiring 2 comparisons: 2
Elements requiring 3 comparisons: 4
Therefore, Elements requiring I comparisons: 2^(I-1)
The maximum number of comparisons = Number of times N is divided by 2 so that result is 1 = Comparisons to reach 1st element = logN comparisons
I can vary from 0 to logN
Total number of comparisons = 1 * (Elements requiring 1 comparison) + 2 * (Elements requiring 2 comparisons) + ... + logN * (Elements requiring logN comparisons)
Total number of comparisons = 1 * (1) + 2 * (2) + 3 * (4) + ... + logN * (2^(logN-1))
Total number of comparisons = 1 + 4 + 12 + 32 + ... = 2^logN * (logN - 1) + 1
Total number of comparisons = N * (logN - 1) + 1
Total number of cases = N+1
Therefore, average number of comparisons = ( N * (logN - 1) + 1 ) / (N+1)
Average number of comparisons = N * logN / (N+1) - N/(N+1) + 1/(N+1)
The dominant term is N * logN / (N+1) which is approximately logN. Therefore, Average Case Time Complexity of Binary Search is O(logN).Let there be N distinct numbers: a1, a2, ..., a(N-1), aN
We need to find element P.
There are two cases:
Case 1: The element P can be in N distinct indexes from 0 to N-1.
Case 2: There will be a case when the element P is not present in the list.
There are N case 1 and 1 case 2. So, there are N+1 distinct cases to consider in total.
If element P is in index K, then Binary Search will do K+1 comparisons.
This is because:
The element at index N/2 can be found in 1 comparison as Binary Search starts from the middle.
Similarly, in the 2nd comparison, elements at index N/4 and 3N/4 are compared based on the result of the 1st comparison.
On this line, in the 3rd comparison, elements at index N/8, 3N/8, 5N/8, and 7N/8 are compared based on the result of the 2nd comparison.
Based on this, we know that:
Elements requiring 1 comparison: 1
Elements requiring 2 comparisons: 2
Elements requiring 3 comparisons: 4
Therefore, Elements requiring I comparisons: 2^(I-1)
The maximum number of comparisons = Number of times N is divided by 2 so that result is 1 = Comparisons to reach 1st element = logN comparisons
I can vary from 0 to logN
Total number of comparisons = 1 * (Elements requiring 1 comparison) + 2 * (Elements requiring 2 comparisons) + ... + logN * (Elements requiring logN comparisons)
Total number of comparisons = 1 * (1) + 2 * (2) + 3 * (4) + ... + logN * (2^(logN-1))
Total number of comparisons = 1 + 4 + 12 + 32 + ... = 2^logN * (logN - 1) + 1
Total number of comparisons = N * (logN - 1) + 1
Total number of cases = N+1
Therefore, average number of comparisons = ( N * (logN - 1) + 1 ) / (N+1)
Average number of comparisons = N * logN / (N+1) - N/(N+1) + 1/(N+1)
The dominant term is N * logN / (N+1) which is approximately logN. Therefore, the Average Case Time Complexity of Binary Search is O(logN).
Worst Case Time Complexity of Binary Search
The worst case of Binary Search occurs when:
The element to search is in the first index or last index
In this case, the total number of comparisons required is logN comparisons.
Therefore, the Worst Case Time Complexity of Binary Search is O(logN).

Related

Given a binary string "10110". Find the count of all the substring with number of set bit count >= n

We could solve this question in brute force, taking all possible substrings and checking if the set bit count is greater n.
I was asked to solve this in o(n). I could not find any answer which could achieve this in o(n).
Is it possible to get all possible substrings of binary string in 0(n)?
Answer changed (noticed >= in problem statement).
Make two indices - left and right.
We want to account substrings starting from position left containing at least k ones.
At first move right until bit count reaches k.
Now we have some "good" substrings starting at left and ending in any position after right, so we can add len(s) - right + 1 to result.
Increment left by 1 until the next one.
Repeat moving right and so on. Algorithm is linear.
Python example:
s = '0010110010'
#s = '110010110010'
k = 2
left = 0
right = 0
res = 0
cnt = 0
while right < len(s):
while (right < len(s)) and (cnt < k):
if s[right] == "1":
cnt += 1
right +=1
while (left <= right) and (cnt >= k):
addend = len(s) + 1 - right
res += addend
print(left, right, addend, res) #intermediate debug output
if s[left] == "1":
cnt -= 1
left +=1
print(res)
0 5 6 6
1 5 6 12
2 5 6 18
3 6 5 23
4 6 5 28
5 9 2 30
30
A useful approach is to ask yourself how many substrings have less than n bits set.
If you can answer this question, then the answer to the original question is right around the corner.
Why is the modified question easier to grasp? Because when you have a substring, say S, with exactly n bits set, then any substring that contains S will have at least n bits set, so you don't need to examine any of those.
So let's say you have a substring. If it has less than n bits set, you can grow it to accommodate more bits. If it has n or more bits set, it cannot grow, you must shrink it.
Suppose you start from the leftmost empty substring, start index 0, end index 0, length 0. (Of course it's a half-open interval). It has no bits set, so you can grow it. The only direction it can grow is to the right, by increasing its end index. It grows and grows and grows until it eats n 1-bits; now it must shrink. How should it shrink? Obviously shrinking it in the opposite direction (decreasing its end index) would accomplish nothing. You would arrive at a substring you have just examined! So you should shrink it from the left, by increasing its start index. So it shrinks and shrinks and shrinks until it excretes a 1-bit from its rear end. Now it has n-1 1-bits, and it can grow again.
It is not difficult to show that you would enumerate all strings with less than n bits set this way.
let N = count of '1'
And let M = count of '0'
int sum = 0 ;
for( int i = n ; i <= N; i++ ) sum += C(N,i) ;
sum *= 1<<M ;
sum is your answer.

How to calculate the time complexity for nested for loops in the following example?

So in the following code, I am trying I am passing a (huge)number-string to the function where I have to find the maximum product of consecutive m digits
So, first, I am looping through let's say n-string and then the inner loop looping through m numbers.
So the inner loop is affected by the if-statement which makes a jump of m indexes if the next number is 0.
EDIT : 1
Actual Problem Question:
The four adjacent digits in the 1000-digit number that have the greatest product are 9 × 9 × 8 × 9 = 5832.
731671765313306249192251....(1000digits)
Find the thirteen adjacent digits in the 1000-digit number that have the greatest product. What is the value of this product?
Example:
m = 12 number = "1234567891120123456704832...(1000 digits)"
So in 1st iteration function will calculate the product of 1st 12 digits(i.e. from index-11 to index-0 - "1234567891120123456704832..."
Now, in 2nd iteration when it checks the value at index-12 which is 0 then index will jump to index-13. This way the loop will skip 11 iterations.
For the 3rd Iteration, the inner loop will execute for 4 iterations until it finds 0 ("0123456704832...".
def LargestProductInSeries_1(number,m):
max = -1
product = 1
index = 0
x = 0
while index < len(number)-(m-1):
for j in range(index+(m-1), index-1, -1):
num = int(number[j])
if(not num):
index = j
break
product = product * int(number[j])
max = product if max < product else max
product = 1
index += 1
return max
So according to me, the Worst Case Time Complexity would be O(n*m)
I think the Best Time would be O(n/m) if only once the inner loop is completely iterated or every mth digit is 0 which will make the outer loop execute but the index will jump to every mth digit.
Is my analysis correct?
What will be the Average Time for this case?
Will it be O(n*(log m)). Can anyone explain how? Or how to find Complexity in such cases?

Finding the middle index value of an array in a binary search algorithm in Python

I am new to Python and implementing a binary search algorithm. Here is the algorithm:
def binary_search(list, item):
low = 0
high = len(list)-1
while low <= high:
mid = (low + high)
guess = list[mid]
if guess == item:
return mid
if guess > item:
high = mid - 1
else:
low = mid + 1
return None
My question is in regard to the line mid = (low + high). The algorithm returns the correct index location for any item in the array whether I use mid = (low + high) or mid = (low + high)/2. Why is that? I have searched everywhere for an explanation for this specific question and cannot find one. All I've found is that Python 3 automatically rounds down numbers that are not evenly divisible, so for an array with an odd number of elements, like 13, the index of the middle element will be 6. But how does the algorithm above get to the middle index element without dividing by 2 every time?
It's because your algorithm isn't a binary search.
mid = (low + high)
Since high starts at len(arr) - 1 and low always starts at 0 mid always starts at len(arr) - 1.
if guess > item:
high = mid - 1
In this case in the next recalculation of mid the value of mid decreases by one. So if the guess is too high it goes down one element. The thing is since mid always starts at len(arr) - 1, this will always be True. guess will always start out as the largest element and then go down one by one. Until you hit:
if guess == item:
return mid
In which case you just return the item. Your algorithm searches for the item linearly from the last element to the first in a one by one manner.
If you actually add a print(low, high, mid) you'll get an output like this:
0 6 6
0 5 5
0 4 4
0 3 3
0 2 2
0 1 1
0 0 0

Explanation of normalized edit distance formula

Based on this paper:
IEEE TRANSACTIONS ON PAITERN ANALYSIS : Computation of Normalized Edit Distance and Applications In this paper Normalized Edit Distance as followed:
Given two strings X and Y over a finite alphabet, the normalized edit
distance between X and Y, d( X , Y ) is defined as the minimum of W( P
) / L ( P )w, here P is an editing path between X and Y , W ( P ) is
the sum of the weights of the elementary edit operations of P, and
L(P) is the number of these operations (length of P).
Can i safely translate the normalized edit distance algorithm explained above as this:
normalized edit distance =
levenshtein(query 1, query 2)/max(length(query 1), length(query 2))
You are probably misunderstanding the metric. There are two issues:
The normalization step is to divide W(P) which is the weight of the edit procedure over L(P), which is the length of the edit procedure, not over the max length of the strings as you did;
Also, the paper showed that (Example 3.1) normalized edit distance cannot be simply computed with levenshtein distance. You probably need to implement their algorithm.
An explanation of Example 3.1 (c):
From aaab to abbb, the paper used the following transformations:
match a with a;
skip a in the first string;
skip a in the first string;
skip b in the second string;
skip b in the second string;
match the final bs.
These are 6 operations which is why L(P) is 6; from the matrix in (a), matching has cost 0, skipping has cost 2, thus we have total cost of 0 + 2 + 2 + 2 + 2 + 0 = 8, which is exactly W(P), and W(P) / L(P) = 1.33. Similar results can be obtained for (b), which I'll left to you as exercise :-)
The 3 in figure 2(a) refers to the cost of changing "a" to "b" or the cost of changing "b" to "a". The columns with lambdas in figure 2(a) mean that it costs 2 in order to insert or delete either an "a" or a "b".
In figure 2(b), W(P) = 6 because the algorithm does the following steps:
keep first a (cost 0)
convert first b to a (cost 3)
convert second b to a (cost 3)
keep last b (cost 0)
The sum of the costs of the steps is W(P). The number of steps is 4 which is L(P).
In figure 2(c), the steps are different:
keep first a (cost 0)
delete first b (cost 2)
delete second b (cost 2)
insert a (cost 2)
insert a (cost 2)
keep last b (cost 0)
In this path there are six steps so the L(P) is 6. The sum of the costs of the steps is 8 so W(P) is 8. Therefore the normalized edit distance is 8/6 = 4/3 which is about 1.33.

number of executions in an algorithm

For this algorithm:
i = 1
while(i<=2n){
x = x + 1
i = i + 2
}
can someone tell me how to find the formula for the number of times x = x + 1 is executed?
i goes from 1 to 2n (inclusive), so first thought is 2n.
But we see that i increments by two at a time instead of one, so it's half of that: n.
With n<1 number of times executed=0.
With n>=1 number of times executed=n.

Resources