What is the worst case for binary search - search

Where should an element be located in the array so that the run time of the Binary search algorithm is O(log n)?

The first or last element will give the worst case complexity in binary search as you'll have to do maximum no of comparisons.
Example:
1 2 3 4 5 6 7 8 9
Here searching for 1 will give you the worst case, with the result coming in 4th pass.
1 2 3 4 5 6 7 8
In this case, searching for 8 will give the worst case, with the result coming in 4 passes.
Note that in the second case searching for 1 (the first element) can be done in just 3 passes. (compare 1 & 4, compare 1 & 2 and finally 1)
So, if no. of elements are even, the last element gives the worst case.
This is assuming all arrays are 0 indexed. This happens due to considering the mid as float of (start + end) /2.

// Java implementation of iterative Binary Search
class BinarySearch
{
// Returns index of x if it is present in arr[],
// else return -1
int binarySearch(int arr[], int x)
{
int l = 0, r = arr.length - 1;
while (l <= r)
{
int m = l + (r-l)/2;
// Check if x is present at mid
if (arr[m] == x)
return m;
// If x greater, ignore left half
if (arr[m] < x)
l = m + 1;
// If x is smaller, ignore right half
else
r = m - 1;
}
// if we reach here, then element was
// not present
return -1;
}
// Driver method to test above
public static void main(String args[])
{
BinarySearch ob = new BinarySearch();
int arr[] = {2, 3, 4, 10, 40};
int n = arr.length;
int x = 10;
int result = ob.binarySearch(arr, x);
if (result == -1)
System.out.println("Element not present");
else
System.out.println("Element found at " +
"index " + result);
}
}
Time Complexity:
The time complexity of Binary Search can be written as
T(n) = T(n/2) + c
The above recurrence can be solved either using Recurrence T ree method or Master method. It falls in case II of Master Method and solution of the recurrence is Theta(Logn).
Auxiliary Space: O(1) in case of iterative implementation. In case of recursive implementation, O(Logn) recursion call stack space.

Related

Optimizing a primality test based on runtime in Python

I'm pretty new to algorithms and runtimes, and I'm trying to optimise a bit of my code for a personal project.
import math
for num in range(0, 10000000000000000000000):
if all((num**(num+1)+(num+1)**(num))%i!=0 for i in range(2,int(math.sqrt((num**(num+1)+(num+1)**(num))))+1)):
print(num)
What can I do to speed this up? I know that num=80 should work but my code isn't getting past num=0, 1, 2 (it's not fast enough).
First I define my range, then I say if 'such-and-such' is prime from range 2 to sqrt(such-and-such) + 1, then return that number. Sqrt(n) + 1 is the minimum number of factors to test for the primality of n.
This is a primality test of sequence A051442
You would probably get a minor boost from computing (num**(num+1)+(num+1)**(num)) only once per iteration instead of sqrt(num**(num+1)+(num+1)**(num)) times. As you can see, this will greatly reduce the constant factor in your complexity. It won't change the fundamental complexity because you still need to compute the remainder. Change
if all((num**(num+1)+(num+1)**(num))%i!=0 for i in range(2,int(math.sqrt((num**(num+1)+(num+1)**(num))))+1)):
to
k = num**(num+1)+(num+1)**(num)
if all(k%i for i in range(2,int(math.sqrt(k))+1)):
The != 0 is implicit in Python.
Update
All this is just trivial improvement to an extremely inefficieny algorithm. The biggest speedup I can think of is to reduce the check k % i to only prime i. For any composite i = a * b such that k % i == 0, it must be the case that k % a == 0 and k % b == 0 (if k is divisible by i, it must also be divisible by the factors of i).
I am assuming that you don't want to use any kind of pre-computed prime tables in your code. In that case, you can compute the table yourself. This will involve checking all the numbers up to a given sqrt(k) only once ever, instead of once per iteration of num, since we can stash the previously computed primes in say a list. That will effectively increase the lower limit of the range in your current all from 2 to the square root of the previous k.
Let's define a function to extend our set of primes using the seive of Eratosthenes:
from math import sqrt
def extend(primes, from_, to):
"""
primes: a sequence containing prime numbers from 2 to `from - 1`, in order
from_: the number to start checking with
to: the number to end with (inclusive)
"""
if not primes:
primes.extend([2, 3])
return
for k in range(max(from_, 5), to + 1):
s = int(sqrt(k)) # No need to compute this more than once per k
for p in primes:
if p > s:
# Reached sqrt(k) -> short circuit success
primes.append(k)
break
elif not k % p:
# Found factor -> short circuit failure
break
Now we can use this function to extend our list of primes at every iteration of the original loop. This allows us to check the divisibility of k only against the slowly growing list of primes, not against all numbers:
primes = []
prev = 0
for num in range(10000000000000000000000):
k = num**(num + 1) + (num + 1)**num
lim = int(sqrt(k)) + 1
extend(primes, prev, lim)
#print('Num={}, K={}, checking {}-{}, p={}'.format(num, k, prev, lim, primes), end='... ')
if k <= 3 and k in primes or all(k % i for i in primes):
print('{}: {} Prime!'.format(num, k))
else:
print('{}: {} Nope'.format(num, k))
prev = lim + 1
I am not 100% sure that my extend function is optimal, but I am able to get to num == 13, k == 4731091158953433 in <10 minutes on my ridiculously old and slow laptop, so I guess it's not too bad. That means that the algorithm builds a complete table of primes up to ~7e7 in that time.
Update #2
A sort-of-but-not-really optimization you could do would be to check all(k % i for i in primes) before calling extend. This would save you a lot of cycles for numbers that have small prime factors, but would probably catch up to you later on, when you would end up having to compute all the primes up to some enormous number. Here is a sample of how you could do that:
primes = []
prev = 0
for num in range(10000000000000000000000):
k = num**(num + 1) + (num + 1)**num
lim = int(sqrt(k)) + 1
if not all(k % i for i in primes):
print('{}: {} Nope'.format(num, k))
continue
start = len(primes)
extend(primes, prev, lim)
if all(k % i for i in primes[start:]):
print('{}: {} Prime!'.format(num, k))
else:
print('{}: {} Nope'.format(num, k))
prev = lim + 1
While this version does not do much for the long run, it does explain why you were able to get to 15 so quickly in your original run. The prime table does note get extended after num == 3, until num == 16, which is when the terrible delay occurs in this version as well. The net runtime to 16 should be identical in both versions.
Update #3
As #paxdiablo suggests, the only numbers we need to consider in extend are multiples of 6 +/- 1. We can combine that with the fact that only a small number of primes generally need to be tested, and convert the functionality of extend into a generator that will only compute as many primes as absolutely necessary. Using Python's lazy generation should help. Here is a completely rewritten version:
from itertools import count
from math import ceil, sqrt
prime_table = [2, 3]
def prime_candidates(start=0):
"""
Infinite generator of prime number candidates starting with the
specified number.
Candidates are 2, 3 and all numbers that are of the form 6n-1 and 6n+1
"""
if start <= 3:
if start <= 2:
yield 2
yield 3
start = 5
delta = 2
else:
m = start % 6
if m < 2:
start += 1 - m
delta = 4
else:
start += 5 - m
delta = 2
while True:
yield start
start += delta
delta = 6 - delta
def isprime(n):
"""
Checks if `n` is prime.
All primes up to sqrt(n) are expected to already be present in
the generated `prime_table`.
"""
s = int(ceil(sqrt(n)))
for p in prime_table:
if p > s:
break
if not n % p:
return False
return True
def generate_primes(max):
"""
Generates primes up to the specified maximum.
First the existing table is yielded. Then, the new primes are
found in the sequence generated by `prime_candidates`. All verified
primes are added to the existing cache.
"""
for p in prime_table:
if p > max:
return
yield p
for k in prime_candidates(prime_table[-1] + 1):
if isprime(k):
prime_table.append(k)
if k > max:
# Putting the return here ensures that we always stop on a prime and therefore don't do any extra work
return
else:
yield k
for num in count():
k = num**(num + 1) + (num + 1)**num
lim = int(ceil(sqrt(k)))
b = all(k % i for i in generate_primes(lim))
print('n={}, k={} is {}prime'.format(num, k, '' if b else 'not '))
This version gets to 15 almost instantly. It gets stuck at 16 because the smallest prime factor for k=343809097055019694337 is 573645313. Some future expectations:
17 should be a breeze: 16248996011806421522977 has factor 19
18 will take a while: 812362695653248917890473 has factor 22156214713
19 is easy: 42832853457545958193355601 is divisible by 3
20 also easy: 2375370429446951548637196401 is divisible by 58967
21: 138213776357206521921578463913 is divisible by 13
22: 8419259736788826438132968480177 is divisible by 103
etc... (link to sequence)
So in terms of instant gratification, this method will get you much further if you can make it past 18 (which will take >100 times longer than getting past 16, which in my case took ~1.25hrs).
That being said, your greatest speedup at this point would be re-writing this in C or some similar low-level language that does not have as much overhead for loops.
Update #4
Just for giggles, here is an implementation of the latest Python version in C. I chose to go with GMP for arbitrary precision integers, because it is easy to use and install on my Red Hat system, and the docs are very clear:
#include <stdio.h>
#include <stdlib.h>
#include <gmp.h>
typedef struct {
size_t alloc;
size_t size;
mpz_t *numbers;
} PrimeTable;
void init_table(PrimeTable *buf)
{
buf->alloc = 0x100000L;
buf->size = 2;
buf->numbers = malloc(buf->alloc * sizeof(mpz_t));
if(buf == NULL) {
fprintf(stderr, "No memory for prime table\n");
exit(1);
}
mpz_init_set_si(buf->numbers[0], 2);
mpz_init_set_si(buf->numbers[1], 3);
return;
}
void append_table(PrimeTable *buf, mpz_t number)
{
if(buf->size == buf->alloc) {
size_t new = 2 * buf->alloc;
mpz_t *tmp = realloc(buf->numbers, new * sizeof(mpz_t));
if(tmp == NULL) {
fprintf(stderr, "Ran out of memory for prime table\n");
exit(1);
}
buf->alloc = new;
buf->numbers = tmp;
}
mpz_set(buf->numbers[buf->size], number);
buf->size++;
return;
}
size_t print_table(PrimeTable *buf, FILE *file)
{
size_t i, n;
n = fprintf(file, "Array contents = [");
for(i = 0; i < buf->size; i++) {
n += mpz_out_str(file, 10, buf->numbers[i]);
if(i < buf->size - 1)
n += fprintf(file, ", ");
}
n += fprintf(file, "]\n");
return n;
}
void free_table(PrimeTable *buf)
{
for(buf->size--; ((signed)(buf->size)) >= 0; buf->size--)
mpz_clear(buf->numbers[buf->size]);
free(buf->numbers);
return;
}
int isprime(mpz_t num, PrimeTable *table)
{
mpz_t max, rem, next;
size_t i, d, r;
mpz_inits(max, rem, NULL);
mpz_sqrtrem(max, rem, num);
// Check if perfect square: definitely not prime
if(!mpz_cmp_si(rem, 0)) {
mpz_clears(rem, max, NULL);
return 0;
}
/* Normal table lookup */
for(i = 0; i < table->size; i++) {
// Got to sqrt(n) -> prime
if(mpz_cmp(max, table->numbers[i]) < 0) {
mpz_clears(rem, max, NULL);
return 1;
}
// Found a factor -> not prime
if(mpz_divisible_p(num, table->numbers[i])) {
mpz_clears(rem, max, NULL);
return 0;
}
}
/* Extend table and do lookup */
// Start with last found prime + 2
mpz_init_set(next, table->numbers[i - 1]);
mpz_add_ui(next, next, 2);
// Find nearest number of form 6n-1 or 6n+1
r = mpz_fdiv_ui(next, 6);
if(r < 2) {
mpz_add_ui(next, next, 1 - r);
d = 4;
} else {
mpz_add_ui(next, next, 5 - r);
d = 2;
}
// Step along numbers of form 6n-1/6n+1. Check each candidate for
// primality. Don't stop until next prime after sqrt(n) to avoid
// duplication.
for(;;) {
if(isprime(next, table)) {
append_table(table, next);
if(mpz_divisible_p(num, next)) {
mpz_clears(next, rem, max, NULL);
return 0;
}
if(mpz_cmp(max, next) <= 0) {
mpz_clears(next, rem, max, NULL);
return 1;
}
}
mpz_add_ui(next, next, d);
d = 6 - d;
}
// Return can only happen from within loop.
}
int main(int argc, char *argv[])
{
PrimeTable table;
mpz_t k, a, b;
size_t n, next;
int p;
init_table(&table);
mpz_inits(k, a, b, NULL);
for(n = 0; ; n = next) {
next = n + 1;
mpz_set_ui(a, n);
mpz_pow_ui(a, a, next);
mpz_set_ui(b, next);
mpz_pow_ui(b, b, n);
mpz_add(k, a, b);
p = isprime(k, &table);
printf("n=%ld k=", n);
mpz_out_str(stdout, 10, k);
printf(" p=%d\n", p);
//print_table(&table, stdout);
}
mpz_clears(b, a, k, NULL);
free_table(&table);
return 0;
}
While this version has the exact same algorithmic complexity as the Python one, I expect it to run a few orders of magnitude faster because of the relatively minimal overhead incurred in C. And indeed, it took about 15 minutes to get stuck at n == 18, which is ~5 times faster than the Python version so far.
Update #5
This is going to be the last one, I promise.
GMP has a function called mpz_nextprime, which offers a potentially much faster implementation of this algorightm, especially with caching. According to the docs:
This function uses a probabilistic algorithm to identify primes. For practical purposes it’s adequate, the chance of a composite passing will be extremely small.
This means that it is probably much faster than the current prime generator I implemented, with a slight cost offset of some false primes being added to the cache. This cost should be minimal: even adding a few thousand extra modulo operations should be fine if the prime generator is faster than it is now.
The only part that needs to be replaced/modified is the portion of isprime below the comment /* Extend table and do lookup */. Basically that whole section just becomes a series of calls to mpz_nextprime instead of recursion.
At that point, you may as well adapt isprime to use mpz_probab_prime_p when possible. You only need to check for sure if the result of mpz_probab_prime_p is uncertain:
int isprime(mpz_t num, PrimeTable *table)
{
mpz_t max, rem, next;
size_t i, r;
int status;
status = mpz_probab_prime_p(num, 50);
// Status = 2 -> definite yes, Status = 0 -> definite no
if(status != 1)
return status != 0;
mpz_inits(max, rem, NULL);
mpz_sqrtrem(max, rem, num);
// Check if perfect square: definitely not prime
if(!mpz_cmp_si(rem, 0)) {
mpz_clears(rem, max, NULL);
return 0;
}
mpz_clear(rem);
/* Normal table lookup */
for(i = 0; i < table->size; i++) {
// Got to sqrt(n) -> prime
if(mpz_cmp(max, table->numbers[i]) < 0) {
mpz_clear(max);
return 1;
}
// Found a factor -> not prime
if(mpz_divisible_p(num, table->numbers[i])) {
mpz_clear(max);
return 0;
}
}
/* Extend table and do lookup */
// Start with last found prime + 2
mpz_init_set(next, table->numbers[i - 1]);
mpz_add_ui(next, next, 2);
// Step along probable primes
for(;;) {
mpz_nextprime(next, next);
append_table(table, next);
if(mpz_divisible_p(num, next)) {
r = 0;
break;
}
if(mpz_cmp(max, next) <= 0) {
r = 1;
break;
}
}
mpz_clears(next, max, NULL);
return r;
}
Sure enough, this version makes it to n == 79 in a couple of seconds at most. It appears to get stuck on n == 80, probably because mpz_probab_prime_p can't determine if k is a prime for sure. I doubt that computing all the primes up to ~10^80 is going to take a trivial amount of time.

Maximum element in array which is equal to product of two elements in array

We need to find the maximum element in an array which is also equal to product of two elements in the same array. For example [2,3,6,8] , here 6=2*3 so answer is 6.
My approach was to sort the array and followed by a two pointer method which checked whether the product exist for each element. This is o(nlog(n)) + O(n^2) = O(n^2) approach. Is there a faster way to this ?
There is a slight better solution with O(n * sqrt(n)) if you are allowed to use O(M) memory M = max number in A[i]
Use an array of size M to mark every number while you traverse them from smaller to bigger number.
For each number try all its factors and see if those were already present in the array map.
Here is a pseudo code for that:
#define M 1000000
int array_map[M+2];
int ans = -1;
sort(A,A+n);
for(i=0;i<n;i++) {
for(j=1;j<=sqrt(A[i]);j++) {
int num1 = j;
if(A[i]%num1==0) {
int num2 = A[i]/num1;
if(array_map[num1] && array_map[num2]) {
if(num1==num2) {
if(array_map[num1]>=2) ans = A[i];
} else {
ans = A[i];
}
}
}
}
array_map[A[i]]++;
}
There is an ever better approach if you know how to find all possible factors in log(M) this just becomes O(n*logM). You have to use sieve and backtracking for that
#JerryGoyal 's solution is correct. However, I think it can be optimized even further if instead of using B pointer, we use binary search to find the other factor of product if arr[c] is divisible by arr[a]. Here's the modification for his code:
for(c=n-1;(c>1)&& (max==-1);c--){ // loop through C
for(a=0;(a<c-1)&&(max==-1);a++){ // loop through A
if(arr[c]%arr[a]==0) // If arr[c] is divisible by arr[a]
{
if(binary_search(a+1, c-1, (arr[c]/arr[a]))) //#include<algorithm>
{
max = arr[c]; // if the other factor x of arr[c] is also in the array such that arr[c] = arr[a] * x
break;
}
}
}
}
I would have commented this on his solution, unfortunately I lack the reputation to do so.
Try this.
Written in c++
#include <vector>
#include <algorithm>
using namespace std;
int MaxElement(vector< int > Input)
{
sort(Input.begin(), Input.end());
int LargestElementOfInput = 0;
int i = 0;
while (i < Input.size() - 1)
{
if (LargestElementOfInput == Input[Input.size() - (i + 1)])
{
i++;
continue;
}
else
{
if (Input[i] != 0)
{
LargestElementOfInput = Input[Input.size() - (i + 1)];
int AllowedValue = LargestElementOfInput / Input[i];
int j = 0;
while (j < Input.size())
{
if (Input[j] > AllowedValue)
break;
else if (j == i)
{
j++;
continue;
}
else
{
int Product = Input[i] * Input[j++];
if (Product == LargestElementOfInput)
return Product;
}
}
}
i++;
}
}
return -1;
}
Once you have sorted the array, then you can use it to your advantage as below.
One improvement I can see - since you want to find the max element that meets the criteria,
Start from the right most element of the array. (8)
Divide that with the first element of the array. (8/2 = 4).
Now continue with the double pointer approach, till the element at second pointer is less than the value from the step 2 above or the match is found. (i.e., till second pointer value is < 4 or match is found).
If the match is found, then you got the max element.
Else, continue the loop with next highest element from the array. (6).
Efficient solution:
2 3 8 6
Sort the array
keep 3 pointers C, B and A.
Keeping C at the last and A at 0 index and B at 1st index.
traverse the array using pointers A and B till C and check if A*B=C exists or not.
If it exists then C is your answer.
Else, Move C a position back and traverse again keeping A at 0 and B at 1st index.
Keep repeating this till you get the sum or C reaches at 1st index.
Here's the complete solution:
int arr[] = new int[]{2, 3, 8, 6};
Arrays.sort(arr);
int n=arr.length;
int a,b,c,prod,max=-1;
for(c=n-1;(c>1)&& (max==-1);c--){ // loop through C
for(a=0;(a<c-1)&&(max==-1);a++){ // loop through A
for(b=a+1;b<c;b++){ // loop through B
prod=arr[a]*arr[b];
if(prod==arr[c]){
System.out.println("A: "+arr[a]+" B: "+arr[b]);
max=arr[c];
break;
}
if(prod>arr[c]){ // no need to go further
break;
}
}
}
}
System.out.println(max);
I came up with below solution where i am using one array list, and following one formula:
divisor(a or b) X quotient(b or a) = dividend(c)
Sort the array.
Put array into Collection Col.(ex. which has faster lookup, and maintains insertion order)
Have 2 pointer a,c.
keep c at last, and a at 0.
try to follow (divisor(a or b) X quotient(b or a) = dividend(c)).
Check if a is divisor of c, if yes then check for b in col.(a
If a is divisor and list has b, then c is the answer.
else increase a by 1, follow step 5, 6 till c-1.
if max not found then decrease c index, and follow the steps 4 and 5.
Check this C# solution:
-Loop through each element,
-loop and multiply each element with other elements,
-verify if the product exists in the array and is the max
private static int GetGreatest(int[] input)
{
int max = 0;
int p = 0; //product of pairs
//loop through the input array
for (int i = 0; i < input.Length; i++)
{
for (int j = i + 1; j < input.Length; j++)
{
p = input[i] * input[j];
if (p > max && Array.IndexOf(input, p) != -1)
{
max = p;
}
}
}
return max;
}
Time complexity O(n^2)

Minimum no. of comparisons to find median of 3 numbers

I was implementing quicksort and I wished to set the pivot to be the median or three numbers. The three numbers being the first element, the middle element, and the last element.
Could I possibly find the median in less no. of comparisons?
median(int a[], int p, int r)
{
int m = (p+r)/2;
if(a[p] < a[m])
{
if(a[p] >= a[r])
return a[p];
else if(a[m] < a[r])
return a[m];
}
else
{
if(a[p] < a[r])
return a[p];
else if(a[m] >= a[r])
return a[m];
}
return a[r];
}
If the concern is only comparisons, then this should be used.
int getMedian(int a, int b , int c) {
int x = a-b;
int y = b-c;
int z = a-c;
if(x*y > 0) return b;
if(x*z > 0) return c;
return a;
}
int32_t FindMedian(const int n1, const int n2, const int n3) {
auto _min = min(n1, min(n2, n3));
auto _max = max(n1, max(n2, n3));
return (n1 + n2 + n3) - _min - _max;
}
You can't do it in one, and you're only using two or three, so I'd say you've got the minimum number of comparisons already.
Rather than just computing the median, you might as well put them in place. Then you can get away with just 3 comparisons all the time, and you've got your pivot closer to being in place.
T median(T a[], int low, int high)
{
int middle = ( low + high ) / 2;
if( a[ middle ].compareTo( a[ low ] ) < 0 )
swap( a, low, middle );
if( a[ high ].compareTo( a[ low ] ) < 0 )
swap( a, low, high );
if( a[ high ].compareTo( a[ middle ] ) < 0 )
swap( a, middle, high );
return a[middle];
}
I know that this is an old thread, but I had to solve exactly this problem on a microcontroller that has very little RAM and does not have a h/w multiplication unit (:)). In the end I found the following works well:
static char medianIndex[] = { 1, 1, 2, 0, 0, 2, 1, 1 };
signed short getMedian(const signed short num[])
{
return num[medianIndex[(num[0] > num[1]) << 2 | (num[1] > num[2]) << 1 | (num[0] > num[2])]];
}
If you're not afraid to get your hands a little dirty with compiler intrinsics you can do it with exactly 0 branches.
The same question was discussed before on:
Fastest way of finding the middle value of a triple?
Though, I have to add that in the context of naive implementation of quicksort, with a lot of elements, reducing the amount of branches when finding the median is not so important because the branch predictor will choke either way when you'll start tossing elements around the the pivot. More sophisticated implementations (which don't branch on the partition operation, and avoid WAW hazards) will benefit from this greatly.
remove max and min value from total sum
int med3(int a, int b, int c)
{
int tot_v = a + b + c ;
int max_v = max(a, max(b, c));
int min_v = min(a, min(b, c));
return tot_v - max_v - min_v
}
There is actually a clever way to isolate the median element from three using a careful analysis of the 6 possible permutations (of low, median, high). In python:
def med(a, start, mid, last):
# put the median of a[start], a[mid], a[last] in the a[start] position
SM = a[start] < a[mid]
SL = a[start] < a[last]
if SM != SL:
return
ML = a[mid] < a[last]
m = mid if SM == ML else last
a[start], a[m] = a[m], a[start]
Half the time you have two comparisons otherwise you have 3 (avg 2.5). And you only swap the median element once when needed (2/3 of the time).
Full python quicksort using this at:
https://github.com/mckoss/labs/blob/master/qs.py
You can write up all the permutations:
1 0 2
1 2 0
0 1 2
2 1 0
0 2 1
2 0 1
Then we want to find the position of the 1. We could do this with two comparisons, if our first comparison could split out a group of equal positions, such as the first two lines.
The issue seems to be that the first two lines are different on any comparison we have available: a<b, a<c, b<c. Hence we have to fully identify the permutation, which requires 3 comparisons in the worst case.
Using a Bitwise XOR operator, the median of three numbers can be found.
def median(a,b,c):
m = max(a,b,c)
n = min(a,b,c)
ans = m^n^a^b^c
return ans

Google Interview : Find Crazy Distance Between Strings

This Question was asked to me at the Google interview. I could do it O(n*n) ... Can I do it in better time.
A string can be formed only by 1 and 0.
Definition:
X & Y are strings formed by 0 or 1
D(X,Y) = Remove the things common at the start from both X & Y. Then add the remaining lengths from both the strings.
For e.g.
D(1111, 1000) = Only First alphabet is common. So the remaining string is 111 & 000. Therefore the result length("111") & length("000") = 3 + 3 = 6
D(101, 1100) = Only First two alphabets are common. So the remaining string is 01 & 100. Therefore the result length("01") & length("100") = 2 + 3 = 5
It is pretty that obvious that do find out such a crazy distance is going to be linear. O(m).
Now the question is
given n input, say like
1111
1000
101
1100
Find out the maximum crazy distance possible.
n is the number of input strings.
m is the max length of any input string.
The solution of O(n2 * m) is pretty simple. Can it be done in a better way?
Let's assume that m is fixed. Can we do this in better than O(n^2) ?
Put the strings into a tree, where 0 means go left and 1 means go right. So for example
1111
1000
101
1100
would result in a tree like
Root
1
0 1
0 1* 0 1
0* 0* 1*
where the * means that an element ends there. Constructing this tree clearly takes O(n m).
Now we have to find the diameter of the tree (the longest path between two nodes, which is the same thing as the "crazy distance"). The optimized algorithm presented there hits each node in the tree once. There are at most min(n m, 2^m) such nodes.
So if n m < 2^m, then the the algorithm is O(n m).
If n m > 2^m (and we necessarily have repeated inputs), then the algorithm is still O(n m) from the first step.
This also works for strings with a general alphabet; for an alphabet with k letters build a k-ary tree, in which case the runtime is still O(n m) by the same reasoning, though it takes k times as much memory.
I think this is possible in O(nm) time by creating a binary tree where each bit in a string encodes the path (0 left, 1 right). Then finding the maximum distance between nodes of the tree which can be done in O(n) time.
This is my solution, I think it works:
Create a binary tree from all strings. The tree will be constructed in this way:
at every round, select a string and add it to the tree. so for your example, the tree will be:
<root>
<1> <empty>
<1> <0>
<1> <0> <1> <0>
<1> <0> <0>
So each path from root to a leaf will represent a string.
Now the distance between each two leaves is the distance between two strings. To find the crazy distance, you must find the diameter of this graph, that you can do it easily by dfs or bfs.
The total complexity of this algorithm is:
O(n*m) + O(n*m) = O(n*m).
I think this problem is something like "find prefix for two strings", you can use trie(http://en.wikipedia.org/wiki/Trie) to accerlate searching
I have a google phone interview 3 days before, but maybe I failed...
Best luck to you
To get an answer in O(nm) just iterate across the characters of all string (this is an O(n) operation). We will compare at most m characters, so this will be done O(m). This gives a total of O(nm). Here's a C++ solution:
int max_distance(char** strings, int numstrings, int &distance) {
distance = 0;
// loop O(n) for initialization
for (int i=0; i<numstrings; i++)
distance += strlen(strings[i]);
int max_prefix = 0;
bool done = false;
// loop max O(m)
while (!done) {
int c = -1;
// loop O(n)
for (int i=0; i<numstrings; i++) {
if (strings[i][max_prefix] == 0) {
done = true; // it is enough to reach the end of one string to be done
break;
}
int new_element = strings[i][max_prefix] - '0';
if (-1 == c)
c = new_element;
else {
if (c != new_element) {
done = true; // mismatch
break;
}
}
}
if (!done) {
max_prefix++;
distance -= numstrings;
}
}
return max_prefix;
}
void test_misc() {
char* strings[] = {
"10100",
"10101110",
"101011",
"101"
};
std::cout << std::endl;
int distance = 0;
std::cout << "max_prefix = " << max_distance(strings, sizeof(strings)/sizeof(strings[0]), distance) << std::endl;
}
Not sure why use trees when iteration gives you the same big O computational complexity without the code complexity. anyway here is my version of it in javascript O(mn)
var len = process.argv.length -2; // in node first 2 arguments are node and program file
var input = process.argv.splice(2);
var current;
var currentCount = 0;
var currentCharLoc = 0;
var totalCount = 0;
var totalComplete = 0;
var same = true;
while ( totalComplete < len ) {
current = null;
currentCount = 0;
for ( var loc = 0 ; loc < len ; loc++) {
if ( input[loc].length === currentCharLoc) {
totalComplete++;
same = false;
} else if (input[loc].length > currentCharLoc) {
currentCount++;
if (same) {
if ( current === null ) {
current = input[loc][currentCharLoc];
} else {
if (current !== input[loc][currentCharLoc]) {
same = false;
}
}
}
}
}
if (!same) {
totalCount += currentCount;
}
currentCharLoc++;
}
console.log(totalCount);

calculating the sum of nodes in a single verticle line of a binary tree

For a binary tree i want to get the sum of all nodes that fall in a single verticle line. I want the sum of nodes in each verticle node
A
/ \
B C
/ \ / \
D E F G
/ \
H I
IF you look at above tee
line 0 A E F so sum = A+E+F
line -1 B I so sum = B +I
line 1 C so sum = C
line 2 G so sum = G
I implemented following algorithm
Map<Integer,Integere> mp = new HashMap<Integer,Integer>()
calculate(root,0);
void calculate(Node node, int pos){
if(node==null)
return ;
if(mp.containsKey(pos) ){
int val = mp.get(pos) + node.data;
mp.put(pos,val);
}
else{
mp.put(pos,node.data);
}
calculate(node.left,pos-1);
calculate(node.right,pos+1);
}
I think the above algo is fine.Can
any one confirm?
Also how can i do it without using
HashMap,arraylist or any such
collection datatype of java.One
method is two is 2 arrays one for
storing negative indexes(mapped to
positive) and one for positive
indexs(right side of root).But we
dont know what the size of array
will be.
One approach is to use doubly link
list and add a node on right/left
movement if necessary. Am not
getting how can i implement this
approach? Any other simple/more time
efficient approach?
Is the complexity of the above code
i imolmeted is O(n)? (am not good at
analysing time complexity , so
asking )
C++ code
int vertsum(Node* n, int cur_level, int target_level)
{
if (!n)
return 0;
int sum = 0;
if (cur_level == target_level)
sum = n->value;
return sum +
vertsum(n->left, cur_level-1, target_level) +
vertsum(n->right, cur_level+1, target_level);
}
invocation example:
vertsum(root, 0, 1);
EDIT:
After clarifying the requirements, here the suggested code. Note that this is C++'ish and not exactly using Java's or C++'s standard API for lists, but you should get the idea. I assume that addNodeBefore and addNodeAfter initialize node's data (i.e. ListNode::counter)
void vertsum(TreeNode* n, int level, ListNode& counter)
{
if (!n)
return;
counter.value += n->value;
counter.index = level;
if (! counter.prev)
addNodeBefore(counter);
vertsum(n->left, level-1, counter.prev);
if (! counter.next)
addNodeAfter(counter);
vertsum(n->right, level+1, counter.next);
return;
}
You could visit the binary tree in depth-first postorder, and use an offset to keep track of how far you moved to the left/right with respect to your starting node. Every time you move to the left, you decrement the offset, and every time you move to the right you increment the offset. If your visit procedure is called with an offset of 0, then it means that the node being visited has the same offset of your starting node (i.e. it's in the same column), and so you must add its value.
Pseudocode:
procedure visit (node n, int offset) {
sumleft = 0
sumright = 0
if (n.left != null)
sumleft = visit(n.left, offset - 1)
if (n.right != null)
sumright = visit(n.right, offset + 1)
if (offset == 0)
return n.value + sumleft + sumright
else
return sumleft + sumright;
}
For example, if you call
visit(A, 0)
you will get the following calls:
visit(A, 0) -> E.value + F.value + A.value
visit(B, -1) -> E.value
visit(D, -2) -> 0
visit(H, -3) -> 0
visit(I, +2) -> 0
visit(E, 0) -> E.value
visit(C, +1) -> F.value
visit(F, 0) -> F.value
visit(G, +1) -> 0
Another example, starting from node B:
visit(B, 0)
visit(D, -1)
visit(H, -2)
visit(I, 0) -> here we return I.value
visit(E, +1)
when recursion goes back to the initial call visit(B, 0) we have sumleft = I.value and sumright = 0, so we return the final result B.value + I.value, as expected.
Complexity of O(n), because you visit once all nodes of your tree rooted at the starting node.
After think about the above algorithm, I realize it has a limitation, which becomes evident when we consider a more complex tree like the following:
In this case visit(B, 0) would still return B.value + I.value, but this is not the expected result, because N is also on the same column. The following algorithm should cope with this problem:
procedure visit(node n, int c, int t) {
sumleft = 0;
sumright = 0;
if (n.left != null)
sumleft = visit(n.left, c - 1, t)
if (n.right != null)
sumright = visit(n.right, c + 1, t)
if (c == t)
return n.value + sumleft + sumright;
else
return sumleft + sumright;
}
The idea is essentially the same, but we have now a parameter c which gives the current column, and a parameter t which is the target column. If we want the sum of the elements in the B column, then we can call visit(A, 0, -1), that is we always start our visit from node A (the root's tree), which is at column 0, and our target is column -1. We get the following:
Therefore visit(A, 0, -1) = B + I + N as expected.
Complexity is always O(n), where n is the number of nodes in the tree, because we visit the entire tree with depth-first postorder, and we process each node only once.
If we want to compute the sum of every column, we can use the following algorithm
procedure visit(node n, int c) {
if (n.left != null)
S{c} += n.value;
visit(n.left, c - 1)
visit(n.right, c + 1)
}
and call once visit(A, 0), where A is the root node. Note that S{...} in the algorithm is a map whose keys are the columns numbers (..., -2, -1, 0, 1, 2, ...) and whose values (at the end of the algorithm) are the sums of the values of nodes in that column (S{1} will be the sum of nodes in column 1). We can also use an array, instead of a map, provided that we pay attention to the indexes (arrays have no negative indexes). The algorithm is still O(n), because we traverse the entire tree only once. However, in this case we need additional space to store the sum for all columns (the map, or the array). If I'm not mistaken a binary tree of height h will have 2*h + 1 columns.
What about the following? (Inside your node class, assuming getData returns the integer value, hasRight() and hasLeft() are boolean values indicating whether a right/left branch exists and getRight() and getLeft() return the next node in the right/left branch.
public int calculateVerticalLine(int numLine) {
return calculateVerticalLineRecursive(numLine, 0);
}
protected int calculateVerticalLineRecursive(int numLine, int curPosition) {
int result = 0;
if(numLine == curPosition) result += this.getData();
if(hasRight()) result += getRight().calculateVerticalLineRecursive(numLine, curPosition+1);
if(hasLeft()) result += getLeft().calculateVerticalLineRecursive(numLine, curPosition-1);
return result;
}
public class Solution {
public static int getSum(BinaryTreeNode<Integer> root) {
//Your code goes here.
if(root==null){
return 0;
}
int leftnodesum=getSum(root.left);
int rightnodesum=getSum(root.right);
return root.data+leftnodesum+rightnodesum;
}
}

Resources