Minimum no. of comparisons to find median of 3 numbers - median

I was implementing quicksort and I wished to set the pivot to be the median or three numbers. The three numbers being the first element, the middle element, and the last element.
Could I possibly find the median in less no. of comparisons?
median(int a[], int p, int r)
{
int m = (p+r)/2;
if(a[p] < a[m])
{
if(a[p] >= a[r])
return a[p];
else if(a[m] < a[r])
return a[m];
}
else
{
if(a[p] < a[r])
return a[p];
else if(a[m] >= a[r])
return a[m];
}
return a[r];
}

If the concern is only comparisons, then this should be used.
int getMedian(int a, int b , int c) {
int x = a-b;
int y = b-c;
int z = a-c;
if(x*y > 0) return b;
if(x*z > 0) return c;
return a;
}

int32_t FindMedian(const int n1, const int n2, const int n3) {
auto _min = min(n1, min(n2, n3));
auto _max = max(n1, max(n2, n3));
return (n1 + n2 + n3) - _min - _max;
}

You can't do it in one, and you're only using two or three, so I'd say you've got the minimum number of comparisons already.

Rather than just computing the median, you might as well put them in place. Then you can get away with just 3 comparisons all the time, and you've got your pivot closer to being in place.
T median(T a[], int low, int high)
{
int middle = ( low + high ) / 2;
if( a[ middle ].compareTo( a[ low ] ) < 0 )
swap( a, low, middle );
if( a[ high ].compareTo( a[ low ] ) < 0 )
swap( a, low, high );
if( a[ high ].compareTo( a[ middle ] ) < 0 )
swap( a, middle, high );
return a[middle];
}

I know that this is an old thread, but I had to solve exactly this problem on a microcontroller that has very little RAM and does not have a h/w multiplication unit (:)). In the end I found the following works well:
static char medianIndex[] = { 1, 1, 2, 0, 0, 2, 1, 1 };
signed short getMedian(const signed short num[])
{
return num[medianIndex[(num[0] > num[1]) << 2 | (num[1] > num[2]) << 1 | (num[0] > num[2])]];
}

If you're not afraid to get your hands a little dirty with compiler intrinsics you can do it with exactly 0 branches.
The same question was discussed before on:
Fastest way of finding the middle value of a triple?
Though, I have to add that in the context of naive implementation of quicksort, with a lot of elements, reducing the amount of branches when finding the median is not so important because the branch predictor will choke either way when you'll start tossing elements around the the pivot. More sophisticated implementations (which don't branch on the partition operation, and avoid WAW hazards) will benefit from this greatly.

remove max and min value from total sum
int med3(int a, int b, int c)
{
int tot_v = a + b + c ;
int max_v = max(a, max(b, c));
int min_v = min(a, min(b, c));
return tot_v - max_v - min_v
}

There is actually a clever way to isolate the median element from three using a careful analysis of the 6 possible permutations (of low, median, high). In python:
def med(a, start, mid, last):
# put the median of a[start], a[mid], a[last] in the a[start] position
SM = a[start] < a[mid]
SL = a[start] < a[last]
if SM != SL:
return
ML = a[mid] < a[last]
m = mid if SM == ML else last
a[start], a[m] = a[m], a[start]
Half the time you have two comparisons otherwise you have 3 (avg 2.5). And you only swap the median element once when needed (2/3 of the time).
Full python quicksort using this at:
https://github.com/mckoss/labs/blob/master/qs.py

You can write up all the permutations:
1 0 2
1 2 0
0 1 2
2 1 0
0 2 1
2 0 1
Then we want to find the position of the 1. We could do this with two comparisons, if our first comparison could split out a group of equal positions, such as the first two lines.
The issue seems to be that the first two lines are different on any comparison we have available: a<b, a<c, b<c. Hence we have to fully identify the permutation, which requires 3 comparisons in the worst case.

Using a Bitwise XOR operator, the median of three numbers can be found.
def median(a,b,c):
m = max(a,b,c)
n = min(a,b,c)
ans = m^n^a^b^c
return ans

Related

find the number of ways you can form a string on size N, given an unlimited number of 0s and 1s

The below question was asked in the atlassian company online test ,I don't have test cases , this is the below question I took from this link
find the number of ways you can form a string on size N, given an unlimited number of 0s and 1s. But
you cannot have D number of consecutive 0s and T number of consecutive 1s. N, D, T were given as inputs,
Please help me on this problem,any approach how to proceed with it
My approach for the above question is simply I applied recursion and tried for all possiblity and then I memoized it using hash map
But it seems to me there must be some combinatoric approach that can do this question in less time and space? for debugging purposes I am also printing the strings generated during recursion, if there is flaw in my approach please do tell me
#include <bits/stdc++.h>
using namespace std;
unordered_map<string,int>dp;
int recurse(int d,int t,int n,int oldd,int oldt,string s)
{
if(d<=0)
return 0;
if(t<=0)
return 0;
cout<<s<<"\n";
if(n==0&&d>0&&t>0)
return 1;
string h=to_string(d)+" "+to_string(t)+" "+to_string(n);
if(dp.find(h)!=dp.end())
return dp[h];
int ans=0;
ans+=recurse(d-1,oldt,n-1,oldd,oldt,s+'0')+recurse(oldd,t-1,n-1,oldd,oldt,s+'1');
return dp[h]=ans;
}
int main()
{
int n,d,t;
cin>>n>>d>>t;
dp.clear();
cout<<recurse(d,t,n,d,t,"")<<"\n";
return 0;
}
You are right, instead of generating strings, it is worth to consider combinatoric approach using dynamic programming (a kind of).
"Good" sequence of length K might end with 1..D-1 zeros or 1..T-1 of ones.
To make a good sequence of length K+1, you can add zero to all sequences except for D-1, and get 2..D-1 zeros for the first kind of precursors and 1 zero for the second kind
Similarly you can add one to all sequences of the first kind, and to all sequences of the second kind except for T-1, and get 1 one for the first kind of precursors and 2..T-1 ones for the second kind
Make two tables
Zeros[N][D] and Ones[N][T]
Fill the first row with zero counts, except for Zeros[1][1] = 1, Ones[1][1] = 1
Fill row by row using the rules above.
Zeros[K][1] = Sum(Ones[K-1][C=1..T-1])
for C in 2..D-1:
Zeros[K][C] = Zeros[K-1][C-1]
Ones[K][1] = Sum(Zeros[K-1][C=1..T-1])
for C in 2..T-1:
Ones[K][C] = Ones[K-1][C-1]
Result is sum of the last row in both tables.
Also note that you really need only two active rows of the table, so you can optimize size to Zeros[2][D] after debugging.
This can be solved using dynamic programming. I'll give a recursive solution to the same. It'll be similar to generating a binary string.
States will be:
i: The ith character that we need to insert to the string.
cnt: The number of consecutive characters before i
bit: The character which was repeated cnt times before i. Value of bit will be either 0 or 1.
Base case will: Return 1, when we reach n since we are starting from 0 and ending at n-1.
Define the size of dp array accordingly. The time complexity will be 2 x N x max(D,T)
#include<bits/stdc++.h>
using namespace std;
int dp[1000][1000][2];
int n, d, t;
int count(int i, int cnt, int bit) {
if (i == n) {
return 1;
}
int &ans = dp[i][cnt][bit];
if (ans != -1) return ans;
ans = 0;
if (bit == 0) {
ans += count(i+1, 1, 1);
if (cnt != d - 1) {
ans += count(i+1, cnt + 1, 0);
}
} else {
// bit == 1
ans += count(i+1, 1, 0);
if (cnt != t-1) {
ans += count(i+1, cnt + 1, 1);
}
}
return ans;
}
signed main() {
ios_base::sync_with_stdio(false), cin.tie(nullptr);
cin >> n >> d >> t;
memset(dp, -1, sizeof dp);
cout << count(0, 0, 0);
return 0;
}

What is the worst case for binary search

Where should an element be located in the array so that the run time of the Binary search algorithm is O(log n)?
The first or last element will give the worst case complexity in binary search as you'll have to do maximum no of comparisons.
Example:
1 2 3 4 5 6 7 8 9
Here searching for 1 will give you the worst case, with the result coming in 4th pass.
1 2 3 4 5 6 7 8
In this case, searching for 8 will give the worst case, with the result coming in 4 passes.
Note that in the second case searching for 1 (the first element) can be done in just 3 passes. (compare 1 & 4, compare 1 & 2 and finally 1)
So, if no. of elements are even, the last element gives the worst case.
This is assuming all arrays are 0 indexed. This happens due to considering the mid as float of (start + end) /2.
// Java implementation of iterative Binary Search
class BinarySearch
{
// Returns index of x if it is present in arr[],
// else return -1
int binarySearch(int arr[], int x)
{
int l = 0, r = arr.length - 1;
while (l <= r)
{
int m = l + (r-l)/2;
// Check if x is present at mid
if (arr[m] == x)
return m;
// If x greater, ignore left half
if (arr[m] < x)
l = m + 1;
// If x is smaller, ignore right half
else
r = m - 1;
}
// if we reach here, then element was
// not present
return -1;
}
// Driver method to test above
public static void main(String args[])
{
BinarySearch ob = new BinarySearch();
int arr[] = {2, 3, 4, 10, 40};
int n = arr.length;
int x = 10;
int result = ob.binarySearch(arr, x);
if (result == -1)
System.out.println("Element not present");
else
System.out.println("Element found at " +
"index " + result);
}
}
Time Complexity:
The time complexity of Binary Search can be written as
T(n) = T(n/2) + c
The above recurrence can be solved either using Recurrence T ree method or Master method. It falls in case II of Master Method and solution of the recurrence is Theta(Logn).
Auxiliary Space: O(1) in case of iterative implementation. In case of recursive implementation, O(Logn) recursion call stack space.

Optimizing a primality test based on runtime in Python

I'm pretty new to algorithms and runtimes, and I'm trying to optimise a bit of my code for a personal project.
import math
for num in range(0, 10000000000000000000000):
if all((num**(num+1)+(num+1)**(num))%i!=0 for i in range(2,int(math.sqrt((num**(num+1)+(num+1)**(num))))+1)):
print(num)
What can I do to speed this up? I know that num=80 should work but my code isn't getting past num=0, 1, 2 (it's not fast enough).
First I define my range, then I say if 'such-and-such' is prime from range 2 to sqrt(such-and-such) + 1, then return that number. Sqrt(n) + 1 is the minimum number of factors to test for the primality of n.
This is a primality test of sequence A051442
You would probably get a minor boost from computing (num**(num+1)+(num+1)**(num)) only once per iteration instead of sqrt(num**(num+1)+(num+1)**(num)) times. As you can see, this will greatly reduce the constant factor in your complexity. It won't change the fundamental complexity because you still need to compute the remainder. Change
if all((num**(num+1)+(num+1)**(num))%i!=0 for i in range(2,int(math.sqrt((num**(num+1)+(num+1)**(num))))+1)):
to
k = num**(num+1)+(num+1)**(num)
if all(k%i for i in range(2,int(math.sqrt(k))+1)):
The != 0 is implicit in Python.
Update
All this is just trivial improvement to an extremely inefficieny algorithm. The biggest speedup I can think of is to reduce the check k % i to only prime i. For any composite i = a * b such that k % i == 0, it must be the case that k % a == 0 and k % b == 0 (if k is divisible by i, it must also be divisible by the factors of i).
I am assuming that you don't want to use any kind of pre-computed prime tables in your code. In that case, you can compute the table yourself. This will involve checking all the numbers up to a given sqrt(k) only once ever, instead of once per iteration of num, since we can stash the previously computed primes in say a list. That will effectively increase the lower limit of the range in your current all from 2 to the square root of the previous k.
Let's define a function to extend our set of primes using the seive of Eratosthenes:
from math import sqrt
def extend(primes, from_, to):
"""
primes: a sequence containing prime numbers from 2 to `from - 1`, in order
from_: the number to start checking with
to: the number to end with (inclusive)
"""
if not primes:
primes.extend([2, 3])
return
for k in range(max(from_, 5), to + 1):
s = int(sqrt(k)) # No need to compute this more than once per k
for p in primes:
if p > s:
# Reached sqrt(k) -> short circuit success
primes.append(k)
break
elif not k % p:
# Found factor -> short circuit failure
break
Now we can use this function to extend our list of primes at every iteration of the original loop. This allows us to check the divisibility of k only against the slowly growing list of primes, not against all numbers:
primes = []
prev = 0
for num in range(10000000000000000000000):
k = num**(num + 1) + (num + 1)**num
lim = int(sqrt(k)) + 1
extend(primes, prev, lim)
#print('Num={}, K={}, checking {}-{}, p={}'.format(num, k, prev, lim, primes), end='... ')
if k <= 3 and k in primes or all(k % i for i in primes):
print('{}: {} Prime!'.format(num, k))
else:
print('{}: {} Nope'.format(num, k))
prev = lim + 1
I am not 100% sure that my extend function is optimal, but I am able to get to num == 13, k == 4731091158953433 in <10 minutes on my ridiculously old and slow laptop, so I guess it's not too bad. That means that the algorithm builds a complete table of primes up to ~7e7 in that time.
Update #2
A sort-of-but-not-really optimization you could do would be to check all(k % i for i in primes) before calling extend. This would save you a lot of cycles for numbers that have small prime factors, but would probably catch up to you later on, when you would end up having to compute all the primes up to some enormous number. Here is a sample of how you could do that:
primes = []
prev = 0
for num in range(10000000000000000000000):
k = num**(num + 1) + (num + 1)**num
lim = int(sqrt(k)) + 1
if not all(k % i for i in primes):
print('{}: {} Nope'.format(num, k))
continue
start = len(primes)
extend(primes, prev, lim)
if all(k % i for i in primes[start:]):
print('{}: {} Prime!'.format(num, k))
else:
print('{}: {} Nope'.format(num, k))
prev = lim + 1
While this version does not do much for the long run, it does explain why you were able to get to 15 so quickly in your original run. The prime table does note get extended after num == 3, until num == 16, which is when the terrible delay occurs in this version as well. The net runtime to 16 should be identical in both versions.
Update #3
As #paxdiablo suggests, the only numbers we need to consider in extend are multiples of 6 +/- 1. We can combine that with the fact that only a small number of primes generally need to be tested, and convert the functionality of extend into a generator that will only compute as many primes as absolutely necessary. Using Python's lazy generation should help. Here is a completely rewritten version:
from itertools import count
from math import ceil, sqrt
prime_table = [2, 3]
def prime_candidates(start=0):
"""
Infinite generator of prime number candidates starting with the
specified number.
Candidates are 2, 3 and all numbers that are of the form 6n-1 and 6n+1
"""
if start <= 3:
if start <= 2:
yield 2
yield 3
start = 5
delta = 2
else:
m = start % 6
if m < 2:
start += 1 - m
delta = 4
else:
start += 5 - m
delta = 2
while True:
yield start
start += delta
delta = 6 - delta
def isprime(n):
"""
Checks if `n` is prime.
All primes up to sqrt(n) are expected to already be present in
the generated `prime_table`.
"""
s = int(ceil(sqrt(n)))
for p in prime_table:
if p > s:
break
if not n % p:
return False
return True
def generate_primes(max):
"""
Generates primes up to the specified maximum.
First the existing table is yielded. Then, the new primes are
found in the sequence generated by `prime_candidates`. All verified
primes are added to the existing cache.
"""
for p in prime_table:
if p > max:
return
yield p
for k in prime_candidates(prime_table[-1] + 1):
if isprime(k):
prime_table.append(k)
if k > max:
# Putting the return here ensures that we always stop on a prime and therefore don't do any extra work
return
else:
yield k
for num in count():
k = num**(num + 1) + (num + 1)**num
lim = int(ceil(sqrt(k)))
b = all(k % i for i in generate_primes(lim))
print('n={}, k={} is {}prime'.format(num, k, '' if b else 'not '))
This version gets to 15 almost instantly. It gets stuck at 16 because the smallest prime factor for k=343809097055019694337 is 573645313. Some future expectations:
17 should be a breeze: 16248996011806421522977 has factor 19
18 will take a while: 812362695653248917890473 has factor 22156214713
19 is easy: 42832853457545958193355601 is divisible by 3
20 also easy: 2375370429446951548637196401 is divisible by 58967
21: 138213776357206521921578463913 is divisible by 13
22: 8419259736788826438132968480177 is divisible by 103
etc... (link to sequence)
So in terms of instant gratification, this method will get you much further if you can make it past 18 (which will take >100 times longer than getting past 16, which in my case took ~1.25hrs).
That being said, your greatest speedup at this point would be re-writing this in C or some similar low-level language that does not have as much overhead for loops.
Update #4
Just for giggles, here is an implementation of the latest Python version in C. I chose to go with GMP for arbitrary precision integers, because it is easy to use and install on my Red Hat system, and the docs are very clear:
#include <stdio.h>
#include <stdlib.h>
#include <gmp.h>
typedef struct {
size_t alloc;
size_t size;
mpz_t *numbers;
} PrimeTable;
void init_table(PrimeTable *buf)
{
buf->alloc = 0x100000L;
buf->size = 2;
buf->numbers = malloc(buf->alloc * sizeof(mpz_t));
if(buf == NULL) {
fprintf(stderr, "No memory for prime table\n");
exit(1);
}
mpz_init_set_si(buf->numbers[0], 2);
mpz_init_set_si(buf->numbers[1], 3);
return;
}
void append_table(PrimeTable *buf, mpz_t number)
{
if(buf->size == buf->alloc) {
size_t new = 2 * buf->alloc;
mpz_t *tmp = realloc(buf->numbers, new * sizeof(mpz_t));
if(tmp == NULL) {
fprintf(stderr, "Ran out of memory for prime table\n");
exit(1);
}
buf->alloc = new;
buf->numbers = tmp;
}
mpz_set(buf->numbers[buf->size], number);
buf->size++;
return;
}
size_t print_table(PrimeTable *buf, FILE *file)
{
size_t i, n;
n = fprintf(file, "Array contents = [");
for(i = 0; i < buf->size; i++) {
n += mpz_out_str(file, 10, buf->numbers[i]);
if(i < buf->size - 1)
n += fprintf(file, ", ");
}
n += fprintf(file, "]\n");
return n;
}
void free_table(PrimeTable *buf)
{
for(buf->size--; ((signed)(buf->size)) >= 0; buf->size--)
mpz_clear(buf->numbers[buf->size]);
free(buf->numbers);
return;
}
int isprime(mpz_t num, PrimeTable *table)
{
mpz_t max, rem, next;
size_t i, d, r;
mpz_inits(max, rem, NULL);
mpz_sqrtrem(max, rem, num);
// Check if perfect square: definitely not prime
if(!mpz_cmp_si(rem, 0)) {
mpz_clears(rem, max, NULL);
return 0;
}
/* Normal table lookup */
for(i = 0; i < table->size; i++) {
// Got to sqrt(n) -> prime
if(mpz_cmp(max, table->numbers[i]) < 0) {
mpz_clears(rem, max, NULL);
return 1;
}
// Found a factor -> not prime
if(mpz_divisible_p(num, table->numbers[i])) {
mpz_clears(rem, max, NULL);
return 0;
}
}
/* Extend table and do lookup */
// Start with last found prime + 2
mpz_init_set(next, table->numbers[i - 1]);
mpz_add_ui(next, next, 2);
// Find nearest number of form 6n-1 or 6n+1
r = mpz_fdiv_ui(next, 6);
if(r < 2) {
mpz_add_ui(next, next, 1 - r);
d = 4;
} else {
mpz_add_ui(next, next, 5 - r);
d = 2;
}
// Step along numbers of form 6n-1/6n+1. Check each candidate for
// primality. Don't stop until next prime after sqrt(n) to avoid
// duplication.
for(;;) {
if(isprime(next, table)) {
append_table(table, next);
if(mpz_divisible_p(num, next)) {
mpz_clears(next, rem, max, NULL);
return 0;
}
if(mpz_cmp(max, next) <= 0) {
mpz_clears(next, rem, max, NULL);
return 1;
}
}
mpz_add_ui(next, next, d);
d = 6 - d;
}
// Return can only happen from within loop.
}
int main(int argc, char *argv[])
{
PrimeTable table;
mpz_t k, a, b;
size_t n, next;
int p;
init_table(&table);
mpz_inits(k, a, b, NULL);
for(n = 0; ; n = next) {
next = n + 1;
mpz_set_ui(a, n);
mpz_pow_ui(a, a, next);
mpz_set_ui(b, next);
mpz_pow_ui(b, b, n);
mpz_add(k, a, b);
p = isprime(k, &table);
printf("n=%ld k=", n);
mpz_out_str(stdout, 10, k);
printf(" p=%d\n", p);
//print_table(&table, stdout);
}
mpz_clears(b, a, k, NULL);
free_table(&table);
return 0;
}
While this version has the exact same algorithmic complexity as the Python one, I expect it to run a few orders of magnitude faster because of the relatively minimal overhead incurred in C. And indeed, it took about 15 minutes to get stuck at n == 18, which is ~5 times faster than the Python version so far.
Update #5
This is going to be the last one, I promise.
GMP has a function called mpz_nextprime, which offers a potentially much faster implementation of this algorightm, especially with caching. According to the docs:
This function uses a probabilistic algorithm to identify primes. For practical purposes it’s adequate, the chance of a composite passing will be extremely small.
This means that it is probably much faster than the current prime generator I implemented, with a slight cost offset of some false primes being added to the cache. This cost should be minimal: even adding a few thousand extra modulo operations should be fine if the prime generator is faster than it is now.
The only part that needs to be replaced/modified is the portion of isprime below the comment /* Extend table and do lookup */. Basically that whole section just becomes a series of calls to mpz_nextprime instead of recursion.
At that point, you may as well adapt isprime to use mpz_probab_prime_p when possible. You only need to check for sure if the result of mpz_probab_prime_p is uncertain:
int isprime(mpz_t num, PrimeTable *table)
{
mpz_t max, rem, next;
size_t i, r;
int status;
status = mpz_probab_prime_p(num, 50);
// Status = 2 -> definite yes, Status = 0 -> definite no
if(status != 1)
return status != 0;
mpz_inits(max, rem, NULL);
mpz_sqrtrem(max, rem, num);
// Check if perfect square: definitely not prime
if(!mpz_cmp_si(rem, 0)) {
mpz_clears(rem, max, NULL);
return 0;
}
mpz_clear(rem);
/* Normal table lookup */
for(i = 0; i < table->size; i++) {
// Got to sqrt(n) -> prime
if(mpz_cmp(max, table->numbers[i]) < 0) {
mpz_clear(max);
return 1;
}
// Found a factor -> not prime
if(mpz_divisible_p(num, table->numbers[i])) {
mpz_clear(max);
return 0;
}
}
/* Extend table and do lookup */
// Start with last found prime + 2
mpz_init_set(next, table->numbers[i - 1]);
mpz_add_ui(next, next, 2);
// Step along probable primes
for(;;) {
mpz_nextprime(next, next);
append_table(table, next);
if(mpz_divisible_p(num, next)) {
r = 0;
break;
}
if(mpz_cmp(max, next) <= 0) {
r = 1;
break;
}
}
mpz_clears(next, max, NULL);
return r;
}
Sure enough, this version makes it to n == 79 in a couple of seconds at most. It appears to get stuck on n == 80, probably because mpz_probab_prime_p can't determine if k is a prime for sure. I doubt that computing all the primes up to ~10^80 is going to take a trivial amount of time.

How do I return the smallest value using a for loop?

I am given a limit, and I have to return the smallest value for n to make it true: 1+2+3+4+...+n >= limit. I feel like there's one thing missing, but I can't tell.
public int whenToReachLimit(int limit) {
int sum = 0;
for (int i = 1; sum < limit; i++) {
sum = sum + i;
}
return sum;
}
The output would be:
1 : 1
4 : 3
10 : 4
You get avoid the loop to compute the sum of the n first integers, using:
Thus the inequality becomes:
Notice that the left-hand side is positive (if n is negative, the sum is empty) and strictly increasing. Notice also that you are looking for the first integer satisfying the inequality. The idea here is first to replace the inequality by an equality which will allow us to solve the equation for n. In a second step, the possibly non-integer solution will be rounder to the closest integer.
Solving this equation for n should give you two solutions. The negative one can be discarded (remember n is positive). That is:
Finally, let's round this solution to the closest integer that will also satisfy the inequality:
NB: it can be overkilled for small inputs
I'm not sure if I know exactly what you want to do. But I would recommend to make a "practice run".
If Limit = 0 the function returns 0
If Limit = 1 the function returns 1
If Limit = 2 the function return 3
If Limit = 3 the function return 3
If Limit = 4 the function return 6
If Limit = 5 the function return 6
Now you decide by your own if the functions does what you're expecting.
I've found the answer. Turns out it doesn't work with a for loop which I find odd. But this is the answer to my own question.
public int whenToReachLimit(int limit) {
int n = 0;
int sum = 0;
while (sum < limit) {
sum += n;
n++;
}
return n-1;
}
You don't want to return sum, you want to return n (smallest possible value satisfying the given requirement).
return i-1 instead of sum.

Google Interview : Find Crazy Distance Between Strings

This Question was asked to me at the Google interview. I could do it O(n*n) ... Can I do it in better time.
A string can be formed only by 1 and 0.
Definition:
X & Y are strings formed by 0 or 1
D(X,Y) = Remove the things common at the start from both X & Y. Then add the remaining lengths from both the strings.
For e.g.
D(1111, 1000) = Only First alphabet is common. So the remaining string is 111 & 000. Therefore the result length("111") & length("000") = 3 + 3 = 6
D(101, 1100) = Only First two alphabets are common. So the remaining string is 01 & 100. Therefore the result length("01") & length("100") = 2 + 3 = 5
It is pretty that obvious that do find out such a crazy distance is going to be linear. O(m).
Now the question is
given n input, say like
1111
1000
101
1100
Find out the maximum crazy distance possible.
n is the number of input strings.
m is the max length of any input string.
The solution of O(n2 * m) is pretty simple. Can it be done in a better way?
Let's assume that m is fixed. Can we do this in better than O(n^2) ?
Put the strings into a tree, where 0 means go left and 1 means go right. So for example
1111
1000
101
1100
would result in a tree like
Root
1
0 1
0 1* 0 1
0* 0* 1*
where the * means that an element ends there. Constructing this tree clearly takes O(n m).
Now we have to find the diameter of the tree (the longest path between two nodes, which is the same thing as the "crazy distance"). The optimized algorithm presented there hits each node in the tree once. There are at most min(n m, 2^m) such nodes.
So if n m < 2^m, then the the algorithm is O(n m).
If n m > 2^m (and we necessarily have repeated inputs), then the algorithm is still O(n m) from the first step.
This also works for strings with a general alphabet; for an alphabet with k letters build a k-ary tree, in which case the runtime is still O(n m) by the same reasoning, though it takes k times as much memory.
I think this is possible in O(nm) time by creating a binary tree where each bit in a string encodes the path (0 left, 1 right). Then finding the maximum distance between nodes of the tree which can be done in O(n) time.
This is my solution, I think it works:
Create a binary tree from all strings. The tree will be constructed in this way:
at every round, select a string and add it to the tree. so for your example, the tree will be:
<root>
<1> <empty>
<1> <0>
<1> <0> <1> <0>
<1> <0> <0>
So each path from root to a leaf will represent a string.
Now the distance between each two leaves is the distance between two strings. To find the crazy distance, you must find the diameter of this graph, that you can do it easily by dfs or bfs.
The total complexity of this algorithm is:
O(n*m) + O(n*m) = O(n*m).
I think this problem is something like "find prefix for two strings", you can use trie(http://en.wikipedia.org/wiki/Trie) to accerlate searching
I have a google phone interview 3 days before, but maybe I failed...
Best luck to you
To get an answer in O(nm) just iterate across the characters of all string (this is an O(n) operation). We will compare at most m characters, so this will be done O(m). This gives a total of O(nm). Here's a C++ solution:
int max_distance(char** strings, int numstrings, int &distance) {
distance = 0;
// loop O(n) for initialization
for (int i=0; i<numstrings; i++)
distance += strlen(strings[i]);
int max_prefix = 0;
bool done = false;
// loop max O(m)
while (!done) {
int c = -1;
// loop O(n)
for (int i=0; i<numstrings; i++) {
if (strings[i][max_prefix] == 0) {
done = true; // it is enough to reach the end of one string to be done
break;
}
int new_element = strings[i][max_prefix] - '0';
if (-1 == c)
c = new_element;
else {
if (c != new_element) {
done = true; // mismatch
break;
}
}
}
if (!done) {
max_prefix++;
distance -= numstrings;
}
}
return max_prefix;
}
void test_misc() {
char* strings[] = {
"10100",
"10101110",
"101011",
"101"
};
std::cout << std::endl;
int distance = 0;
std::cout << "max_prefix = " << max_distance(strings, sizeof(strings)/sizeof(strings[0]), distance) << std::endl;
}
Not sure why use trees when iteration gives you the same big O computational complexity without the code complexity. anyway here is my version of it in javascript O(mn)
var len = process.argv.length -2; // in node first 2 arguments are node and program file
var input = process.argv.splice(2);
var current;
var currentCount = 0;
var currentCharLoc = 0;
var totalCount = 0;
var totalComplete = 0;
var same = true;
while ( totalComplete < len ) {
current = null;
currentCount = 0;
for ( var loc = 0 ; loc < len ; loc++) {
if ( input[loc].length === currentCharLoc) {
totalComplete++;
same = false;
} else if (input[loc].length > currentCharLoc) {
currentCount++;
if (same) {
if ( current === null ) {
current = input[loc][currentCharLoc];
} else {
if (current !== input[loc][currentCharLoc]) {
same = false;
}
}
}
}
}
if (!same) {
totalCount += currentCount;
}
currentCharLoc++;
}
console.log(totalCount);

Resources