compare strings lexicographically in multiple queries - string

Compare two numbers in the form of string lexicographically multiple times.
What I tried
the question is straight forward, to compare string after change but I am getting Time limit exceeded error because of multiple queries.
I was searching internet and came across segment tree to solve range queries. But alas I am not able to visualise, how it can help here.
Any hint is appreciated.

Segment tree seems overkill for this problem. When will B be lexicographically larger than A? When at index i, A[i] = 0, B [i] = 1 and A[0:i] = B [0:i]. Iterate over both strings at same time and keep in a set all indexes where they are different.
For each query at index i, update B to 1. Then check if B[i] = A[i]. IF they are equal, erase i from the indexes set. Otherwise, add it to the set. If there is no index left in the set, A and B are now equal => answer YES.
If there is at least 1 element, get the lowest index. If this index is j, that means A[0:j] = B[0:j] but A[j] != B[j]. So either A is 0 and B is 1 or A is 1 and B is 0. Depending on that, answer YES or NO.
This has complexity of O(Q log N), being Q the amount of queries

Related

how to find all numbers that there distance from given point are less or equall to integer n

given a set of points D and some number K I want to find all numbers that are in D such that the distance between K and any found number is less or equal to integer N?
Example:
suppose we have D={5,9,0,6,7} and K=8 and N=1 then the result should be {9,7}
I was thinking to use k-d tree or VP tree but both as I understand (correct me if I am wrong please) find nearest neighbors and do not care about N in my example.
To summarize all the comment:
Solve this problem as brute force will take O(n) time as iterate on each element in D and check if its distance from k is less then n.
You have big data set but a lot of queries it is better to do pre-processing on D (with O(nlogn) and the you can get the answer in O(logn) -> by sorting D as pre-processes (in O(nlogn) as dimple sort of array.
Now, on given query search for k - notice binary search will stop if the number missing but he do stop at the closest value. From that index start spread to both side of D and for each check if still in n range. Notice the spreading in allow as it is include of O(|output|).
In your example: sorted D yield: [0,5,6,7,9]. Try finding k=8 will give false but index 3 or 4 (depended on the implementation). Let say is return index 3. for 3 till last index check if arr[i] - k < n if so print - if bigger stop. For the other side check k - arr[i] < n - if so print and if bigger stop -> this will give you 7,9
Hope that helps!

Coin Change Algorithm - DP with 1D array

I came across a solution to the Coin Change problem here : Coin Change. Here I was able to understand the first recursive method, the second method which uses DP with a 2D array. But am not able to understand the logic behind the third solution.
As far as I have thought, the last method works for problems in which the sequence of coins used in coin change is considered. Am I correct? Can anyone please explain me if I am wrong.
Well I figured it out myself!
This can be easily proved using induction. Let table[k] denote the ways change can be given for a total of k. Now the algorithm consists of two loops, one which is controlled by i and iterates through the array containing all the different coins and the other is the j controlled loop which for a given i, updates all the values of elements in array table. Now consider for a fixed i we have calculated the number of ways change can be given for all values from 1 to n and these values are stored in table from table[1] to table[n]. When the i controlled loop iterates for i+1, the value in table[j] for an arbitrary j is incremented by table[j-S[i + 1]] which is nothing but the ways we can create j using at least one coin with value S[i + 1] (the array which stores coin values). Thus the total value in table[j] equals the number of ways we can create a change with coins of value S[1]....S[i] (this was already stored before) and the value table[j-S[i + 1]]. This is same as the optimal substructure of the problem used in the recursive algorithm.
int arr[size];
memset(arr,0,sizeof(size));
int n;
cin>>n;
int sum;
cin>>sum;
int a[size];
fi(i,n)
cin>>a[i];
arr[0]=1;
fi(i,n)
for(int j=arr[i]; j<=n; j++)
a[j]+=a[j-arr[i]];
cout<<arr[n];
The array arr is initialised as 0 so as to show that the number of ways a sum of ican be represented is zero(that is not initialised). However, the number of ways in which a sum of 0 can be represented is 1 (zero way).
Further, we take each coin and start initialising each position in the array starting from the coin denomination.
a[j]+=a[j-arr[i]] means that we are basically incrementing the possible ways to represent the sum jby the previous number of ways, required (j-arr[i]).
In the end, we output the a[n]

Use dynamic programming to find a subset of numbers whose sum is closest to given number M

Given a set A of n positive integers a1, a2,... a3 and another positive integer M, I'm going to find a subset of numbers of A whose sum is closest to M. In other words, I'm trying to find a subset A′ of A such that the absolute value |M - 􀀀 Σ a∈A′| is minimized, where [ Σ a∈A′ a ] is the total sum of the numbers of A′. I only need to return the sum of the elements of the solution subset A′ without reporting the actual subset A′.
For example, if we have A as {1, 4, 7, 12} and M = 15. Then, the solution subset is A′ = {4, 12}, and thus the algorithm only needs to return 4 + 12 = 16 as the answer.
The dynamic programming algorithm for the problem should run in
O(nK) time in the worst case, where K is the sum of all numbers of A.
You construct a Dynamic Programming table of size n*K where
D[i][j] = Can you get sum j using the first i elements ?
The recursive relation you can use is: D[i][j] = D[i-1][j-a[i]] OR D[i-1][j] This relation can be derived if you consider that ith element can be added or left.
Time complexity : O(nK) where K=sum of all elements
Lastly you iterate over entire possible sum you can get, i.e. D[n][j] for j=1..K. Which ever is closest to M will be your answer.
For dynamic algorithm, we
Define the value we would work on
The set of values here is actually a table.
For this problem, we define value DP[i , j] as an indicator for whether we can obtain sum j using first i elements. (1 means yes, 0 means no)
Here 0<=i<=n, 0<=j<=K, where K is the sum of all elements in A
Define the recursive relation
DP[i+1 , j] = 1 , if ( DP[i,j] == 1 || DP[i,j-A[i+1]] ==1)
Else, DP[i+1, j] = 0.
Don't forget to initialize the table to 0 at first place. This solves boundary and trivial case.
Calculate the value you want
Through bottom-up implementation, you can finally fill the whole table.
Now, things become easy. You just need to find out the closest value to M in the table whose value is one.
Here, just work on DP[n][j], since n covers the whole set. Find the closest j to M whose value is 1.
Time complexity is O(kn), since you iterate k*n times in total.

Binary search - worst/avg case

I'm finding it difficult to understand why/how the worst and average case for searching for a key in an array/list using binary search is O(log(n)).
log(1,000,000) is only 6. log(1,000,000,000) is only 9 - I get that, but I don't understand the explanation. If one did not test it, how do we know that the avg/worst case is actually log(n)?
I hope you guys understand what I'm trying to say. If not, please let me know and I'll try to explain it differently.
Worst case
Every time the binary search code makes a decision, it eliminates half of the remaining elements from consideration. So you're dividing the number of elements by 2 with each decision.
How many times can you divide by 2 before you are down to only a single element? If n is the starting number of elements and x is the number of times you divide by 2, we can write this as:
n / (2 * 2 * 2 * ... * 2) = 1 [the '2' is repeated x times]
or, equivalently,
n / 2^x = 1
or, equivalently,
n = 2^x
So log base 2 of n gives you x, which is the number of decisions being made.
Finally, you might ask, if I used log base 2, why is it also OK to write it as log base 10, as you have done? The base does not matter because the difference is only a constant factor which is "ignored" by Big O notation.
Average case
I see that you also asked about the average case. Consider:
There is only one element in the array that can be found on the first try.
There are only two elements that can be found on the second try. (Because after the first try, we chose either the right half or the left half.)
There are only four elements that can be found on the third try.
You can see the pattern: 1, 2, 4, 8, ... , n/2. To express the same pattern going in the other direction:
Half the elements take the maximum number of decisions to find.
A quarter of the elements take one fewer decision to find.
etc.
Since half of the elements take the maximum amount of time, it doesn't matter how much less time the other elements take. We could assume that all elements take the maximum amount of time, and even if half of them actually take 0 time, our assumption would not be more than double whatever the true average is. We can ignore "double" since it is a constant factor. So the average case is the same as the worst case, as far as Big O notation is concerned.
For binary search, the array should be arranged in ascending or descending order.
In each step, the algorithm compares the search key value with the key value of the middle element of the array.
If the keys match, then a matching element has been found and its index, or position, is returned.
Otherwise, if the search key is less than the middle element's key, then the algorithm repeats its action on the sub-array to the left of the middle element.
Or, if the search key is greater,then the algorithm repeats its action on the sub-array to the right.
If the remaining array to be searched is empty, then the key cannot be found in the array and a special "not found" indication is returned.
So, a binary search is a dichotomic divide and conquer search algorithm. Thereby it takes logarithmic time for performing the search operation as the elements are reduced by half in each of the iteration.
For sorted lists which we can do a binary search, each "decision" made by the binary search compares your key to the middle element, if greater it takes the right half of the list, if less it will take the left half of the list (if it's a match it will return the element at that position) you effectively reduce your list by half for every decision yielding O(logn).
Binary search however, only works for sorted lists. For un-sorted lists you can do a straight search starting with the first element yielding a complexity of O(n).
O(logn) < O(n)
Although it entirely depends on how many searches you'll be doing, your inputs, etc what your best approach would be.
For Binary search the prerequisite is a sorted array as input.
• As the list is sorted:
• Certainly we don't have to check every word in the dictionary to look up a word.
• A basic strategy is to repeatedly halve our search range until we find the value.
• For example, look for 5 in the list of 9 #s below.v = 1 1 3 5 8 10 18 33 42
• We would first start in the middle: 8
• Since 5<8, we know we can look at just the first half: 1 1 3 5
• Looking at the middle # again, narrow down to 3 5
• Then we stop when we're down to one #: 5
How many comparison is needed: 4 =log(base 2)(9-1)=O(log(base2)n)
int binary_search (vector<int> v, int val) {
int from = 0;
int to = v.size()-1;
int mid;
while (from <= to) {
mid = (from+to)/2;
if (val == v[mid])
return mid;
else if (val > v[mid])
from = mid+1;
else
to = mid-1;
}
return -1;
}

String pre-processing step, to answer further queries in O(1) time

A string is given to you and it contains characters consisting of only 3 characters. Say, x y z.
There will be million queries given to you.
Query format: x z i j
Now in this we need to find all possible different substrings which begins with x and ends in z. i and j denotes the lower and upper bound of the range where the substring must lie. It should not cross this.
My Logic:-
Read the string. Have 3 arrays which will store the count of x y z respectively, for i=0 till strlen
Store the indexes of each characters separately in 3 more arrays. xlocation[], ylocation[], zlocation[]
Now, accordingly to the query, (a b i j) find all the indices of b within the range i and j.
Calculate the answer, for each index of b and sum it to get the result.
Is it possible to pre-process this string before the query? So, like that it takes O(1) time to answer the query.
As the others suggested, you can do this with a divide and conquer algorithm.
Optimal substructure:
If we are given a left half of the string and a right half and we know how many substrings there are in the left half and how many there are in the right half then we can add the two numbers together. We will be undercounting by all the strings that begin in the left and end in the right. This is simply the number of x's in the left substring multiplied by the number of z's in the right substring.
Therefore we can use a recursive algorithm.
This would be a problem however if we tried to solve for everything single i and j combination as the bottom level subproblems would be solved many many times.
You should look into implementing this with a dynamic programming algorithm keeping track of substrings in range i,j, x's in range i,j, and z's in range i,j.

Resources