Longest “increasing” subsequence with two consecutive numbers whose average is less than the third number

Longest “increasing” subsequence with two consecutive numbers whose average is less than the third number - dynamic-programming

Problem Statement
Given an array of integers, find the length of the longest subsequence with two consecutive numbers whose average is less than the third number in O(n^3) time.
Example:
[20, 10, 5, 0, 6, 4, 15, 6, 9, 8], the longest subsequence that satisfies the requirement is 5, 0, 6, 4, 6, 9, 8, and the length of that sequence is 7. (5 + 0) / 2 = 2.5 < 6, (0 + 6) / 2 = 3.0 < 4, (6 + 4) / 2 = 5.0 < 6, etc.
What I tried
1st approach: O(n^2)
A generic dynamic programming approach, I define the DP array to be the length of the longest subsequence that satisfies the condition.
If (i-2)th and (i-1)th integers’ average is less than ith integer, we add one to the dp array. The solution is the last element of the DP array.
This didn’t work as I realized it is only considering the numbers in the original array, not the subsequence I am trying to achieve. So, this approach only gave me 5 as the answer for the example input above, and the answer would be 5, 0, 6, 4, 15. The approach did not account for disjoint parts of the original sequence to create the new subsequence.
1.5th approach
While writing out the problem on my notes, I realized the corresponding average subsequence for the example input is the longest. Following the idea of a LIS problem, I created an array of all the average numbers to find the longest increasing subsequence in that array. This solved the example input but failed more complicated inputs.
2nd approach: O(n^3)
Using the hint of the problem statement that the algorithm can be O(n^3), so I tried coming up with a definition for a 2D DP array and a loop to make it O(n^3). I defined the DP[i][j] to be the length of the longest subsequence from the start element to the ith element, while considering the jth element.
Considering the example input, for instance, DP[2][6] = 3 because the subsequence would be 10, 5, 15. From the first element to the 2nd index element, we consider the subsequence 10, 5, and the 6th index element is 15, so the subsequence here is 10, 5, 15, and the length is 3. Repeat until every half above the main diagonal of the table is filled, and the solution is the last element (last row, last column) in that half.
I thought this was it, but there were problems I ran into such as not knowing which part of the DP table should i be reusing and not knowing what exactly are my last two numbers of the subsequence I am trying to achieve. Ultimately, I didn’t know where to go next.
Other thoughts
I think a 3D DP array could also work, but I haven’t really thought about how I would define the array…
Any help would be greatly appreciated!

Related

How to get the second largest value in a column

Recently I discovered the LARGE and SMALL worksheet functions, one can use for determining the first, second, third, ... larges of smalles value in an array.
At least, that's what I thought:
When having a look at the array [1, 3, 5, 7, 9] (in one column or row), the LARGE(...;2) gives 7 as expected, but:
When having a look at the array [1, 1, 5, 9, 9], I expect LARGE(...;2) to give 5 but instead I get 9.
Now this makes sense : it seems that the function LARGE(...;2) takes the largest entry in the array (value 9 on the last but one place), deletes this and gives the larges entry of the reduced array (which still contains another 9), but this is not what one might expect intuitively.
In order to get 5 from [1, 1, 5, 9, 9], I would need something like:
=LARGE_OF_UNIQUE_VALUES_OF(...;2))
I didn't find this in LARGE documentation.
Does anybody know an easy way to achieve this?

If you have the new Dynamic Array formulas:
=LARGE(UNIQUE(...),2)
If not use AGGREGATE:
=AGGREGATE(14,7,A1:A5/(MATCH(A1:A5,A1:A5)=ROW(A1:A5)),2)

This is a bit of a hack.
=LARGE(IF(YOUR_DATA=LARGE(YOUR_DATA,1),SMALL(YOUR_DATA,1)-1,YOUR_DATA),1)
The idea is to (a) take any value in your data that is equal to the largest element and set it to less than the smallest element, then (b) find the (new) largest element. It's OK if you want the 2nd largest, but extending to 3rd largest etc. gets progressively uglier.
Hope that helps

Interview question about "largest range" makes no sense

Here's the question. I'm actually dumbfounded. I don't even get the question. What are they on about?
What even is a largest range? What do they mean by largest? What's a range? They say a range is a collection of numbers that come right after each other in the set of real integers. Okay, so 1, 2, 3, 4, stuff like that, right? But then they say the numbers need not be ordered or even adjacent.... but then they're not coming right after each other!! They are contradicting their own previous statement. Now I have no idea what a range is.
Their example doesn't help either. Why is [0, 15, 5, 2, 4, 10, 7] the largest range in that vector?
What is going on?

It's not very clear in the question, but I'm pretty sure the interviewer means a "range" is a set of consecutive numbers (n, n+1).
The range [0,7] is actually [0,1,2,3,4,5,6,7] since all of those appear in the full set.
The actual order doesn't matter.

In the example you were given in the interview, which you list in your question as well, the input array is: [1, 11, 3, 0, 15, 5, 2, 4, 10, 7, 12, 6]. The reason that the "largest range" is identified as [0, 7] is because all the numbers between 0 and 7 are included in that array.
There isn't another range in the input array that has a longer range than 0 to 7. For instance, there is a [10, 12] range in the input array, but that array has a length of 3 that is smaller than the length of [0, 7] range, which is 8.
In this case, the range is understood as a continuous list of integers, the largest range is the list with the most number of integers.

It means
Find the largest continuous range of numbers
For eg. in array [0,1,2,5,6,7,8,9,10]
There are 2 continuous list
[0,1,2] and [5,6,7,8,9,10] but as the larger range is the second one. so the output must be [5,10].
i.e. The largest and smallest of the largest range.

Delete as few as possible digits to make number divisible by 3

I was solving this question, namely we have given number N, which can be very big, it can have up to 100000 digits.
Now I want to know what is the most efficient way to find those digits, and I think that in big numbers I will need to delete at most 3 digits to make the number divisible by 3.
I know that number is divisible by three if the sum of its digits is divisible by three, but I can't think of how can we use this.
My idea is to brute force over the string and to check if we delete that digit is the number going to be divisible by 3, but my solution fails at complex examples. Please give me some hints.
Thanks in advance.

If the sum of the digits modulo 3 is equal to 1, you want to delete a single 1, 4, or 7. If the sum of the digits is 2, you want to delete a single 2, 5, or 8.
If that can't be done, then you have to delete two digits.
To avoid scanning the list twice, you could remember the indices of up to two digits congruent to 1, and the indices of up to two digits congruent to 2, so when you compute the final modulus you know where to look.

The number 3 has some special properties relative to a base-10 number system that you can leverage.
10 is 1 more than 9, and 9 is evenly divisible by 3, so the "1" in "10" acts as a sort of remainder from adding 1 to 9. As a result, if the sum of all digits in the number is evenly divisible by 3 then that number is also divisible by 3.
So if you begin by figuring out what the modulo is after adding all the digits, then you'll know whether the number is divisible by zero (i.e. results in a modulo zero) or not. If not, then you can subtract one digit at a time, recalculating the modulo of the resulting number until you end up with a modulo of zero.

You should check what makes a number divisible by 3. If you find it you should divide the problem into smaller problems

Subsequences whose sum of digits is divisible by 6

Say I have a string whose characters are nothing but digits in [0 - 9] range. E.g: "2486". Now I want to find out all the subsequences whose sum of digits is divisible by 6. E.g: in "2486", the subsequences are - "6", "246" ( 2+ 4 + 6 = 12 is divisible by 6 ), "486" (4 + 8 + 6 = 18 is divisible by 6 ) etc. I know generating all 2^n combinations we can do this. But that's very costly. What is the most efficient way to do this?
Edit:
I found the following solution somewhere in quora.
int len,ar[MAXLEN],dp[MAXLEN][MAXN];
int fun(int idx,int m)
{
if(idx==len)
return (m==0);
if(dp[idx][m]!=-1)
return dp[idx][m];
int ans=fun(idx+1,m);
ans+=fun(idx+1,(m*10+ar[idx])%n);
return dp[idx][m]=ans;
}
int main()
{
// input len , n , array
memset(dp,-1,sizeof(dp));
printf("%d\n",fun(0,0));
return 0;
}
Can someone please explain what is the logic behind the code - 'm*10+ar[idx])%n' ? Why is m multiplied by 10 here?

Say you have a sequence of 16 digits You could generate all 216 subsequences and test them, which is 65536 operations.
Or you could take the first 8 digits and generate the 28 possible subsequences, and sort them based on the result of their sum modulo 6, and do the same for the last 8 digits. This is only 512 operations.
Then you can generate all subsequences of the original 16 digit string that are divisible by 6 by taking each subsequence of the first list with a modulo value equal to 0 (including the empty subsquence) and concatenating it with each subsequence of the last list with a modulo value equal to 0.
Then take each subsequence of the first list with a modulo value equal to 1 and concatenate it with each subsequence of the last list with a modulo value equal to 5. Then 2 with 4, 3 with 3, 4 with 2 and 5 with 1.
So after an initial cost of 512 operations you can generate just those subsequences whose sum is divisible by 6. You can apply this algorithm recursively for larger sequences.

Create an array with a 6-bit bitmap for each position in the string. Work from right to left and set the array of bitmaps so that bitmaps have bits set in the array when there is some subsequence starting from just after the array which sums up to that position in the bitmap. You can do this from right to left using the bitmap just after the current position. If you see a 3 and the bitmap just after the current position is 010001 then sums 1 and 5 are already accessible by just skipping the 3. Using the 3 sums 4 and 2 are now available, so the new bitmap is 011011.
Now do a depth first search for subsequences from left to right, with the choice at each character being either to take that character or not. As you do this keep track of the mod 6 sum of the characters taken so far. Use the bitmaps to work out whether there is a subsequence to the right of that position that, added to the sum so far, yields zero. Carry on as long as you can see that the current sum leads to a subsequence of sum zero, otherwise stop and recurse.
The first stage has cost linear in the size of the input (for fixed values of 6). The second stage has cost linear in the number of subsequences produced. In fact, if you have to actually write out the subsequences visited (E.g. by maintaining an explicit stack and writing out the contents of the stack) THAT will be the most expensive part of the program.
The worst case is of course input 000000...0000 when all 2^n subsequences are valid.

I'm pretty sure a user named, amit, recently answered a similar question for combinations rather than subsequences where the divisor is 4, although I can't find it right now. His answer was to create, in this case, five arrays (call them Array_i) in O(n) where each array contains the array elements with a modular relationship i with 6. With subsequences we also need a way to record element order. For example, in your case of 2486, our arrays could be:
Array_0 = [null,null,null,6]
Array_1 = []
Array_2 = [null,4,null,null]
Array_3 = []
Array_4 = [2,null,8,null]
Array_5 = []
Now just cross-combine the appropriate arrays, maintaining element order: Array_0, Array_2 & Array_4, Array_0 & any other combination of arrays:
6, 24, 48, 246, 486

Unable to understand the concept mentioned in http://pine.cs.yale.edu/pinewiki/SuffixArrays

Please explain:
Suppose we have a suffix array corresponding to an n-character text and we want to find all occurrences in the text of an m-character pattern. Since the suffixes are ordered, the easiest solution is to do binary search for the first and last occurrences of the pattern (if any) using O(log n) comparisons.
I need to know how I can get all occurrences of pattern after determining the first and last occurrence of pattern.

The text you quoted is slightly confusing, perhaps even misleading, in two ways:
It says it suffices to find the first and last occurrence of the pattern, but it should say, more precisely: the first and last occurrence of the pattern in the suffix array. That is not the same as the first and last occurrence in the underlying text.
It says you need O(log n) comparisons. This is only true if "comparison" refers to a string comparison of up to m characters. Since comparing up to m characters takes O(m) time, the number of computational steps (e.g. in the standard RAM model) is O(m*log n). It can be improved if auxiliary data structures are built and used, such as the LCP (longest-common-prefix) array.
Now, to answer your question: Taking (1.) above into account, you get all occurrences of the pattern easily because the suffix array is sorted lexicographically. This means the first occurrence is the lexicographically smallest, and the last occurrence is the lexicographically greatest. Hence, the remaining occurrences must be in between the first and the last.
Example. Consider the string bcfabcabxbbcabcgdebcd. Its suffix array (represented as starting positions of suffixes, counting from 0) is
[3, 12, 6, 9, 10, 4, 18, 0, 13, 7, 11, 5, 19, 1, 14, 20, 16, 17, 2, 15, 8]
which corresponds to the following list of suffixes:
3 : abcabxbbcabcgdebcd
12 : abcgdebcd
6 : abxbbcabcgdebcd
9 : bbcabcgdebcd
10 : bcabcgdebcd <======= first occurrence of 'bc'
4 : bcabxbbcabcgdebcd
18 : bcd
0 : bcfabcabxbbcabcgdebcd
13 : bcgdebcd <======= last occurrence of 'bc'
7 : bxbbcabcgdebcd
11 : cabcgdebcd
5 : cabxbbcabcgdebcd
19 : cd
1 : cfabcabxbbcabcgdebcd
14 : cgdebcd
20 : d
16 : debcd
17 : ebcd
2 : fabcabxbbcabcgdebcd
15 : gdebcd
8 : xbbcabcgdebcd
Suppose the pattern you are looking for is 'bc'. I have marked the first and last occurrences of that pattern in the suffix array. Because of the lexicographical sorting, all entries in between must start with 'bc' as well, and any entry starting with 'bc' must be somewhere in between. Therefore all suffixes starting with 'bc', i.e. all positions of occurrences of 'bc', must be between this first and last occurrence.
Expressed as position integers, the range we identified is
[10, 4, 18, 0, 13]
Hence positions 10, 4, 18, 0 and 13 mark occurrences of the pattern.
(Note that in practice the full string list of the suffixes is not used – only the integer position list.)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string