Longest Common Substring (using Intuition for Longest Common Subsequence)

Longest Common Substring (using Intuition for Longest Common Subsequence) - dynamic-programming

I've been trying to learn Dynamic Programming. And I have come across two seemingly similar problems "Longest Common Subsequence" and "Longest Common Substring"
So we assume we have 2 strings str1 and str2.
For Longest Common Subsequence, we create the dp table as such:
if str1[i] != str2[j]:
dp[i][j] = max(dp[i-1][j], sp[i][j-1])
else:
dp[i][j] = 1 + dp[i-1][j-1]
Following the same intuition, for "Longest Common Substring" can we do the following:
if str1[i] != str2[j]:
dp[i][j] = max(dp[i-1][j], sp[i][j-1])
else:
if str1[i-1] == str2[j-1]:
dp[i][j] = 1 + dp[i-1][j-1]
else:
dp[i][j] = 1 + dp[i-1][j-1]
The check if str1[i-1] == str2[j-1] confirms that we are checking for substrings and not subsequence

I didn't understand what you are asking, but I'll try to give a good explanation for the Longest Common Substring.
Let DP[x][y] be the maximum common substring, considering str1[0..x] and str2[0..y].
Consider that we are computing DP[a][b], we always have the possibility of not using this character = max( DP[a][b - 1], DP[a - 1][b] ) and if str1[a] == str2[b] we can also take the answer of DP[a - 1][b - 1] + 1 ( this +1 exist because we have found a new matching character )
// This does not depend on s[i] == s[j]
dp[i][j] = max( dp[i][j - 1], dp[i - 1][j] )
if str1[i] == str2[j]:
dp[i][j] = max( dp[i][j], dp[i - 1][j - 1] + 1 )

Related

Number of palindromic subsequences of length 5

Given a String s, return the number of palindromic subsequences of length 5.
Test case 1:
input : "abcdba"
Output : 2
"abcba" and "abdba"
Test case 2:
input : "aabccba"
Output : 4
"abcba" , "abcba" , "abcba" , "abcba"
Max length of String: 700
My TLE Approach: O(2^n)
https://www.online-java.com/5YegWkAVad
Any inputs are highly appreciated...

Whenever 2 characters match, we only have to find how many palindromes of length 3 are possible in between these 2 characters.
For example:
a bcbc a
^ ^
|_ _ _ |
In the above example, you can find 2 palindromes of length 3 which is bcb and cbc. Hence, we can make palindromic sequence of length 5 as abcba or acbca. Hence, the answer is 2.
Computing how many palindromes of length 3 are possible for every substring can be time consuming if we don't cache the results when we do it the first time. So, cache those results and reuse them for queries generated by other 2 character matches. (a.k.a dynamic programming)
This way, the solution becomes quadratic O(n^2) time where n is length of the string.
Snippet:
private static long solve(String s){
long ans = 0;
int len = s.length();
long[][] dp = new long[len][len];
/* compute how many palindromes of length 3 are possible for every 2 characters match */
for(int i = len - 2;i >= 0; --i){
for(int j = i + 2; j < len; ++j){
dp[i][j] = dp[i][j-1] + (dp[i + 1][j] == dp[i + 1][j-1] ? 0 : dp[i + 1][j] - dp[i + 1][j - 1]);
if(s.charAt(i) == s.charAt(j)){
dp[i][j] += j - i - 1;
}
}
}
/* re-use the above data to calculate for palindromes of length 5*/
for(int i = 0; i < len; ++i){
for(int j = i + 4; j < len; ++j){
if(s.charAt(i) == s.charAt(j)){
ans += dp[i + 1][j - 1];
}
}
}
//for(int i=0;i<len;++i) System.out.println(Arrays.toString(dp[i]));
return ans;
}
Online Demo
Update:
dp[i][j] = dp[i][j-1] + (dp[i + 1][j] == dp[i + 1][j-1] ? 0 : dp[i + 1][j] - dp[i + 1][j - 1]);
The above line basically mean this,
For any substring, say bcbcb, with matching first and last b, the total 3 length palindromes can be addition of
The total count possible for bcbc.
The total count possible for cbcb.
The total count possible for bcbcb (which is (j - i - 1) in the if condition).
dp[i][j] For the current substring at hand.
dp[i][j-1] - Adding the previous substring counts of length 3. In this example, bcbc.
dp[i + 1][j], Adding the substring ending at current index excluding the first character. (Here, cbcb).
dp[i + 1][j] == dp[i + 1][j-1] ? 0 : dp[i + 1][j] - dp[i + 1][j - 1] This is to basically avoid duplicate counting for internal substrings and only adding them if there is a difference in the counts.

Observation:
The preceding method is too cool because it gives the impression of a number of palindrome substrings of length 5, whereas the preceding method is o(n^2) using 2 DP. Cant we reduced to o(n) by using 3 dp.  Yes we can becuase here n should should be length of string but next  2 parameters length lie in range 0 to 25.
Eg: dp[i][j][k]  j, k is between 0 and 25. i is between 0 and the length of the string.
We can't get an idea directly from observation, so go to the institution.
Intitution:
Of length 3
For palindromic substring of length 3 should be  number of palindrome substring of length 3 would count the occurence of left to the index multiply with right side of the index .
Eg: _ s[i]_  => number of palindromic substring of length 3 should be at index is occurence of each alphabet before multiply with after index. So that it becomes palindrome of length 3.
Time complexity : o(n)
Of length 5
Similary for the case of length if 5 => _ _ s[i] _ _ Here number of occurence of combination of 2 characters before index multiply with after index, So that it becomes palindrome of length 5.
Eg: x y s[i] y x ; x,y belongs to a to z. Here we need to store occurence of xy before index and after index.
Time complexity : o(26 * 26 * n)
Of length 7
Similary for the case of length if 7 => _ _ _ s[i] _ _ _ Here number of occurence of combination of 3 characters before index multiply with after index, So that it becomes palindrome of length 7.
Eg: x y z s[i] z y x ; x,y ,z belongs to a to z. Here we need to store occurence of xyz before index and after index.
Time complexity : o(26 * 26 * 26*n)
Code
int pre[10000][26][26], suf[10000][26][26], cnts[26] = {};
int countPalindromes(string s) {
int mod = 1e9 + 7, n = s.size(), ans = 0;
for (int i = 0; i < n; i++) {
int c = s[i] - '0';
if (i)
for (int j = 0; j < 26; j++)
for (int k = 0; k < 26; k++) {
pre[i][j][k] = pre[i - 1][j][k];
if (k == c) pre[i][j][k] += cnts[j];
}
cnts[c]++;
}
memset(cnts, 0, sizeof(cnts));
for (int i = n - 1; i >= 0; i--) {
int c = s[i] - '0';
if (i < n - 1)
for (int j = 0; j < 26; j++)
for (int k = 0; k < 26; k++) {
suf[i][j][k] = suf[i + 1][j][k];
if (k == c) suf[i][j][k] += cnts[j];
}
cnts[c]++;
}
for (int i = 2; i < n - 2; i++)
for (int j = 0; j < 26; j++)
for (int k = 0; k < 26; k++)
ans = (ans + 1LL * pre[i - 1][j][k] * suf[i + 1][j][k]) % mod;
return ans;
}
Reference
Here's a link! for related problem , there 0 to 9, Most voted blog for problem.

How to reduce the cognitive complexity in Python3

I have a question regarding this piece of code:
doc = nlp(text)
words = nlp(text).ents[0]
for entity in doc.ents:
self.entity_list = [entity]
left = [
{'Left': str(words[entity.start - 1])} if words[entity.start - 1] and not words[entity.start - 1].is_punct and not
words[entity.start - 1].is_space
else
{'Left': str(words[entity.start - 2])} if words[entity.start - 2] and not words[entity.start - 2].is_punct and not
words[entity.start - 2].is_space
else
{'Left': str(words[entity.start - 3])} for entity in nlp(text).ents]
entities = [{'Entity': str(entity)} for entity in doc.ents]
right = [
{'Right': str(words[entity.end])} if (entity.end < self.entity_list[-1].end) and not words[
entity.end].is_punct and not words[entity.end].is_space
else
{'Right': str(words[entity.end + 1])} if (entity.end + 1 < self.entity_list[-1].end) and not words[
entity.end + 1].is_punct and not words[entity.end + 1].is_space
else
{'Right': str(words[entity.end + 2])} if (entity.end + 2 < self.entity_list[-1].end) and not words[
entity.end + 2].is_punct and not words[entity.end + 2].is_space
else
{'Right': 'null'}
for entity in nlp(text).ents]
I was asking for a solution a few days ago, regarding obtaining side words of an entity with SpaCy in Python3.
I found the solution and updated my question with the answer. However, it looks very complicated and ugly.
My question is:
How can I reduce the cognitive complexity here in order to get more clean and readable code?
Maybe with iterator? or something that Python3 has to control this kind of structures better?
If anyone has a solution or suggestion for that, I would appreciate.

You should both move the computation of indexes to dedicated functions and iterate instead of manually listing
def get_left_index(entity, words):
for i in range(1, 3):
if (
words[entity.start - i]
and not words[entity.start - i].is_punct
and not words[entity.start - i].is_space
):
return entity.start - i
return entity.start - (i + 1)
def get_right_index(entity, entity_list, words):
for i in range(3):
if (
(entity.end + i < entity_list[-1].end)
and not words[entity.end + i].is_punct
and not words[entity.end + i].is_space
):
return entity.end + i
left = [
{"Left": str(words[get_left_index(entity, words)])} for entity in nlp(text).ents
]
entities = [{"Entity": str(entity)} for entity in doc.ents]
right = [
{"Right": str(words[get_right_index(entity, self.entity_list, words)])}
if get_right_index(entity, self.entity_list, words) is not None
else {"Right": "null"}
for entity in nlp(text).ents
]

Which algorithm to use for sorting a sequence of strings into blocks avoiding beginnig with a certain element?

Let's say there is a list: ['a', 'a', 'a', 'b', 'a', 'a', 'b', 'b', 'b'].
Which algorithm can you use to sort it into groups which have 2-4 elements?
There should be as little as possible groups beginning with 'b'.
In this example: aa | aba | abbb
This is for a homework to find a optimal algorithm.

If we are allowed to change the order of the elements, the problem is trivial: just group all the as and insert three bs after every a while possible, creating groups of size 4 (or less if there aren't enough bs). After that, if there are any remaining bs, group them in groups of size 4; otherwise, group the remaining as as you like.
If we are obliged to keep the element order, the problem becomes more interesting and we can solve it in O(n) where n is the number of elements using the following recurrence: let f(i, j) represent the number of groups starting with b in an optimal collection up to index i, where this element is the jth one of its group. (Since j ranges from 2 to 4, the complexity is O(3n) = O(n).) Then:
f(i, j) =
if A[i - j + 1] == 'b':
return 1 + min(f(i - j, k)), for 1 < k < 5
else:
return min(f(i - j, k)), for 1 < k < 5
Naive top-down in JavaScript:
function f(A, i, j){
if (j > i + 1 || i == 0)
return Infinity
if (i == 1)
return A[0] == 'b' ? 1 : 0
let prev = Infinity
for (let k=2; k<5; k++)
prev = Math.min(f(A, i - j, k), prev)
return prev + (A[i - j + 1] == 'b' ? 1 : 0)
}
var A = "aaabaabbb"
console.log(A)
for (let j=2; j<5; j++)
console.log(`If the last char is ${j + [,,'nd','rd','th'][j]}, then optimal is ${f(A, A.length-1, j)}`)
A = "aaabaabbbb"
console.log('\n' + A)
for (let j=2; j<5; j++)
console.log(`If the last char is ${j + [,,'nd','rd','th'][j]}, then optimal is ${f(A, A.length-1, j)}`)

total substrings with k ones

Given a binary string s, we need to find the number of its substrings, containing exactly k characters that are '1'.
For example: s = "1010" and k = 1, answer = 6.
Now, I solved it using binary search technique over the cumulative sum array.
I also used another approach to solve it. The approach is as follows:
For each position i, find the total substrings that end at i containing
exactly k characters that are '1'.
To find the total substrings that end at i containing exactly k characters that are 1, it can be represented as the set of indices j such that substring j to i contains exactly k '1's. The answer would be the size of the set. Now, to find all such j for the given position i, we can rephrase the problem as finding all j such that
number of ones from [1] to [j - 1] = the total number of ones from 1 to i - [the total number of ones from j to i = k].
i.e. number of ones from [1] to [j - 1] = C[i] - k
which is equal to
C[j - 1] = C[i] - k,
where C is the cumulative sum array, where
C[i] = sum of characters of string from 1 to i.
Now, the problem is easy because, we can find all the possible values of j's using the equation by counting all the prefixes that sum to C[i] - k.
But I found this solution,
int main() {
cin >> k >> S;
C[0] = 1;
for (int i = 0; S[i]; ++i) {
s += S[i] == '1';
++C[s];
}
for (int i = k; i <= s; ++i) {
if (k == 0) {
a += (C[i] - 1) * C[i] / 2;
} else {
a += C[i] * C[i - k];
}
}
cout << a << endl;
return 0;
}
In the code, S is the given string and K as described above, C is the cumulative sum array and a is the answer.
What is the code exactly doing by using multiplication, I don't know.
Could anybody explain the algorithm?

If you see the way C[i] is calculated, C[i] represents the number of characters between ith 1 and i+1st 1.
If you take an example S = 1001000
C[0] = 1
C[1] = 3 // length of 100
C[2] = 4 // length of 1000
So coming to your doubt, Why multiplication
Say your K=1, then you want to find out the substring which have only one 1, now you know that after first 1 there are two zeros since C[1] = 3. So number of of substrings will be 3, because you have to include this 1.
{1,10,100}
But when you come to the second part: C[2] =4
now if you see 1000 and you know that you can make 4 substrings (which is equal to C[2])
{1,10,100,1000}
and also you should notice that there are C[1]-1 zeroes before this 1.
So by including those zeroes you can make more substring, in this case by including 0 once
0{1,10,100,1000}
=> {01,010,0100,01000}
and 00 once
00{1,10,100,1000}
=> {001,0010,00100,001000}
so essentially you are making C[i] substrings starting with 1 and you can append i number of zeroes before this one and make another C[i] * C[i-k]-1 substrings. i varies from 1 to C[i-k]-1 (-1 because we want to leave that last one).
((C[i-k]-1)* C[i]) +C[i]
=> C[i-k]*C[i]

Change strings to make them equal

Referring to question HERE
We have two strings A and B with the same super set of characters. We
need to change these strings to obtain two equal strings. In each move
we can perform one of the following operations:
1- swap two consecutive characters of a string
2- swap the first and
the last characters of a string
A move can be performed on either string. What is the minimum number
of moves that we need in order to obtain two equal strings? Input
Format and Constraints: The first and the second line of the input
contains two strings A and B. It is guaranteed that the superset their
characters are equal. 1 <= length(A) = length(B) <= 2000 All the
input characters are between 'a' and 'z'
It looks like this will have to solved using dynamic programming. But I am not able to come up with equations. Some one has suggested them in answer - but it does not look all right.
dp[i][j] =
Min{
dp[i + 1][j - 1] + 1, if str1[i] = str2[j] && str1[j] = str2[i]
dp[i + 2][j] + 1, if str1[i] = str2[i + 1] && str1[i + 1] = str2[i]
dp[i][j - 2] + 1, if str1[j] = str2[j - 1] && str1[j - 1] = str2[j]
}
In short, it's
dp[i][j] = Min(dp[i + 1][j - 1], dp[i + 2][j], dp[i][j - 2]) + 1.
Here dp[i][j] means the number of minimum swaps needs to swap str1[i, j] to str2[i, j]. Here str1[i, j] means the substring of str1 starting from pos i to pos j :)
Here is an example like the one in the quesition,
str1 = "aab",
str2 = "baa"
dp[1][1] = 0 since str1[1] == str2[1];
dp[0][2] = str1[0 + 1][2 - 1] + 1 since str1[0] = str2[2] && str1[2] = str2[0].

You have two atomic operations:
swap consecutive with cost 1
swap first and last with cost 1
One interesting fact:
and 2. are the same if the strings end would be attached to the strings begin (circular string)
So we can derive a more generic operation
move a character with cost = |from - to| (across borders)
The problem rather seems not 2-dimensional to me, or yet I cannot determine the dimensions. Take this algorithm as naive approach:
private static int transform(String from, String to) {
int commonLength = to.length();
List<Solution> worklist = new ArrayList<>();
worklist.add(new Solution(0,from));
while (!worklist.isEmpty()) {
Solution item = worklist.remove(0);
if (item.remainder.length() == 0) {
return item.cost;
} else {
int toPosition = commonLength - item.remainder.length();
char expected = to.charAt(toPosition);
nextpos : for (int i = 0; i < item.remainder.length(); i++) {
if (item.remainder.charAt(i) == expected) {
Solution nextSolution = item.moveCharToBegin(i, commonLength);
for (Solution solution : worklist) {
if (solution.remainder.equals(nextSolution.remainder)) {
solution.cost = Math.min(solution.cost, nextSolution.cost);
continue nextpos;
}
}
worklist.add(nextSolution);
}
}
}
}
return Integer.MAX_VALUE;
}
private static class Solution {
public int cost;
public String remainder;
public Solution(int cost, String remainder) {
this.cost = cost;
this.remainder = remainder;
}
public Solution moveCharToBegin(int i, int length) {
int costOffset = Math.min(i, length - i); //minimum of forward and backward circular move
String newRemainder = remainder.substring(0, i) + remainder.substring(i + 1);
return new Solution(cost + costOffset, newRemainder);
}
}

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Longest Common Substring (using Intuition for Longest Common Subsequence) - dynamic-programming

Related

Number of palindromic subsequences of length 5

How to reduce the cognitive complexity in Python3

Which algorithm to use for sorting a sequence of strings into blocks avoiding beginnig with a certain element?

total substrings with k ones

Change strings to make them equal

Categories

Resources