Number of palindromic subsequences of length 5 - string

Given a String s, return the number of palindromic subsequences of length 5.
Test case 1:
input : "abcdba"
Output : 2
"abcba" and "abdba"
Test case 2:
input : "aabccba"
Output : 4
"abcba" , "abcba" , "abcba" , "abcba"
Max length of String: 700
My TLE Approach: O(2^n)
https://www.online-java.com/5YegWkAVad
Any inputs are highly appreciated...

Whenever 2 characters match, we only have to find how many palindromes of length 3 are possible in between these 2 characters.
For example:
a bcbc a
^ ^
|_ _ _ |
In the above example, you can find 2 palindromes of length 3 which is bcb and cbc. Hence, we can make palindromic sequence of length 5 as abcba or acbca. Hence, the answer is 2.
Computing how many palindromes of length 3 are possible for every substring can be time consuming if we don't cache the results when we do it the first time. So, cache those results and reuse them for queries generated by other 2 character matches. (a.k.a dynamic programming)
This way, the solution becomes quadratic O(n^2) time where n is length of the string.
Snippet:
private static long solve(String s){
long ans = 0;
int len = s.length();
long[][] dp = new long[len][len];
/* compute how many palindromes of length 3 are possible for every 2 characters match */
for(int i = len - 2;i >= 0; --i){
for(int j = i + 2; j < len; ++j){
dp[i][j] = dp[i][j-1] + (dp[i + 1][j] == dp[i + 1][j-1] ? 0 : dp[i + 1][j] - dp[i + 1][j - 1]);
if(s.charAt(i) == s.charAt(j)){
dp[i][j] += j - i - 1;
}
}
}
/* re-use the above data to calculate for palindromes of length 5*/
for(int i = 0; i < len; ++i){
for(int j = i + 4; j < len; ++j){
if(s.charAt(i) == s.charAt(j)){
ans += dp[i + 1][j - 1];
}
}
}
//for(int i=0;i<len;++i) System.out.println(Arrays.toString(dp[i]));
return ans;
}
Online Demo
Update:
dp[i][j] = dp[i][j-1] + (dp[i + 1][j] == dp[i + 1][j-1] ? 0 : dp[i + 1][j] - dp[i + 1][j - 1]);
The above line basically mean this,
For any substring, say bcbcb, with matching first and last b, the total 3 length palindromes can be addition of
The total count possible for bcbc.
The total count possible for cbcb.
The total count possible for bcbcb (which is (j - i - 1) in the if condition).
dp[i][j] For the current substring at hand.
dp[i][j-1] - Adding the previous substring counts of length 3. In this example, bcbc.
dp[i + 1][j], Adding the substring ending at current index excluding the first character. (Here, cbcb).
dp[i + 1][j] == dp[i + 1][j-1] ? 0 : dp[i + 1][j] - dp[i + 1][j - 1] This is to basically avoid duplicate counting for internal substrings and only adding them if there is a difference in the counts.

Observation:
The preceding method is too cool because it gives the impression of a number of palindrome substrings of length 5, whereas the preceding method is o(n^2) using 2 DP. Cant we reduced to o(n) by using 3 dp.  Yes we can becuase here n should should be length of string but next  2 parameters length lie in range 0 to 25.
Eg: dp[i][j][k]  j, k is between 0 and 25. i is between 0 and the length of the string.
We can't get an idea directly from observation, so go to the institution.
Intitution:
Of length 3
For palindromic substring of length 3 should be  number of palindrome substring of length 3 would count the occurence of left to the index multiply with right side of the index .
Eg: _ s[i]_  => number of palindromic substring of length 3 should be at index is occurence of each alphabet before multiply with after index. So that it becomes palindrome of length 3.
Time complexity : o(n)
Of length 5
Similary for the case of length if 5 => _ _ s[i] _ _ Here number of occurence of combination of 2 characters before index multiply with after index, So that it becomes palindrome of length 5.
Eg: x y s[i] y x ; x,y belongs to a to z. Here we need to store occurence of xy before index and after index.
Time complexity : o(26 * 26 * n)
Of length 7
Similary for the case of length if 7 => _ _ _ s[i] _ _ _ Here number of occurence of combination of 3 characters before index multiply with after index, So that it becomes palindrome of length 7.
Eg: x y z s[i] z y x ; x,y ,z belongs to a to z. Here we need to store occurence of xyz before index and after index.
Time complexity : o(26 * 26 * 26*n)
Code
int pre[10000][26][26], suf[10000][26][26], cnts[26] = {};
int countPalindromes(string s) {
int mod = 1e9 + 7, n = s.size(), ans = 0;
for (int i = 0; i < n; i++) {
int c = s[i] - '0';
if (i)
for (int j = 0; j < 26; j++)
for (int k = 0; k < 26; k++) {
pre[i][j][k] = pre[i - 1][j][k];
if (k == c) pre[i][j][k] += cnts[j];
}
cnts[c]++;
}
memset(cnts, 0, sizeof(cnts));
for (int i = n - 1; i >= 0; i--) {
int c = s[i] - '0';
if (i < n - 1)
for (int j = 0; j < 26; j++)
for (int k = 0; k < 26; k++) {
suf[i][j][k] = suf[i + 1][j][k];
if (k == c) suf[i][j][k] += cnts[j];
}
cnts[c]++;
}
for (int i = 2; i < n - 2; i++)
for (int j = 0; j < 26; j++)
for (int k = 0; k < 26; k++)
ans = (ans + 1LL * pre[i - 1][j][k] * suf[i + 1][j][k]) % mod;
return ans;
}
Reference
Here's a link! for related problem , there 0 to 9, Most voted blog for problem.

Related

Confused about a string hash function

As I was looking through some string hash fucntions, I came across this one (code below). The function processes the string four bytes at a time, and interprets each of the four-byte chunks as a single long integer value. The integer values for the four-byte chunks are added together. In the end, the resulting sum is converted to the range 0 to M-1 using the modulus operator.
The following is the function code :
// Use folding on a string, summed 4 bytes at a time
long sfold(String s, int M) {
int intLength = s.length() / 4;
long sum = 0;
for (int j = 0; j < intLength; j++) {
char c[] = s.substring(j * 4, (j * 4) + 4).toCharArray();
long mult = 1;
for (int k = 0; k < c.length; k++) {
sum += c[k] * mult;
mult *= 256;
}
}
char c[] = s.substring(intLength * 4).toCharArray();
long mult = 1;
for (int k = 0; k < c.length; k++) {
sum += c[k] * mult;
mult *= 256;
}
return(Math.abs(sum) % M);
}
The confusion for me is this chunk of code, especially the first line.
char c[] = s.substring(intLength * 4).toCharArray();
long mult = 1;
for (int k = 0; k < c.length; k++) {
sum += c[k] * mult;
mult *= 256;
To my knowledge, the substring function used in this line takes as argument : begin index inclusive, The substring will start from the specified beginIndex and it will extend to the end of the string.
For the sake of example, let's assume we want to hash the following string : aaaabbbb. In this case intLength is going to be 2 (second line of function code). Replacing the value of intlength in s.substring(intLength * 4).toCharArray() will give us s.substring(8).toCharArray() which means string index is out of bounds given the string to be hashed has 8 characters.
I don't quite understand what's going on !
This hash function is awful, but to answer your question:
There is no IndexOutOfBoundsException, because "aaaabbbb".substring(8) is ""
The purpose of that last loop is to deal with leftovers when the string length isn't a multiple of 4. When s is "aaaabbbbcc", for example, then intLength == 2, and s.substring(8) is "cc".

Optimum solution for splitting a string into three palindromes with earliest cuts

I was asked this question in an interview:
Given a string (1<=|s|<=10^5), check if it is possible to partition it into three palindromes. If there are multiple answers possible, output the one where the cuts are made the earliest. If no answer is possible, print "Impossible".
**Input:**
radarnoonlevel
aabab
abcdefg
**Output:**
radar noon level
a a bab (Notice how a, aba, b is also an answer, but we will output the one with the earliest cuts)
Impossible
I was able to give a brute force solution, running two loops and checking palindrome property for every 3 substrings ( 0-i, i-j, j-end). This was obviously not optimal, but I have not been able to find a better solution since then.
I need a way of checking that if I know the palindrome property of a string, then how removing a character from the start or adding one at the end can give me the property of the new string without having to do the check for the whole string again. I am thinking of using three maps where each character key is mapped to number of occurences but that too doesn't lead me down anything.
Still O(n^2) solution, but you can store the result of palindrome substrings in a table and use that to get to the answer.
vector<string> threePalindromicSubstrings(string word) {
int n = word.size();
vector<vector<bool>> dp (n,vector<bool>(n,false));
for(int i = 0 ; i < n ; ++i)
dp[i][i] = 1;
for(int l = 2 ; l <= n ; ++l){
for(int i = 0 ; i < n - l +1 ; ++i){
int j = i + l - 1;
if(l == 2)
dp[i][j] = (word[i] == word[j]);
else
dp[i][j] = (word[i] == word[j]) && (dp[i+1][j-1]);
}
}
vector<string> ans;
for(int i = 0 ; i < n - 2 ; ++i){
if(dp[0][i]) {
for(int j = i+1 ; j < n - 1 ; ++j){
if(dp[i+1][j] && dp[j+1][n-1]){
ans.push_back(word.substr(0,i + 1));
ans.push_back(word.substr(i+1,j-i));
ans.push_back(word.substr(j+1,n-j));
return ans;
}
}
}
}
if(ans.empty())
ans.push_back("Impossible");
return ans;
}

total substrings with k ones

Given a binary string s, we need to find the number of its substrings, containing exactly k characters that are '1'.
For example: s = "1010" and k = 1, answer = 6.
Now, I solved it using binary search technique over the cumulative sum array.
I also used another approach to solve it. The approach is as follows:
For each position i, find the total substrings that end at i containing
exactly k characters that are '1'.
To find the total substrings that end at i containing exactly k characters that are 1, it can be represented as the set of indices j such that substring j to i contains exactly k '1's. The answer would be the size of the set. Now, to find all such j for the given position i, we can rephrase the problem as finding all j such that
number of ones from [1] to [j - 1] = the total number of ones from 1 to i - [the total number of ones from j to i = k].
i.e. number of ones from [1] to [j - 1] = C[i] - k
which is equal to
C[j - 1] = C[i] - k,
where C is the cumulative sum array, where
C[i] = sum of characters of string from 1 to i.
Now, the problem is easy because, we can find all the possible values of j's using the equation by counting all the prefixes that sum to C[i] - k.
But I found this solution,
int main() {
cin >> k >> S;
C[0] = 1;
for (int i = 0; S[i]; ++i) {
s += S[i] == '1';
++C[s];
}
for (int i = k; i <= s; ++i) {
if (k == 0) {
a += (C[i] - 1) * C[i] / 2;
} else {
a += C[i] * C[i - k];
}
}
cout << a << endl;
return 0;
}
In the code, S is the given string and K as described above, C is the cumulative sum array and a is the answer.
What is the code exactly doing by using multiplication, I don't know.
Could anybody explain the algorithm?
If you see the way C[i] is calculated, C[i] represents the number of characters between ith 1 and i+1st 1.
If you take an example S = 1001000
C[0] = 1
C[1] = 3 // length of 100
C[2] = 4 // length of 1000
So coming to your doubt, Why multiplication
Say your K=1, then you want to find out the substring which have only one 1, now you know that after first 1 there are two zeros since C[1] = 3. So number of of substrings will be 3, because you have to include this 1.
{1,10,100}
But when you come to the second part: C[2] =4
now if you see 1000 and you know that you can make 4 substrings (which is equal to C[2])
{1,10,100,1000}
and also you should notice that there are C[1]-1 zeroes before this 1.
So by including those zeroes you can make more substring, in this case by including 0 once
0{1,10,100,1000}
=> {01,010,0100,01000}
and 00 once
00{1,10,100,1000}
=> {001,0010,00100,001000}
so essentially you are making C[i] substrings starting with 1 and you can append i number of zeroes before this one and make another C[i] * C[i-k]-1 substrings. i varies from 1 to C[i-k]-1 (-1 because we want to leave that last one).
((C[i-k]-1)* C[i]) +C[i]
=> C[i-k]*C[i]

Suffix Array Implementation Bugs

I've coded a Suffix Array implementation and discovered an issue in my implementation. Concretely I've outputted the first few suffix array ranks RA[0..7] of this string(length = 10^5) and had the following output:
80994
84360
87854
91517
95320
99277
83068
But the correct one had to be (everything shifted by 23):
81017
84383
87877
91540
95343
99300
83091
I know two ways how to fix it, but I don't know why it worked.
The first way was adding S[N++] = '$'; to the top of the buildSA() function (then the output was 1 less than the correct one, but it doesn't matter)
I also found another solution by decreasing the MAX_N constant to 1e5 + 10!
This is so much magic for me and I really need to know why this bug happened because I don't want to have this bug again.
#include <cstdio>
#include <cstring>
#include <algorithm>
using std::max;
const int MAX_N = 2e5 + 10;
int SA[MAX_N]; // The ith element is the index of the suffix
int RA[MAX_N]; // The rank of the suffix at i
int tmp[MAX_N]; // A temporary array
int B[MAX_N]; // An array for the buckets
int N;
char S[MAX_N];
void bucketSort(int k){
int i, m = max(256, N);
for(i = 0; i < m; i++)
B[i] = 0;
for(i = 0; i < N; i++)
B[i + k < N ? RA[i + k] : 0] ++;
for(i = 1; i < m; i++)
B[i] += B[i - 1];
for(i = N - 1; i >= 0; i--)
tmp[--B[SA[i] + k < N ? RA[SA[i] + k] : 0]] = SA[i];
for(i = 0; i < N; i++)
SA[i] = tmp[i];
}
void buildSA(){
for(int i = 0; i < N; i++){
SA[i] = i;
RA[i] = S[i];
}
for(int k = 1; k < N; k <<= 1){
bucketSort(k);
bucketSort(0);
int norder = 0;
tmp[SA[0]] = 0;
for(int i = 1; i < N; i++){
if(RA[SA[i]] == RA[SA[i - 1]] && RA[SA[i] + k] == RA[SA[i - 1] + k])
{} else norder++;
tmp[SA[i]] = norder;
}
for(int i = 0; i < N; i++)
RA[i] = tmp[i];
if(norder == N)
break;
}
}
void printSA(){
for(int i = 0; i < N; i++){
printf("%d: %s\n", SA[i], S + SA[i]);
}
}
int main(){
scanf("%s", S);
N = strlen(S);
buildSA();
for(int i = 0; i < 7; i++){
printf("%d\n",RA[i]);
}
return 0;
}
In the following line:
if(RA[SA[i]] == RA[SA[i - 1]] && RA[SA[i] + k] == RA[SA[i - 1] + k])
SA[i] + k can be >=N(the same is for SA[i - 1] + k).
It should be (SA[i] + k) % Ninstead.
I think I got it after many wasted hours. Sometimes the littlest mistakes can literally result to wrong answers.
The "bad" code line is:
if(RA[SA[i]] == RA[SA[i - 1]] && RA[SA[i] + k] == RA[SA[i - 1] + k])
{} else norder++;
I verified this by using a very simple testcase (I couldn't generate randomly...) like:
abab
The resulting suffix array was
0: abab
2: ab
3: b
1: bab
which is clearly wrong.
At step k = 2, if we are comparing two suffixes like ab and abab then, we realize that they have the same rank, since their first k = 2 characters match. ab is suffix #2, by adding k = 2, we are out of range.
I've often coded it like this because I've always appended an auxiliary character (e.g. '$') to the end. If I don't put such a character (like in my case), SA[i] + k could actually be >= N and this code crashes.

Longest Common Prefix property

I was going through suffix array and its use to compute longest common prefix of two suffixes.
The source says:
"The lcp between two suffixes is the minimum of the lcp's of all pairs of adjacent suffixes between them on the array"
i.e. lcp(x,y)=min{ lcp(x,x+1),lcp(x+1,x+2),.....,lcp(y-1,y) }
where x and y are two index of the string from where the two suffix of the string starts.
I am not convinced with the statement as in example of string "abca".
lcp(1,4)=1 (considering 1 based indexing)
but if I apply the above equation then
lcp(1,4)=min{lcp(1,2),lcp(2,3),lcp(3,4)}
and I think lcp(1,2)=0.
so the answer must be 0 according to the equation.
Am i getting it wrong somewhere?
I think the index referred by the source is not the index of the string itself, but index of the sorted suffixes.
a
abca
bca
ca
Hence
lcp(1,2) = lcp(a, abca) = 1
lcp(1,4) = min(lcp(1,2), lcp(2,3), lcp(3,4)) = 0
You can't find LCP of any two suffixes by simply calculating the minimum of the lcp's of all pairs of adjacent suffixes between them on the array.
We can calculate the LCPs of any suffixes (i,j)
with the Help of Following :
LCP(suffix i,suffix j)=LCP[RMQ(i + 1; j)]
Also Note (i<j) as LCP (suff i,suff j) may not necessarly equal LCP (Suff j,suff i).
RMQ is Range Minimum Query .
Page 3 of this paper.
Details:
Step 1:
First Calculate LCP of Adjacents /consecutive Suffix Pairs .
n= Length of string.
suffixArray[] is Suffix array.
void calculateadjacentsuffixes(int n)
{
for (int i=0; i<n; ++i) Rank[suffixArray[i]] = i;
Height[0] = 0;
for (int i=0, h=0; i<n; ++i)
{
if (Rank[i] > 0)
{
int j = suffixArray[Rank[i]-1];
while (i + h < n && j + h < n && str[i+h] == str[j+h])
{
h++;
}
Height[Rank[i]] = h;
if (h > 0) h--;
}
}
}
Note: Height[i]=LCPs of (Suffix i-1 ,suffix i) ie. Height array contains LCP of adjacent suffix.
Step 2:
Calculate LCP of Any two suffixes i,j using RMQ concept.
RMQ pre-compute function:
void preprocesses(int N)
{
int i, j;
//initialize M for the intervals with length 1
for (i = 0; i < N; i++)
M[i][0] = i;
//compute values from smaller to bigger intervals
for (j = 1; 1 << j <= N; j++)
{
for (i = 0; i + (1 << j) - 1 < N; i++)
{
if (Height[M[i][j - 1]] < Height[M[i + (1 << (j - 1))][j - 1]])
{
M[i][j] = M[i][j - 1];
}
else
{
M[i][j] = M[i + (1 << (j - 1))][j - 1];
}
}
}
}
Step 3: Calculate LCP between any two Suffixes i,j
int LCP(int i,int j)
{
/*Make sure we send i<j always */
/* By doing this ,it resolve following
suppose ,we send LCP(5,4) then it converts it to LCP(4,5)
*/
if(i>j)
swap(i,j);
/*conformation over*/
if(i==j)
{
return (Length_of_str-suffixArray[i]);
}
else
{
return Height[RMQ(i+1,j)];
//LCP(suffix i,suffix j)=LCPadj[RMQ(i + 1; j)]
//LCPadj=LCP of adjacent suffix =Height.
}
}
Where RMQ function is:
int RMQ(int i,int j)
{
int k=log((double)(j-i+1))/log((double)2);
int vv= j-(1<<k)+1 ;
if(Height[M[i][k]]<=Height[ M[vv][ k] ])
return M[i][k];
else
return M[ vv ][ k];
}
Refer Topcoder tutorials for RMQ.
You can check the complete implementation in C++ at my blog.

Resources