String permutations rank + data structure - string

The problem at hand is:
Given a string. Tell its rank among all its permutations sorted
lexicographically.
The question can be attempted mathematically, but I was wondering if there was some other algorithmic method to calculate it ?
Also if we have to store all the string permutations rankwise , how can we generate them efficiently (and what would be the complexity) . What would be a good data structure for storing the permutations and which is also efficient for retrieval?
EDIT
Thanks for the detailed answers on the permutations generation part, could someone also suggest a good data structure? I have only been able to think of trie tree.

There is an O(n|Σ|) algorithm to find the rank of a string of length n in the list of its permutations. Here, Σ is the alphabet.
Algorithm
Every permutation which is ranked below s can be written uniquely in the form pcx; where:
p is a proper prefix of s
c is a character ranked below the character appearing just after p in s. And c is also a character occurring in the part of s not included in p.
x is any permutation of the remaining characters occurring in s; i.e. not included in p or c.
We can count the permutations included in each of these classes by iterating through each prefix of s in increasing order of length, while maintaining the frequency of the characters appearing in the remaining part of s, as well as the number of permutations x represents. The details are left to the reader.
This is assuming the arithmetic operations involved take constant time; which it wont; since the numbers involved can have nlog|Σ| digits. With this consideration, the algorithm will run in O(n2 log|Σ| log(nlog|Σ|)). Since we can add, subtract, multiply and divide two d-digit numbers in O(dlogd).
C++ Implementation
typedef long long int lli;
lli rank(string s){
int n = s.length();
vector<lli> factorial(n+1,1);
for(int i = 1; i <= n; i++)
factorial[i] = i * factorial[i-1];
vector<int> freq(26);
lli den = 1;
lli ret = 0;
for(int i = n-1; i >= 0; i--){
int si = s[i]-'a';
freq[si]++;
den *= freq[si];
for(int c = 0; c < si; c++)
if(freq[c] > 0)
ret += factorial[n-i-1] / (den / freq[c]);
}
return ret + 1;
}

This is similar to the quickselect algorithm. In an unsorted array of integers, find the index of some particular array element. The partition element would be the given string.
Edit:
Actually it is similar to partition method done in QuickSort. The given string is the partition element.Once all permutations are generated, the complexity to find the rank for strings with length k would be O(nk). You can generate string permutations using recursion and store them in a linked list. You can pass this linked list to the partition method.
Here's the java code to generate all String permutations:
private static int generateStringPermutations(String name,int currIndex) {
int sum = 0;
for(int j=name.length()-1;j>=0;j--) {
for(int i=j-1;((i<j) && (i>currIndex));i--) {
String swappedString = swapCharsInString(name,i,j);
list.add(swappedString);
//System.out.println(swappedString);
sum++;
sum = sum + generateStringPermutations(swappedString,i);
}
}
return sum;
}
Edit:
Generating all permutations is costly. If a string contains distinct characters, the rank can be determined without generating all permutations. Here's the link.
This can be extended for cases where there are repeating characters.
Instead of x * (n-1)! which is for distinct cases mentioned as in the link,
For repeating characters it will be:
if there is 1 character which is repeating twice,
x* (n-1)!/2!
Let's take an example. For string abca the combinations are:
aabc,aacb,abac,abca,acab,acba,baac,baca,bcaa,caab,caba,cbaa (in sorted order)
Total combinations = 4!/2! = 12
if we want to find rank of 'bcaa' then we know all strings starting with 'a' are before which is 3! = 6.
Note that because 'a' is the starting character, the remaining characters are a,b,c and there are no repetitions so it is 3!. We also know strings starting with 'ba' will be before which is 2! = 2 so it's rank is 9.
Another example. If we want to find the rank of 'caba':
All strings starting with a are before = 6.
All strings starting with b are before = 3!/2! = 3 (Because once we choose b, we are left with a,a,c and because there are repetitions it is 3!/2!.
All strings starting with caa will be before which is 1
So the final rank is 11.

From GeeksforGeeks:
Given a string, find its rank among all its permutations sorted
lexicographically. For example, rank of “abc” is 1, rank of “acb” is
2, and rank of “cba” is 6.
For simplicity, let us assume that the string does not contain any
duplicated characters.
One simple solution is to initialize rank as 1, generate all
permutations in lexicographic order. After generating a permutation,
check if the generated permutation is same as given string, if same,
then return rank, if not, then increment the rank by 1. The time
complexity of this solution will be exponential in worst case.
Following is an efficient solution.
Let the given string be “STRING”. In the input string, ‘S’ is the
first character. There are total 6 characters and 4 of them are
smaller than ‘S’. So there can be 4 * 5! smaller strings where first
character is smaller than ‘S’, like following
R X X X X X I X X X X X N X X X X X G X X X X X
Now let us Fix S’ and find the smaller strings staring with ‘S’.
Repeat the same process for T, rank is 4*5! + 4*4! +…
Now fix T and repeat the same process for R, rank is 4*5! + 4*4! +
3*3! +…
Now fix R and repeat the same process for I, rank is 4*5! + 4*4! +
3*3! + 1*2! +…
Now fix I and repeat the same process for N, rank is 4*5! + 4*4! +
3*3! + 1*2! + 1*1! +…
Now fix N and repeat the same process for G, rank is 4*5! + 4*4 + 3*3!
+ 1*2! + 1*1! + 0*0!
Rank = 4*5! + 4*4! + 3*3! + 1*2! + 1*1! + 0*0! = 597
Since the value of rank starts from 1, the final rank = 1 + 597 = 598

Related

Given a string, counting the number of permutations of the string with no repetitions(and forbidden characters)

I have been hitting my head against an algorithmic problem for a couple of hours now.
The (fancy) statement of the problem is as follows:
Our garden contains a single row of flowers.You are given the current contents of the row in the String garden. Each character in garden represents one flower. Different characters represent different colors.Flowers of the same color all look the same. You may rearrange the flowers in your garden into any order you like. (Formally, you may swap any two flowers in your garden, and you can do so arbitrarily many times.) You are also given a String forbid of the same length as garden.You want to rearrange garden into a new string G that will satisfy the following conditions :
No two adjacent flowers will have the same color.Formally, for each valid i, G[i] and G[i + 1] must differ.
For each valid i, G[i] must not be equal to forbid[i].
Let X be the number of different strings G that satisfy all conditions given above.Compute and return the number(X modulo 1, 000, 000, 007).
Just to clarify with an example: X("aaabbb", "cccccc") = 2 ("ababab" and "bababa")
I have been trying by counting how many letters are in the string ( 'a'->3, 'b'->4, in the example) and then recursively counting the different possibilities (skipping if there is a repetition or a forbidden letter). Something on these lines:
using Map = std::map < char, size_t > ;
Map hist;
std::string forbid;
size_t countRecursive(std::string s, size_t len)
{
if (len == 0)
return 1;
size_t curPos = s.size() ;
size_t count(0);
for (auto &p : hist) {
auto key = p.first;
if (hist[key] == 0) continue;
if (forbid[curPos] == key) continue;
if (curPos > 0 && s[curPos - 1] == key) continue;
hist[key]--;
count += countRecursive(s + key, len - 1);
hist[key]++;
}
return count;
}
Where hist and forbid are previously initialized. However, this appears to be n! and since n can be <= 15, it really explodes in complexity.
I am not really looking for a complete solution. Only, if you had any kind of suggestion about the way I should approach the problem, I would be highly thankful!
I'd approach it as follows: your 'forbidden' string is as long as the 'garden'. This means that given an alphabet of N characters, each position G[i] can have at most N-1 possible characters (since one will be forbidden). This gives you an upper bound that's limited only by N. If that bound is less than the modulo it might lead to some interesting considerations, but let's move forward.
Now a very basic approach is counting the combinations: if the garden is K characters long, the first item G[0] will have N-1 possibilities; the second one G[1] will have N-2 possibilities if forbidden[1] is different than G[0], N-1 if forbidden[1] == G[0]. The third character G[2] will too have N-2 possibilities depending on forbidden[2] and G[1], and so on.
For clarity: N-2 comes from the fact that of the N-1 possibilities, another one must be removed that is the value of the character preceding it in the string, unless such character matches the forbidden character of the current position.
So if forbidden[i+1] is always different from G[i] you have N-1 * N-2 * N-2 * ... * N-2, K times. This is your lower bound.
Now between upper and lower bound there is a number of strings where e.g. forbidden[i+1] is equal to G[i] only for the second position; for the second and for the third; etc. So your number of strings is:
N-1 * N-2 * N-2 * N-2 ... K
N-1 * N-1 * N-2 * N-2 ... K
N-1 * N-1 * N-1 * N-2 ... K
and so on until you have a string where each character can have N-1 possibilities.
In other words,
N-1 * (N-2)^K-1
(N-1)^2 * (N-2)^K-2
(N-1)^3 * (N-2)^K-3
How many of those strings can you have? It depends how big is K, i.e. how large is your garden :)
That is, assuming I understood the problem correctly.

Efficiently counting the number of substrings of a digit string that are divisible by k?

We are given a string which consists of digits 0-9. We have to count number of sub-strings divisible by a number k. One way is to generate all the sub-strings and check if it is divisible by k but this will take O(n^2) time. I want to solve this problem in O(n*k) time.
1 <= n <= 100000 and 2 <= k <= 1000.
I saw a similar question here. But k was fixed as 4 in that question. So, I used the property of divisibility by 4 to solve the problem.
Here is my solution to that problem:
int main()
{
string s;
vector<int> v[5];
int i;
int x;
long long int cnt = 0;
cin>>s;
x = 0;
for(i = 0; i < s.size(); i++) {
if((s[i]-'0') % 4 == 0) {
cnt++;
}
}
for(i = 1; i < s.size(); i++) {
int f = s[i-1]-'0';
int s1 = s[i] - '0';
if((10*f+s1)%4 == 0) {
cnt = cnt + (long long)(i);
}
}
cout<<cnt;
}
But I wanted a general algorithm for any value of k.
This is a really interesting problem. Rather than jumping into the final overall algorithm, I thought I'd start with a reasonable algorithm that doesn't quite cut it, then make a series of modifications to it to end up with the final, O(nk)-time algorithm.
This approach combines together a number of different techniques. The major technique is the idea of computing a rolling remainder over the digits. For example, let's suppose we want to find all prefixes of the string that are multiples of k. We could do this by listing off all the prefixes and checking whether each one is a multiple of k, but that would take time at least Θ(n2) since there are Θ(n2) different prefixes. However, we can do this in time Θ(n) by being a bit more clever. Suppose we know that we've read the first h characters of the string and we know the remainder of the number formed that way. We can use this to say something about the remainder of the first h+1 characters of the string as well, since by appending that digit we're taking the existing number, multiplying it by ten, and then adding in the next digit. This means that if we had a remainder of r, then our new remainder is (10r + d) mod k, where d is the digit that we uncovered.
Here's quick pseudocode to count up the number of prefixes of a string that are multiples of k. It runs in time Θ(n):
remainder = 0
numMultiples = 0
for i = 1 to n: // n is the length of the string
remainder = (10 * remainder + str[i]) % k
if remainder == 0
numMultiples++
return numMultiples
We're going to use this initial approach as a building block for the overall algorithm.
So right now we have an algorithm that can find the number of prefixes of our string that are multiples of k. How might we convert this into an algorithm that finds the number of substrings that are multiples of k? Let's start with an approach that doesn't quite work. What if we count all the prefixes of the original string that are multiples of k, then drop off the first character of the string and count the prefixes of what's left, then drop off the second character and count the prefixes of what's left, etc? This will eventually find every substring, since each substring of the original string is a prefix of some suffix of the string.
Here's some rough pseudocode:
numMultiples = 0
for i = 1 to n:
remainder = 0
for j = i to n:
remainder = (10 * remainder + str[j]) % k
if remainder == 0
numMultiples++
return numMultiples
For example, running this approach on the string 14917 looking for multiples of 7 will turn up these strings:
String 14917: Finds 14, 1491, 14917
String 4917: Finds 49,
String 917: Finds 91, 917
String 17: Finds nothing
String 7: Finds 7
The good news about this approach is that it will find all the substrings that work. The bad news is that it runs in time Θ(n2).
But let's take a look at the strings we're seeing in this example. Look, for example, at the substrings found by searching for prefixes of the entire string. We found three of them: 14, 1491, and 14917. Now, look at the "differences" between those strings:
The difference between 14 and 14917 is 917.
The difference between 14 and 1491 is 91
The difference between 1491 and 14917 is 7.
Notice that the difference of each of these strings is itself a substring of 14917 that's a multiple of 7, and indeed if you look at the other strings that we've matched later on in the run of the algorithm we'll find these other strings as well.
This isn't a coincidence. If you have two numbers with a common prefix that are multiples of the same number k, then the "difference" between them will also be a multiple of k. (It's a good exercise to check the math on this.)
So this suggests another route we can take. Suppose that we find all prefixes of the original string that are multiples of k. If we can find all of them, we can then figure out how many pairwise differences there are among those prefixes and potentially avoid rescanning things multiple times. This won't find everything, necessarily, but it will find all substrings that can be formed by computing the difference of two prefixes. Repeating this over all suffixes - and being careful not to double-count things - could really speed things up.
First, let's imagine that we find r different prefixes of the string that are multiples of k. How many total substrings did we just find if we include differences? Well, we've found k strings, plus one extra string for each (unordered) pair of elements, which works out to k + k(k-1)/2 = k(k+1)/2 total substrings discovered. We still need to make sure we don't double-count things, though.
To see whether we're double-counting something, we can use the following technique. As we compute the rolling remainders along the string, we'll store the remainders we find after each entry. If in the course of computing a rolling remainder we rediscover a remainder we've already computed at some point, we know that the work we're doing is redundant; some previous scan over the string will have already computed this remainder and anything we've discovered from this point forward will have already been found.
Putting these ideas together gives us this pseudocode:
numMultiples = 0
seenRemainders = array of n sets, all initially empty
for i = 1 to n:
remainder = 0
prefixesFound = 0
for j = i to n:
remainder = (10 * remainder + str[j]) % k
if seenRemainders[j] contains remainder:
break
add remainder to seenRemainders[j]
if remainder == 0
prefixesFound++
numMultiples += prefixesFound * (prefixesFound + 1) / 2
return numMultiples
So how efficient is this? At first glance, this looks like it runs in time O(n2) because of the outer loops, but that's not a tight bound. Notice that each element can only be passed over in the inner loop at most k times, since after that there aren't any remainders that are still free. Therefore, since each element is visited at most O(k) times and there are n total elements, the runtime is O(nk), which meets your runtime requirements.

How to determine string S can be made from string T by deleting some characters, but at most K successive characters

Sorry for the long title :)
In this problem, we have string S of length n, and string T of length m. We can check whether S is a subsequence of string T in time complexity O(n+m). It's really simple.
I am curious about: what if we can delete at most K successive characters? For example, if K = 2, we can make "ab" from "accb", but not from "abcccb". I want to check if it's possible very fast.
I could only find obvious O(nm): check if it's possible for every suffix pairs in string S and string T. I thought maybe greedy algorithm could be possible, but if K = 2, the case S = "abc" and T = "ababbc" is a counterexample.
Is there any fast solution to solve this problem?
(Update: I've rewritten the opening of this answer to include a discussion of complexity and to discussion some alternative methods and potential risks.)
(Short answer, the only real improvement above the O(nm) approach that I can think of is to observe that we don't usually need to compute all n times m entries in the table. We can calculate only those cells we need. But in practice it might be very good, depending on the dataset.)
Clarify the problem: We have a string S of length n, and a string T of length m. The maximum allowed gap is k - this gap is to be enforced at the beginning and end of the string also. The gap is the number of unmatched characters between two matched characters - i.e. if the letters are adjacent, that is a gap of 0, not 1.
Imagine a table with n+1 rows and m+1 columns.
0 1 2 3 4 ... m
--------------------
0 | ? ? ? ? ? ?
1 | ? ? ? ? ? ?
2 | ? ? ? ? ? ?
3 | ? ? ? ? ? ?
... |
n | ? ? ? ? ? ?
At first, we we could define that the entry in row r and column c is a binary flag that tells us whether the first r characters of of S is a valid k-subsequence of the first c characters of T. (Don't worry yet how to compute these values, or even whether these values are useful, we just need to define them clearly first.)
However, this binary-flag table isn't very useful. It's not possible to easily calculate one cell as a function of nearby cells. Instead, we need each cell to store slightly more information. As well as recording whether the relevant strings are a valid subsequence, we need to record the number of consecutive unmatched characters at the end of our substring of T (the substring with c characters). For example, if the first r=2 characters of S are "ab" and the first c=3 characters of T are "abb", then there are two possible matches here: The first characters obviously match with each other, but the b can match with either of the latter b. Therefore, we have a choice of leaving one or zero unmatched bs at the end. Which one do we record in the table?
The answer is that, if a cell has multiple valid values, then we take the smallest one. It's logical that we want to make life as easy as possible for ourselves while matching the remainder of the string, and therefore that the smaller the gap at the end, the better. Be wary of other incorrect optmizations - we do not want to match as many characters as possible or as few characters. That can backfire. But it is logical, for a given pair of strings S,T, to find the match (if there are any valid matches) that minimizes the gap at the end.
One other observation is that if the string S is much shorter than T, then it cannot match. This depends on k also obviously. The maximum length that S can cover is rk, if this is less than c, then we can easily mark (r,c) as -1.
(Any other optimization statements that can be made?)
We do not need to compute all the values in this table. The number of different possible states is k+3. They start off in an 'undefined' state (?). If a matching is not possible for the pair of (sub)strings, the state is -. If a matching is possible, then the score in the cell will be a number between 0 and k inclusive, recording the smallest possible number of unmatched consecutive characters at the end. This gives us a total of k+3 states.
We are interested only in the entry in the bottom right of the table. If f(r,c) is the function that computes a particular cell, then we are interested only in f(n,m). The value for a particular cell can be computed as a function of the values nearby. We can build a recursive algorithm that takes r and c as input and performs the relevant calculations and lookups in term of the nearby values. If this function looks up f(r,c) and finds a ?, it will go ahead and compute it and then store the answer.
It is important to store the answer as the algorithm may query the same cell many times. But also, some cells will never be computed. We just start off attempting to calculate one cell (the bottom right) and just lookup-and-calculate-and-store as necessary.
This is the "obvious" O(nm) approach. The only optimization here is the observation that we don't need to calculate all the cells, therefore this should bring the complexity below O(nm). Of course, with really nasty datasets, you may end up calculating almost all of the cells! Therefore, it's difficult to put an official complexity estimate on this.
Finally, I should say how to compute a particular cell f(r,c):
If r==0 and c <= k, then f(r,c) = 0. An empty string can match any string with up to k characters in it.
If r==0 and c > k, then f(r,c) = -1. Too long for a match.
There are only two other ways a cell can have a successful state. We first try:
If S[r]==T[c] and f(r-1,c-1) != -1, then f(r,c) = 0. This is the best case - a match with no trailing gap.
If that didn't work, we try the next best thing. If f(r,c-1) != -1 and f(r,c) < k, then f(r,c) = f(r,c-1)+1.
If neither of those work, then f(r,c) = -1.
The rest of this answer is my initial, Haskell-based approach. One advantage of it is that it 'understands' that it needn't compute every cell, only computing cells where necessary. But it could make the inefficiency of calculating one cell many times.
*Also note that the Haskell approach is effectively approaching the problem in a mirror image - it trying to build matches from the end substrings of S and T where minimal leading bunch of unmatched characters. I don't have the time to rewrite it in its 'mirror image' form!
A recursive approach should work. We want a function that will take three arguments, int K, String S, and String T. However, we don't just want a boolean answer as to whether S is a valid k-subsequence of T.
For this recursive approach, if S is a valid k-subsequence, we also want to know about the best subsequence possible by returning how few characters from the start of T can be dropped. We want to find the 'best' subsequence. If a k-subsequence is not possible for S and T, then we return -1, but if it is possible then we want to return the smallest number of characters we can pull from T while retaining the k-subsequence property.
helloworld
l r d
This is a valid 4-subsequence, but the biggest gap has (at most) four characters (lowo). This is the best subsequence because it leaves a gap of just two characters at the start (he). Alternatively, here is another valid k-subsequence with the same strings, but it's not as good because it leaves a gap of three at the start:
helloworld
l r d
This is written in Haskell, but it should be easy enough to rewrite in any other language. I'll break it down in more detail below.
best :: Int -> String -> String -> Int
-- K S T return
-- where len(S) <= len(T)
best k [] t_string -- empty S is a subsequence of anything!
| length(t_string) <= k = length(t_string)
| length(t_string) > k = -1
best k sss#(s:ss) [] = (-1) -- if T is empty, and S is non-empty, then no subsequence is possible
best k sss#(s:ss) tts#(t:ts) -- both are non-empty. Various possibilities:
| s == t && best k ss ts /= -1 = 0 -- if s==t, and if best k ss ts != -1, then we have the best outcome
| best k sss ts /= -1
&& best k sss ts < k = 1+ (best k sss ts) -- this is the only other possibility for a valid k-subsequence
| otherwise = -1 -- no more options left, return -1 for failure.
A line-by-line analysis:
(A comment in Haskell starts with --)
best :: Int -> String -> String -> Int
A function that takes an Int, and two Strings, and that returns an Int. The return value is to be -1 if a k-subsequence is not possible. Otherwise it will return an integer between 0 and K (inclusive) telling us the smallest possible gap at the start of T.
We simply deal with the cases in order.
best k [] t -- empty S is a subsequence of anything!
| length(t) <= k = length(t)
| length(t) > k = -1
Above, we handle the case where S is empty ([]). This is simple, as an empty string is always a valid subsequence. But to test if it is a valid k-subsequence, we must calculate the length of T.
best k sss#(s:ss) [] = (-1)
-- if T is empty, and S is non-empty, then no subsequence is possible
That comment explains it. This leaves us with the situations where both strings are non-empty:
best k sss#(s:ss) tts#(t:ts) -- both are non-empty. Various possibilities:
| s == t && best k ss ts /= -1 = 0 -- if s==t, and if best k ss ts != -1, then we have the best outcome
| best k sss ts /= -1
&& best k sss ts < k = 1+ (best k sss ts) -- this is the only other possibility for a valid k-subsequence
| otherwise = -1 -- no more options left, return -1 for failure.
tts#(t:ts) matches a non-empty string. The name of the string is tts. But there is also a convenient trick in Haskell to allow you to give names to the first letter in the string (t) and the remainder of the string (ts). Here ts should be read aloud as the plural of t - the s suffix here means 'plural'. We say have have a t and some ts and together they make the full (non-empty) string.
That last block of code deals with the case where both strings are non-empty. The two strings are called sss and tts. But to save us the hassle of writing head sss and tail sss to access the first letter, and the string-remainer, of the string, we simply use #(s:ss) to tell the compiler to store those quantities into variables s and ss. If this was C++ for example, you'd get the same effect with char s = sss[0]; as the first line of your function.
The best situation is that the first characters match s==t and the remainder of the strings are a valid k-subsequence best k sss ts /= -1. This allows us to return 0.
The only other possibility for success if if the current complete string (sss) is a valid k-subsequence of the remainder of the other string (ts). We add 1 to this and return, but making an exception if the gap would grow too big.
It's very important not to change the order of those last five lines. They are order in decreasing order of how 'good' the score is. We want to test for, and return the very best possibilities first.
Naive recursive solution. Bonus := return value is the number of ways that the string can be matched.
#include <stdio.h>
#include <string.h>
unsigned skipneedle(char *haystack, char *needle, unsigned skipmax)
{
unsigned found,skipped;
// fprintf(stderr, "skipneedle(%s,%s,%u)\n", haystack, needle, skipmax);
if ( !*needle) return strlen(haystack) <= skipmax ? 1 : 0 ;
found = 0;
for (skipped=0; skipped <= skipmax ; haystack++,skipped++ ) {
if ( !*haystack ) break;
if ( *haystack == *needle) {
found += skipneedle(haystack+1, needle+1, skipmax);
}
}
return found;
}
int main(void)
{
char *ab = "ab";
char *test[] = {"ab" , "accb" , "abcccb" , "abcb", NULL}
, **cpp;
for (cpp = test; *cpp; cpp++ ) {
printf( "[%s,%s,%u]=%u \n"
, *cpp, ab, 2
, skipneedle(*cpp, ab, 2) );
}
return 0;
}
An O(p*n) solution where p = number of subsequences possible of S in T.
Scan the string T and maintain a list of possible subsequences of S that would have
1. Index of last character found and
2. Number of characters to be deleted found
Continue to update this list at each character of T.
Not sure if this is what your asking for, but you could create a list of characters from each String, and search for instances of the one list in the other, then if(list2.length-K > list1.length) return false.
Following is a proposed algorithm : - O(|T|*k) average case
1> scan T and store character indices in Hash Table :-
eg. S = "abc" T = "ababbc"
Symbol table entries : -
a = 1 3
b = 2 4 5
c = 6
2.> as we know isValidSub(S,T) = isValidSub(S(0,j),T) && (isValidSub(S(j+1,N),T)||....isValidSub(S(j+K,T),T))
a.> we will use the bottom up approach to solve above problem
b.> we will maintain an valid array Valid(len(S)) where each record points to a Hash Table (Explained as we go along solving further)
c.> Start from the last element of S, Look up for the indices stored corresponding to the character in Symbol Table
eg. in above example S[last] = "c"
in Symbol Table c = 6
Now we put records like (5,6) , (4,6) ,.... (6-k-1,6) into Hash table at Valid(last)
Explanation : - as s(6,len(S)) is valid subsequence hence s(0,6-i) ++ s(6,len(S)) (where i is in range(1,k+1)) is also valid subsequence provided s(0,6-i) is valid subsequence.
3.> start filling up Valid Array from last to 0 element : -
a.> take a indice from hash table entry corresponding to S[j] where j is current indice of Valid Array we are analysing.
b.> Check whether indice is in Valid(j+1) if less then add (indice-i,indice) where i in range(1,k+1) into Valid(j) Hash Table
example:-
S = "abc" T = "ababbc"
iteration 1 :
j = len(S) = 3
S[3] = 'c'
Symbol Table : c = 6
add (5,6),(4,6),(3,6) as K = 2 in Valid(j)
Valid(3) = {(5,6),(4,6),(3,6)}
j = 2
iteration 2 :
S[j] = 'b'
Symbol table: b = 2 4 5
Look up 2 in Valid(3) => not found => skip
Look up 4 in Valid(3) => found => add Valid(2) = {(3,4),(2,4),(1,4)}
Look up 5 in Valid(3) => found => add Valid(2) = {(3,4),(2,4),(1,4),(4,5)}
j = 1
iteration 3:
S[j] = "a"
Symbol Table : a = 1 3
Look up 1 in Valid(2) => not found
Look up 3 in Valid(2) => found => stop as it is last iteration
END
as 3 is found in Valid(2) that means there exists a valid subsequence starting at in T
Start = 3
4.> Reconstruct the solution moving downwards in Valid Array :-
example :
Start = 3
Look up 3 in Valid(2) => found (3,4)
Look up 4 in Valid(3) => found (4,6)
END
reconstructed solution (3,4,6) which is indeed valid subsequence
Remember (3,5,6) can also be a solution if we had added (3,5) instead of (3,4) in that iteration
Analysis of Time complexity & Space complexity : -
Time Complexity :
Step 1 : Scan T = O(|T|)
Step 2 : fill up all Valid entries O(|T|*k) using HashTable lookup is aprox O(1)
Step 3 : Reconstruct solution O(|S|)
Overall average case Time : O(|T|*k)
Space Complexity:
Symbol table = O(|T|+|S|)
Valid table = O(|T|*k) can be improved with optimizations
Overall space = O(|T|*k)
Java Implementation: -
public class Subsequence {
private ArrayList[] SymbolTable = null;
private HashMap[] Valid = null;
private String S;
private String T;
public ArrayList<Integer> getSubsequence(String S,String T,int K) {
this.S = S;
this.T = T;
if(S.length()>T.length())
return(null);
S = S.toLowerCase();
T = T.toLowerCase();
SymbolTable = new ArrayList[26];
for(int i=0;i<26;i++)
SymbolTable[i] = new ArrayList<Integer>();
char[] s1 = T.toCharArray();
char[] s2 = S.toCharArray();
//Calculate Symbol table
for(int i=0;i<T.length();i++) {
SymbolTable[s1[i]-'a'].add(i);
}
/* for(int j=0;j<26;j++) {
System.out.println(SymbolTable[j]);
}
*/
Valid = new HashMap[S.length()];
for(int i=0;i<S.length();i++)
Valid[i] = new HashMap<Integer,Integer >();
int Start = -1;
for(int j = S.length()-1;j>=0;j--) {
int index = s2[j] - 'a';
//System.out.println(index);
for(int m = 0;m<SymbolTable[index].size();m++) {
if(j==S.length()-1||Valid[j+1].containsKey(SymbolTable[index].get(m))) {
int value = (Integer)SymbolTable[index].get(m);
if(j==0) {
Start = value;
break;
}
for(int t=1;t<=K+1;t++) {
Valid[j].put(value-t, value);
}
}
}
}
/* for(int j=0;j<S.length();j++) {
System.out.println(Valid[j]);
}
*/
if(Start != -1) { //Solution exists
ArrayList subseq = new ArrayList<Integer>();
subseq.add(Start);
int prev = Start;
int next;
// Reconstruct solution
for(int i=1;i<S.length();i++) {
next = (Integer)Valid[i].get(prev);
subseq.add(next);
prev = next;
}
return(subseq);
}
return(null);
}
public static void main(String[] args) {
Subsequence sq = new Subsequence();
System.out.println(sq.getSubsequence("abc","ababbc", 2));
}
}
Consider a recursive approach: let int f(int i, int j) denote the minimum possible gap at the beginning for S[i...n] matching T[j...m]. f returns -1 if such matching does not exist. Here's the implementation of f:
int f(int i, int j){
if(j == m){
if(i == n)
return 0;
else
return -1;
}
if(i == n){
return m - j;
}
if(S[i] == T[j]){
int tmp = f(i + 1, j + 1);
if(tmp >= 0 && tmp <= k)
return 0;
}
return f(i, j + 1) + 1;
}
If we convert this recursive approach to a dynamic programming approach, then we can have a time complexity of O(nm).
Here's an implementation that usually* runs in O(N) and takes O(m) space, where m is length(S).
It uses the idea of a surveyor's chain:
Imagine a series of poles linked by chains of length k.
Achor the first pole at the beginning of the string.
Now cary the next pole forward until you find a character match.
Place that pole. If there is slack, move on to the next character;
else the previous pole has been dragged forward, and you need to go back
and move it to the next nearest match.
Repeat until you reach the end or run out of slack.
typedef struct chain_t{
int slack;
int pole;
} chainlink;
int subsequence_k_impl(char* t, char* s, int k, chainlink* link, int len)
{
char* match=s;
int extra = k; //total slack in the chain
//for all chars to match, including final null
while (match<=s+len){
//advance until we find spot for this post or run out of chain
while (t[link->pole] && t[link->pole]!=*match ){
link->pole++; link->slack--;
if (--extra<0) return 0; //no more slack, can't do it.
}
//if we ran out of ground, it's no good
if (t[link->pole] != *match) return 0;
//if this link has slack, go to next pole
if (link->slack>=0) {
link++; match++;
//if next pole was already placed,
while (link[-1].pole < link->pole) {
//recalc slack and advance again
extra += link->slack = k-(link->pole-link[-1].pole-1);
link++; match++;
}
//if not done
if (match<=s+len){
//currrent pole is out of order (or unplaced), move it next to prev one
link->pole = link[-1].pole+1;
extra+= link->slack = k;
}
}
//else drag the previous pole forward to the limit of the chain.
else if (match>=s) {
int drag = (link->pole - link[-1].pole -1)- k;
link--;match--;
link->pole+=drag;
link->slack-=drag;
}
}
//all poles planted. good match
return 1;
}
int subsequence_k(char* t, char* s, int k)
{
int l = strlen(s);
if (strlen(t)>(l+1)*(k+1))
return -1; //easy exit
else {
chainlink* chain = calloc(sizeof(chainlink),l+2);
chain[0].pole=-1; //first pole is anchored before the string
chain[0].slack=0;
chain[1].pole=0; //start searching at first char
chain[1].slack=k;
l = subsequence_k_impl(t,s,k,chain+1,l);
l=l?chain[1].pole:-1; //pos of first match or -1;
free(chain);
}
return l;
}
* I'm not sure of the big-O. I initially thought it was something like O(km+N). In testing, it averages less than 2N for good matches and less than N for failed matches.
...but.. there is a strange degenerate case. For random strings selected from an alphabet of size A, it gets much slower when k = 2A+1. Even this case it's better than O(Nm), and the performance returns to O(N) when k is increased or decreased slightly. Gist Here if anyone is curious.

Given a palindromic string, in how many ways we can convert it to a non palindrome by removing one more more characters from it? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions must demonstrate a minimal understanding of the problem being solved. Tell us what you've tried to do, why it didn't work, and how it should work. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
Given a palindromic string, in how many ways we can convert it to a non palindrome by removing one more more characters from it?
For example if the string is "b99b". Then we can do it in 6 ways,
i) Remove 1st character : "99b"
ii) Remove 1st, 2nd characters : "9b"
iii) Remove 1st, 3rd characters : "9b"
iv) Remove 2nd, 4th characters : "b9"
v) Remove 3rd, 4th characters : "b9"
vi) Remove 4th character : "b99"
How to approach this one?
PS:Two ways are considered different if there exists an i such that character at index i is removed in one way and not removed in another.
There's an O(n2) dynamic programming algorithm for counting the number of palindromic subsequences of a string; you can use that to count the number of non-palindromic subsequences by subtracting the number of palindromic subsequences from the number of subsequences (which is simply 2n).
This algorithm counts subsequences by the criterion in the OP; two subsequences are considered different if there is a difference in the list of indices used to select the elements, even if the resulting subsequences have the same elements.
To count palindromic subsequences, we build up the count based on intervals of the sequence. Specifically, we define:
Si,j = the substring of S starting at index i and ending at index j (inclusive)
Pi,j = the number of palindromic subsequences of Si,j
Now, every one-element interval is a palindrome, so:
Pi,i &equals; 1 for all i < n
If a substring does not begin and end with the same element (i.e., Si ≠ Sj) then the palindromic subsequences consist of:
Those which contain Si but do not contain Sj
Those which contain Sj but do not contain Si
Those which contain neither Si nor Sj
Now, note that Pi,j-1 includes both the first and the third set of subsequences, while Pi+1,j includes both the second and the third set; Pi+1,j-1 is precisely the third set. Consequently:
Pi,j &equals; Pi+1,j &plus; Pi,j-1 − Pi+1,j-1 if Si ≠ Sj
But what if Si &equals; Sj? In that case, we have to add the palindromes consisting of Si followed by a subsequence palindrome from Si+1,j-1 followed by Sj, as well as the palindromic subsequence consisting of just the start and end characters. (Technically, an empty sequence is a palindrome, but we don't count those here.) The number of subsequences we add is Pi+1,j-1 &plus; 1, which cancels out the subtracted double count in the above equation. So:
Pi,j &equals; Pi+1,j &plus; Pi,j-1 &plus; 1 if Si &equals; Sj.
In order to save space, we can actually compute Pi,i+k for 0 ≤ i < |S|-k for increasing values of k; we only need to retain two of these vectors in order to generate the final result P0,|S|-1.
EDIT:
Here's a little python program; the first one computes the number of palindromic subsequences, as above, and the driver computes the number of non-palindromic subsequences (i.e. the number of ways to remove zero or more elements and produce a non-palindrome; if the original sequence is a palindrome, then it's the number of ways to remove one or more elements.)
# Count the palindromic subsequences of s
def pcount(s):
length = len(s)
p0 = [0] * (length + 1)
p1 = [1] * length
for l in range(1, length):
for i in range(length - l):
p0[i] = p1[i]
if s[i] == s[i + l]:
p1[i] += p1[i+1] + 1
else:
p1[i] += p1[i+1] - p0[i+1]
# The "+ 1" is to account for the empty sequence, which is a palindrome.
return p1[0] + 1
# Count the non-palindromic subsequences of s
def npcount(s):
return 2**len(s) - pcount(s)
this is not a complete answer, just a suggestion.
i would count the number of ways you can remove one or more characters and keep the string a palindrome. then subtract that from the total number of ways you can modify the string.
the most obvious way to modify a palindrome and keep it a palindrome is to remove the i'th and the (n-i)'th characters (n being the length of the string). there are 2^(n/2) ways you can do that.
the problem with this approach is that it assumes only a symmetric modification can keep the string a palindrome, you need to find a way to handle cases such as "aaaa" where any sort of modification will still result in a palindrome.
Brute force with memoization is pretty straightforward:
numWays(str): return 0 if str is empty
return memo[str] if it exists
memo[str] = numWays(str - firstChar) +
numWays(str - secondChar) +
... +
1 if str is not a palindrome
return memo[str]
Basically, you remove every character in turn and save the answer for the resulting string. The more identical characters you have in the string, the faster this is.
I'm not sure how to do it more efficiently, I will update this if I figure it out.
For a string with N elements, there are 2^N possible substrings (including the whole string and the empty substring). Thus we can encode every substring by a number with a '1' bit at the bitposition for every omitted (or present) character, and a '0' bit otherwise. (assuming the length of the string is smaller then the number of bits in an int (size_t here), otherwise you would need an other representation for the bitstring):
#include <stdio.h>
#include <string.h>
char string[] = "AbbA";
int is_palindrome (char *str, size_t len, size_t mask);
int main(void)
{
size_t len,mask, count;
len = strlen(string);
count =0;
for (mask = 1; mask < (1ul <<len) -1; mask++) {
if ( is_palindrome (string, len, mask)) continue;
count++;
}
fprintf(stderr, "Len:=%u, Count=%u \n"
, (unsigned) len , (unsigned) count );
return 0;
}
int is_palindrome (char *str, size_t len, size_t mask)
{
size_t l,r,pop;
for (pop=l=0, r = len -1; l < r; ) {
if ( mask & (1u <<l)) { l++; continue; }
if ( mask & (1u <<r)) { r--; continue; }
if ( str[l] == str[r] ) return 1;
l++,r--; pop++;
}
return (pop <1) ? 1: 0;
}
Here's a Haskell version:
import Data.List
listNonPalindromes string =
filter (isNotPalindrome) (subsequences string)
where isNotPalindrome str
| fst substr == snd substr = False
| otherwise = True
where substr = let a = splitAt (div (length str) 2) str
in (reverse (fst a), if even (length str)
then snd a
else drop 1 (snd a))
howManyNonPalindromes string = length $ listNonPalindromes string
*Main> listNonPalindromes "b99b"
["b9","b9","b99","9b","9b","99b"]
*Main> howManyNonPalindromes "b99b"
6

Non increasing and Non Decreasing Subsequence

Finding non-decreasing subsequence is well known problem.
But this Question is a slight variant of the finding longest non-decreasing subsequence. In this problem we have to find the length of longest subsequence which comprises 2 disjoint sequences 1. non decreasing 2. non-increasing.
e.g. in string "aabcazcczba" longest such sequence is aabczcczba. aabczcczba is made up of 2 disjoint subsequence aabcZccZBA. (capital letter shows non-increasing sequence)
My algorithm is
length = 0
For i = 0 to length of given string S
let s' = find the longest non-decreasing subsequence starting at position i
let s" = find the longest non-increasing subsequence from S-s'.
if (length of s' + length of s") > length
length = (length of s' + length of s")
enter code here
But I am not sure whether this would give correct answer or not. Can you find a bug in this algo and if there is bug also suggest correct algorithm. Also I need to optimize the solution. My algorithm would take roughly o(n^4) steps.
Your solution is definitely incorrect. Eg. addddbc. The longest non-decreasing sequence is adddd, but that would never give you a non-increasing sequence. The optimal solution is abc and dddd ( or ab ddddc, or ac ddddb).
One solution is to use dynamic programming.
F(i, x, a, b) = 1, if there is a non-decreasing and non-increasing combo from first i letters of x ( x[:i]) such that last letter of non-decreasing part is a, and non-increasing part is b. Both of these letters equal to NULL if the corresponding sub-sequence is empty.
Otherwise F(i, x, a, b) = 0.
F(i+1,x,x[i+1],b) = 1 if there exists a and b such that
a<=x[i+1] or a=NULL and F(i,x,a,b)=1. 0 otherwise.
F(i+1,x,a,x[i+1]) = 1 if there exists a and b such that
b>=x[i+1] or b=NULL and F(i,x,a,b)=1. 0 otherwise.
Initialize F(0,x,NULL,NULL)=1 and iterate from i=1..n
As you can see, you can get F(i+1, x, a, b) from F(i, x, a, b). Complexity: Linear in length, polynomial in size of the alphabet.
I got the answer, And here is how it works, thanx to #ElKamina
maintain a table of 27X27 dimension. 27 = (1 Null character + 26 (alphabets))
table[i][j] denotes the length of the sub sequence whose non decreasing subsequence has last character 'i' and non increasing subsequence has last character 'j' (0th index denote null character and kth index denotes character 'k')
for i = 0 to length of string S
//subsequence whose non decreasing subsequence's last character is smaller than S[i], find such a subsequence of maximum length. Now S[i] can be part of this subsequence's non-decreasing part.
int lim = S[i] - 'a' + 1;
for(int k=0; k<27; k++){
if(lim == k) continue;
int tmax = 0;
for(int j=0; j<=lim; j++){
if(table[k][j] > tmax) tmax = table[k][j];
}
if(k == 0 && tmax == 0) table[0][lim] = 1;
else if (tmax != 0) table[k][lim] = tmax + 1;
}
//Simillarly for non-increasing subsequence
Time complexity is o(lengthOf(S)*27*27) and space complexity is o(27*27)

Resources