Build string of length L with exactly N palindromes in it - string

Given length of a string L and N - number of palindromes, build a string with exactly N palindromic substrings in it. For example,
L = 4
N = 2
S = 'aabb' or 'abba'
L = 4
N = 3,4,5
S = impossible
L = 4
N = 6
S = 'aaaa' (palindromes are substrings S[0:2], S[2:4], S[1:3], S[0:3], S[1:4], S[0:4])
UPDATE: all target palindromes should be of length > 1

You can introduce a variable s_i = {0, 1} for each character in the string S, then if a substring S[a..b] is a palindrome, it must be that
(s_a = s_b) and (s_{a+1} = s_{b-1}) and ...
so for each substring you have a clause and exactly N of them must be satisfied. This reduces the problem to satisfiability.
I would also be curious if you can solve it as an optimization problem:
for each substring introduce a variable x_i = {0, 1} that stands for the fact that substring number i is a palindrome (let it be S[a..b]). Then introduce a variable y_i for the clause of that substring:
(s_a-s_b)^2 + (s_{a+1}-s_{b-1})^2 + ...
Then you need to satisfy that \sum_{x_i} = N and minimize \sum_{x_i * y_i}. Obviously the minimum is 0 if a solution exists and the objective is always non-negative.
Edit
the optimisation idea seems to be false, since you need to enforce that if y_i = 0, then it must be that x_i = 0, but satisfiability formulation should work

Related

Minimum number of subsequences required to interleave one string to another

I saw the original question Minimum of subsequences required to convert one string to another, and it's very similar to this quesion 1055. Shortest Way to Form String on LeetCode.
Description:
Given two strings source and target, return the minimum number of subsequences of source such that their concatenation equals target. If the task is impossible, return -1.
Example 1:
Input: source = "abc", target = "abcbc"
Output: 2
Explanation: The target "abcbc" can be formed by "abc" and "bc", which are subsequences of source "abc".
Supposed s' is a subsequence of source, so this problem is to find s'_1s'_2...s'_k to form target. My question is how to find the minimum number of subsequences required to interleave one string to another.
eg:
Input: source = "adbsc", target = "addsbc"
Output: 2
Explanation:
step1: s'1 = adbc, then target' = ds
step2: s'2 = ds, then target' = ""
I don't know whether it can slove this problem by removing the longest common subsequence of source and target' to form t', and repeat it untill t' = "".
Here is my code:
class Solution:
def shortestWay(self, s: str, t: str) -> int:
def lcs(s: str, t: str) -> str:
m, n = len(s), len(t)
dp = [[0] * (n + 1) for _ in range(m + 1)]
for i in range(1, m + 1):
for j in range(1, n + 1):
if s[i-1] == t[j-1]:
dp[i][j] = dp[i-1][j-1] + 1
else:
dp[i][j] = max(dp[i-1][j], dp[i][j-1])
choose = [False] * n
i, j = m, n
while i + j != 0:
if s[i-1] == t[j-1]:
choose[j-1] = True
i, j = i - 1, j - 1
elif dp[i][j] == dp[i-1][j]:
i -= 1
elif dp[i][j] == dp[i][j-1]:
j -= 1
print(f'lcs({s}, {t}) is', ''.join(t[i] for i, v in enumerate(choose) if v))
return ''.join(t[i] for i, v in enumerate(choose) if not v)
ans = 0
while t:
ans += 1
t1 = lcs(s, t)
print(repr(t1))
if t1 == t:
return -1
t = t1
return ans
"""
lcs(adbsc, addsbc) is adbc
'ds'
lcs(adbsc, ds) is ds
''
"""
Is anybody can help me to proof/solve this problem, thanks!
Your working for your second example (source = "abc", target = "abcbc") doesn't make sense to me. Your algorithm idea (repeatedly removing the LCS from target) produces an optimal solution -- you just need to prove that no other solution can be better.
Proof sketch
Consider an optimal solution, containing OPT segments. If it equals your algorithm's solution then we're done; otherwise, it must consist of some number of common subsequence segments, the first k (possibly k=0) of which match the segments produced by your algorithm, followed by a (k+1)-th segment that is strictly shorter than your algorithm's (k+1)-th segment (since your algorithm always chooses the longest possible segment to add at each stage), followed by some number (possibly zero) of remaining segments.
Notice that if some subsequence of a string S is equal to a string T, then for any given suffix of T, we can certainly find a subsequence of S equal to it -- all we need to do is drop some of the initial characters from the subsequence.
So, getting back to our original problem: Some initial part of the remaining segments (possibly involving multiple segments) can be trimmed off to produce a list of segments that, when appended to the first k+1 segments produced by your algorithm, gives a solution that:
agrees with the first k+1 segments of your algorithm's solution, and
has segment count no worse than OPT.
The optimal solution we started with agreed with your algorithm's solution on the first k segments; this new optimal solution agrees on at least one more segment. If the new solution is not yet completely equal to your algorithm's solution, then the analysis can be repeated, producing a new solution that agrees with it on at least the first k+2 segments. This can be repeated until we ultimately produce a solution that agrees completely with your algorithm's solution and has length OPT. Since we made no assumptions about the input instance, this proves your algorithm produces an optimal solution on every input instance.

Find the maximum value of K such that sub-sequences A and B exist and should satisfy the mentioned conditions

Given a string S of length n. Choose an integer K and two non-empty sub-sequences A and B of length K such that it satisfies the following conditions:
A = B i.e. for each i the ith character in A is same as the ith character in B.
Let's denote the indices used to construct A as a1,a2,a3,...,an where ai belongs to S and B as b1,b2,b3,...,bn where bi belongs to S. If we denote the number of common indices in A and B by M then M + 1 <= K.
Find the maximum value of K such that it is possible to find the sub-sequences A and B which satisfies the above conditions.
Constraints:
0 < N <= 10^5
Things which I observed are:
The value of K = 0 if the number of characters in the given string are all distinct i.e S = abcd.
K = length of S - 1 if all the characters in the string are same i.e. S = aaaa.
The value of M cannot be equal to K because then M + 1 <= K will not be true i.e you cannot have a sub-sequence A and B that satifies A = B and a1 = b1, a2 = b2, a3 = b3, ..., an = bn.
If the string S is palindrome then K = (Total number of times a character is repeated in the string if the repeatation count > 1) - 1. i.e. S = tenet then t is repeated 2 times, e is repeated 2 times, Total number of times a character is repeated = 4, K = 4 - 1 = 3.
I am having trouble designing the algorithm to solve the above problem.
Let me know in the comments if you need more clarification.
(Update: see O(n) answer.)
We can modify the classic longest common subsequence recurrence to take an extra parameter.
JavaScript code (not memoised) that I hope is self explanatory:
function f(s, i, j, haveUncommon){
if (i < 0 || j < 0)
return haveUncommon ? 0 : -Infinity
if (s[i] == s[j]){
if (haveUncommon){
return 1 + f(s, i-1, j-1, true)
} else if (i == j){
return Math.max(
1 + f(s, i-1, j-1, false),
f(s, i-1, j, false),
f(s, i, j-1, false)
)
} else {
return 1 + f(s, i-1, j-1, true)
}
}
return Math.max(
f(s, i-1, j, haveUncommon),
f(s, i, j-1, haveUncommon)
)
}
var s = "aabcde"
console.log(f(s, s.length-1, s.length-1, false))
I believe we are just looking for the closest equal pair of characters since the only characters excluded from A and B would be one of the characters in the pair and any characters in between.
Here's O(n) in JavaScript:
function f(s){
let map = {}
let best = -1
for (let i=0; i<s.length; i++){
if (!map.hasOwnProperty(s[i])){
map[s[i]] = i
continue
}
best = Math.max(best, s.length - i + map[s[i]])
map[s[i]] = i
}
return best
}
var strs = [
"aabcde", // 5
"aaababcd", // 7
"aebgaseb", // 4
"aefttfea",
// aeft fea
"abcddbca",
// abcd bca,
"a" // -1
]
for (let s of strs)
console.log(`${ s }: ${ f(s) }`)
O(n) solution in Python3:
def compute_maximum_k(word):
last_occurences = {}
max_k = -1
for i in range(len(word)):
if(not last_occurences or not word[i] in last_occurences):
last_occurences[word[i]] = i
continue
max_k = max(max_k,(len(word) - i) + last_occurences[word[i]])
last_occurences[word[i]] = i
return max_k
def main():
words = ["aabcde","aaababcd","aebgaseb","aefttfea","abcddbca","a","acbdaadbca"]
for word in words:
print(compute_maximum_k(word))
if __name__ == "__main__":
main()
A solution for the maximum length substring would be the following:
After building a Suffix Array you can derive the LCP Array. The maximum value in the LCP array corresponds to the K you are looking for. The overall complexity of both constructions is O(n).
A suffix array will sort all prefixes in you string S in ascending order. The longest common prefix array then computes the lengths of the longest common prefixes (LCPs) between all pairs of consecutive suffixes in the sorted suffix array. Thus the maximum value in this array corresponds to the length of the two maximum length substrings of S.
For a nice example using the word "banana", check out the LCP Array Wikipage
I deleted my previous answer as I don't think we need an LCS-like solution (LCS=longest Common Subsequence).
It is sufficient to find the couple of subsequences (A, B) that differ in one character and share all the others.
The code below finds the solution in O(N) time.
def function(word):
dp = [0]*len(word)
lastOccurences = {}
for i in range(len(dp)-1, -1, -1):
if i == len(dp)-1:
dp[i] = 0
else:
if dp[i+1] > 0:
dp[i] = 1 + dp[i+1]
elif word[i] in lastOccurences:
dp[i] = len(word)-lastOccurences[word[i]]
lastOccurences[word[i]] = i
return dp[0]
dp[i] is equal to 0 when all characters from i to the end of the string are different.
I will explain my code by an example.
For "abcack", there are two cases:
Either the first 'a' will be shared by the two subsequences A and B, in this case the solution will be = 1 + function("bcack")
Or 'a' will not be shared between A and B. In this case the result will be 1 + "ck". Why 1 + "ck" ? It's because we have already satisfied M+1<=K so just add all the remaining characters. In terms of indices, the substrings are [0, 4, 5] and [3, 4, 5].
We take the maximum between these two cases.
The reason I'm scanning right to left is to not have O(N) search for the current character in the rest of the string, I maintain the index of the last visited occurence of the character in the dict lastOccurences.

String lexicographical permutation and inversion

Consider the following function on a string:
int F(string S)
{
int N = S.size();
int T = 0;
for (int i = 0; i < N; i++)
for (int j = i + 1; j < N; j++)
if (S[i] > S[j])
T++;
return T;
}
A string S0 of length N with all pairwise distinct characters has a total of N! unique permutations.
For example "bac" has the following 6 permutations:
bac
abc
cba
bca
acb
cab
Consider these N! strings in lexicographical order:
abc
acb
bac
bca
cab
cba
Now consider the application of F to each of these strings:
F("abc") = 0
F("acb") = 1
F("bac") = 1
F("bca") = 2
F("cab") = 2
F("cba") = 3
Given some string S1 of this set of permutations, we want to find the next string S2 in the set, that has the following relationship to S1:
F(S2) == F(S1) + 1
For example if S1 == "acb" (F = 1) than S2 == "bca" (F = 1 + 1 = 2)
One way to do this would be to start at one past S1 and iterate through the list of permutations looking for F(S) = F(S1)+1. This is unfortunately O(N!).
By what O(N) function on S1 can we calculate S2 directly?
Suppose length of S1 is n, biggest value for F(S1) is n(n-1)/2, if F(S1) = n(n-1)/2, means it's a last function and there isn't any next for it, but if F(S1) < n(n-1)/2, means there is at least one char x which is bigger than char y and x is next to y, find such a x with lowest index, and change x and y places. let see it by example:
S1 == "acb" (F = 1) , 1 < 3 so there is a char x which is bigger than another char y and its index is bigger than y, here smallest index x is c, and by first try you will replace it with a (which is smaller than x so algorithm finishes here)==> S2= "cab", F(S2) = 2.
Now let test it with S2, cab: x=b, y=a, ==> S3 = "cba".\
finding x is not hard, iterate the input, and have a variable name it min, while current visited character is smaller than min, set min as newly visited char, and visit next character, first time you visit a character which is bigger than min stop iteration, this is x:
This is pseudocode in c# (but I wasn't careful about boundaries e.g in input.Substring):
string NextString(string input)
{
var min = input[0];
int i=1;
while (i < input.Length && input[i] < min)
{
min = input[i];
i++;
}
if (i == input.Length) return "There isn't next item";
var x = input[i], y=input[i-1];
return input.Substring(0,i-2) + x + y + input.Substring(i,input.Length - 1 - i);
}
Here's the outline of an algorithm for a solution to your problem.
I'll assume that you have a function to directly return the n-th permutation (given n) and its inverse, ie a function to return n given a permutation. Let these be perm(n) and perm'(n) respectively.
If I've figured it correctly, when you have a 4-letter string to permute the function F goes like this:
F("abcd") = 0
F("abdc") = 1
F(perm(3)) = 1
F(...) = 2
F(...) = 2
F(...) = 3
F(perm(7)) = 1
F(...) = 2
F(...) = 2
F(...) = 3
F(...) = 3
F(...) = 4
F(perm(13)) = 2
F(...) = 3
F(...) = 3
F(...) = 4
F(...) = 4
F(...) = 5
F(perm(19)) = 3
F(...) = 4
F(...) = 4
F(...) = 5
F(...) = 5
F(perm(24)) = 6
In words, when you go from 3 letters to 4 you get 4 copies of the table of values of F, adding (0,1,2,3) to the (1st,2nd,3rd,4th) copy respectively. In the 2nd case, for example, you already have one derangement by putting the 2nd letter in the 1st place; this simply gets added to the other derangements in the same pattern as would be true for the original 3-letter strings.
From this outline it shouldn't be too difficult (but I haven't got time right now) to write the function F. Strictly speaking the inverse of F isn't a function as it would be multi-valued, but given n, and F(n) there are only a few cases for finding m st F(m)==F(n)+1. These cases are:
n == N! where N is the number of letters in the string, there is no next permutation;
F(n+1) < F(n), the sought-for solution is perm(n+(N-1)!), ;
F(n+1) == F(n), the solution is perm(n+2);
F(n+1) > F(n), the solution is perm(n+1).
I suspect that some of this might only work for 4 letter strings, that some of these terms will have to be adjusted for K-letter permutations.
This is not O(n), but it is at least O(n²) (where n is the number of elements in the permutation, in your example 3).
First, notice that whenever you place a character in your string, you already know how much of an increase in F that's going to mean -- it's however many characters smaller than that one that haven't been added to the string yet.
This gives us another algorithm to calculate F(n):
used = set()
def get_inversions(S1):
inv = 0
for index, ch in enumerate(S1):
character = ord(ch)-ord('a')
cnt = sum(1 for x in range(character) if x not in used)
inv += cnt
used.add(character)
return inv
This is not much better than the original version, but it is useful when inverting F. You want to know the first string that is lexicographically smaller -- therefore, it makes sense to copy your original string and only change it whenever mandatory. When such changes are required, we should also change the string by the least amount possible.
To do so, let's use the information that the biggest value of F for a string with n letters is n(n-1)/2. Whenever the number of required inversions would be bigger than this amount if we didn't change the original string, this means we must swap a letter at that point. Code in Python:
used = set()
def get_inversions(S1):
inv = 0
for index, ch in enumerate(S1):
character = ord(ch)-ord('a')
cnt = sum(1 for x in range(character) if x not in used)
inv += cnt
used.add(character)
return inv
def f_recursive(n, S1, inv, ign):
if n == 0: return ""
delta = inv - (n-1)*(n-2)/2
if ign:
cnt = 0
ch = 0
else:
ch = ord(S1[len(S1)-n])-ord('a')
cnt = sum(1 for x in range(ch) if x not in used)
for letter in range(ch, len(S1)):
if letter not in used:
if cnt < delta:
cnt += 1
continue
used.add(letter)
if letter != ch: ign = True
return chr(letter+ord('a'))+f_recursive(n-1, S1, inv-cnt, ign)
def F_inv(S1):
used.clear()
inv = get_inversions(S1)
used.clear()
return f_recursive(len(S1), S1, inv+1, False)
print F_inv("acb")
It can also be made to run in O(n log n) by replacing the innermost loop with a data structure such as a binary indexed tree.
Did you try to swap two neighbor characters in the string? It seems that it can help to solve the problem. If you swap S[i] and S[j], where i < j and S[i] < S[j], then F(S) increases by one, because all other pairs of indices are not affected by this permutation.
If I'm not mistaken, F calculates the number of inversions of the permutation.

Non increasing and Non Decreasing Subsequence

Finding non-decreasing subsequence is well known problem.
But this Question is a slight variant of the finding longest non-decreasing subsequence. In this problem we have to find the length of longest subsequence which comprises 2 disjoint sequences 1. non decreasing 2. non-increasing.
e.g. in string "aabcazcczba" longest such sequence is aabczcczba. aabczcczba is made up of 2 disjoint subsequence aabcZccZBA. (capital letter shows non-increasing sequence)
My algorithm is
length = 0
For i = 0 to length of given string S
let s' = find the longest non-decreasing subsequence starting at position i
let s" = find the longest non-increasing subsequence from S-s'.
if (length of s' + length of s") > length
length = (length of s' + length of s")
enter code here
But I am not sure whether this would give correct answer or not. Can you find a bug in this algo and if there is bug also suggest correct algorithm. Also I need to optimize the solution. My algorithm would take roughly o(n^4) steps.
Your solution is definitely incorrect. Eg. addddbc. The longest non-decreasing sequence is adddd, but that would never give you a non-increasing sequence. The optimal solution is abc and dddd ( or ab ddddc, or ac ddddb).
One solution is to use dynamic programming.
F(i, x, a, b) = 1, if there is a non-decreasing and non-increasing combo from first i letters of x ( x[:i]) such that last letter of non-decreasing part is a, and non-increasing part is b. Both of these letters equal to NULL if the corresponding sub-sequence is empty.
Otherwise F(i, x, a, b) = 0.
F(i+1,x,x[i+1],b) = 1 if there exists a and b such that
a<=x[i+1] or a=NULL and F(i,x,a,b)=1. 0 otherwise.
F(i+1,x,a,x[i+1]) = 1 if there exists a and b such that
b>=x[i+1] or b=NULL and F(i,x,a,b)=1. 0 otherwise.
Initialize F(0,x,NULL,NULL)=1 and iterate from i=1..n
As you can see, you can get F(i+1, x, a, b) from F(i, x, a, b). Complexity: Linear in length, polynomial in size of the alphabet.
I got the answer, And here is how it works, thanx to #ElKamina
maintain a table of 27X27 dimension. 27 = (1 Null character + 26 (alphabets))
table[i][j] denotes the length of the sub sequence whose non decreasing subsequence has last character 'i' and non increasing subsequence has last character 'j' (0th index denote null character and kth index denotes character 'k')
for i = 0 to length of string S
//subsequence whose non decreasing subsequence's last character is smaller than S[i], find such a subsequence of maximum length. Now S[i] can be part of this subsequence's non-decreasing part.
int lim = S[i] - 'a' + 1;
for(int k=0; k<27; k++){
if(lim == k) continue;
int tmax = 0;
for(int j=0; j<=lim; j++){
if(table[k][j] > tmax) tmax = table[k][j];
}
if(k == 0 && tmax == 0) table[0][lim] = 1;
else if (tmax != 0) table[k][lim] = tmax + 1;
}
//Simillarly for non-increasing subsequence
Time complexity is o(lengthOf(S)*27*27) and space complexity is o(27*27)

How to find all combinations of a multiset in a string in linear time?

I am given a bag B (multiset) of characters with the size m and a string text S of size n. Is it possible to find all substrings that can be created by B (4!=24 combinations) in S in linear time O(n)?
Example:
S = abdcdbcdadcdcbbcadc (n=19)
B = {b, c, c, d} (m=4)
Result: {cdbc (Position 3), cdcb (Position 10)}
The fastest solution I found is to keep a counter for each character and compare it with the Bag in each step, thus the runtime is O(n*m). Algorithm can be shown if needed.
There is a way to do it in O(n), assuming we're only interested in substrings of length m (otherwise it's impossible, because for the bag that has all characters in the string, you'd have to return all substrings of s, which means a O(n^2) result that can't be computed in O(n)).
The algorithm is as follows:
Convert the bag to a histogram:
hist = []
for c in B do:
hist[c] = hist[c] + 1
Initialize a running histogram that we're going to modify (histrunsum is the total count of characters in histrun):
histrun = []
histrunsum = 0
We need two operations: add a character to the histogram and remove it. They operate as follows:
add(c):
if hist[c] > 0 and histrun[c] < hist[c] then:
histrun[c] = histrun[c] + 1
histrunsum = histrunsum + 1
remove(c):
if histrun[c] > 0 then:
histrun[c] = histrun[c] - 1
histrunsum = histrunsum + 1
Essentially, histrun captures the amount of characters that are present in B in current substring. If histrun is equal to hist, our substring has the same characters as B. histrun is equal to hist iff histrunsum is equal to length of B.
Now add first m characters to histrun; if histrunsum is equal to length of B; emit first substring; now, until we reach the end of string, remove the first character of the current substring and add the next character.
add, remove are O(1) since hist and histrun are arrays; checking if hist is equal to histrun is done by comparing histrunsum to length(B), so it's also O(1). Loop iteration count is O(n), the resulting running time is O(n).
Thanks for the answer. The add() and remove() methods have to be changed to make the algorithm work correctly.
add(c):
if hist[c] > 0 and histrun[c] < hist[c] then
histrunsum++
else
histrunsum--
histrun[c] = histrun[c] + 1
remove(c):
if histrun[c] > hist[c] then
histrunsum++
else
histrunsum--
histrun[c] = histrun[c] - 1
Explanation:
histrunsum can be seen as a score of how identical both multisets are.
add(c): when there are less occurrences of a char in the histrun multiset than in the hist multiset, the additional occurrence of that char has to be "rewarded" since the histrun multiset is getting closer to the hist multiset. If there are at least equal or more chars in the histrun set already, and additional char is negative.
remove(c): like add(c), where a removal of a char is weighted positively when it's number in the histrun multiset > hist multiset.
Sample Code (PHP):
function multisetSubstrings($sequence, $mset)
{
$multiSet = array();
$substringLength = 0;
foreach ($mset as $char)
{
$multiSet[$char]++;
$substringLength++;
}
$sum = 0;
$currentSet = array();
$result = array();
for ($i=0;$i<strlen($sequence);$i++)
{
if ($i>=$substringLength)
{
$c = $sequence[$i-$substringLength];
if ($currentSet[$c] > $multiSet[$c])
$sum++;
else
$sum--;
$currentSet[$c]--;
}
$c = $sequence[$i];
if ($currentSet[$c] < $multiSet[$c])
$sum++;
else
$sum--;
$currentSet[$c]++;
echo $sum."<br>";
if ($sum==$substringLength)
$result[] = $i+1-$substringLength;
}
return $result;
}
Use hashing. For each character in the multiset, assign a UNIQUE prime number. Compute the hash for any string by multiplying the prime number associated with a number, as many times as the frequency of that number.
Example : CATTA. Let C = 2, A=3, T = 5. Hash = 2*3*5*5*3 = 450
Hash the multiset ( treat it as a string ). Now go through the input string, and compute the hash of each substring of length k ( where k is the number of characters in the multiset ). Check if this hash matches the multiset hash. If yes, then it is one such occurence.
The hashes can be computed very easily in linear time as follows :
Let multiset = { A, A, B, C }, A=2, B=3, C=5.
Multiset hash = 2*2*3*5 = 60
Let text = CABBAACCA
(i) CABB = 5*2*3*3 = 90
(ii) Now, the next letter is A, and the letter discarded is the first one, C. So the new hash = ( 90/5 )*2 = 36
(iii) Now, A is discarded, and A is also added, so new hash = ( 36/2 ) * 2= 36
(iv) Now B is discarded, and C is added, so hash = ( 36/3 ) * 5 = 60 = multiset hash. Thus we have found one such required occurence - BAAC
This procedure will obviously take O( n ) time.

Resources