Changing letters of a string to obtain maximum score - string

You are given a string and can change at most Q letters in the string. You are also given a list of substrings (each two characters long), with a corresponding score. Each occurance of the substring within the string adds to your total score. What is the maximum possible attainable score?
String length <= 150, Q <= 100, Number of Substrings <= 700
Example:
String = bpdcg
Q = 2
Substrings:
bz - score: 2
zd - score: 5
dm - score: 7
ng - score: 10
In this example, you can achieve the maximum score b changing the "p" in the string to a "z" and the "c" to an "n". Thus, your new string is "bzdng" which has a score of 2+5+10 = 17.
I know that given a string which already has the letters changed, the score can be checked in linear time using a dictionary matching algorithm such as aho-corasick (or with a slightly worse complexity, Rabin Karp). However, trying each two letter substitution will take too long and then checking will take too long.
Another possible method I thought was to work backwards, to construct the ideal string from the given substrings and then check whether it differs by at most two characters from the original string. However, I am not sure how to do this, and even if it could be done, I think that it would also take too long.
What is the best way to go about this?

An efficient way to solve this is to use dynamic programming.
Let L be the set of letters that start any of the length-2 scoring substrings, and a special letter "*" which stands for any other letter than these.
Let S(i, j, c) be the maximum score possible in the string (up to index i) using j substitutions, where the string ends with character c (where c in L).
The recurrence relations are a bit messy (or at least, I didn't find a particularly beautiful formulation of them), but here's some code that computes the largest score possible:
infinity = 100000000
def S1(L1, L2, s, i, j, c, scores, cache):
key = (i, j, c)
if key not in cache:
if i == 0:
if c != '*' and s[0] != c:
v = 0 if j >= 1 else -infinity
else:
v = 0 if j >= 0 else -infinity
else:
v = -infinity
for d in L1:
for c2 in [c] if c != '*' else L2 + s[i]:
jdiff = 1 if s[i] != c2 else 0
score = S1(L1, L2, s, i-1, j-jdiff, d, scores, cache)
score += scores.get(d+c2 , 0)
v = max(v, score)
cache[key] = v
return cache[key]
def S(s, Q, scores):
L1 = ''.join(sorted(set(w[0] for w in scores))) + '*'
L2 = ''.join(sorted(set(w[1] for w in scores)))
return S1(L1, L2, s + '.', len(s), Q, '.', scores, {})
print S('bpdcg', 2, {'bz': 2, 'zd': 5, 'dm': 7, 'ng': 10})
There's some room for optimisation:
the computation isn't terminated early if j goes negative
when given a choice, every value of L2 is tried, whereas only letters that can complete a scoring word from d need trying.
Overall, if there's k different letters in the scoring words, the algorithm runs in time O(QN*k^2). With the second optimisation above, this can be reduced to O(QNw) where w is the number of scoring words.

Related

The algorithm receives a natural number N > 1 as input and builds a new number R from it as follows:

Python.
It's a problem:
The algorithm receives a natural number N > 1 as input and builds a new number R from it as follows:
We translate the number N into binary notation.
Invert all bits of the number except the first one.
Convert to decimal notation.
Add the result with the original number N.
The resulting number is the desired number R. Indicate the smallest odd number N for which the result of this algorithm is greater than 310. In your answer, write this number in decimal notation.
This is my solution:
for n in range(2, 10000):
s = bin(n)[2:]
for i in range(len(s)):
if s[i+1] == 0:
s[i] = '1'
else:
s[i] = 'k'
for i in range(len(s)):
if s[i] == 'k':
s[i] = '0'
h = int(s, 2)
r = h + n
if n % 2 == 1 and r > 310:
print(n)
break
So it doesn't work and i dont know why. I am now preparing for the exam, so I would be grateful if you could explain the reason to me
the bin function returns a string and my idea is to go through the binary elements of this string, starting from the second element, to replace 0 with 1, and 1 with k. Then iterate over the elements of a new line again and replace k with 0
Took me longer than I expected but feels good.
Comments might make it look chaotic but will make it easily understandable.
#since N is supposed to be odd and >1 the loop is being run from 3
for N in range(3, 10000,2):
#appending binary numbers to the list bin_li
bin_li=[]
bin_li.append((bin(N)[2:]))
for i in bin_li:
#print("bin_li item :",i)
#storing 1st digit to be escaped in j
j=i[:1]
#reversing the digits
for k in i[1:]:
if k=='0':
#putting together the digits after reversing
j=j+'1'
else:
j=j+'0'
#print("reversed item :",j) #note first digit is escaped
#converting back to decimal
dec=int(j,2)
R=dec+N
#print("current sum:---------" ,R)
if R > 310:
print("The number N :",N)
print("The reversed binary number:",dec)
print("Sum :",R)
break
#break will only break the inner loop
# for reference https://www.geeksforgeeks.org/how-to-break-out-of-multiple-loops-in-python/
else:
continue
break

Minimum number of subsequences required to interleave one string to another

I saw the original question Minimum of subsequences required to convert one string to another, and it's very similar to this quesion 1055. Shortest Way to Form String on LeetCode.
Description:
Given two strings source and target, return the minimum number of subsequences of source such that their concatenation equals target. If the task is impossible, return -1.
Example 1:
Input: source = "abc", target = "abcbc"
Output: 2
Explanation: The target "abcbc" can be formed by "abc" and "bc", which are subsequences of source "abc".
Supposed s' is a subsequence of source, so this problem is to find s'_1s'_2...s'_k to form target. My question is how to find the minimum number of subsequences required to interleave one string to another.
eg:
Input: source = "adbsc", target = "addsbc"
Output: 2
Explanation:
step1: s'1 = adbc, then target' = ds
step2: s'2 = ds, then target' = ""
I don't know whether it can slove this problem by removing the longest common subsequence of source and target' to form t', and repeat it untill t' = "".
Here is my code:
class Solution:
def shortestWay(self, s: str, t: str) -> int:
def lcs(s: str, t: str) -> str:
m, n = len(s), len(t)
dp = [[0] * (n + 1) for _ in range(m + 1)]
for i in range(1, m + 1):
for j in range(1, n + 1):
if s[i-1] == t[j-1]:
dp[i][j] = dp[i-1][j-1] + 1
else:
dp[i][j] = max(dp[i-1][j], dp[i][j-1])
choose = [False] * n
i, j = m, n
while i + j != 0:
if s[i-1] == t[j-1]:
choose[j-1] = True
i, j = i - 1, j - 1
elif dp[i][j] == dp[i-1][j]:
i -= 1
elif dp[i][j] == dp[i][j-1]:
j -= 1
print(f'lcs({s}, {t}) is', ''.join(t[i] for i, v in enumerate(choose) if v))
return ''.join(t[i] for i, v in enumerate(choose) if not v)
ans = 0
while t:
ans += 1
t1 = lcs(s, t)
print(repr(t1))
if t1 == t:
return -1
t = t1
return ans
"""
lcs(adbsc, addsbc) is adbc
'ds'
lcs(adbsc, ds) is ds
''
"""
Is anybody can help me to proof/solve this problem, thanks!
Your working for your second example (source = "abc", target = "abcbc") doesn't make sense to me. Your algorithm idea (repeatedly removing the LCS from target) produces an optimal solution -- you just need to prove that no other solution can be better.
Proof sketch
Consider an optimal solution, containing OPT segments. If it equals your algorithm's solution then we're done; otherwise, it must consist of some number of common subsequence segments, the first k (possibly k=0) of which match the segments produced by your algorithm, followed by a (k+1)-th segment that is strictly shorter than your algorithm's (k+1)-th segment (since your algorithm always chooses the longest possible segment to add at each stage), followed by some number (possibly zero) of remaining segments.
Notice that if some subsequence of a string S is equal to a string T, then for any given suffix of T, we can certainly find a subsequence of S equal to it -- all we need to do is drop some of the initial characters from the subsequence.
So, getting back to our original problem: Some initial part of the remaining segments (possibly involving multiple segments) can be trimmed off to produce a list of segments that, when appended to the first k+1 segments produced by your algorithm, gives a solution that:
agrees with the first k+1 segments of your algorithm's solution, and
has segment count no worse than OPT.
The optimal solution we started with agreed with your algorithm's solution on the first k segments; this new optimal solution agrees on at least one more segment. If the new solution is not yet completely equal to your algorithm's solution, then the analysis can be repeated, producing a new solution that agrees with it on at least the first k+2 segments. This can be repeated until we ultimately produce a solution that agrees completely with your algorithm's solution and has length OPT. Since we made no assumptions about the input instance, this proves your algorithm produces an optimal solution on every input instance.

Find the maximum value of K such that sub-sequences A and B exist and should satisfy the mentioned conditions

Given a string S of length n. Choose an integer K and two non-empty sub-sequences A and B of length K such that it satisfies the following conditions:
A = B i.e. for each i the ith character in A is same as the ith character in B.
Let's denote the indices used to construct A as a1,a2,a3,...,an where ai belongs to S and B as b1,b2,b3,...,bn where bi belongs to S. If we denote the number of common indices in A and B by M then M + 1 <= K.
Find the maximum value of K such that it is possible to find the sub-sequences A and B which satisfies the above conditions.
Constraints:
0 < N <= 10^5
Things which I observed are:
The value of K = 0 if the number of characters in the given string are all distinct i.e S = abcd.
K = length of S - 1 if all the characters in the string are same i.e. S = aaaa.
The value of M cannot be equal to K because then M + 1 <= K will not be true i.e you cannot have a sub-sequence A and B that satifies A = B and a1 = b1, a2 = b2, a3 = b3, ..., an = bn.
If the string S is palindrome then K = (Total number of times a character is repeated in the string if the repeatation count > 1) - 1. i.e. S = tenet then t is repeated 2 times, e is repeated 2 times, Total number of times a character is repeated = 4, K = 4 - 1 = 3.
I am having trouble designing the algorithm to solve the above problem.
Let me know in the comments if you need more clarification.
(Update: see O(n) answer.)
We can modify the classic longest common subsequence recurrence to take an extra parameter.
JavaScript code (not memoised) that I hope is self explanatory:
function f(s, i, j, haveUncommon){
if (i < 0 || j < 0)
return haveUncommon ? 0 : -Infinity
if (s[i] == s[j]){
if (haveUncommon){
return 1 + f(s, i-1, j-1, true)
} else if (i == j){
return Math.max(
1 + f(s, i-1, j-1, false),
f(s, i-1, j, false),
f(s, i, j-1, false)
)
} else {
return 1 + f(s, i-1, j-1, true)
}
}
return Math.max(
f(s, i-1, j, haveUncommon),
f(s, i, j-1, haveUncommon)
)
}
var s = "aabcde"
console.log(f(s, s.length-1, s.length-1, false))
I believe we are just looking for the closest equal pair of characters since the only characters excluded from A and B would be one of the characters in the pair and any characters in between.
Here's O(n) in JavaScript:
function f(s){
let map = {}
let best = -1
for (let i=0; i<s.length; i++){
if (!map.hasOwnProperty(s[i])){
map[s[i]] = i
continue
}
best = Math.max(best, s.length - i + map[s[i]])
map[s[i]] = i
}
return best
}
var strs = [
"aabcde", // 5
"aaababcd", // 7
"aebgaseb", // 4
"aefttfea",
// aeft fea
"abcddbca",
// abcd bca,
"a" // -1
]
for (let s of strs)
console.log(`${ s }: ${ f(s) }`)
O(n) solution in Python3:
def compute_maximum_k(word):
last_occurences = {}
max_k = -1
for i in range(len(word)):
if(not last_occurences or not word[i] in last_occurences):
last_occurences[word[i]] = i
continue
max_k = max(max_k,(len(word) - i) + last_occurences[word[i]])
last_occurences[word[i]] = i
return max_k
def main():
words = ["aabcde","aaababcd","aebgaseb","aefttfea","abcddbca","a","acbdaadbca"]
for word in words:
print(compute_maximum_k(word))
if __name__ == "__main__":
main()
A solution for the maximum length substring would be the following:
After building a Suffix Array you can derive the LCP Array. The maximum value in the LCP array corresponds to the K you are looking for. The overall complexity of both constructions is O(n).
A suffix array will sort all prefixes in you string S in ascending order. The longest common prefix array then computes the lengths of the longest common prefixes (LCPs) between all pairs of consecutive suffixes in the sorted suffix array. Thus the maximum value in this array corresponds to the length of the two maximum length substrings of S.
For a nice example using the word "banana", check out the LCP Array Wikipage
I deleted my previous answer as I don't think we need an LCS-like solution (LCS=longest Common Subsequence).
It is sufficient to find the couple of subsequences (A, B) that differ in one character and share all the others.
The code below finds the solution in O(N) time.
def function(word):
dp = [0]*len(word)
lastOccurences = {}
for i in range(len(dp)-1, -1, -1):
if i == len(dp)-1:
dp[i] = 0
else:
if dp[i+1] > 0:
dp[i] = 1 + dp[i+1]
elif word[i] in lastOccurences:
dp[i] = len(word)-lastOccurences[word[i]]
lastOccurences[word[i]] = i
return dp[0]
dp[i] is equal to 0 when all characters from i to the end of the string are different.
I will explain my code by an example.
For "abcack", there are two cases:
Either the first 'a' will be shared by the two subsequences A and B, in this case the solution will be = 1 + function("bcack")
Or 'a' will not be shared between A and B. In this case the result will be 1 + "ck". Why 1 + "ck" ? It's because we have already satisfied M+1<=K so just add all the remaining characters. In terms of indices, the substrings are [0, 4, 5] and [3, 4, 5].
We take the maximum between these two cases.
The reason I'm scanning right to left is to not have O(N) search for the current character in the rest of the string, I maintain the index of the last visited occurence of the character in the dict lastOccurences.

String permutations rank + data structure

The problem at hand is:
Given a string. Tell its rank among all its permutations sorted
lexicographically.
The question can be attempted mathematically, but I was wondering if there was some other algorithmic method to calculate it ?
Also if we have to store all the string permutations rankwise , how can we generate them efficiently (and what would be the complexity) . What would be a good data structure for storing the permutations and which is also efficient for retrieval?
EDIT
Thanks for the detailed answers on the permutations generation part, could someone also suggest a good data structure? I have only been able to think of trie tree.
There is an O(n|Σ|) algorithm to find the rank of a string of length n in the list of its permutations. Here, Σ is the alphabet.
Algorithm
Every permutation which is ranked below s can be written uniquely in the form pcx; where:
p is a proper prefix of s
c is a character ranked below the character appearing just after p in s. And c is also a character occurring in the part of s not included in p.
x is any permutation of the remaining characters occurring in s; i.e. not included in p or c.
We can count the permutations included in each of these classes by iterating through each prefix of s in increasing order of length, while maintaining the frequency of the characters appearing in the remaining part of s, as well as the number of permutations x represents. The details are left to the reader.
This is assuming the arithmetic operations involved take constant time; which it wont; since the numbers involved can have nlog|Σ| digits. With this consideration, the algorithm will run in O(n2 log|Σ| log(nlog|Σ|)). Since we can add, subtract, multiply and divide two d-digit numbers in O(dlogd).
C++ Implementation
typedef long long int lli;
lli rank(string s){
int n = s.length();
vector<lli> factorial(n+1,1);
for(int i = 1; i <= n; i++)
factorial[i] = i * factorial[i-1];
vector<int> freq(26);
lli den = 1;
lli ret = 0;
for(int i = n-1; i >= 0; i--){
int si = s[i]-'a';
freq[si]++;
den *= freq[si];
for(int c = 0; c < si; c++)
if(freq[c] > 0)
ret += factorial[n-i-1] / (den / freq[c]);
}
return ret + 1;
}
This is similar to the quickselect algorithm. In an unsorted array of integers, find the index of some particular array element. The partition element would be the given string.
Edit:
Actually it is similar to partition method done in QuickSort. The given string is the partition element.Once all permutations are generated, the complexity to find the rank for strings with length k would be O(nk). You can generate string permutations using recursion and store them in a linked list. You can pass this linked list to the partition method.
Here's the java code to generate all String permutations:
private static int generateStringPermutations(String name,int currIndex) {
int sum = 0;
for(int j=name.length()-1;j>=0;j--) {
for(int i=j-1;((i<j) && (i>currIndex));i--) {
String swappedString = swapCharsInString(name,i,j);
list.add(swappedString);
//System.out.println(swappedString);
sum++;
sum = sum + generateStringPermutations(swappedString,i);
}
}
return sum;
}
Edit:
Generating all permutations is costly. If a string contains distinct characters, the rank can be determined without generating all permutations. Here's the link.
This can be extended for cases where there are repeating characters.
Instead of x * (n-1)! which is for distinct cases mentioned as in the link,
For repeating characters it will be:
if there is 1 character which is repeating twice,
x* (n-1)!/2!
Let's take an example. For string abca the combinations are:
aabc,aacb,abac,abca,acab,acba,baac,baca,bcaa,caab,caba,cbaa (in sorted order)
Total combinations = 4!/2! = 12
if we want to find rank of 'bcaa' then we know all strings starting with 'a' are before which is 3! = 6.
Note that because 'a' is the starting character, the remaining characters are a,b,c and there are no repetitions so it is 3!. We also know strings starting with 'ba' will be before which is 2! = 2 so it's rank is 9.
Another example. If we want to find the rank of 'caba':
All strings starting with a are before = 6.
All strings starting with b are before = 3!/2! = 3 (Because once we choose b, we are left with a,a,c and because there are repetitions it is 3!/2!.
All strings starting with caa will be before which is 1
So the final rank is 11.
From GeeksforGeeks:
Given a string, find its rank among all its permutations sorted
lexicographically. For example, rank of “abc” is 1, rank of “acb” is
2, and rank of “cba” is 6.
For simplicity, let us assume that the string does not contain any
duplicated characters.
One simple solution is to initialize rank as 1, generate all
permutations in lexicographic order. After generating a permutation,
check if the generated permutation is same as given string, if same,
then return rank, if not, then increment the rank by 1. The time
complexity of this solution will be exponential in worst case.
Following is an efficient solution.
Let the given string be “STRING”. In the input string, ‘S’ is the
first character. There are total 6 characters and 4 of them are
smaller than ‘S’. So there can be 4 * 5! smaller strings where first
character is smaller than ‘S’, like following
R X X X X X I X X X X X N X X X X X G X X X X X
Now let us Fix S’ and find the smaller strings staring with ‘S’.
Repeat the same process for T, rank is 4*5! + 4*4! +…
Now fix T and repeat the same process for R, rank is 4*5! + 4*4! +
3*3! +…
Now fix R and repeat the same process for I, rank is 4*5! + 4*4! +
3*3! + 1*2! +…
Now fix I and repeat the same process for N, rank is 4*5! + 4*4! +
3*3! + 1*2! + 1*1! +…
Now fix N and repeat the same process for G, rank is 4*5! + 4*4 + 3*3!
+ 1*2! + 1*1! + 0*0!
Rank = 4*5! + 4*4! + 3*3! + 1*2! + 1*1! + 0*0! = 597
Since the value of rank starts from 1, the final rank = 1 + 597 = 598

Non increasing and Non Decreasing Subsequence

Finding non-decreasing subsequence is well known problem.
But this Question is a slight variant of the finding longest non-decreasing subsequence. In this problem we have to find the length of longest subsequence which comprises 2 disjoint sequences 1. non decreasing 2. non-increasing.
e.g. in string "aabcazcczba" longest such sequence is aabczcczba. aabczcczba is made up of 2 disjoint subsequence aabcZccZBA. (capital letter shows non-increasing sequence)
My algorithm is
length = 0
For i = 0 to length of given string S
let s' = find the longest non-decreasing subsequence starting at position i
let s" = find the longest non-increasing subsequence from S-s'.
if (length of s' + length of s") > length
length = (length of s' + length of s")
enter code here
But I am not sure whether this would give correct answer or not. Can you find a bug in this algo and if there is bug also suggest correct algorithm. Also I need to optimize the solution. My algorithm would take roughly o(n^4) steps.
Your solution is definitely incorrect. Eg. addddbc. The longest non-decreasing sequence is adddd, but that would never give you a non-increasing sequence. The optimal solution is abc and dddd ( or ab ddddc, or ac ddddb).
One solution is to use dynamic programming.
F(i, x, a, b) = 1, if there is a non-decreasing and non-increasing combo from first i letters of x ( x[:i]) such that last letter of non-decreasing part is a, and non-increasing part is b. Both of these letters equal to NULL if the corresponding sub-sequence is empty.
Otherwise F(i, x, a, b) = 0.
F(i+1,x,x[i+1],b) = 1 if there exists a and b such that
a<=x[i+1] or a=NULL and F(i,x,a,b)=1. 0 otherwise.
F(i+1,x,a,x[i+1]) = 1 if there exists a and b such that
b>=x[i+1] or b=NULL and F(i,x,a,b)=1. 0 otherwise.
Initialize F(0,x,NULL,NULL)=1 and iterate from i=1..n
As you can see, you can get F(i+1, x, a, b) from F(i, x, a, b). Complexity: Linear in length, polynomial in size of the alphabet.
I got the answer, And here is how it works, thanx to #ElKamina
maintain a table of 27X27 dimension. 27 = (1 Null character + 26 (alphabets))
table[i][j] denotes the length of the sub sequence whose non decreasing subsequence has last character 'i' and non increasing subsequence has last character 'j' (0th index denote null character and kth index denotes character 'k')
for i = 0 to length of string S
//subsequence whose non decreasing subsequence's last character is smaller than S[i], find such a subsequence of maximum length. Now S[i] can be part of this subsequence's non-decreasing part.
int lim = S[i] - 'a' + 1;
for(int k=0; k<27; k++){
if(lim == k) continue;
int tmax = 0;
for(int j=0; j<=lim; j++){
if(table[k][j] > tmax) tmax = table[k][j];
}
if(k == 0 && tmax == 0) table[0][lim] = 1;
else if (tmax != 0) table[k][lim] = tmax + 1;
}
//Simillarly for non-increasing subsequence
Time complexity is o(lengthOf(S)*27*27) and space complexity is o(27*27)

Resources