How to find the lexicographically smallest string by reversing a substring?

How to find the lexicographically smallest string by reversing a substring? - string

I have a string S which consists of a's and b's. Perform the below operation once. Objective is to obtain the lexicographically smallest string.
Operation: Reverse exactly one substring of S
e.g.
if S = abab then Output = aabb (reverse ba of string S)
if S = abba then Output = aabb (reverse bba of string S)
My approach
Case 1: If all characters of the input string are same then output will be the string itself.
Case 2: if S is of the form aaaaaaa....bbbbbb.... then answer will be S itself.
otherwise: Find the first occurence of b in S say the position is i. String S will look like
aa...bbb...aaaa...bbbb....aaaa....bbbb....aaaaa...
|
i
In order to obtain the lexicographically smallest string the substring that will be reversed starts from index i. See below for possible ending j.
aa...bbb...aaaa...bbbb....aaaa....bbbb....aaaaa...
| | | |
i j j j
Reverse substring S[i:j] for every j and find the smallest string.
The complexity of the algorithm will be O(|S|*|S|) where |S| is the length of the string.
Is there a better way to solve this problem? Probably O(|S|) solution.
What I am thinking if we can pick the correct j in linear time then we are done. We will pick that j where number of a's is maximum. If there is one maximum then we solved the problem but what if it's not the case? I have tried a lot. Please help.

So, I came up with an algorithm, that seems to be more efficient that O(|S|^2), but I'm not quite sure of it's complexity. Here's a rough outline:
Strip of the leading a's, storing in variable start.
Group the rest of the string into letter chunks.
Find the indices of the groups with the longest sequences of a's.
If only one index remains, proceed to 10.
Filter these indices so that the length of the [first] group of b's after reversal is at a minimum.
If only one index remains, proceed to 10.
Filter these indices so that the length of the [first] group of a's (not including the leading a's) after reversal is at a minimum.
If only one index remains, proceed to 10.
Go back to 5, except inspect the [second/third/...] groups of a's and b's this time.
Return start, plus the reversed groups up to index, plus the remaining groups.
Since any substring that is being reversed begins with a b and ends in an a, no two hypothesized reversals are palindromes and thus two reversals will not result in the same output, guaranteeing that there is a unique optimal solution and that the algorithm will terminate.
My intuition says this approach of probably O(log(|S|)*|S|), but I'm not too sure. An example implementation (not a very good one albeit) in Python is provided below.
from itertools import groupby
def get_next_bs(i, groups, off):
d = 1 + 2*off
before_bs = len(groups[i-d]) if i >= d else 0
after_bs = len(groups[i+d]) if i <= d and len(groups) > i + d else 0
return before_bs + after_bs
def get_next_as(i, groups, off):
d = 2*(off + 1)
return len(groups[d+1]) if i < d else len(groups[i-d])
def maximal_reversal(s):
# example input: 'aabaababbaababbaabbbaa'
first_b = s.find('b')
start, rest = s[:first_b], s[first_b:]
# 'aa', 'baababbaababbaabbbaa'
groups = [''.join(g) for _, g in groupby(rest)]
# ['b', 'aa', 'b', 'a', 'bb', 'aa', 'b', 'a', 'bb', 'aa', 'bbb', 'aa']
try:
max_length = max(len(g) for g in groups if g[0] == 'a')
except ValueError:
return s # no a's after the start, no reversal needed
indices = [i for i, g in enumerate(groups) if g[0] == 'a' and len(g) == max_length]
# [1, 5, 9, 11]
off = 0
while len(indices) > 1:
min_bs = min(get_next_bs(i, groups, off) for i in indices)
indices = [i for i in indices if get_next_bs(i, groups, off) == min_bs]
# off 0: [1, 5, 9], off 1: [5, 9], off 2: [9]
if len(indices) == 1:
break
max_as = max(get_next_as(i, groups, off) for i in indices)
indices = [i for i in indices if get_next_as(i, groups, off) == max_as]
# off 0: [1, 5, 9], off 1: [5, 9]
off += 1
i = indices[0]
groups[:i+1] = groups[:i+1][::-1]
return start + ''.join(groups)
# 'aaaabbabaabbabaabbbbaa'

TL;DR: Here's an algorithm that only iterates over the string once (with O(|S|)-ish complexity for limited string lengths). The example with which I explain it below is a bit long-winded, but the algorithm is really quite simple:
Iterate over the string, and update its value interpreted as a reverse (lsb-to-msb) binary number.
If you find the last zero of a sequence of zeros that is longer than the current maximum, store the current position, and the current reverse value. From then on, also update this value, interpreting the rest of the string as a forward (msb-to-lsb) binary number.
If you find the last zero of a sequence of zeros that is as long as the current maximum, compare the current reverse value with the current value of the stored end-point; if it is smaller, replace the end-point with the current position.
So you're basically comparing the value of the string if it were reversed up to the current point, with the value of the string if it were only reversed up to a (so-far) optimal point, and updating this optimal point on-the-fly.
Here's a quick code example; it could undoubtedly be coded more elegantly:
function reverseSubsequence(str) {
var reverse = 0, max = 0, first, last, value, len = 0, unit = 1;
for (var pos = 0; pos < str.length; pos++) {
var digit = str.charCodeAt(pos) - 97; // read next digit
if (digit == 0) {
if (first == undefined) continue; // skip leading zeros
if (++len > max || len == max && reverse < value) { // better endpoint found
max = len;
last = pos;
value = reverse;
}
} else {
if (first == undefined) first = pos; // end of leading zeros
len = 0;
}
reverse += unit * digit; // update reverse value
unit <<= 1;
value = value * 2 + digit; // update endpoint value
}
return {from: first || 0, to: last || 0};
}
var result = reverseSubsequence("aaabbaabaaabbabaaabaaab");
document.write(result.from + "→" + result.to);
(The code could be simplified by comparing reverse and value whenever a zero is found, and not just when the end of a maximally long sequence of zeros is encountered.)
You can create an algorithm that only iterates over the input once, and can process an incoming stream of unknown length, by keeping track of two values: the value of the whole string interpreted as a reverse (lsb-to-msb) binary number, and the value of the string with one part reversed. Whenever the reverse value goes below the value of the stored best end-point, a better end-point has been found.
Consider this string as an example:
aaabbaabaaabbabaaabaaab
or, written with zeros and ones for simplicity:
00011001000110100010001
We iterate over the leading zeros until we find the first one:
0001
^
This is the start of the sequence we'll want to reverse. We will start interpreting the stream of zeros and ones as a reversed (lsb-to-msb) binary number and update this number after every step:
reverse = 1, unit = 1
Then at every step, we double the unit and update the reverse number:
0001 reverse = 1
00011 unit = 2; reverse = 1 + 1 * 2 = 3
000110 unit = 4; reverse = 3 + 0 * 4 = 3
0001100 unit = 8; reverse = 3 + 0 * 8 = 3
At this point we find a one, and the sequence of zeros comes to an end. It contains 2 zeros, which is currently the maximum, so we store the current position as a possible end-point, and also store the current reverse value:
endpoint = {position = 6, value = 3}
Then we go on iterating over the string, but at every step, we update the value of the possible endpoint, but now as a normal (msb-to-lsb) binary number:
00011001 unit = 16; reverse = 3 + 1 * 16 = 19
endpoint.value *= 2 + 1 = 7
000110010 unit = 32; reverse = 19 + 0 * 32 = 19
endpoint.value *= 2 + 0 = 14
0001100100 unit = 64; reverse = 19 + 0 * 64 = 19
endpoint.value *= 2 + 0 = 28
00011001000 unit = 128; reverse = 19 + 0 * 128 = 19
endpoint.value *= 2 + 0 = 56
At this point we find that we have a sequence of 3 zeros, which is longer that the current maximum of 2, so we throw away the end-point we had so far and replace it with the current position and reverse value:
endpoint = {position = 10, value = 19}
And then we go on iterating over the string:
000110010001 unit = 256; reverse = 19 + 1 * 256 = 275
endpoint.value *= 2 + 1 = 39
0001100100011 unit = 512; reverse = 275 + 1 * 512 = 778
endpoint.value *= 2 + 1 = 79
00011001000110 unit = 1024; reverse = 778 + 0 * 1024 = 778
endpoint.value *= 2 + 0 = 158
000110010001101 unit = 2048; reverse = 778 + 1 * 2048 = 2826
endpoint.value *= 2 + 1 = 317
0001100100011010 unit = 4096; reverse = 2826 + 0 * 4096 = 2826
endpoint.value *= 2 + 0 = 634
00011001000110100 unit = 8192; reverse = 2826 + 0 * 8192 = 2826
endpoint.value *= 2 + 0 = 1268
000110010001101000 unit = 16384; reverse = 2826 + 0 * 16384 = 2826
endpoint.value *= 2 + 0 = 2536
Here we find that we have another sequence with 3 zeros, so we compare the current reverse value with the end-point's value, and find that the stored endpoint has a lower value:
endpoint.value = 2536 < reverse = 2826
so we keep the end-point set to position 10 and we go on iterating over the string:
0001100100011010001 unit = 32768; reverse = 2826 + 1 * 32768 = 35594
endpoint.value *= 2 + 1 = 5073
00011001000110100010 unit = 65536; reverse = 35594 + 0 * 65536 = 35594
endpoint.value *= 2 + 0 = 10146
000110010001101000100 unit = 131072; reverse = 35594 + 0 * 131072 = 35594
endpoint.value *= 2 + 0 = 20292
0001100100011010001000 unit = 262144; reverse = 35594 + 0 * 262144 = 35594
endpoint.value *= 2 + 0 = 40584
And we find another sequence of 3 zeros, so we compare this position to the stored end-point:
endpoint.value = 40584 > reverse = 35594
and we find it has a smaller value, so we replace the possible end-point with the current position:
endpoint = {position = 21, value = 35594}
And then we iterate over the final digit:
00011001000110100010001 unit = 524288; reverse = 35594 + 1 * 524288 = 559882
endpoint.value *= 2 + 1 = 71189
So at the end we find that position 21 gives us the lowest value, so it is the optimal solution:
00011001000110100010001 -> 00000010001011000100111
^ ^
start = 3 end = 21
Here's a C++ version that uses a vector of bool instead of integers. It can parse strings longer than 64 characters, but the complexity is probably quadratic.
#include <vector>
struct range {unsigned int first; unsigned int last;};
range lexiLeastRev(std::string const &str) {
unsigned int len = str.length(), first = 0, last = 0, run = 0, max_run = 0;
std::vector<bool> forward(0), reverse(0);
bool leading_zeros = true;
for (unsigned int pos = 0; pos < len; pos++) {
bool digit = str[pos] - 'a';
if (!digit) {
if (leading_zeros) continue;
if (++run > max_run || run == max_run && reverse < forward) {
max_run = run;
last = pos;
forward = reverse;
}
}
else {
if (leading_zeros) {
leading_zeros = false;
first = pos;
}
run = 0;
}
forward.push_back(digit);
reverse.insert(reverse.begin(), digit);
}
return range {first, last};
}

Related

Smallest window (substring) that has both uppercase and corresponding lowercase characters

I was asked the following question in an onsite interview:
A string is considered "balanced" when every letter in the string appears both in uppercase and lowercase. For e.g., CATattac is balanced (a, c, t occur in both cases), while Madam is not (a, d only appear in lowercase). Write a function that, given a string, returns the shortest balanced substring of that string. For e.g.,:
“azABaabza” should return “ABaab”
“TacoCat” should return -1 (not balanced)
“AcZCbaBz” should returns the entire string
Doing it with the brute force approach is trivial - calculating all the pairs of substrings and then checking if they are balanced, while keeping track of the size and starting index of the smallest one.
How do I optimize? I have a strong feeling it can be done with a sliding-window/two-pointer approach, but I am not sure how. When to update the pointers of the sliding window?
Edit: Removing the sliding-window tag since this is not a sliding-window problem (as discussed in the comments).

Due to the special property of string. There is only 26 uppercase letters and 26 lowercase letters.
We can loop every 26 letter j and denote the minimum length for any substrings starting from position i to find matches for uppercase and lowercase letter j be len[i][j]
Demo C++ code:
string s = "CATattac";
// if len[i] >= s.size() + 1, it denotes there is no matching
vector<vector<int>> len(s.size(), vector<int>(26, 0));
for (int i = 0; i < 26; ++i) {
int upperPos = s.size() * 2;
int lowerPos = s.size() * 2;
for (int j = s.size() - 1; j >= 0; --j) {
if (s[j] == 'A' + i) {
upperPos = j;
} else if (s[j] == 'a' + i) {
lowerPos = j;
}
len[j][i] = max(lowerPos - j + 1, upperPos - j + 1);
}
}
We also keep track of the count of characters.
// cnt[i][j] denotes the number of characters j in substring s[0..i-1]
// cnt[0][j] is always 0
vector<vector<int>> cnt(s.size() + 1, vector<int>(26, 0));
for (int i = 0; i < s.size(); ++i) {
for (int j = 0; j < 26; ++j) {
cnt[i + 1][j] = cnt[i][j];
if (s[i] == 'A' + j || s[i] == 'a' + j) {
++cnt[i + 1][j];
}
}
}
Then we can loop over s.
int m = s.size() + 1;
for (int i = 0; i < s.size(); ++i) {
bool done = false;
int minLen = 1;
while (!done && i + minLen <= s.size()) {
// execute at most 26 times, a new character must be added to change minLen
int prevMinLen = minLen;
done = true;
for (int j = 0; j < 26 && i + minLen <= s.size(); ++j) {
if (cnt[i + minLen][j] - cnt[i][j] > 0) {
// character j exists in the substring, have to find pair of it
minLen = max(minLen, len[i][j]);
}
}
if (prevMinLen != minLen) done = false;
}
// find overall minLen
if (i + minLen <= s.size())
m = min(m, minLen);
cout << minLen << '\n';
}
Output: (if i + minLen <= s.size(), it is valid. Otherwise substring doesn't exist if starting at that position)
The invalid output difference is due to how the array len is generated.
8
4
15
14
13
12
11
10
I'm not sure whether there is a simpler solution but it is the best I could think of right now.
Time complexity: O(N) with a constant of 26 * 26

Edit: I previously had O(nlog(n)) due to a unnecessary binary search.
I thought of a solution, which is technically O(n), where n is the length of the string, but the constant is pretty large.
For simplicity's sake, let's consider an analogous situation with only two letters, A and B (and their lowercase counterparts), and let l be the size of the alphabet for future reference. I worked on an example string ABabBaaA.
We start by computing the prefix counts of the number of occurrences of each letter. In this case, we get
i: 0, 1, 2, 3, 4, 5, 6, 7, 8
----------------------------
A: 0, 1, 1, 1, 1, 1, 1, 1, 2
a: 0, 0, 0, 1, 1, 1, 2, 3, 3
B: 0, 0, 1, 1, 1, 2, 2, 2, 2
b: 0, 0, 0, 0, 1, 1, 1, 1, 1
This way, assuming we are indexing the string starting from 1 (for implementation's sake you can add an extra character to the beginning, like a dollar sign $), we can get the number of occurrences of each letter on any substring in constant time (or rather -- in O(l), but in my case l is set to 2 and in your case l = 26 so technically this is constant time).
OK now we prepare arrays / vectors / queues of character indices, so if the character A appears on indices 1 and 8, the structure will consist of 1 and 8. We get
A: 1, 8
a: 3, 6, 7
B: 2, 5
b: 4
What is important, is that in arrays and vectors, we can look up certain "lowest element greater than" in amortized constant time by discarding indices which are smaller than every index one by one.
Now, the algorithm. Starting at each (left) index greater than 0, we will find the earliest right index for which the substring bound by [left_index, right_index] is balanced. We do that as follows:
Start with left_index = right_index = i for i = 1, ..., n.
Read the array of prefix counts for right_index and subtract the prefix counts for left_index - 1 receiving the counts for the substring [left_index, right_index]. Find any letter, which fails the "balance" check. If there is none, you found the shortest balanced substring starting at left_index.
Find the first occurrence of the "missing" letter, greater than left_index. Set right_index to the index of that occurrence. Go to step 1 keeping the modified right_index.
For example: starting with left_index = right_index = 1 we see that the number of occurrences of each letter in the substring is 1, 0, 0, 0, so a fails the check. The earliest occurrence of a is 3, so we set right_index = 3. We go back to step 1 receiving a new array of occurrences: 1, 1, 1, 0. Now b fails the check, and its earliest occurrence greater than 1 is 4, so we set right_index to 4. We go to step 1 receiving an array of occurrences 1, 1, 1, 1, which passes the balance check.
Another example: starting with left_index = right_index = 2 we get in step 1 an array of occurrences 0, 0, 1, 0. Now b fails the check. The earliest occurrence of b greater than left_index is 4, so we set right_index to 4. Now we get an array of occurrences 0, 1, 1, 1, so A fails the check. The earliest occurrence of A greater than left_index is 8, so we set right_index to that. Now, the array of occurrences is 2-1, 3-0, 2-0, 1-0, which is 1, 3, 2, 1 and it passes the balance check.
Ultimately we will find the shortest balanced substring to be bB with left_index = 4.
The complexity of this algorithm is O(nl^2) because: we start at n different indices and we perform a maximum of l lookups (for l different letters which can fail the check) in O(1). For each lookup, we have to calculate l differences of prefix sums. But as l is constant (albeit it may be large, like 26), this simplifies to O(n).

I'm using a recursive approach to this; I'm not sure what it's time complexity is though.
The idea is we check what characters in the string are present in both their lower and upper form formats. For any characters that aren't given in both forms, we replace them with a space ' '. We then split the remaining string on ' ' into a list.
In the first case, if we have only one string left after it- we return it's length.
In the second case, if we have no characters left, we return -1.
In the third case, if we have more than one string left, we re-evaluate each of the strings sub-lengths and return the length of the longest string we then evaluate.
from collections import Counter
def findMutual(s):
lower = dict(Counter( [x for x in s if x.lower() == x] ))
upper = dict(Counter( [x for x in s if x.upper() == x] ))
mutual = {}
for charr in lower:
if charr.upper() in upper:
mutual[charr] = upper[charr.upper()] + lower[charr]
matching_charrs = ''.join([x if x.lower() in mutual else ' ' for x in s ]).split()
print(s)
print(matching_charrs)
return matching_charrs
def smallestSubstring(s):
matching_charrs = findMutual(s)
if len(matching_charrs) == 1:
return(len(matching_charrs[0]))
elif len(matching_charrs) == 0:
return(-1)
else:
list_lens = []
for i in matching_charrs:
list_lens.append(smallestSubstring(i))
return max(list_lens)
print(smallestSubstring('azABaabza'))
print(smallestSubstring('dAcZCbaBz'))
print(smallestSubstring('TacoCat'))
print(smallestSubstring('Tt'))
print(smallestSubstring('T'))
print(smallestSubstring('TaCc'))

Sequence Of Zero

Consider the sequence of numbers from 1 to 𝑁. For example, for 𝑁 = 9,
we have 1, 2, 3, 4, 5, 6, 7, 8, 9.
Now, place among the numbers one of the three following operators:
"+" sum
"-" subtraction
"#" Paste Operator --> paste the previous and the next operands.
For example, 1#2 = 12
How can I calculate the number of possible sequences that yield zero ?
Example for N = 7:
1+2-3+4-5-6+7
1+2-3-4+5+6-7
1-2#3+4+5+6+7
1-2#3-4#5+6#7
1-2+3+4-5+6-7
1-2-3-4-5+6+7
See the fourth sequence, it is same as 1-23-45+67 and the result is 0.
All of the above sequences evaluate to zero.

Here is my recursion based solution just to build your intuition so that you can approach and improve this solution using dynamic programming on your own (implemented in c++):
// N is the input
// index_count is the index count in the given sequence
// sum is the total sum of a given sequence
int isEvaluteToZero(int N, int index_count, int sum){
// if N==1, then the sequence only contains 1 which is not 0, so return 0
if(N==1){
return 0;
}
// Base case
// if index_count is equal to N and total sum is 0, return 1, else 0
if(index_count==N){
if(sum==0){
return 1;
}
return 0;
}
// recursively call by considering '+' between index_count and index_count+1
// increase index_count by 1
int placeAdd = isEvaluteToZero(N, index_count+1, sum+index_count+1);
// recursively call by considering '-' between index_count and index_count+1
// increase index_count by 1
int placeMinus = isEvaluteToZero(N, index_count+1, sum-index_count-1);
// place '#'
int placePaste;
if(index_count+2<=N){
// paste the previous and the next operands
// For e.g., (8#9) = 8*(10^1)+9 = 89
// (9#10) = 9*(10^2)+10 = 910
// (99#100) = 99*(10^3)+100 = 99100
// (999#1000) = 999*(10^4)+1000 = 9991000
int num1 = index_count+1;
int num2 = index_count+2;
int concat_num = num1*(int)(pow(10, (int)num2/10 + 1) + 0.5)+num2;
placePaste = isEvaluteToZero(N, index_count+2, sum+concat_num) + isEvaluteToZero(N, index_count+2, sum-concat_num);
}else{
// in case index_count+2>N
placePaste = 0;
}
return (placeAdd+placeMinus+placePaste);
}
int main(){
int N, res=1, index_count=1;
cout<<"Enter N:";
cin>>N;
cout<<isEvaluteToZero(N, index_count, res)<<endl;
return 0;
}
output:
N=1 output=0
N=2 output=0
N=3 output=1
N=4 output=1
N=7 output=6

Find the number of subsequences of a n-digit number, that are divisible by 8

Given n = 1 to 10^5, stored as a string in decimal format.
Example: If n = 968, then out of all subsequences i.e 9, 6, 8, 96, 68, 98, 968 there are 3 sub-sequences of it, i.e 968, 96 and 8, that are divisible by 8. So, the answer is 3.
Since the answer can be very large, print the answer modulo (10^9 + 7).

You can use dynamic programming. Let f(len, sum) be the number of subsequences of the prefix of length len such that their sum is sum modulo 8 (sum ranges from 0 to 7).
The value of f for len = 1 is obvious. The transitions go as follows:
We can start a new subsequence in the new position: f(len, a[i] % 8) += 1.
We can continue any subsequence from the shorter prefix:
for old_sum = 0..7
f(len, (old_sum * 10 + a[i]) % 8) += f(len - 1, old_sum) // take the new element
f(len, old_sum) += f(len - 1, old_sum) // ignore the new element
Of course, you can perform all computations module 10^9 + 7 and use a standard integer type.
The answer is f(n, 0) (all elements are taken into account and the sum modulo 8 is 0).
The time complexity of this solution is O(n) (as there are O(n) states and 2 transition from each of them).
Note: if the numbers can't have leading zeros, you can just one more parameter to the state: a flag that indicates whether the first element of the subsequence is zero (this sequences should never be extended). The rest of the solution stays the same.

Note: This answer assumes you mean contiguous subsequences.
The divisibility rule for a number to be divisible by 8 is if the last three digits of the number are divisible by 8. Using this, a simple O(n) algorithm can be obtained where n is the number of digits in the number.
Let N=a_0a_1...a_(n-1) be the decimal representation of N with n digits.
Let the number of sequences so far be s = 0
For each set of three digits, a_i a_(i+1) a_(i+2), check if the number is divisible by 8. If so, add i + 1 to the number of sequences, i.e., s = s + i. This is because all strings a_k..a_(i+2) will be divisible by 8 for k ranging from 0..i.
Loop i from 0 to n-2-1 and continue.
So, if you have 1424968, the subsequences divisible are at:
i=1 (424 yielding i+1 = 2 numbers: 424 and 1424)
i=3 (496 yielding i+1 = 4 numbers: 496, 2496, 42496, 142496)
i=4 (968 yielding i+1 = 5 numbers: 968, 4968, 24968, 424968, 1424968)
Note that some small modifications will be needed to consider numbers lesser than three digits in length.
Hence the total number of sequences = 2 + 4 + 5 = 11. Total complexity = O(n) where n is the number of digits.

One can use the fact that for any three-digit number abc the following holds:
abc % 8 = ((ab % 8) * 10 + c) % 8
Or in other words: the test for a number with a fixed start-index can be cascaded:
int div8(String s){
int total = 0, mod = 0;
for(int i = 0; i < s.length(); i++)
{
mod = (mod * 10 + s.charAt(i) - '0') % 8
if(mod == 0)
total++;
}
return total;
}
But we don't have fixed start-indices!
Well, that's pretty easy to fix:
Suppose two sequences a and b, such that int(a) % 8 = int(b) % 8 and b is a suffix of a. No matter what how the sequence continues, the modulos of a and b will always remain equal. Thus it's sufficient to keep track of the number of sequences that share the property of having an equal value modulo 8.
final int RESULTMOD = 1000000000 + 7;
int div8(String s){
int total = 0;
//modtable[i] is the number of subsequences with int(sequence) % 8 = i
int[] modTable = new int[8];
for(int i = 0; i < s.length(); i++){
int[] nextTable = new int[8];
//transform table from last loop-run (shared modulo)
for(int j = 0; j < 8; j++){
nextTable[(j * 10 + s.charAt(i) - '0') % 8] = modTable[j] % RESULTMOD;
}
//add the sequence that starts at this index to the appropriate bucket
nextTable[(s.charAt(i) - '0') % 8]++;
//add the count of all sequences with int(sequence) % 8 = 0 to the result
total += nextTable[0];
total %= RESULTMOD;
//table for next run
modTable = nextTable;
}
return total;
}
Runtime is O(n).

There are 10 possible states a subsequence can be in. The first is empty. The second is that there was a leading 0. And the other 8 are a ongoing number that is 0-7 mod 8. You start at the beginning of the string with 1 way of being empty, no way to be anything else. At the end of the string your answer is the number of ways to have a leading 0 plus an ongoing number that is 0 mod 8.
The transition table should be obvious. The rest is just normal dynamic programming.

Find non-unique characters in a given string in O(n) time with constant space i.e with no extra auxiliary array

Given a string s containing only lower case alphabets (a - z), find (i.e print) the characters that are repeated.
For ex, if string s = "aabcacdddec"
Output: a c d
3 approaches to this problem exists:
[brute force] Check every char of string (i.e s[i] with every other char and print if both are same)
Time complexity: O(n^2)
Space complexity: O(1)
[sort and then compare adjacent elements] After sorting (in O(n log(n) time), traverse the string and check if s[i] ans s[i + 1] are equal
Time complexity: O(n logn) + O(n) = O(n logn)
Space complexity: O(1)
[store the character count in an array] Create an array of size 26 (to keep track of a - z) and for every s[i], increment value stored at index = s[i] - 26 in the array. Finally traverse the array and print all elements (i.e 'a' + i) with value greater than 1
Time complexity: O(n)
Space complexity: O(1) but we have a separate array for storing the frequency of each element.
Is there a O(n) approach that DOES NOT use any array/hash table/map (etc)?
HINT: Use BIT Vectors

This is the element distinctness problem, so generally speaking - no there is no way to solve it in O(n) without extra space.
However, if you regard the alphabet as constant size (a-z characters only is pretty constant) you can either create a bitset of these characters, in O(1) space [ it is constant!] or check for each character in O(n) if it repeats more than once, it will be O(constant*n), which is still in O(n).
Pseudo code for 1st solution:
bit seen[] = new bit[SIZE_OF_ALPHABET] //contant!
bit printed[] = new bit[SIZE_OF_ALPHABET] //so is this!
for each i in seen.length: //init:
seen[i] = 0
printed[i] = 0
for each character c in string: //traverse the string:
i = intValue(c)
//already seen it and didn't print it? print it now!
if seen[i] == 1 and printed[i] == 0:
print c
printed[i] = 1
else:
seen[i] = 1
Pseudo code for 2nd solution:
for each character c from a-z: //constant number of repeats is O(1)
count = 0
for each character x in the string: //O(n)
if x==c:
count += 1
if count > 1
print count

Implementation in Java
public static void findDuplicate(String str) {
int checker = 0;
char c = 'a';
for (int i = 0; i < str.length(); ++i) {
int val = str.charAt(i) - c;
if ((checker & (1 << val)) > 0) {
System.out.println((char)(c+val));
}else{
checker |= (1 << val);
}
}
}
Uses as int as storage and performs bit wise operator to find the duplicates.
it is in O(n) .. explanation follows
Input as "abddc"
i==0
STEP #1 : val = 98 - 98 (0) str.charAt(0) is a and conversion char to int is 98 ( ascii of 'a')
STEP #2 : 1 << val equal to ( 1 << 0 ) equal to 1 finally 1 & 0 is 0
STEP #3 : checker = 0 | ( 1 << 0) equal to 0 | 1 equal to 1 checker is 1
i==1
STEP #1 : val = 99 - 98 (1) str.charAt(1) is b and conversion char to int is 99 ( ascii of 'b')
STEP #2 : 1 << val equal to ( 1 << 1 ) equal to 2 finally 1 & 2 is 0
STEP #3 : checker = 2 | ( 1 << 1) equal to 2 | 1 equal to 2 finally checker is 2
i==2
STEP #1 : val = 101 - 98 (3) str.charAt(2) is d and conversion char to int is 101 ( ascii of 'd')
STEP #2 : 1 << val equal to ( 1 << 3 ) equal to 8 finally 2 & 8 is 0
STEP #3 : checker = 2 | ( 1 << 3) equal to 2 | 8 equal to 8 checker is 8
i==3
STEP #1 : val = 101 - 98 (3) str.charAt(3) is d and conversion char to int is 101 ( ascii of 'd')
STEP #2 : 1 << val equal to ( 1 << 3 ) equal to 8 finally 8 & 8 is 8
Now print 'd' since the value > 0
You can also use the Bit Vector, depends upon the language it would space efficient. In java i would prefer to use int for this fixed ( just 26) constant case

The size of the character set is a constant, so you could scan the input 26 times. All you need is a counter to store the number of times you've seen the character corresponding to the current iteration. At the end of each iteration, print that character if your counter is greater than 1.
It's O(n) in runtime and O(1) in auxiliary space.

Implementation in C# (recursive solution)
static void getNonUniqueElements(string s, string nonUnique)
{
if (s.Count() > 0)
{
char ch = s[0];
s = s.Substring(1);
if (s.LastIndexOf(ch) > 0)
{
if (nonUnique.LastIndexOf(ch) < 0)
nonUnique += ch;
}
getNonUniqueElements(s, nonUnique);
}
else
{
Console.WriteLine(nonUnique);
return;
}
}
static void Main(string[] args)
{
getNonUniqueElements("aabcacdddec", "");
Console.ReadKey();
}

Modified longest common substring

Given two strings what is an efficient algorithm to find the number and length of longest common sub-strings with the sub-strings being called common if :
1) they have at-least x% characters same and at same position.
2) the start and end indexes of the sub-strings being same.
Ex :
String 1 -> abedefkhj
String 2 -> kbfdfjhlo
suppose the x% being asked is 40,then, ans is,
5 1
where 5 is the longest length and 1 is the number of sub-strings in each string satisfying the given property. Sub-String is "abede" in string 1 and "kbfdf" in string 2.

You can use smth like Levenshtein distance without deleting and inserting.
Build the table, where every element [i, j] is error for substring from position [i] to position [j].
foo(string a, string b, int x):
len = min(a.length, b.length)
error[0][0] = 0 if a[0] == b[0] else 1;
for (end: [1 -> len-1]):
for (start: [end -> 0]):
if a[end] == b[end]:
error[start][end] = error[start][end - 1]
else:
error[start][end] = error[start][end - 1] + 1
best_len = 0;
best_pos = 0;
for (i: [0 -> len-1]):
for (j: [i -> 0]):
len = i - j + 1
error_percent = 100 * error[i][j] / len
if (error_percent <= x and len > best_len):
best_len = len
best_pos = j
return (best_len, best_pos)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to find the lexicographically smallest string by reversing a substring? - string

Related

Smallest window (substring) that has both uppercase and corresponding lowercase characters

Sequence Of Zero

Find the number of subsequences of a n-digit number, that are divisible by 8

Find non-unique characters in a given string in O(n) time with constant space i.e with no extra auxiliary array

Modified longest common substring

Categories

Resources