Which character to append to string in suffix array? - string

I was solving
https://www.spoj.com/problems/BEADS/
above question at SPOJ. I have stated the relevant information below:
Problem Statement:
The description of the necklace is a string A = a1a2 ... am specifying sizes of the particular beads, where the last character am is considered to precede character a1 in circular fashion.
The disjoint point i is said to be worse than the disjoint point j if and only if the string aiai+1 ... ana1 ... ai-1 is lexicografically smaller than the string ajaj+1 ... ana1 ... aj-1. String a1a2 ... an is lexicografically smaller than the string b1b2 ... bn if and only if there exists an integer i, i <= n, so that aj=bj, for each j, 1 <= j < i and ai < bi.
Output:
For each test case, print exactly one line containing only one integer -- number of the bead which is the first at the worst possible disjoining, i.e. such i, that the string A[i] is lexicographically smallest among all the n possible disjoinings of a necklace. If there are more than one solution, print the one with the lowest i.
Now the solution is using SUFFIX ARRAY. Input string s, and concat with itself, s'=s+s ,since I have to sort cyclic suffixes of array. Then create a suffix array on s', and output the smallest index that points to a suffix of original s, i.e., index < len(s).
But there is a problem I face. I was appending '$' character to get SA, but I was getting wrong answer. After looking online, I found 1 solution that had appended '}' to string.
I found that ascii('$') < ascii('a') < ascii('z') < ascii('}')
But i don't understand how this will make a difference, why this is accepted answer and haven;t found a case where this will make a difference. The solution (AC) can be found here:
Link to Code
#include <bits/stdc++.h>
using namespace std;
string s;int n;
bool cmp_init(int a, int b)
{
return s[a]<s[b] || (s[a]==s[b] && a<b);
}
int jmp;
vector<int> pos;
bool cmp(int a, int b)
{
return pos[a]<pos[b] || (pos[a]==pos[b] && pos[(a+jmp)%n]<pos[(b+jmp)%n]);
}
int main() {
int tc;cin>>tc;
while(tc--){
cin>>s;
int m=s.size();
s=s+s+"{";
n=s.size();
vector<int> SA(n,0);
for(int i=0;i<n;i++)SA[i]=i;
sort(SA.begin(), SA.end(), cmp_init);
pos.assign(n,0);
for(int i=1 , c=0;i<n;i++)pos[SA[i]]=(s[SA[i]]==s[SA[i-1]])?c:++c;
for(jmp=1;jmp<=n;jmp*=2)
{
sort(SA.begin(), SA.end(), cmp);
vector<int> tmp(n,0);
for(int i=1 , c=0;i<n;i++)tmp[SA[i]]=(pos[SA[i]]==pos[SA[i-1]] && pos[(SA[i]+jmp)%n]==pos[(SA[i-1]+jmp)%n])?c:++c;
for(int i=0;i<n;i++)pos[i]=tmp[i];
}
for(int i=0;i<n;i++)if(SA[i]<m){cout<<SA[i]+1<<"\n";break;}
}
}
PS.: I have found that SA construction code is correct, only problem is with last character appending. Normally we append '$' in SA construction.

The difference is in the last condition:
If there are more than one solution, print the one with the lowest i.
Consider input "abab".
The correct answer is 0, which you get when you append '}', because "abababab}" is less than all of its suffixes.
If you append '$', you get the wrong answer, because "ab$" < "abab$" < "ababab$" < "abababab$".

Related

find the number of ways you can form a string on size N, given an unlimited number of 0s and 1s

The below question was asked in the atlassian company online test ,I don't have test cases , this is the below question I took from this link
find the number of ways you can form a string on size N, given an unlimited number of 0s and 1s. But
you cannot have D number of consecutive 0s and T number of consecutive 1s. N, D, T were given as inputs,
Please help me on this problem,any approach how to proceed with it
My approach for the above question is simply I applied recursion and tried for all possiblity and then I memoized it using hash map
But it seems to me there must be some combinatoric approach that can do this question in less time and space? for debugging purposes I am also printing the strings generated during recursion, if there is flaw in my approach please do tell me
#include <bits/stdc++.h>
using namespace std;
unordered_map<string,int>dp;
int recurse(int d,int t,int n,int oldd,int oldt,string s)
{
if(d<=0)
return 0;
if(t<=0)
return 0;
cout<<s<<"\n";
if(n==0&&d>0&&t>0)
return 1;
string h=to_string(d)+" "+to_string(t)+" "+to_string(n);
if(dp.find(h)!=dp.end())
return dp[h];
int ans=0;
ans+=recurse(d-1,oldt,n-1,oldd,oldt,s+'0')+recurse(oldd,t-1,n-1,oldd,oldt,s+'1');
return dp[h]=ans;
}
int main()
{
int n,d,t;
cin>>n>>d>>t;
dp.clear();
cout<<recurse(d,t,n,d,t,"")<<"\n";
return 0;
}
You are right, instead of generating strings, it is worth to consider combinatoric approach using dynamic programming (a kind of).
"Good" sequence of length K might end with 1..D-1 zeros or 1..T-1 of ones.
To make a good sequence of length K+1, you can add zero to all sequences except for D-1, and get 2..D-1 zeros for the first kind of precursors and 1 zero for the second kind
Similarly you can add one to all sequences of the first kind, and to all sequences of the second kind except for T-1, and get 1 one for the first kind of precursors and 2..T-1 ones for the second kind
Make two tables
Zeros[N][D] and Ones[N][T]
Fill the first row with zero counts, except for Zeros[1][1] = 1, Ones[1][1] = 1
Fill row by row using the rules above.
Zeros[K][1] = Sum(Ones[K-1][C=1..T-1])
for C in 2..D-1:
Zeros[K][C] = Zeros[K-1][C-1]
Ones[K][1] = Sum(Zeros[K-1][C=1..T-1])
for C in 2..T-1:
Ones[K][C] = Ones[K-1][C-1]
Result is sum of the last row in both tables.
Also note that you really need only two active rows of the table, so you can optimize size to Zeros[2][D] after debugging.
This can be solved using dynamic programming. I'll give a recursive solution to the same. It'll be similar to generating a binary string.
States will be:
i: The ith character that we need to insert to the string.
cnt: The number of consecutive characters before i
bit: The character which was repeated cnt times before i. Value of bit will be either 0 or 1.
Base case will: Return 1, when we reach n since we are starting from 0 and ending at n-1.
Define the size of dp array accordingly. The time complexity will be 2 x N x max(D,T)
#include<bits/stdc++.h>
using namespace std;
int dp[1000][1000][2];
int n, d, t;
int count(int i, int cnt, int bit) {
if (i == n) {
return 1;
}
int &ans = dp[i][cnt][bit];
if (ans != -1) return ans;
ans = 0;
if (bit == 0) {
ans += count(i+1, 1, 1);
if (cnt != d - 1) {
ans += count(i+1, cnt + 1, 0);
}
} else {
// bit == 1
ans += count(i+1, 1, 0);
if (cnt != t-1) {
ans += count(i+1, cnt + 1, 1);
}
}
return ans;
}
signed main() {
ios_base::sync_with_stdio(false), cin.tie(nullptr);
cin >> n >> d >> t;
memset(dp, -1, sizeof dp);
cout << count(0, 0, 0);
return 0;
}

Remove occurrences of substring recursively

Here's a problem:
Given string A and a substring B, remove the first occurence of substring B in string A till it is possible to do so. Note that removing a substring, can further create a new same substring. Ex. removing 'hell' from 'hehelllloworld' once would yield 'helloworld' which after removing once more would become 'oworld', the desired string.
Write a program for the above for input constraints of length 10^6 for A, and length 100 for B.
This question was asked to me in an interview, I gave them a simple algorithm to solve it that was to do exactly what the statement was and remove it iteratievly(to decresae over head calls), I later came to know there's a better solution for it that's much faster what would it be ? I've thought of a few optimizations but it's still not as fast as the fastest soln for the problem(acc. the company), so can anyone tell me of a faster way to solve the problem ?
P.S> I know of stackoverflow rules and that having code is better, but for this problem, I don't think that having code would be in any way beneficial...
Your approach has a pretty bad complexity. In a very bad case the string a will be aaaaaaaaabbbbbbbbb, and the string b will be ab, in which case you will need O(|a|) searches, each taking O(|a| + |b|) (assuming using some sophisticated search algorithm), resulting in a total complexity of O(|a|^2 + |a| * |b|), which with their constraints is years.
For their constraints a good complexity to aim for would be O(|a| * |b|), which is around 100 million operations, will finish in subsecond. Here's one way to approach it. For each position i in the string a let's compute the largest length n_i, such that the a[i - n_i : i] = b[0 : n_i] (in other words, the longest suffix of a at that position which is a prefix of b). We can compute it in O(|a| + |b|) by using Knuth-Morris-Pratt algorithm.
After we have n_i computed, finding the first occurrence of b in a is just a matter of finding the first n_i that is equal to |b|. This will be the right end of one of the occurrences of b in a.
Finally, we will need to modify Knuth-Morris-Pratt slightly. We will be logically removing occurrences of b as soon as we compute an n_i that is equal to |b|. To account for the fact that some letters were removed from a we will rely on the fact that Knuth-Morris-Pratt only relies on the last value of n_i (and those computed for b), and the current letter of a, so we just need a fast way of retrieving the last value of n_i after we logically remove an occurrence of b. That can be done with a deque, that stores all the valid values of n_i. Each value will be pushed into the deque once, and popped from it once, so that complexity of maintaining it is O(|a|), while the complexity of the Knuth-Morris-Pratt is O(|a| + |b|), resulting in O(|a| + |b|) total complexity.
Here's a C++ implementation. It could have some off-by-one errors, but it works on your sample, and it flies for the worst case that I described at the beginning.
#include <deque>
#include <string>
#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;
int main() {
string a, b;
cin >> a >> b;
size_t blen = b.size();
// make a = b$a
a = b + "$" + a;
vector<size_t> n(a.size()); // array for knuth-morris-pratt
vector<bool> removals(a.size()); // positions of right ends at which we remove `b`s
deque<size_t> lastN;
n[0] = 0;
// For the first blen + 1 iterations just do vanilla knuth-morris-pratt
for (size_t i = 1; i < blen + 1; ++ i) {
size_t z = n[i - 1];
while (z && a[i] != a[z]) {
z = n[z - 1];
}
if (a[i] != a[z]) n[i] = 0;
else n[i] = z + 1;
lastN.push_back(n[i]);
}
// For the remaining iterations some characters could have been logically
// removed from `a`, so use lastN to get last value of n instaed
// of actually getting it from `n[i - 1]`
for (size_t i = blen + 1; i < a.size(); ++ i) {
size_t z = lastN.back();
while (z && a[i] != a[z]) {
z = n[z - 1];
}
if (a[i] != a[z]) n[i] = 0;
else n[i] = z + 1;
if (n[i] == blen) // found a match
{
removals[i] = true;
// kill last |b| - 1 `n_i`s
for (size_t j = 0; j < blen - 1; ++ j) {
lastN.pop_back();
}
}
else {
lastN.push_back(n[i]);
}
}
string ret;
size_t toRemove = 0;
for (size_t pos = a.size() - 1; a[pos] != '$'; -- pos) {
if (removals[pos]) toRemove += blen;
if (toRemove) -- toRemove;
else ret.push_back(a[pos]);
}
reverse(ret.begin(), ret.end());
cout << ret << endl;
return 0;
}
[in] hehelllloworld
[in] hell
[out] oworld
[in] abababc
[in] ababc
[out] ab
[in] caaaaa ... aaaaaabbbbbb ... bbbbc
[in] ab
[out] cc

Finding maximum substring that is cyclic equivalent

This is a problem from a programming contest that was held recently.
Two strings a[0..n-1] and b[0..n-1] are called cyclic equivalent if and only if there exists an offset d, such that for all 0 <= i < n, a[i] = b[(i + d) mod n].
Given two strings s[0..L-1] and t[0..L-1] with same length L. You need to find the maximum p such that s[0..p-1] and t[0..p-1] are cyclic equivalent.Print 0 if no such valid p exists.
Input
The first line contains an integer T indicating the number of test cases.
For each test case, there are two lines in total. The first line contains s. The second line contains t.
All strings contain only lower case alphabets.
Output
Output T lines in total. Each line should start with "Case #: " and followed by the maximum p. Here "#" is the number of the test case starting from 1.
Constraints
1 ≤ T ≤ 10
1 ≤ L ≤ 1000000
Example
Input:
2
abab
baba
abab
baac
Output:
Case 1: 4
Case 2: 3
Explanation
Case 1, d can be 1.
Case 2, d can be 2.
My approach :
Generate all substrings of S and T in the from S[0...i], T[0...i] and concatenate S[0...i] with itself and check if T is a substring of S[0...i]+S[0...i]. if it a substring then maximum P = i
bool isCyclic( string s, string t ){
string str = s;
str.append(s);
if( str.find(t) != string::npos )
return true;
return false;
}
int main(){
string s, t;
int t1,l, o=1;
scanf("%d", &t1);
while( t1-- ){
cin>>s>>t;
l = min( s.length(), t.length());
int i, maxP = 0;
for( i=1; i<=l; i++ ){
if( isCyclic(s.substr(0,i), t.substr(0,i)) ){
maxP = i;
}
}
printf("Case %d: %d\n", o++, maxP);
}
return 0;
}
I knew that this not the most optimized approach for this problem since i got Time Limit Exceeded.I came to know that prefix function can be used to get an O(n) algorithm. I dont know about prefix function.Could someone explain the O(n) approach ?
Contest link http://www.codechef.com/ACMKGP14/problems/ACM14KP3

Char Array Returning Integers

I've been working through this exercise, and my output is not what I expect.
(Check substrings) You can check whether a string is a substring of another string
by using the indexOf method in the String class. Write your own method for
this function. Write a program that prompts the user to enter two strings, and
checks whether the first string is a substring of the second.
** My code compromises with the problem's specifications in two ways: it can only display matching substrings to 3 letters, and it cannot work on string literals with less than 4 letters. I mistakenly began writing the program without using the suggested method, indexOf. My program's objective (although it shouldn't entirely deviate from the assignment's objective) is to design a program that determines whether two strings share at least three consecutive letters.
The program's primary error is that it generates numbers instead of char characters. I've run through several, unsuccessful ideas to discover what the logical error is. I first tried to idenfity whether the char characters (which, from my understanding, are underwritten in unicode) were converted to integers, considering that the outputted numbers are also three letters long. Without consulting a reference, I know this isn't true. A comparison between java and javac outputted permutation of 312, and a comparison between abab and ababbab ouputted combinations of 219. j should be > b. My next thought was that the ouputs were indexes of the arrays I used. Once again, this isn't true. A comparison between java and javac would ouput 0, if my reasoning were true.
public class Substring {
public static char [] array;
public static char [] array2;
public static void main (String[]args){
java.util.Scanner input = new java.util.Scanner (System.in);
System.out.println("Enter your two strings here, the longer one preceding the shorter one");
String container1 = input.next();
String container2 = input.next();
char [] placeholder = container1.toCharArray();
char [] placeholder2 = container2.toCharArray();
array = placeholder;
array2 = placeholder2;
for (int i = 0; i < placeholder2.length; i++){
for (int j = 0; j < placeholder.length; j ++){
if (array[j] == array2[i]) matcher(j,i);
}
}
}
public static void matcher(int higher, int lower){
if ((higher < array.length - 2) && (lower < array2.length - 2))
if (( array[higher+1] == array2[lower+1]) && (array[higher+2] == array2[lower+2]))
System.out.println(array[higher] + array[higher+1] + array[higher+2] );
}
}
The + operator promotes shorts, chars, and bytes operands to ints, so
array[higher] + array[higher+1] + array[higher+2]
has type int, not type char which means that
System.out.println(...)
binds to
System.out.println(int)
which displays its argument as a decimal number, instead of binding to
System.out.println(char)
which outputs the given character using the PrintStream's encoding.

Finding the number of permutations for a three letter string with ABC and 123

I know from Algebra class that with ABC and 123 we can make 216 different permutations for a three letter string, right? (6 x 6 x 6) I'd like to create a console program in C++ that displays ever possible permutation for the example above. The thing is, how would I even begin trying to calculate them. Perhaps:
AAA
BAA
CAA
1BA
2BA
3CA
1AB
2BC
3CA
etc.
This is really hard to ask, but what would I have to do to ensure that I include every permutation? I know there are 216 but I don't know how to actually go about going through all of them.
Any suggestions would be greatly appreciated!!!
If you need a fixed-number strings, you can use N nested loops (three in your case).
string parts = "ABC123";
for (int i = 0 ; i != parts.size() ; i++)
for (int j = 0 ; j != parts.size() ; j++)
for (int k = 0 ; k != parts.size() ; k++)
cout << parts[i] << parts[j] << parts[k] << endl;
If N is not fixed, you would need a more general recursive solution.
It's really easy to do using recursion. Provided you have an array of all six elements, here's java code to do it. I am sure you can translate it to C++ easily.
void getAllCombinations(List<String> output, char[] chrs, String prefix, int length) {
if (prefix.length() == length) {
output.add(prefix);
} else {
for (int i = 0;i < chrs.length;i++) {
getAllCombinations(output, chrs, prefix + chrs[i], length);
}
}
return;
}
This is not perfect, but it should give you the general idea.
Run it with parameters: empty list, array of available characters, empty string and length of desired strings.
With three nested loops (one per character position) iterating over each of the 6 allowed characters it's hard not to see that every possibly combination has a corresponding set of loop indices, and that every set of legal loop indices has a corresponding 3 letter string. And that 1-1 correspondence between loop indices and strings is what you're looking for, I gather.

Resources