How to find the longest continuous sub-string in a string? - string

For example, there is a given string which is consisted of 1s and 0s:
s = "00000000001111111111100001111111110000";
What is the efficient way to get the count of longest 1s substring in s? (11)
What is the efficient way to get the count of longest 0s substring in s? (10)
I appreciate the question would be answered from an algorithmic perspective.

I think the most straight-forward way is to walk through the bit-string while recording the max lengths for all 0 and all 1 sub-strings. This is of O (n) complexity as suggested by others.
If you can afford some sort of a data-parallel computation, you might want to look at parallel patterns as explained here. Specifically, take a look at parallel reduction. I think this problem can be implemented in O (log n) time if you can afford one of those methods.
I'm trying to think of a parallel reduction for this problem:
On the first level of the reduction, each thread will process chunks of 8 bit strings (depending on the number of threads you have and the length of the string) and produce a summary of the bit string like: 0 -> x, 1 -> y, 0 -> z, ....
On the next level each thread will merge two of these summaries into one, any possible joins will be performed at this phase (basically, if the previous summary ended with a 0 (1) and the next summary begins with a 0 (1), then the last entry and the first entry of the two summaries can be collapsed into one).
On the top level there will be just one structure with the overall summary of the bit string, which you'll have to step through to figure out the largest sequences (but this time they are all in summary form, so it should be faster). Or, you can make each summary structure keep track of the larges 0 and 1 sub-strings, this will make it unnecessary to walk through the final structure.
I guess this approach only makes sense in a very limited scope, but since you seem to be very keen on getting better than O (n)...

OK, here is one solution I come up with, I'm not sure whether this is bug-free. Correct me if you discover a bug or suggest a better way to do it. Vote it if you agree with this solution. Thanks!
#include <iostream>
using namespace std;
int main(){
int s[] = {0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0};
int length = sizeof(s) / sizeof(s[0]);
int one_start = 0;
int one_n = 0;
int max_one_n = 0;
int zero_start = 0;
int zero_n = 0;
int max_zero_n = 0;
for(int i=0; i<length; i++){
// Calculate 1s
if(one_start==0 && s[i]==1){
one_start = 1;
one_n++;
}
else if(one_start==1 && s[i]==1){
one_n++;
}
else if(one_start==1 && s[i]==0){
one_start = 0;
if(one_n > max_one_n){
max_one_n = one_n;
}
one_n = 0; // Reset
}
// Calculate 0s
if(zero_start==0 && s[i]==0){
zero_start = 1;
zero_n++;
}
else if(zero_start==1 && s[i]==0){
zero_n++;
}
else if(one_start==1 && s[i]==1){
zero_start = 0;
if(zero_n > max_zero_n){
max_zero_n = zero_n;
}
zero_n = 0; // Reset
}
}
if(one_n > max_one_n){
max_one_n = one_n;
}
if(zero_n > max_zero_n){
max_zero_n = zero_n;
}
cout << "max_one_n: " << max_one_n << endl;
cout << "max_zero_n: " << max_zero_n << endl;
return 0;
}

Worst case is always O(n), you can always find input which forces the algorithm to check every bit.
But you can probably get average slightly better than that (more simply if you scan just for 0 or 1, not both), because you can skip the length of currently found longest sequence and scan backwards. At the very least this will reduce the constant factor of O(n), but at least with random input, more items also means longer sequences, and thus longer and longer skips. But the difference to O(n) will not be much...

Related

find the number of ways you can form a string on size N, given an unlimited number of 0s and 1s

The below question was asked in the atlassian company online test ,I don't have test cases , this is the below question I took from this link
find the number of ways you can form a string on size N, given an unlimited number of 0s and 1s. But
you cannot have D number of consecutive 0s and T number of consecutive 1s. N, D, T were given as inputs,
Please help me on this problem,any approach how to proceed with it
My approach for the above question is simply I applied recursion and tried for all possiblity and then I memoized it using hash map
But it seems to me there must be some combinatoric approach that can do this question in less time and space? for debugging purposes I am also printing the strings generated during recursion, if there is flaw in my approach please do tell me
#include <bits/stdc++.h>
using namespace std;
unordered_map<string,int>dp;
int recurse(int d,int t,int n,int oldd,int oldt,string s)
{
if(d<=0)
return 0;
if(t<=0)
return 0;
cout<<s<<"\n";
if(n==0&&d>0&&t>0)
return 1;
string h=to_string(d)+" "+to_string(t)+" "+to_string(n);
if(dp.find(h)!=dp.end())
return dp[h];
int ans=0;
ans+=recurse(d-1,oldt,n-1,oldd,oldt,s+'0')+recurse(oldd,t-1,n-1,oldd,oldt,s+'1');
return dp[h]=ans;
}
int main()
{
int n,d,t;
cin>>n>>d>>t;
dp.clear();
cout<<recurse(d,t,n,d,t,"")<<"\n";
return 0;
}
You are right, instead of generating strings, it is worth to consider combinatoric approach using dynamic programming (a kind of).
"Good" sequence of length K might end with 1..D-1 zeros or 1..T-1 of ones.
To make a good sequence of length K+1, you can add zero to all sequences except for D-1, and get 2..D-1 zeros for the first kind of precursors and 1 zero for the second kind
Similarly you can add one to all sequences of the first kind, and to all sequences of the second kind except for T-1, and get 1 one for the first kind of precursors and 2..T-1 ones for the second kind
Make two tables
Zeros[N][D] and Ones[N][T]
Fill the first row with zero counts, except for Zeros[1][1] = 1, Ones[1][1] = 1
Fill row by row using the rules above.
Zeros[K][1] = Sum(Ones[K-1][C=1..T-1])
for C in 2..D-1:
Zeros[K][C] = Zeros[K-1][C-1]
Ones[K][1] = Sum(Zeros[K-1][C=1..T-1])
for C in 2..T-1:
Ones[K][C] = Ones[K-1][C-1]
Result is sum of the last row in both tables.
Also note that you really need only two active rows of the table, so you can optimize size to Zeros[2][D] after debugging.
This can be solved using dynamic programming. I'll give a recursive solution to the same. It'll be similar to generating a binary string.
States will be:
i: The ith character that we need to insert to the string.
cnt: The number of consecutive characters before i
bit: The character which was repeated cnt times before i. Value of bit will be either 0 or 1.
Base case will: Return 1, when we reach n since we are starting from 0 and ending at n-1.
Define the size of dp array accordingly. The time complexity will be 2 x N x max(D,T)
#include<bits/stdc++.h>
using namespace std;
int dp[1000][1000][2];
int n, d, t;
int count(int i, int cnt, int bit) {
if (i == n) {
return 1;
}
int &ans = dp[i][cnt][bit];
if (ans != -1) return ans;
ans = 0;
if (bit == 0) {
ans += count(i+1, 1, 1);
if (cnt != d - 1) {
ans += count(i+1, cnt + 1, 0);
}
} else {
// bit == 1
ans += count(i+1, 1, 0);
if (cnt != t-1) {
ans += count(i+1, cnt + 1, 1);
}
}
return ans;
}
signed main() {
ios_base::sync_with_stdio(false), cin.tie(nullptr);
cin >> n >> d >> t;
memset(dp, -1, sizeof dp);
cout << count(0, 0, 0);
return 0;
}

Maximum repeating substring of size n

Find the substring of length n that repeats a maximum number of times in a given string.
Input: abbbabbbb# 2
Output: bb
My solution:
public static String mrs(String s, int m) {
int n = s.length();
String[] suffixes = new String[n-m+1];
for (int i = 0; i < n-m+1; i++) {
suffixes[i] = s.substring(i, i+m);
}
Arrays.sort(suffixes);
String ans = "", tmp=suffixes[0].substring(0,m);
int cnt = 1, max=0;
for (int i = 0; i < n-m; i++) {
if (suffixes[i].equals(suffixes[i+1])){
cnt++;
}else{
if(cnt>max){
max = cnt;
ans =tmp;
}
cnt=0;
tmp = suffixes[i];
}
}
return ans;
}
Can it be done better than the above O(nm) time and O(n) space solution?
For a string of length L and a given length k (not to mess up with n and m which the question interchanges at times), we can compute polynomial hashes of all substrings of length k in O(L) (see Wikipedia for some elaboration on this subproblem).
Now, if we map the hash values to the number of times they occur, we get the value which occurs most frequently in O(L) (with a HashMap with high probability, or in O(L log L) with a TreeMap).
After that, just take the substring which got the most frequent hash as the answer.
This solution does not take hash collisions into account.
The idea is to just reduce the probability of collisions enough for the application (if it's too high, use multiple hashes, for example).
If the application demands that we absolutely never give a wrong answer, we can check the answer in O(L) with another algorithm (KMP, for example), and re-run the whole solution with a different hash function as long as the answer turns out to be wrong.

Counter for two binary strings C++

I am trying to count two binary numbers from string. The maximum number of counting digits have to be 253. Short numbers works, but when I add there some longer numbers, the output is wrong. The example of bad result is "10100101010000111111" with "000011010110000101100010010011101010001101011100000000111000000000001000100101101111101000111001000101011010010111000110".
#include <iostream>
#include <stdlib.h>
using namespace std;
bool isBinary(string b1,string b2);
int main()
{
string b1,b2;
long binary1,binary2;
int i = 0, remainder = 0, sum[254];
cout<<"Get two binary numbers:"<<endl;
cin>>b1>>b2;
binary1=atol(b1.c_str());
binary2=atol(b2.c_str());
if(isBinary(b1,b2)==true){
while (binary1 != 0 || binary2 != 0){
sum[i++] =(binary1 % 10 + binary2 % 10 + remainder) % 2;
remainder =(binary1 % 10 + binary2 % 10 + remainder) / 2;
binary1 = binary1 / 10;
binary2 = binary2 / 10;
}
if (remainder != 0){
sum[i++] = remainder;
}
--i;
cout<<"Result: ";
while (i >= 0){
cout<<sum[i--];
}
cout<<endl;
}else cout<<"Wrong input"<<endl;
return 0;
}
bool isBinary(string b1,string b2){
bool rozhodnuti1,rozhodnuti2;
for (int i = 0; i < b1.length();i++) {
if (b1[i]!='0' && b1[i]!='1') {
rozhodnuti1=false;
break;
}else rozhodnuti1=true;
}
for (int k = 0; k < b2.length();k++) {
if (b2[k]!='0' && b2[k]!='1') {
rozhodnuti2=false;
break;
}else rozhodnuti2=true;
}
if(rozhodnuti1==false || rozhodnuti2==false){ return false;}
else{ return true;}
}
One of the problems might be here: sum[i++]
This expression, as it is, first returns the value of i and then increases it by one.
Did you do it on purporse?
Change it to ++i.
It'd help if you could also post the "bad" output, so that we can try to move backward through the code starting from it.
EDIT 2015-11-7_17:10
Just to be sure everything was correct, I've added a cout to check what binary1 and binary2 contain after you assing them the result of the atol function: they contain the integer numbers 547284487 and 18333230, which obviously dont represent the correct binary-to-integer transposition of the two 01 strings you presented in your post.
Probably they somehow exceed the capacity of atol.
Also, the result of your "math" operations bring to an even stranger result, which is 6011111101, which obviously doesnt make any sense.
What do you mean, exactly, when you say you want to count these two numbers? Maybe you want to make a sum? I guess that's it.
But then, again, what you got there is two signed integer numbers and not two binaries, which means those %10 and %2 operations are (probably) misused.
EDIT 2015-11-07_17:20
I've tried to use your program with small binary strings and it actually works; with small binary strings.
It's a fact(?), at this point, that atol cant handle numerical strings that long.
My suggestion: use char arrays instead of strings and replace 0 and 1 characters with numerical values (if (bin1[i]){bin1[i]=1;}else{bin1[i]=0}) with which you'll be able to perform all the math operations you want (you've already written a working sum function, after all).
Once done with the math, you can just convert the char array back to actual characters for 0 and 1 and cout it on the screen.
EDIT 2015-11-07_17:30
Tested atol on my own: it correctly converts only strings that are up to 10 characters long.
Anything beyond the 10th character makes the function go crazy.

Finding the number of permutations for a three letter string with ABC and 123

I know from Algebra class that with ABC and 123 we can make 216 different permutations for a three letter string, right? (6 x 6 x 6) I'd like to create a console program in C++ that displays ever possible permutation for the example above. The thing is, how would I even begin trying to calculate them. Perhaps:
AAA
BAA
CAA
1BA
2BA
3CA
1AB
2BC
3CA
etc.
This is really hard to ask, but what would I have to do to ensure that I include every permutation? I know there are 216 but I don't know how to actually go about going through all of them.
Any suggestions would be greatly appreciated!!!
If you need a fixed-number strings, you can use N nested loops (three in your case).
string parts = "ABC123";
for (int i = 0 ; i != parts.size() ; i++)
for (int j = 0 ; j != parts.size() ; j++)
for (int k = 0 ; k != parts.size() ; k++)
cout << parts[i] << parts[j] << parts[k] << endl;
If N is not fixed, you would need a more general recursive solution.
It's really easy to do using recursion. Provided you have an array of all six elements, here's java code to do it. I am sure you can translate it to C++ easily.
void getAllCombinations(List<String> output, char[] chrs, String prefix, int length) {
if (prefix.length() == length) {
output.add(prefix);
} else {
for (int i = 0;i < chrs.length;i++) {
getAllCombinations(output, chrs, prefix + chrs[i], length);
}
}
return;
}
This is not perfect, but it should give you the general idea.
Run it with parameters: empty list, array of available characters, empty string and length of desired strings.
With three nested loops (one per character position) iterating over each of the 6 allowed characters it's hard not to see that every possibly combination has a corresponding set of loop indices, and that every set of legal loop indices has a corresponding 3 letter string. And that 1-1 correspondence between loop indices and strings is what you're looking for, I gather.

Time Complexity of Ternary Search Algorithm

I have an assignment that wants me to write an ternary search algorithm and compute its time complexity afterwards. I was able to write an algorithm for it but I couldn't come up with any ideas how to compute its complexity. I think I didn't understand the concept of big-theta notation.
Here is my code: It works like binary search but only divides the list into there pieces and continues the search like that.
*some list which contains n increasingly-ordered integers;*
int num;
int min = 1;
int max = n;
int middle1 = (2*min+max)/3;
int middle2 = (min+2*max)/3;
cin >> num; //num is the number that is wanted to be found
while (middle1 != middle2)
{
middle1 = (2*min+max)/3;
middle2 = (min+2*max)/3;
if(num <= list[middle1])
max = middle1;
else if(num >list[middle1] && num <= list[middle2])
{
min= middle1;
max = middle2;
}
else
min = middle2;
}
if(num == list[max])
cout << "your number is found in the "<< max <<"th location\n";
else
cout << "number cannot be found";
If you could explain how to determine its complexity in terms of big-theta notation, it would be very helpful for me.
At each step, you are reducing the size of the searchable range by a constant factor (in this case 3). If you find your element after n steps, then the searchable range has size N = 3n. Inversely, the number of steps that you need until you find the element is the logarithm of the size of the collection. That is, the runtime is O(log N). A little further thought shows that you can also always construct situations where you need all those steps, so the worst-case runtime is actually Θ(log N).
It is Θ(log3(N)).
To check how to calculate complexity just check http://en.wikipedia.org/wiki/Big_O_notation
To read more about ternary search, just check the wikipedia page also: http://en.wikipedia.org/wiki/Ternary_search

Resources