Leetcode--3 find the longest substring without repeating character - string

The target is simple--find the longest substring without repeating characters,here is the code:
class Solution {
public:
int lengthOfLongestSubstring(string s) {
int ans = 0;
int dic[256];
memset(dic, -1, sizeof(dic));
int len = s.size();
int idx = -1;
for (int i = 0;i < len;i++) {
char c = s[i];
if (dic[c] > idx)
idx = dic[c];
ans = max(ans, i - idx);
dic[c] = i;
}
return ans;
}
};
From its concise expression,I think this is a high-performance method,and we can get that its Time Complexity is just O(n).But I'm confused about this method,though I came up with some examples to understand,can anyone give some tips or idea to me?

What it is doing is recording the position where each character was last seen.
As you step through, it takes each new encountered character and the length of non-repeat goes back at least as far as that last-seen, but for future indices can't go back further, as we have now seen a duplicate.
So we are maintaining in idx, the start index of the latest-seen highest-starting duplicate, which is the candidate for the start of the longest non-duplicating sequence.
I'm certain that the ans = max() code code be optimised slightly, as after encountering a new duplicate, you have to go forward at least ans chars from the start of that duplicate before ans can be improved again. You still need to do the rest of the work maintaining dic and idx, but you could avoid that particular test for ans for a few iterations. You would have to do a lot of unrolling to benefit, though.

Related

why word ladder problem works different on GFG and leetcode?

In this question we have to print the all minimum size string to reach the target word. When i solve this question on GFG, it runs fine but not on LeetCode.
Here is my code:
class Solution {
public:
vector<vector<string>> findSequences(string beginWord, string endWord, vector<string>& wordList) {
unordered_set<string> st(wordList.begin(), wordList.end());
queue<vector<string>> p;
p.push({beginWord});
vector<string> usedOnLevel;
usedOnLevel.push_back(beginWord);
int level = 0;
vector<vector<string>> ans;
while (!p.empty()) {
vector<string> vec = p.front();
p.pop();
if (vec.size() > level) {
level++;
for (auto it : usedOnLevel) {
st.erase(it);
}
}
string word = vec.back();
if (word == endWord) {
if (ans.size() == 0) {
ans.push_back(vec);
} else if (ans.size() > 0 && ans[0].size() == vec.size()) {
ans.push_back(vec);
}
}
for (int i = 0; i < word.length(); i++) {
char original = word[i];
for (char ch = 'a'; ch <= 'z'; ch++) {
word[i] = ch;
if (st.find(word) != st.end()) {
usedOnLevel.push_back(word);
vec.push_back(word);
p.push(vec);
vec.pop_back();
}
}
word[i] = original;
}
}
return ans;
}
};
The difference is that leetcode throws bigger problems at you, and so correct code with poor performance is going to break. And your code has poor performance.
Why? Well, for a start, for each word you find a path to, for each possible substitution, you're looking through all words to find yours. So suppose I start with all of the 5 letter words in the official Scrabble dictionary. There are about 9000 of those. For each word you find you're going to come up with 26*5 = 130 possible new words, then search the entire 9000 word list for that for 1_170_000_000 word comparisons, mostly to find nothing. Your algorithm wanted to do more than just that, but it has already timed out.
How could you make that faster? Here is one idea. Create a data structure to answer the following question:
by position of the deleted letter:
by the resulting string:
list of words that matched
For the entire Scrabble dictionary this data structure only has around 45_000 entries. And makes it easy to find all words next to a given word in the word ladder.
OK, great! Is that enough? Well...probably not. You're starting from startWord and finding all chains of words you can find from there. Most of which are going nowhere near endWord and represent wasted work. If the minimum length chain is fairly long, this can easily be an exponential amount of wasted effort. How can we avoid it?
The answer is to do a breadth-first search from endWord to find out how far away each word is from endWord. In this search we can also record for each word which words moved you closer. Again, even for all of the Scrabble dictionary, this data structure will be of manageable size. And you can break it off as soon as you've found how to get to startWord.
But now with this pre-processing, it is easy to start with startWord and recursively find all solutions. Because all of the work you'll be doing is enumerating paths that you already know will work.

How do I simple remove duplicates in my vector?

I am new to coding and struggling with a section in my code. I am at the part where i want to remove duplicate int values from my vector.
my duplicated vector contains: 1 1 2 1 4
my goal is to get a deduplicated vector: 1, 2, 4.
This is what I have so far, It also needs to be a rather simple solution. No pointers and fancy stuff as I still need to study those in the future.
for(int i = 0; i < duplicatedVector.size(); i++) {
int temp = duplicatedVector.at(i);
int counter = 0;
if(temp == duplicatedVector.at(i)) {
counter++;
if(counter > 1) {
deduplicatedVector.push_back(temp);
}
}
}
Could anyone tell me what I do wrong ? I genuinly am trying to iterate through the vector and delete duplicated int, in the given order.
Your algorithm is not well-enough thought out.
Break it up:
for each element of the original vector:
is it in the result vector?
yes: do nothing
no: add it to the result vector
You have your (1) loop, but the (2) part is confused. The result vector is not the same as the original vector, and is not to be indexed the same.
To determine whether an element is in a vector, you need a loop. Loop through your result vector to see if the element is in it. If you find it, it is, so break the inner loop. If you do not, you don't.
You can tell whether or not you found a duplicate by the final value of your inner loop index (the index into the result vector). If it equals result.size() then no duplicate was found.
Clearer variable naming might help as well. You are calling your original/source vector duplicatedVector, and your result vector deduplicatedVector. Even hasDuplicates and noDuplicates would be easier to mentally parse.
You could use a set since it eliminates duplicates:
#include <bits/stdc++.h>
using namespace std;
int main () {
vector<int> vec = vector<int>();
vector<int> dedupl = vector<int>();
vec.push_back(2);
vec.push_back(4);
vec.push_back(2);
vec.push_back(7);
vec.push_back(34);
vec.push_back(34);
set<int> mySet = set<int>();
for (int i = 0; i < vec.size(); i++) {
mySet.insert(vec[i]);
}
for (int elem : mySet) {
dedupl.push_back(elem);
}
for (int elem : dedupl) {
cout << elem << " ";
}
}

find the number of ways you can form a string on size N, given an unlimited number of 0s and 1s

The below question was asked in the atlassian company online test ,I don't have test cases , this is the below question I took from this link
find the number of ways you can form a string on size N, given an unlimited number of 0s and 1s. But
you cannot have D number of consecutive 0s and T number of consecutive 1s. N, D, T were given as inputs,
Please help me on this problem,any approach how to proceed with it
My approach for the above question is simply I applied recursion and tried for all possiblity and then I memoized it using hash map
But it seems to me there must be some combinatoric approach that can do this question in less time and space? for debugging purposes I am also printing the strings generated during recursion, if there is flaw in my approach please do tell me
#include <bits/stdc++.h>
using namespace std;
unordered_map<string,int>dp;
int recurse(int d,int t,int n,int oldd,int oldt,string s)
{
if(d<=0)
return 0;
if(t<=0)
return 0;
cout<<s<<"\n";
if(n==0&&d>0&&t>0)
return 1;
string h=to_string(d)+" "+to_string(t)+" "+to_string(n);
if(dp.find(h)!=dp.end())
return dp[h];
int ans=0;
ans+=recurse(d-1,oldt,n-1,oldd,oldt,s+'0')+recurse(oldd,t-1,n-1,oldd,oldt,s+'1');
return dp[h]=ans;
}
int main()
{
int n,d,t;
cin>>n>>d>>t;
dp.clear();
cout<<recurse(d,t,n,d,t,"")<<"\n";
return 0;
}
You are right, instead of generating strings, it is worth to consider combinatoric approach using dynamic programming (a kind of).
"Good" sequence of length K might end with 1..D-1 zeros or 1..T-1 of ones.
To make a good sequence of length K+1, you can add zero to all sequences except for D-1, and get 2..D-1 zeros for the first kind of precursors and 1 zero for the second kind
Similarly you can add one to all sequences of the first kind, and to all sequences of the second kind except for T-1, and get 1 one for the first kind of precursors and 2..T-1 ones for the second kind
Make two tables
Zeros[N][D] and Ones[N][T]
Fill the first row with zero counts, except for Zeros[1][1] = 1, Ones[1][1] = 1
Fill row by row using the rules above.
Zeros[K][1] = Sum(Ones[K-1][C=1..T-1])
for C in 2..D-1:
Zeros[K][C] = Zeros[K-1][C-1]
Ones[K][1] = Sum(Zeros[K-1][C=1..T-1])
for C in 2..T-1:
Ones[K][C] = Ones[K-1][C-1]
Result is sum of the last row in both tables.
Also note that you really need only two active rows of the table, so you can optimize size to Zeros[2][D] after debugging.
This can be solved using dynamic programming. I'll give a recursive solution to the same. It'll be similar to generating a binary string.
States will be:
i: The ith character that we need to insert to the string.
cnt: The number of consecutive characters before i
bit: The character which was repeated cnt times before i. Value of bit will be either 0 or 1.
Base case will: Return 1, when we reach n since we are starting from 0 and ending at n-1.
Define the size of dp array accordingly. The time complexity will be 2 x N x max(D,T)
#include<bits/stdc++.h>
using namespace std;
int dp[1000][1000][2];
int n, d, t;
int count(int i, int cnt, int bit) {
if (i == n) {
return 1;
}
int &ans = dp[i][cnt][bit];
if (ans != -1) return ans;
ans = 0;
if (bit == 0) {
ans += count(i+1, 1, 1);
if (cnt != d - 1) {
ans += count(i+1, cnt + 1, 0);
}
} else {
// bit == 1
ans += count(i+1, 1, 0);
if (cnt != t-1) {
ans += count(i+1, cnt + 1, 1);
}
}
return ans;
}
signed main() {
ios_base::sync_with_stdio(false), cin.tie(nullptr);
cin >> n >> d >> t;
memset(dp, -1, sizeof dp);
cout << count(0, 0, 0);
return 0;
}

Maximum repeating substring of size n

Find the substring of length n that repeats a maximum number of times in a given string.
Input: abbbabbbb# 2
Output: bb
My solution:
public static String mrs(String s, int m) {
int n = s.length();
String[] suffixes = new String[n-m+1];
for (int i = 0; i < n-m+1; i++) {
suffixes[i] = s.substring(i, i+m);
}
Arrays.sort(suffixes);
String ans = "", tmp=suffixes[0].substring(0,m);
int cnt = 1, max=0;
for (int i = 0; i < n-m; i++) {
if (suffixes[i].equals(suffixes[i+1])){
cnt++;
}else{
if(cnt>max){
max = cnt;
ans =tmp;
}
cnt=0;
tmp = suffixes[i];
}
}
return ans;
}
Can it be done better than the above O(nm) time and O(n) space solution?
For a string of length L and a given length k (not to mess up with n and m which the question interchanges at times), we can compute polynomial hashes of all substrings of length k in O(L) (see Wikipedia for some elaboration on this subproblem).
Now, if we map the hash values to the number of times they occur, we get the value which occurs most frequently in O(L) (with a HashMap with high probability, or in O(L log L) with a TreeMap).
After that, just take the substring which got the most frequent hash as the answer.
This solution does not take hash collisions into account.
The idea is to just reduce the probability of collisions enough for the application (if it's too high, use multiple hashes, for example).
If the application demands that we absolutely never give a wrong answer, we can check the answer in O(L) with another algorithm (KMP, for example), and re-run the whole solution with a different hash function as long as the answer turns out to be wrong.

Longest Common Substring non-DP solution with O(m*n)

The definition of the problem is:
Given two strings, find the longest common substring.
Return the length of it.
I was solving this problem and I think I solved it with O(m*n) time complexity. However I don't know why when I look up the solution, it's all talking about the optimal solution being dynamic programming - http://www.geeksforgeeks.org/longest-common-substring/
Here's my solution, you can test it here: http://www.lintcode.com/en/problem/longest-common-substring/
int longestCommonSubstring(string &A, string &B) {
int ans = 0;
for (int i=0; i<A.length(); i++) {
int counter = 0;
int k = i;
for (int j=0; j<B.length() && k <A.length(); j++) {
if (A[k]!=B[j]) {
counter = 0;
k = i;
} else {
k++;
counter++;
ans = max(ans, counter);
}
}
}
return ans;
}
My idea is simple, start from the first position of string A and see what's the longest substring I can match with string B, then start from the second position of string A and see what's the longest substring I can match....
Is there something wrong with my solution? Or is it not O(m*n) complexity?
Good news: your algorithm is O(mn). Bad news: it doesn't work correctly.
Your inner loop is wrong: it's intended to find the longest initial substring of A[i:] in B, but it works like this:
j = 0
While j < len(B)
Match as much of A[i:] against B[j:]. Call it s.
Remember s if it's the longest so far found.
j += len(s)
This fails to find the longest match. For example, when A = "XXY" and B = "XXXY" and i=0 it'll find "XX" as the longest match instead of the complete match "XXY".
Here's a runnable version of your code (lightly transcribed into C) that shows the faulty result:
#include <string.h>
#include <stdio.h>
int lcs(const char* A, const char* B) {
int al = strlen(A);
int bl = strlen(B);
int ans = 0;
for (int i=0; i<al; i++) {
int counter = 0;
int k = i;
for (int j=0; j<bl && k<al; j++) {
if (A[k]!=B[j]) {
counter = 0;
k = i;
} else {
k++;
counter++;
if (counter >= ans) ans = counter;
}
}
}
return ans;
}
int main(int argc, char**argv) {
printf("%d\n", lcs("XXY", "XXXY"));
return 0;
}
Running this program outputs "2".
Your solution is O(nm) complexity and if you look compare the structure to the provided algorithm its the exact same; however, yours does not memoize.
One advantage that the dynamic algorithm provided in the link has is that in the same complexity class time it can recall different substring lengths in O(1); otherwise, it looks good to me.
This is a kind of thing will happen from time to time because storing subspace solutions will not always result in a better run time (on first call) and result in the same complexity class runtime instead (eg. try to compute the nth Fibonacci number with a dynamic solution and compare that to a tail recursive solution. Note that in this case like your case, after the array is filled the first time, its faster to return an answer each successive call.

Resources