Complexity Analysis: Remove consecutive(3+) duplicates from string - string

I have written a solution for removing 3(let's call this k) or more consecutive same characters from string.
eg.
in: daabbcccbbaad out: dd
in: aaabbb out:
in: aaabbbc out: c
code:
void removeDup(string& s, int start) {
if (s.length() < 3) {
return;
}
if (start > s.length() - 2) return;
int count = 1;
char c = s[start];
int i = start + 1;
while (i < s.length() && s[i] == c) {
++count, ++i;
}
if (count >= 3) {
start = i - count - 2;
if (start < 0) start = 0;
s.erase(i - count, count);
} else {
start = i;
}
return removeDup(s, start);
}
For each match start pointer will move 2 step behind. Also problem size is reducing by no of matched characters.
I want to understand how to do complexity analysis for this. Also for any k(let's say k=n/2) how will the complexity change.
ps: if you have a better solution with constant space, kindly post.

Related

cs50x 2020 - pset2 - substitution - duplicate characters in key

I keep getting an error around handling duplicate characters in key when checking my code for the substitution problem within pset2 of the cs50 course 2020. My code and further details are below - can anyone please help with this? Thanks
The error message it gives me is
:( handles duplicate characters in key
timed out while waiting for program to exit
When I check my code for duplicate characters it seems to work fine (printing Usage: ./substitution key and ending the program)
Code below
# include <stdio.h>
# include <cs50.h>
# include <string.h>
# include <stdlib.h>
# include <ctype.h>
int main(int argc, string argv[])
{
// Check that only one argument submitted
if (argc == 2)
{
// Check that key contains 26 characters
int keylen = strlen(argv[1]);
if (keylen == 26)
{
// Check that all characters are letters
for (int i = 0; i < keylen; i++)
{
bool lettercheck = isalpha(argv[1][i]);
if (lettercheck == true)
{
// THIS IS CAUSING ERROR - Check that no letters have been repeated - put all in lowercase to do so
for (int n = 0; n < i; n++)
{
char currentletter = argv[1][i];
char previousletter = argv[1][i - 1];
if (tolower(currentletter) == tolower(previousletter))
{
printf("Usage: ./substitution key\n");
return 1;
}
}
}
else
{
printf("Usage: ./substitution key\n");
return 1;
}
}
}
else
{
printf("Key must contain 26 characters.\n");
return 1;
}
}
else
{
printf("Usage: ./substitution key\n");
return 1;
}
// Get user input
string input = get_string("plaintext: ");
//Transform input using key
for(int i = 0; i < strlen(input); i++)
{
char currentletter = input[i];
int testlower = islower(currentletter);
int testupper = isupper(currentletter);
if (testupper > 0)
{
int j = input[i] - 65;
input[i] = toupper(argv[1][j]);
}
else if (testlower > 0)
{
int j = input[i] - 97;
input[i] = tolower(argv[1][j]);
}
}
printf("ciphertext: %s\n", input);
}
Edit:
Figured out solution - problem was with the second for loop was iterating against i - 1 times instead of n times
Code should have been
charpreviouslletter = argv[1][n]
instead of
charpreviousletter = argv[1][i - 1]
for (int n = 0; n < i; n++)
{
char currentletter = argv[1][i];
char previousletter = argv[1]**[i - 1]**
In this loop-
// THIS IS CAUSING ERROR - Check that no letters have been repeated - put all in lowercase to do so
for (int n = 0; n < i; n++)
{
char currentletter = argv[1][i];
char previousletter = argv[1][i - 1];
if (tolower(currentletter) == tolower(previousletter))
{
printf("Usage: ./substitution key\n");
return 1;
}
}
you're comparing only the current character to the previous character. This doesn't work for strings like abcdefca
Notice how, c and a have duplicates - but they're not right next to their originals and hence your logic won't find these duplicates. Your logic will only work for duplicates that are next to each other such as aabcddef.
Instead, you need to take a note of which characters you've encountered whilst looping through. If you encounter a character that you have already encountered, you know there's a duplicate.
Thankfully, the key is only expected to contain all 26 characters of the alphabet without any duplicates. This means you can simply have an int array of 26 slots - each slot counts the number of appearances of the letter at that index. 0th index stands for 'a', 1st for 'b' and so on.
This way, you can very easily get the index of an alphabetic character using letter - 'a', where letter is the alphabetic character. So if the letter was a, you'd get 0, which is indeed the index of 'a'
Also, you have a nested loop while traversing the key, this nested loop also traverses through the key. Except it does it only up until a certain index, the index being the current index of the outer loop. This seems wasteful and weird. Why not simply loop through once, check if current character is an alphabetic letter and also check if this letter has been encountered before. That's all you have to do!
int letter_presence[26];
char upperletter;
string key = argv[1];
if (strlen(key) == KEY_LEN)
{
for (int index = 0; index < KEY_LEN; index++)
{
if (!isalpha(key[index]))
{
// Wrong key - invalid character
printf("Usage: ./substitution key\n");
return 1;
}
if (letter_presence[tolower(key[index]) - 'a'] == 0)
{
// This letter has not been encountered before
letter_presence[upperletter - 'A'] = 1;
}
else
{
// Wrong key - Duplicate letters
return 1;
}
}
// All good
}

Minimum number of swaps to convert a string to palindrome

We are given a string and we have to find out the minimum number of swaps to convert it into a palindrome.
Ex-
Given string: ntiin
Palindrome: nitin
Minimum number of swaps: 1
If it is not possible to convert it into a palindrome, return -1.
I am unable to think of any approach except brute force. We can check on the first and last characters, if they are equal, we check for the smaller substring, and then apply brute force on it. But this will be of a very high complexity, and I feel this question can be solved in another way. Maybe dynamic programming. How to approach it?
First you could check if the string can be converted to a palindrome.
Just have an array of letters (26 chars if all letters are latin lowercase), and count the number of each letter in the input string.
If string length is even, all letters counts should be even.
If string length is odd, all letters counts should be even except one.
This first pass in O(n) will already treat all -1 cases.
If the string length is odd, start by moving the element with odd count to the middle.
Then you can apply following procedure:
Build a weighted graph with the following logic for an input string S of length N:
For every element from index 0 to N/2-1:
- If symmetric element S[N-index-1] is same continue
- If different, create edge between the 2 characters (alphabetic order), or increment weight of an existing one
The idea is that when a weight is even you can do a 'good swap' by forming two pairs in one swap.
When weight is odd, you cannot place two pairs in one swap, your swaps need to form a cycle
1. For instance "a b a b"
One edge between a,b of weight 2:
a - b (2)
Return 1
2. For instance: "a b c b a c"
a - c (1)
b - a (1)
c - b (1)
See the cycle: a - b, b - c, c - a
After a swap of a,c you get:
a - a (1)
b - c (1)
c - b (1)
Which is after ignoring first one and merge 2 & 3:
c - b (2)
Which is even, you get to the result in one swap
Return 2
3. For instance: "a b c a b c"
a - c (2)
One swap and you are good
So basically after your graph is generated, add to the result the weight/2 (integer division e.g. 7/3 = 3) of each edge
Plus find the cycles and add to the result length-1 of each cycle
there is the same question as asked!
https://www.codechef.com/problems/ENCD12
I got ac for this solution
https://www.ideone.com/8wF9DT
//minimum adjacent swaps to make a string to its palindrome
#include<bits/stdc++.h>
using namespace std;
bool check(string s)
{
int n=s.length();
map<char,int> m;
for(auto i:s)
{
m[i]++;
}
int cnt=0;
for(auto i=m.begin();i!=m.end();i++)
{
if(i->second%2)
{
cnt++;
}
}
if(n%2&&cnt==1){return true;}
if(!(n%2)&&cnt==0){return true;}
return false;
}
int main()
{
string a;
while(cin>>a)
{
if(a[0]=='0')
{
break;
}
string s;s=a;
int n=s.length();
//first check if
int cnt=0;
bool ini=false;
if(n%2){ini=true;}
if(check(s))
{
for(int i=0;i<n/2;i++)
{
bool fl=false;
int j=0;
for(j=n-1-i;j>i;j--)
{
if(s[j]==s[i])
{
fl=true;
for(int k=j;k<n-1-i;k++)
{
swap(s[k],s[k+1]);
cnt++;
// cout<<cnt<<endl<<flush;
}
// cout<<" "<<i<<" "<<cnt<<endl<<flush;
break;
}
}
if(!fl&&ini)
{
for(int k=i;k<n/2;k++)
{
swap(s[k],s[k+1]);
cnt++;
}
// cout<<cnt<<" "<<i<<" "<<endl<<flush;
}
}
cout<<cnt<<endl;
}
else{
cout<<"Impossible"<<endl;
}
}
}
Hope it helps!
Technique behind my code is Greedy
first check if palindrome string can exist for the the string and if it can
there would be two cases one is when the string length would be odd then only count of one char has be odd
and if even then no count should be odd
then
from index 0 to n/2-1 do the following
fix this character and search for this char from n-i-1 to i+1
if found then swap from that position (lets say j) to its new position n-i-1
if the string length is odd then every time you encounter a char with no other occurence shift it to n/2th position..
My solution revolves around the palindrome property that first element and last element should match and if their adjacent elements also do not match then its not a palindrome. Keep comparing and swapping till both reach the same element or adjacent elements.
Written solution in java as below:
public static void main(String args[]){
String input = "natinat";
char[] arr = input.toCharArray();
int swap = 0;
int i = 0;
int j = arr.length-1;
char temp;
while(i<j){
if(arr[i] != arr[j]){
if(arr[i+1] == arr[j]){
//swap i and i+1 and increment i, decrement j, swap++
temp = arr[i];
arr[i] = arr[i+1];
arr[i+1] = temp;
i++;j--;
swap++;
} else if(arr[i] == arr[j-1]){
//swap j and j-1 and increment i, decrement j, swap++
temp = arr[j];
arr[j] = arr[j-1];
arr[j-1] = temp;
i++;j--;
swap++;
} else if(arr[i+1] == arr[j-1] && i+1 != j-1){
//swap i and i+1, swap j and j-1 and increment i, decrement j, swap+2
temp = arr[j];
arr[j] = arr[j-1];
arr[j-1] = temp;
temp = arr[i];
arr[i] = arr[i+1];
arr[i+1] = temp;
i++;j--;
swap = swap+2;
}else{
swap = -1;break;
}
} else{
//increment i, decrement j
i++;j--;
}
}
System.out.println("No Of Swaps: "+swap);
}
My solution in java for any type of string i.e Binary String, Numbers
public int countSwapInPalindrome(String s){
int length = s.length();
if (length == 0 || length == 1) return -1;
char[] str = s.toCharArray();
int start = 0, end = length - 1;
int count = 0;
while (start < end) {
if (str[start] != str[end]){
boolean isSwapped = false;
for (int i = start + 1; i < end; i++){
if (str[start] == str[i]){
char temp = str[i];
str[i] = str[end];
str[end] = temp;
count++;
isSwapped = true;
break;
}else if (str[end] == str[i]){
char temp = str[i];
str[i] = str[start];
str[start] = temp;
count++;
isSwapped = true;
break;
}
}
if (!isSwapped) return -1;
}
start++;
end--;
}
return (s.equals(String.valueOf(str))) ? -1 : count;
}
I hope it helps
string s;
cin>>s;
int n = s.size(),odd=0;
vi cnt(26,0);
unordered_map<int,set<int>>mp;
for(int i=0;i<n;i++){
cnt[s[i]-'a']++;
mp[s[i]-'a'].insert(i);
}
for(int i=0;i<26;i++){
if(cnt[i]&1) odd++;
}
int ans=0;
if((n&1 && odd == 1)|| ((n&1) == 0 && odd == 0)){
int left=0,right=n-1;
while(left < right){
if(s[left] == s[right]){
cnt[left]--;
cnt[right]--;
mp[s[left]-'a'].erase(left);
mp[s[right]-'a'].erase(right);
left++;
right--;
}else{
if(cnt[left]&1 == 0){
ans++;
int index = *mp[s[left]-'a'].rbegin();
mp[s[left]-'a'].erase(index);
mp[s[right]-'a'].erase(right);
mp[s[right]-'a'].insert(index);
swap(s[right],s[index]);
cnt[left]-=2;
}else{
ans++;
int index = *mp[s[right]-'a'].begin();
mp[s[right]-'a'].erase(index);
mp[s[left]-'a'].erase(left);
mp[s[left]-'a'].insert(index);
swap(s[left],s[index]);
cnt[right]-=2;
}
left++;
right--;
}
}
}else{
// cout<<odd<<" ";
cout<<"-1\n";
return;
}
cout<<ans<<"\n";

Finding the ranking of a word (permutations) with duplicate letters

I'm posting this although much has already been posted about this question. I didn't want to post as an answer since it's not working. The answer to this post (Finding the rank of the Given string in list of all possible permutations with Duplicates) did not work for me.
So I tried this (which is a compilation of code I've plagiarized and my attempt to deal with repetitions). The non-repeating cases work fine. BOOKKEEPER generates 83863, not the desired 10743.
(The factorial function and letter counter array 'repeats' are working correctly. I didn't post to save space.)
while (pointer != length)
{
if (sortedWordChars[pointer] != wordArray[pointer])
{
// Swap the current character with the one after that
char temp = sortedWordChars[pointer];
sortedWordChars[pointer] = sortedWordChars[next];
sortedWordChars[next] = temp;
next++;
//For each position check how many characters left have duplicates,
//and use the logic that if you need to permute n things and if 'a' things
//are similar the number of permutations is n!/a!
int ct = repeats[(sortedWordChars[pointer]-64)];
// Increment the rank
if (ct>1) { //repeats?
System.out.println("repeating " + (sortedWordChars[pointer]-64));
//In case of repetition of any character use: (n-1)!/(times)!
//e.g. if there is 1 character which is repeating twice,
//x* (n-1)!/2!
int dividend = getFactorialIter(length - pointer - 1);
int divisor = getFactorialIter(ct);
int quo = dividend/divisor;
rank += quo;
} else {
rank += getFactorialIter(length - pointer - 1);
}
} else
{
pointer++;
next = pointer + 1;
}
}
Note: this answer is for 1-based rankings, as specified implicitly by example. Here's some Python that works at least for the two examples provided. The key fact is that suffixperms * ctr[y] // ctr[x] is the number of permutations whose first letter is y of the length-(i + 1) suffix of perm.
from collections import Counter
def rankperm(perm):
rank = 1
suffixperms = 1
ctr = Counter()
for i in range(len(perm)):
x = perm[((len(perm) - 1) - i)]
ctr[x] += 1
for y in ctr:
if (y < x):
rank += ((suffixperms * ctr[y]) // ctr[x])
suffixperms = ((suffixperms * (i + 1)) // ctr[x])
return rank
print(rankperm('QUESTION'))
print(rankperm('BOOKKEEPER'))
Java version:
public static long rankPerm(String perm) {
long rank = 1;
long suffixPermCount = 1;
java.util.Map<Character, Integer> charCounts =
new java.util.HashMap<Character, Integer>();
for (int i = perm.length() - 1; i > -1; i--) {
char x = perm.charAt(i);
int xCount = charCounts.containsKey(x) ? charCounts.get(x) + 1 : 1;
charCounts.put(x, xCount);
for (java.util.Map.Entry<Character, Integer> e : charCounts.entrySet()) {
if (e.getKey() < x) {
rank += suffixPermCount * e.getValue() / xCount;
}
}
suffixPermCount *= perm.length() - i;
suffixPermCount /= xCount;
}
return rank;
}
Unranking permutations:
from collections import Counter
def unrankperm(letters, rank):
ctr = Counter()
permcount = 1
for i in range(len(letters)):
x = letters[i]
ctr[x] += 1
permcount = (permcount * (i + 1)) // ctr[x]
# ctr is the histogram of letters
# permcount is the number of distinct perms of letters
perm = []
for i in range(len(letters)):
for x in sorted(ctr.keys()):
# suffixcount is the number of distinct perms that begin with x
suffixcount = permcount * ctr[x] // (len(letters) - i)
if rank <= suffixcount:
perm.append(x)
permcount = suffixcount
ctr[x] -= 1
if ctr[x] == 0:
del ctr[x]
break
rank -= suffixcount
return ''.join(perm)
If we use mathematics, the complexity will come down and will be able to find rank quicker. This will be particularly helpful for large strings.
(more details can be found here)
Suggest to programmatically define the approach shown here (screenshot attached below) given below)
I would say David post (the accepted answer) is super cool. However, I would like to improve it further for speed. The inner loop is trying to find inverse order pairs, and for each such inverse order, it tries to contribute to the increment of rank. If we use an ordered map structure (binary search tree or BST) in that place, we can simply do an inorder traversal from the first node (left-bottom) until it reaches the current character in the BST, rather than traversal for the whole map(BST). In C++, std::map is a perfect one for BST implementation. The following code reduces the necessary iterations in loop and removes the if check.
long long rankofword(string s)
{
long long rank = 1;
long long suffixPermCount = 1;
map<char, int> m;
int size = s.size();
for (int i = size - 1; i > -1; i--)
{
char x = s[i];
m[x]++;
for (auto it = m.begin(); it != m.find(x); it++)
rank += suffixPermCount * it->second / m[x];
suffixPermCount *= (size - i);
suffixPermCount /= m[x];
}
return rank;
}
#Dvaid Einstat, this was really helpful. It took me a WHILE to figure out what you were doing as I am still learning my first language(C#). I translated it into C# and figured that I'd give that solution as well since this listing helped me so much!
Thanks!
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Text.RegularExpressions;
namespace CsharpVersion
{
class Program
{
//Takes in the word and checks to make sure that the word
//is between 1 and 25 charaters inclusive and only
//letters are used
static string readWord(string prompt, int high)
{
Regex rgx = new Regex("^[a-zA-Z]+$");
string word;
string result;
do
{
Console.WriteLine(prompt);
word = Console.ReadLine();
} while (word == "" | word.Length > high | rgx.IsMatch(word) == false);
result = word.ToUpper();
return result;
}
//Creates a sorted dictionary containing distinct letters
//initialized with 0 frequency
static SortedDictionary<char,int> Counter(string word)
{
char[] wordArray = word.ToCharArray();
int len = word.Length;
SortedDictionary<char,int> count = new SortedDictionary<char,int>();
foreach(char c in word)
{
if(count.ContainsKey(c))
{
}
else
{
count.Add(c, 0);
}
}
return count;
}
//Creates a factorial function
static int Factorial(int n)
{
if (n <= 1)
{
return 1;
}
else
{
return n * Factorial(n - 1);
}
}
//Ranks the word input if there are no repeated charaters
//in the word
static Int64 rankWord(char[] wordArray)
{
int n = wordArray.Length;
Int64 rank = 1;
//loops through the array of letters
for (int i = 0; i < n-1; i++)
{
int x=0;
//loops all letters after i and compares them for factorial calculation
for (int j = i+1; j<n ; j++)
{
if (wordArray[i] > wordArray[j])
{
x++;
}
}
rank = rank + x * (Factorial(n - i - 1));
}
return rank;
}
//Ranks the word input if there are repeated charaters
//in the word
static Int64 rankPerm(String word)
{
Int64 rank = 1;
Int64 suffixPermCount = 1;
SortedDictionary<char, int> counter = Counter(word);
for (int i = word.Length - 1; i > -1; i--)
{
char x = Convert.ToChar(word.Substring(i,1));
int xCount;
if(counter[x] != 0)
{
xCount = counter[x] + 1;
}
else
{
xCount = 1;
}
counter[x] = xCount;
foreach (KeyValuePair<char,int> e in counter)
{
if (e.Key < x)
{
rank += suffixPermCount * e.Value / xCount;
}
}
suffixPermCount *= word.Length - i;
suffixPermCount /= xCount;
}
return rank;
}
static void Main(string[] args)
{
Console.WriteLine("Type Exit to end the program.");
string prompt = "Please enter a word using only letters:";
const int MAX_VALUE = 25;
Int64 rank = new Int64();
string theWord;
do
{
theWord = readWord(prompt, MAX_VALUE);
char[] wordLetters = theWord.ToCharArray();
Array.Sort(wordLetters);
bool duplicate = false;
for(int i = 0; i< theWord.Length - 1; i++)
{
if(wordLetters[i] < wordLetters[i+1])
{
duplicate = true;
}
}
if(duplicate)
{
SortedDictionary<char, int> counter = Counter(theWord);
rank = rankPerm(theWord);
Console.WriteLine("\n" + theWord + " = " + rank);
}
else
{
char[] letters = theWord.ToCharArray();
rank = rankWord(letters);
Console.WriteLine("\n" + theWord + " = " + rank);
}
} while (theWord != "EXIT");
Console.WriteLine("\nPress enter to escape..");
Console.Read();
}
}
}
If there are k distinct characters, the i^th character repeated n_i times, then the total number of permutations is given by
(n_1 + n_2 + ..+ n_k)!
------------------------------------------------
n_1! n_2! ... n_k!
which is the multinomial coefficient.
Now we can use this to compute the rank of a given permutation as follows:
Consider the first character(leftmost). say it was the r^th one in the sorted order of characters.
Now if you replace the first character by any of the 1,2,3,..,(r-1)^th character and consider all possible permutations, each of these permutations will precede the given permutation. The total number can be computed using the above formula.
Once you compute the number for the first character, fix the first character, and repeat the same with the second character and so on.
Here's the C++ implementation to your question
#include<iostream>
using namespace std;
int fact(int f) {
if (f == 0) return 1;
if (f <= 2) return f;
return (f * fact(f - 1));
}
int solve(string s,int n) {
int ans = 1;
int arr[26] = {0};
int len = n - 1;
for (int i = 0; i < n; i++) {
s[i] = toupper(s[i]);
arr[s[i] - 'A']++;
}
for(int i = 0; i < n; i++) {
int temp = 0;
int x = 1;
char c = s[i];
for(int j = 0; j < c - 'A'; j++) temp += arr[j];
for (int j = 0; j < 26; j++) x = (x * fact(arr[j]));
arr[c - 'A']--;
ans = ans + (temp * ((fact(len)) / x));
len--;
}
return ans;
}
int main() {
int i,n;
string s;
cin>>s;
n=s.size();
cout << solve(s,n);
return 0;
}
Java version of unrank for a String:
public static String unrankperm(String letters, int rank) {
Map<Character, Integer> charCounts = new java.util.HashMap<>();
int permcount = 1;
for(int i = 0; i < letters.length(); i++) {
char x = letters.charAt(i);
int xCount = charCounts.containsKey(x) ? charCounts.get(x) + 1 : 1;
charCounts.put(x, xCount);
permcount = (permcount * (i + 1)) / xCount;
}
// charCounts is the histogram of letters
// permcount is the number of distinct perms of letters
StringBuilder perm = new StringBuilder();
for(int i = 0; i < letters.length(); i++) {
List<Character> sorted = new ArrayList<>(charCounts.keySet());
Collections.sort(sorted);
for(Character x : sorted) {
// suffixcount is the number of distinct perms that begin with x
Integer frequency = charCounts.get(x);
int suffixcount = permcount * frequency / (letters.length() - i);
if (rank <= suffixcount) {
perm.append(x);
permcount = suffixcount;
if(frequency == 1) {
charCounts.remove(x);
} else {
charCounts.put(x, frequency - 1);
}
break;
}
rank -= suffixcount;
}
}
return perm.toString();
}
See also n-th-permutation-algorithm-for-use-in-brute-force-bin-packaging-parallelization.

Find the index of a specific combination without generating all ncr combinations

I am trying to find the index of a specific combination without generating the actual list of all possible combinations. For ex: 2 number combinations from 1 to 5 produces, 1,2;1,3,1,4,1,5;2,3,2,4,2,5..so..on. Each combination has its own index starting with zero,if my guess is right. I want to find that index without generating the all possible combination for a given combination. I am writing in C# but my code generates all possible combinations on fly. This would be expensive if n and r are like 80 and 9 and i even can't enumerate the actual range. Is there any possible way to find the index without producing the actual combination for that particular index
public int GetIndex(T[] combination)
{
int index = (from i in Enumerable.Range(0, 9)
where AreEquivalentArray(GetCombination(i), combination)
select i).SingleOrDefault();
return index;
}
I found the answer to my own question in simple terms. It is very simple but seems to be effective in my situation.The choose method is brought from other site though which generates the combinations count for n items chosen r:
public long GetIndex(T[] combinations)
{
long sum = Choose(items.Count(),atATime);
for (int i = 0; i < combinations.Count(); i++)
{
sum = sum - Choose(items.ToList().IndexOf(items.Max())+1 - (items.ToList().IndexOf(combinations[i])+1), atATime - i);
}
return sum-1;
}
private long Choose(int n, int k)
{
long result = 0;
int delta;
int max;
if (n < 0 || k < 0)
{
throw new ArgumentOutOfRangeException("Invalid negative parameter in Choose()");
}
if (n < k)
{
result = 0;
}
else if (n == k)
{
result = 1;
}
else
{
if (k < n - k)
{
delta = n - k;
max = k;
}
else
{
delta = k;
max = n - k;
}
result = delta + 1;
for (int i = 2; i <= max; i++)
{
checked
{
result = (result * (delta + i)) / i;
}
}
}
return result;
}

How to find smallest substring which contains all characters from a given string?

I have recently come across an interesting question on strings. Suppose you are given following:
Input string1: "this is a test string"
Input string2: "tist"
Output string: "t stri"
So, given above, how can I approach towards finding smallest substring of string1 that contains all the characters from string 2?
To see more details including working code, check my blog post at:
http://www.leetcode.com/2010/11/finding-minimum-window-in-s-which.html
To help illustrate this approach, I use an example: string1 = "acbbaca" and string2 = "aba". Here, we also use the term "window", which means a contiguous block of characters from string1 (could be interchanged with the term substring).
i) string1 = "acbbaca" and string2 = "aba".
ii) The first minimum window is found.
Notice that we cannot advance begin
pointer as hasFound['a'] ==
needToFind['a'] == 2. Advancing would
mean breaking the constraint.
iii) The second window is found. begin
pointer still points to the first
element 'a'. hasFound['a'] (3) is
greater than needToFind['a'] (2). We
decrement hasFound['a'] by one and
advance begin pointer to the right.
iv) We skip 'c' since it is not found
in string2. Begin pointer now points to 'b'.
hasFound['b'] (2) is greater than
needToFind['b'] (1). We decrement
hasFound['b'] by one and advance begin
pointer to the right.
v) Begin pointer now points to the
next 'b'. hasFound['b'] (1) is equal
to needToFind['b'] (1). We stop
immediately and this is our newly
found minimum window.
The idea is mainly based on the help of two pointers (begin and end position of the window) and two tables (needToFind and hasFound) while traversing string1. needToFind stores the total count of a character in string2 and hasFound stores the total count of a character met so far. We also use a count variable to store the total characters in string2 that's met so far (not counting characters where hasFound[x] exceeds needToFind[x]). When count equals string2's length, we know a valid window is found.
Each time we advance the end pointer (pointing to an element x), we increment hasFound[x] by one. We also increment count by one if hasFound[x] is less than or equal to needToFind[x]. Why? When the constraint is met (that is, count equals to string2's size), we immediately advance begin pointer as far right as possible while maintaining the constraint.
How do we check if it is maintaining the constraint? Assume that begin points to an element x, we check if hasFound[x] is greater than needToFind[x]. If it is, we can decrement hasFound[x] by one and advancing begin pointer without breaking the constraint. On the other hand, if it is not, we stop immediately as advancing begin pointer breaks the window constraint.
Finally, we check if the minimum window length is less than the current minimum. Update the current minimum if a new minimum is found.
Essentially, the algorithm finds the first window that satisfies the constraint, then continue maintaining the constraint throughout.
You can do a histogram sweep in O(N+M) time and O(1) space where N is the number of characters in the first string and M is the number of characters in the second.
It works like this:
Make a histogram of the second string's characters (key operation is hist2[ s2[i] ]++).
Make a cumulative histogram of the first string's characters until that histogram contains every character that the second string's histogram contains (which I will call "the histogram condition").
Then move forwards on the first string, subtracting from the histogram, until it fails to meet the histogram condition. Mark that bit of the first string (before the final move) as your tentative substring.
Move the front of the substring forwards again until you meet the histogram condition again. Move the end forwards until it fails again. If this is a shorter substring than the first, mark that as your tentative substring.
Repeat until you've passed through the entire first string.
The marked substring is your answer.
Note that by varying the check you use on the histogram condition, you can choose either to have the same set of characters as the second string, or at least as many characters of each type. (Its just the difference between a[i]>0 && b[i]>0 and a[i]>=b[i].)
You can speed up the histogram checks if you keep a track of which condition is not satisfied when you're trying to satisfy it, and checking only the thing that you decrement when you're trying to break it. (On the initial buildup, you count how many items you've satisfied, and increment that count every time you add a new character that takes the condition from false to true.)
Here's an O(n) solution. The basic idea is simple: for each starting index, find the least ending index such that the substring contains all of the necessary letters. The trick is that the least ending index increases over the course of the function, so with a little data structure support, we consider each character at most twice.
In Python:
from collections import defaultdict
def smallest(s1, s2):
assert s2 != ''
d = defaultdict(int)
nneg = [0] # number of negative entries in d
def incr(c):
d[c] += 1
if d[c] == 0:
nneg[0] -= 1
def decr(c):
if d[c] == 0:
nneg[0] += 1
d[c] -= 1
for c in s2:
decr(c)
minlen = len(s1) + 1
j = 0
for i in xrange(len(s1)):
while nneg[0] > 0:
if j >= len(s1):
return minlen
incr(s1[j])
j += 1
minlen = min(minlen, j - i)
decr(s1[i])
return minlen
I received the same interview question. I am a C++ candidate but I was in a position to code relatively fast in JAVA.
Java [Courtesy : Sumod Mathilakath]
import java.io.*;
import java.util.*;
class UserMainCode
{
public String GetSubString(String input1,String input2){
// Write code here...
return find(input1, input2);
}
private static boolean containsPatternChar(int[] sCount, int[] pCount) {
for(int i=0;i<256;i++) {
if(pCount[i]>sCount[i])
return false;
}
return true;
}
public static String find(String s, String p) {
if (p.length() > s.length())
return null;
int[] pCount = new int[256];
int[] sCount = new int[256];
// Time: O(p.lenght)
for(int i=0;i<p.length();i++) {
pCount[(int)(p.charAt(i))]++;
sCount[(int)(s.charAt(i))]++;
}
int i = 0, j = p.length(), min = Integer.MAX_VALUE;
String res = null;
// Time: O(s.lenght)
while (j < s.length()) {
if (containsPatternChar(sCount, pCount)) {
if ((j - i) < min) {
min = j - i;
res = s.substring(i, j);
// This is the smallest possible substring.
if(min==p.length())
break;
// Reduce the window size.
sCount[(int)(s.charAt(i))]--;
i++;
}
} else {
sCount[(int)(s.charAt(j))]++;
// Increase the window size.
j++;
}
}
System.out.println(res);
return res;
}
}
C++ [Courtesy : sundeepblue]
#include <iostream>
#include <vector>
#include <string>
#include <climits>
using namespace std;
string find_minimum_window(string s, string t) {
if(s.empty() || t.empty()) return;
int ns = s.size(), nt = t.size();
vector<int> total(256, 0);
vector<int> sofar(256, 0);
for(int i=0; i<nt; i++)
total[t[i]]++;
int L = 0, R;
int minL = 0; //gist2
int count = 0;
int min_win_len = INT_MAX;
for(R=0; R<ns; R++) { // gist0, a big for loop
if(total[s[R]] == 0) continue;
else sofar[s[R]]++;
if(sofar[s[R]] <= total[s[R]]) // gist1, <= not <
count++;
if(count == nt) { // POS1
while(true) {
char c = s[L];
if(total[c] == 0) { L++; }
else if(sofar[c] > total[c]) {
sofar[c]--;
L++;
}
else break;
}
if(R - L + 1 < min_win_len) { // this judge should be inside POS1
min_win_len = R - L + 1;
minL = L;
}
}
}
string res;
if(count == nt) // gist3, cannot forget this.
res = s.substr(minL, min_win_len); // gist4, start from "minL" not "L"
return res;
}
int main() {
string s = "abdccdedca";
cout << find_minimum_window(s, "acd");
}
Erlang [Courtesy : wardbekker]
-module(leetcode).
-export([min_window/0]).
%% Given a string S and a string T, find the minimum window in S which will contain all the characters in T in complexity O(n).
%% For example,
%% S = "ADOBECODEBANC"
%% T = "ABC"
%% Minimum window is "BANC".
%% Note:
%% If there is no such window in S that covers all characters in T, return the emtpy string "".
%% If there are multiple such windows, you are guaranteed that there will always be only one unique minimum window in S.
min_window() ->
"eca" = min_window("cabeca", "cae"),
"eca" = min_window("cfabeca", "cae"),
"aec" = min_window("cabefgecdaecf", "cae"),
"cwae" = min_window("cabwefgewcwaefcf", "cae"),
"BANC" = min_window("ADOBECODEBANC", "ABC"),
ok.
min_window(T, S) ->
min_window(T, S, []).
min_window([], _T, MinWindow) ->
MinWindow;
min_window([H | Rest], T, MinWindow) ->
NewMinWindow = case lists:member(H, T) of
true ->
MinWindowFound = fullfill_window(Rest, lists:delete(H, T), [H]),
case length(MinWindow) == 0 orelse (length(MinWindow) > length(MinWindowFound)
andalso length(MinWindowFound) > 0) of
true ->
MinWindowFound;
false ->
MinWindow
end;
false ->
MinWindow
end,
min_window(Rest, T, NewMinWindow).
fullfill_window(_, [], Acc) ->
%% window completed
Acc;
fullfill_window([], _T, _Acc) ->
%% no window found
"";
fullfill_window([H | Rest], T, Acc) ->
%% completing window
case lists:member(H, T) of
true ->
fullfill_window(Rest, lists:delete(H, T), Acc ++ [H]);
false ->
fullfill_window(Rest, T, Acc ++ [H])
end.
REF:
http://articles.leetcode.com/finding-minimum-window-in-s-which/#comment-511216
http://www.mif.vu.lt/~valdas/ALGORITMAI/LITERATURA/Cormen/Cormen.pdf
Please have a look at this as well:
//-----------------------------------------------------------------------
bool IsInSet(char ch, char* cSet)
{
char* cSetptr = cSet;
int index = 0;
while (*(cSet+ index) != '\0')
{
if(ch == *(cSet+ index))
{
return true;
}
++index;
}
return false;
}
void removeChar(char ch, char* cSet)
{
bool bShift = false;
int index = 0;
while (*(cSet + index) != '\0')
{
if( (ch == *(cSet + index)) || bShift)
{
*(cSet + index) = *(cSet + index + 1);
bShift = true;
}
++index;
}
}
typedef struct subStr
{
short iStart;
short iEnd;
short szStr;
}ss;
char* subStringSmallest(char* testStr, char* cSet)
{
char* subString = NULL;
int iSzSet = strlen(cSet) + 1;
int iSzString = strlen(testStr)+ 1;
char* cSetBackUp = new char[iSzSet];
memcpy((void*)cSetBackUp, (void*)cSet, iSzSet);
int iStartIndx = -1;
int iEndIndx = -1;
int iIndexStartNext = -1;
std::vector<ss> subStrVec;
int index = 0;
while( *(testStr+index) != '\0' )
{
if (IsInSet(*(testStr+index), cSetBackUp))
{
removeChar(*(testStr+index), cSetBackUp);
if(iStartIndx < 0)
{
iStartIndx = index;
}
else if( iIndexStartNext < 0)
iIndexStartNext = index;
else
;
if (strlen(cSetBackUp) == 0 )
{
iEndIndx = index;
if( iIndexStartNext == -1)
break;
else
{
index = iIndexStartNext;
ss stemp = {iStartIndx, iEndIndx, (iEndIndx-iStartIndx + 1)};
subStrVec.push_back(stemp);
iStartIndx = iEndIndx = iIndexStartNext = -1;
memcpy((void*)cSetBackUp, (void*)cSet, iSzSet);
continue;
}
}
}
else
{
if (IsInSet(*(testStr+index), cSet))
{
if(iIndexStartNext < 0)
iIndexStartNext = index;
}
}
++index;
}
int indexSmallest = 0;
for(int indexVec = 0; indexVec < subStrVec.size(); ++indexVec)
{
if(subStrVec[indexSmallest].szStr > subStrVec[indexVec].szStr)
indexSmallest = indexVec;
}
subString = new char[(subStrVec[indexSmallest].szStr) + 1];
memcpy((void*)subString, (void*)(testStr+ subStrVec[indexSmallest].iStart), subStrVec[indexSmallest].szStr);
memset((void*)(subString + subStrVec[indexSmallest].szStr), 0, 1);
delete[] cSetBackUp;
return subString;
}
//--------------------------------------------------------------------
Edit: apparently there's an O(n) algorithm (cf. algorithmist's answer). Obviously this have this will beat the [naive] baseline described below!
Too bad I gotta go... I'm a bit suspicious that we can get O(n). I'll check in tomorrow to see the winner ;-) Have fun!
Tentative algorithm:
The general idea is to sequentially try and use a character from str2 found in str1 as the start of a search (in either/both directions) of all the other letters of str2. By keeping a "length of best match so far" value, we can abort searches when they exceed this. Other heuristics can probably be used to further abort suboptimal (so far) solutions. The choice of the order of the starting letters in str1 matters much; it is suggested to start with the letter(s) of str1 which have the lowest count and to try with the other letters, of an increasing count, in subsequent attempts.
[loose pseudo-code]
- get count for each letter/character in str1 (number of As, Bs etc.)
- get count for each letter in str2
- minLen = length(str1) + 1 (the +1 indicates you're not sure all chars of
str2 are in str1)
- Starting with the letter from string2 which is found the least in string1,
look for other letters of Str2, in either direction of str1, until you've
found them all (or not, at which case response = impossible => done!).
set x = length(corresponding substring of str1).
- if (x < minLen),
set minlen = x,
also memorize the start/len of the str1 substring.
- continue trying with other letters of str1 (going the up the frequency
list in str1), but abort search as soon as length(substring of strl)
reaches or exceed minLen.
We can find a few other heuristics that would allow aborting a
particular search, based on [pre-calculated ?] distance between a given
letter in str1 and some (all?) of the letters in str2.
- the overall search terminates when minLen = length(str2) or when
we've used all letters of str1 (which match one letter of str2)
as a starting point for the search
Here is Java implementation
public static String shortestSubstrContainingAllChars(String input, String target) {
int needToFind[] = new int[256];
int hasFound[] = new int[256];
int totalCharCount = 0;
String result = null;
char[] targetCharArray = target.toCharArray();
for (int i = 0; i < targetCharArray.length; i++) {
needToFind[targetCharArray[i]]++;
}
char[] inputCharArray = input.toCharArray();
for (int begin = 0, end = 0; end < inputCharArray.length; end++) {
if (needToFind[inputCharArray[end]] == 0) {
continue;
}
hasFound[inputCharArray[end]]++;
if (hasFound[inputCharArray[end]] <= needToFind[inputCharArray[end]]) {
totalCharCount ++;
}
if (totalCharCount == target.length()) {
while (needToFind[inputCharArray[begin]] == 0
|| hasFound[inputCharArray[begin]] > needToFind[inputCharArray[begin]]) {
if (hasFound[inputCharArray[begin]] > needToFind[inputCharArray[begin]]) {
hasFound[inputCharArray[begin]]--;
}
begin++;
}
String substring = input.substring(begin, end + 1);
if (result == null || result.length() > substring.length()) {
result = substring;
}
}
}
return result;
}
Here is the Junit Test
#Test
public void shortestSubstringContainingAllCharsTest() {
String result = StringUtil.shortestSubstrContainingAllChars("acbbaca", "aba");
assertThat(result, equalTo("baca"));
result = StringUtil.shortestSubstrContainingAllChars("acbbADOBECODEBANCaca", "ABC");
assertThat(result, equalTo("BANC"));
result = StringUtil.shortestSubstrContainingAllChars("this is a test string", "tist");
assertThat(result, equalTo("t stri"));
}
//[ShortestSubstring.java][1]
public class ShortestSubstring {
public static void main(String[] args) {
String input1 = "My name is Fran";
String input2 = "rim";
System.out.println(getShortestSubstring(input1, input2));
}
private static String getShortestSubstring(String mainString, String toBeSearched) {
int mainStringLength = mainString.length();
int toBeSearchedLength = toBeSearched.length();
if (toBeSearchedLength > mainStringLength) {
throw new IllegalArgumentException("search string cannot be larger than main string");
}
for (int j = 0; j < mainStringLength; j++) {
for (int i = 0; i <= mainStringLength - toBeSearchedLength; i++) {
String substring = mainString.substring(i, i + toBeSearchedLength);
if (checkIfMatchFound(substring, toBeSearched)) {
return substring;
}
}
toBeSearchedLength++;
}
return null;
}
private static boolean checkIfMatchFound(String substring, String toBeSearched) {
char[] charArraySubstring = substring.toCharArray();
char[] charArrayToBeSearched = toBeSearched.toCharArray();
int count = 0;
for (int i = 0; i < charArraySubstring.length; i++) {
for (int j = 0; j < charArrayToBeSearched.length; j++) {
if (String.valueOf(charArraySubstring[i]).equalsIgnoreCase(String.valueOf(charArrayToBeSearched[j]))) {
count++;
}
}
}
return count == charArrayToBeSearched.length;
}
}
This is an approach using prime numbers to avoid one loop, and replace it with multiplications. Several other minor optimizations can be made.
Assign a unique prime number to any of the characters that you want to find, and 1 to the uninteresting characters.
Find the product of a matching string by multiplying the prime number with the number of occurrences it should have. Now this product can only be found if the same prime factors are used.
Search the string from the beginning, multiplying the respective prime number as you move into a running product.
If the number is greater than the correct sum, remove the first character and divide its prime number out of your running product.
If the number is less than the correct sum, include the next character and multiply it into your running product.
If the number is the same as the correct sum you have found a match, slide beginning and end to next character and continue searching for other matches.
Decide which of the matches is the shortest.
Gist
charcount = { 'a': 3, 'b' : 1 };
str = "kjhdfsbabasdadaaaaasdkaaajbajerhhayeom"
def find (c, s):
Ns = len (s)
C = list (c.keys ())
D = list (c.values ())
# prime numbers assigned to the first 25 chars
prmsi = [ 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89 , 97]
# primes used in the key, all other set to 1
prms = []
Cord = [ord(c) - ord('a') for c in C]
for e,p in enumerate(prmsi):
if e in Cord:
prms.append (p)
else:
prms.append (1)
# Product of match
T = 1
for c,d in zip(C,D):
p = prms[ord (c) - ord('a')]
T *= p**d
print ("T=", T)
t = 1 # product of current string
f = 0
i = 0
matches = []
mi = 0
mn = Ns
mm = 0
while i < Ns:
k = prms[ord(s[i]) - ord ('a')]
t *= k
print ("testing:", s[f:i+1])
if (t > T):
# included too many chars: move start
t /= prms[ord(s[f]) - ord('a')] # remove first char, usually division by 1
f += 1 # increment start position
t /= k # will be retested, could be replaced with bool
elif t == T:
# found match
print ("FOUND match:", s[f:i+1])
matches.append (s[f:i+1])
if (i - f) < mn:
mm = mi
mn = i - f
mi += 1
t /= prms[ord(s[f]) - ord('a')] # remove first matching char
# look for next match
i += 1
f += 1
else:
# no match yet, keep searching
i += 1
return (mm, matches)
print (find (charcount, str))
(note: this answer was originally posted to a duplicate question, the original answer is now deleted.)
C# Implementation:
public static Tuple<int, int> FindMinSubstringWindow(string input, string pattern)
{
Tuple<int, int> windowCoords = new Tuple<int, int>(0, input.Length - 1);
int[] patternHist = new int[256];
for (int i = 0; i < pattern.Length; i++)
{
patternHist[pattern[i]]++;
}
int[] inputHist = new int[256];
int minWindowLength = int.MaxValue;
int count = 0;
for (int begin = 0, end = 0; end < input.Length; end++)
{
// Skip what's not in pattern.
if (patternHist[input[end]] == 0)
{
continue;
}
inputHist[input[end]]++;
// Count letters that are in pattern.
if (inputHist[input[end]] <= patternHist[input[end]])
{
count++;
}
// Window found.
if (count == pattern.Length)
{
// Remove extra instances of letters from pattern
// or just letters that aren't part of the pattern
// from the beginning.
while (patternHist[input[begin]] == 0 ||
inputHist[input[begin]] > patternHist[input[begin]])
{
if (inputHist[input[begin]] > patternHist[input[begin]])
{
inputHist[input[begin]]--;
}
begin++;
}
// Current window found.
int windowLength = end - begin + 1;
if (windowLength < minWindowLength)
{
windowCoords = new Tuple<int, int>(begin, end);
minWindowLength = windowLength;
}
}
}
if (count == pattern.Length)
{
return windowCoords;
}
return null;
}
I've implemented it using Python3 at O(N) efficiency:
def get(s, alphabet="abc"):
seen = {}
for c in alphabet:
seen[c] = 0
seen[s[0]] = 1
start = 0
end = 0
shortest_s = 0
shortest_e = 99999
while end + 1 < len(s):
while seen[s[start]] > 1:
seen[s[start]] -= 1
start += 1
# Constant time check:
if sum(seen.values()) == len(alphabet) and all(v == 1 for v in seen.values()) and \
shortest_e - shortest_s > end - start:
shortest_s = start
shortest_e = end
end += 1
seen[s[end]] += 1
return s[shortest_s: shortest_e + 1]
print(get("abbcac")) # Expected to return "bca"
String s = "xyyzyzyx";
String s1 = "xyz";
String finalString ="";
Map<Character,Integer> hm = new HashMap<>();
if(s1!=null && s!=null && s.length()>s1.length()){
for(int i =0;i<s1.length();i++){
if(hm.get(s1.charAt(i))!=null){
int k = hm.get(s1.charAt(i))+1;
hm.put(s1.charAt(i), k);
}else
hm.put(s1.charAt(i), 1);
}
Map<Character,Integer> t = new HashMap<>();
int start =-1;
for(int j=0;j<s.length();j++){
if(hm.get(s.charAt(j))!=null){
if(t.get(s.charAt(j))!=null){
if(t.get(s.charAt(j))!=hm.get(s.charAt(j))){
int k = t.get(s.charAt(j))+1;
t.put(s.charAt(j), k);
}
}else{
t.put(s.charAt(j), 1);
if(start==-1){
if(j+s1.length()>s.length()){
break;
}
start = j;
}
}
if(hm.equals(t)){
t = new HashMap<>();
if(finalString.length()<s.substring(start,j+1).length());
{
finalString=s.substring(start,j+1);
}
j=start;
start=-1;
}
}
}
JavaScript solution in bruteforce way:
function shortestSubStringOfUniqueChars(s){
var uniqueArr = [];
for(let i=0; i<s.length; i++){
if(uniqueArr.indexOf(s.charAt(i)) <0){
uniqueArr.push(s.charAt(i));
}
}
let windoww = uniqueArr.length;
while(windoww < s.length){
for(let i=0; i<s.length - windoww; i++){
let match = true;
let tempArr = [];
for(let j=0; j<uniqueArr.length; j++){
if(uniqueArr.indexOf(s.charAt(i+j))<0){
match = false;
break;
}
}
let checkStr
if(match){
checkStr = s.substr(i, windoww);
for(let j=0; j<uniqueArr.length; j++){
if(uniqueArr.indexOf(checkStr.charAt(j))<0){
match = false;
break;
}
}
}
if(match){
return checkStr;
}
}
windoww = windoww + 1;
}
}
console.log(shortestSubStringOfUniqueChars("ABA"));
# Python implementation
s = input('Enter the string : ')
s1 = input('Enter the substring to search : ')
l = [] # List to record all the matching combinations
check = all([char in s for char in s1])
if check == True:
for i in range(len(s1),len(s)+1) :
for j in range(0,i+len(s1)+2):
if (i+j) < len(s)+1:
cnt = 0
b = all([char in s[j:i+j] for char in s1])
if (b == True) :
l.append(s[j:i+j])
print('The smallest substring containing',s1,'is',l[0])
else:
print('Please enter a valid substring')
Java code for the approach discussed above:
private static Map<Character, Integer> frequency;
private static Set<Character> charsCovered;
private static Map<Character, Integer> encountered;
/**
* To set the first match index as an intial start point
*/
private static boolean hasStarted = false;
private static int currentStartIndex = 0;
private static int finalStartIndex = 0;
private static int finalEndIndex = 0;
private static int minLen = Integer.MAX_VALUE;
private static int currentLen = 0;
/**
* Whether we have already found the match and now looking for other
* alternatives.
*/
private static boolean isFound = false;
private static char currentChar;
public static String findSmallestSubStringWithAllChars(String big, String small) {
if (null == big || null == small || big.isEmpty() || small.isEmpty()) {
return null;
}
frequency = new HashMap<Character, Integer>();
instantiateFrequencyMap(small);
charsCovered = new HashSet<Character>();
int charsToBeCovered = frequency.size();
encountered = new HashMap<Character, Integer>();
for (int i = 0; i < big.length(); i++) {
currentChar = big.charAt(i);
if (frequency.containsKey(currentChar) && !isFound) {
if (!hasStarted && !isFound) {
hasStarted = true;
currentStartIndex = i;
}
updateEncounteredMapAndCharsCoveredSet(currentChar);
if (charsCovered.size() == charsToBeCovered) {
currentLen = i - currentStartIndex;
isFound = true;
updateMinLength(i);
}
} else if (frequency.containsKey(currentChar) && isFound) {
updateEncounteredMapAndCharsCoveredSet(currentChar);
if (currentChar == big.charAt(currentStartIndex)) {
encountered.put(currentChar, encountered.get(currentChar) - 1);
currentStartIndex++;
while (currentStartIndex < i) {
if (encountered.containsKey(big.charAt(currentStartIndex))
&& encountered.get(big.charAt(currentStartIndex)) > frequency.get(big
.charAt(currentStartIndex))) {
encountered.put(big.charAt(currentStartIndex),
encountered.get(big.charAt(currentStartIndex)) - 1);
} else if (encountered.containsKey(big.charAt(currentStartIndex))) {
break;
}
currentStartIndex++;
}
}
currentLen = i - currentStartIndex;
updateMinLength(i);
}
}
System.out.println("start: " + finalStartIndex + " finalEnd : " + finalEndIndex);
return big.substring(finalStartIndex, finalEndIndex + 1);
}
private static void updateMinLength(int index) {
if (minLen > currentLen) {
minLen = currentLen;
finalStartIndex = currentStartIndex;
finalEndIndex = index;
}
}
private static void updateEncounteredMapAndCharsCoveredSet(Character currentChar) {
if (encountered.containsKey(currentChar)) {
encountered.put(currentChar, encountered.get(currentChar) + 1);
} else {
encountered.put(currentChar, 1);
}
if (encountered.get(currentChar) >= frequency.get(currentChar)) {
charsCovered.add(currentChar);
}
}
private static void instantiateFrequencyMap(String str) {
for (char c : str.toCharArray()) {
if (frequency.containsKey(c)) {
frequency.put(c, frequency.get(c) + 1);
} else {
frequency.put(c, 1);
}
}
}
public static void main(String[] args) {
String big = "this is a test string";
String small = "tist";
System.out.println("len: " + big.length());
System.out.println(findSmallestSubStringWithAllChars(big, small));
}
def minimum_window(s, t, min_length = 100000):
d = {}
for x in t:
if x in d:
d[x]+= 1
else:
d[x] = 1
tot = sum([y for x,y in d.iteritems()])
l = []
ind = 0
for i,x in enumerate(s):
if ind == 1:
l = l + [x]
if x in d:
tot-=1
if not l:
ind = 1
l = [x]
if tot == 0:
if len(l)<min_length:
min_length = len(l)
min_length = minimum_window(s[i+1:], t, min_length)
return min_length
l_s = "ADOBECODEBANC"
t_s = "ABC"
min_length = minimum_window(l_s, t_s)
if min_length == 100000:
print "Not found"
else:
print min_length

Resources