Source: https://www.geeksforgeeks.org/number-substrings-count-character-k/
Given a string and an integer k, find number of substrings in which all the different characters occurs exactly k times.
Looking for a solution in O(n), using two pointers/sliding window approach. I'm able to find only longest substrings satisfying this criteria but not substrings within that long substring.
For ex: ababbaba, k = 2
My solution finds abab, ababba etc, but not bb within ababba.
Can someone help me with the logic?
If you could edit your question to include your solution code, I'd be happy to help you with that.
For now I'm sharing my solution code (in java) which runs in O(n2). I've added enough comments to make the code self explanatory. Nonetheless the logic for the solution is as follows:
As you correctly pointed out, the problem can be solved using sliding window approach (with variable window size). The solution below considers all possible sub-strings, using nested for loops for setting start and end indices. For each sub-string, we check if every element in the sub-string occurs exactly k times.
To avoid recalculating the count for every sub-string, we maintain the count in a map, and keep putting new elements in the map as we increment the end index (slide the window). This ensures that our solution runs in O(n2) and not O(n3).
To further improve efficiency, we only check the count of individual elements if the sub-string's size matches our requirement. e.g. for n unique elements (keys in the map), the size of required sub-string would be n*k. If the sub-string's size doesn't match this value, there's no need to check how many times the individual characters occur.
import java.util.*;
/**
* Java program to count the number of perfect substrings in a given string. A
* substring is considered perfect if all the elements within the substring
* occur exactly k number of times.
*
* #author Codextor
*/
public class PerfectSubstring {
public static void main(String[] args) {
String s = "aabbcc";
int k = 2;
System.out.println(perfectSubstring(s, k));
s = "aabccc";
k = 2;
System.out.println(perfectSubstring(s, k));
}
/**
* Returns the number of perfect substrings in the given string for the
* specified value of k
*
* #param s The string to check for perfect substrings
* #param k The number of times every element should occur within the substring
* #return int The number of perfect substrings
*/
public static int perfectSubstring(String s, int k) {
int finalCount = 0;
/*
* Set the initial starting index for the subarray as 0, and increment it with
* every iteration, till the last index of the string is reached.
*/
for (int start = 0; start < s.length(); start++) {
/*
* Use a HashMap to store the count of every character in the subarray. We'll
* start with an empty map everytime we update the starting index
*/
Map<Character, Integer> frequencyMap = new HashMap<>();
/*
* Set the initial ending index for the subarray equal to the starting index and
* increment it with every iteration, till the last index of the string is
* reached.
*/
for (int end = start; end < s.length(); end++) {
/*
* Get the count of the character at end index and increase it by 1. If the
* character is not present in the map, use 0 as the default count
*/
char c = s.charAt(end);
int count = frequencyMap.getOrDefault(c, 0);
frequencyMap.put(c, count + 1);
/*
* Check if the length of the subarray equals the desired length. The desired
* length is the number of unique characters we've seen so far (size of the map)
* multilied by k (the number of times each character should occur). If the
* length is as per requiremets, check if each element occurs exactly k times
*/
if (frequencyMap.size() * k == (end - start + 1)) {
if (check(frequencyMap, k)) {
finalCount++;
}
}
}
}
return finalCount;
}
/**
* Returns true if every value in the map is equal to k
*
* #param map The map whose values are to be checked
* #param k The required value for keys in the map
* #return true if every value in the map is equal to k
*/
public static boolean check(Map<Character, Integer> map, int k) {
/*
* Iterate through all the values (frequency of each character), comparing them
* with k
*/
for (Integer i : map.values()) {
if (i != k) {
return false;
}
}
return true;
}
}
For a given value k and a string s of length n with alphabet size D, we can solve the problem in O(n*D).
We need to find sub-strings with each character having exactly k-occurences
Minimum size of such sub-string = k (when only one character is there)
Maximum size of such sub-string = k*D (when all characters are there)
So we will check for all sub-strings of sizes in range [k, k*D]
from collections import defaultdict
ALPHABET_SIZE = 26
def check(count, k):
for v in count.values():
if v != k and v != 0:
return False
return True
def countSubstrings(s, k):
total = 0
for d in range(1, ALPHABET_SIZE + 1):
size = d * k
count = defaultdict(int)
l = r = 0
while r < len(s):
count[s[r]] += 1
# if window size exceed `size`, then fix left pointer and count
if r - l + 1 > size:
count[s[l]] -= 1
l += 1
# if window size is adequate then check and update count
if r - l + 1 == size:
total += check(count, k)
r += 1
return total
def main():
string1 = "aabbcc"
k1 = 2
print(countSubstrings(string1, k1)) # output: 6
string2 = "bacabcc"
k2 = 2
print(countSubstrings(string2, k2)) # output: 2
main()
I can't give you a O(n) solution but I can give you a O(k*n) solution (better than O(n^2) mentioned in the geeksforgeeks page).
The idea is that max no. elements are 26. So, we don't have to check all the substrings, we just have to check substrings with length<=26*k (26*k length is the case when all elements will occur k times. If length is more than that then at least one element will have to occur at least k+1 times). Also, we need to check only those substrings whose lengths are a factor of k.
So, check all 26*k*l possible substrings! (assuming k<<l). Thus, solution is O(k*n) but with a bit high constant (26).
There are few observation which will help optimize the solution
Notice that, you don't need to check every possible size substrings, you just need to check substrings of size k, 2k, 3k so on up to ALPHABET_SIZE * k (remember Pigeonhole principle)
You can pre-calculate frequency of alphabets till certain index from any end and later you can use it to find the frequency of alphabets between any two indexes in O(26)
C++ Implementation of your problem in O(n * ALPHABET_SIZE^2)
I have added comments and diagrams to help you out in understanding code quickly
diagram 1
diagram 2
#include <bits/stdc++.h>
#define ll long long
#define ALPHABET_SIZE 26
using namespace std;
int main()
{
ios_base::sync_with_stdio(false);
cin.tie(NULL);
cout.tie(NULL);
int n, k;
string s;
cin >> n >> k;
cin >> s;
ll cnt = 0;
/**
* It will be storing frequency of each alphabets
**/
vector<int> f(ALPHABET_SIZE, 0);
/**
* It will store alphabets frequency till that index
**/
vector<vector<int>> v;
v.push_back(f);
/**
* Scan array from left to right and calculate the frequency of each alphabets till that index
* Now push that frequency array in v
* This loop will run for n times
**/
for (int i = 1; i <= n; i++)
{
f[s[i - 1] - 'a']++;
v.push_back(f);
}
/**
* This loop will run for k times
**/
for (int i = 0; i < k; i++)
{
/**
* start is the lower bound (left end from where window will start sliding)
**/
int start = i;
/**
* end is the upper bound (right end till where window will be sliding)
**/
int end = (n / k) * k + i;
if (end > n)
{
end -= k;
}
/**
* This loop will run for n/k times
**/
for (int j = start; j <= end; j += k)
{
/**
* This is a ALPHABET_SIZE * k size window
* It will be sliding between start and end (inclusive)
* This loop will run for at most ALPHABET_SIZE times
**/
for (int d = j + k; d <= min(ALPHABET_SIZE * k + j, end); d += k)
{
/**
* A flag to check weather substring is valid or not
**/
bool flag = true;
/**
* Check if frequencies at two different indexes differ only by zero or k (element wise)
* Note that frequencies at two different index can't be same
* This loop will run for ALPHABET_SIZE times
**/
for (int idx = 0; idx < ALPHABET_SIZE; idx++)
{
if (abs(v[j][idx] - v[d][idx]) != k && abs(v[j][idx] - v[d][idx]) != 0)
{
flag = false;
}
}
/**
* Increase the total count if flag is true
**/
if (flag)
{
cnt++;
}
}
}
}
/**
* Print the total count
**/
cout << cnt;
return 0;
}
if you want solution in simple way and not worried about time complexity. Here is the solution.
public class PerfecSubstring {
public static void main(String[] args) {
String st = "aabbcc";
int k = 2;
System.out.println(perfect(st, k));
}
public static int perfect(String st, int k) {
int count = 0;
for (int i = 0; i < st.length(); i++) {
for (int j = st.length(); j > i; j--) {
String sub = st.substring(i, j);
if (sub.length() > k && check(sub, k)) {
System.out.println(sub);
count++;
}
}
}
return count;
}
public static boolean check(String st, int k) {
Map<Character, Integer> map = new HashMap<>();
for (int i = 0; i < st.length(); i++) {
Character c = st.charAt(i);
map.put(c, map.getOrDefault(c, 0) + 1);
}
return map.values().iterator().next() == k && new HashSet<>(map.values()).size() == 1;
}
}
Here is an answer I did in C#, with O(n^2) complexity. I probably should have used a helper method to avoid having a large chunk of code, but it does the job. :)
namespace CodingChallenges
{
using System;
using System.Collections.Generic;
class Solution
{
// Returns the number of perfect substrings of repeating character value 'num'.
public static int PerfectSubstring(string str, int num)
{
int count = 0;
for (int startOfSliceIndex = 0; startOfSliceIndex < str.Length - 1; startOfSliceIndex++)
{
for (int endofSliceIndex = startOfSliceIndex + 1; endofSliceIndex < str.Length; endofSliceIndex++)
{
Dictionary<char, int> dict = new Dictionary<char, int>();
string slice = str.Substring(startOfSliceIndex, (endofSliceIndex - startOfSliceIndex) + 1);
for (int i = 0; i < slice.Length; i++)
{
if (dict.ContainsKey(slice[i]))
{
dict[slice[i]]++;
}
else
{
dict[slice[i]] = 1;
}
}
bool isPerfect = true;
foreach (var entry in dict)
{
if (entry.Value != num)
{
isPerfect = false;
}
}
if (isPerfect)
{
Console.WriteLine(slice);
count++;
}
}
}
if (count == 1)
{
Console.WriteLine(count + " perfect substring.");
}
else
{
Console.WriteLine(count + " perfect substrings.");
}
return count;
}
public static void Main(string[] args)
{
string test = "1102021222";
PerfectSubstring(test, 2);
}
}
}
This solution works in O(n*D)
I think it can be upgraded to be O(n) by replacing the hash_map(frozenset(head_sum_mod_k.items())) with a map implementation that updates its hash rather than recalculating it -
this can be done because only one entry of head_sum_mod_k is changed per iteration.
from copy import deepcopy
def countKPerfectSequences(string:str, k):
print(f'Processing \'{string}\', k={k}')
# init running sum
head_sum = {char: 0 for char in string}
tail_sum = deepcopy(head_sum)
tail_position = 0
# to match both 0 & k sequence lengths, test for mod k == 0
head_sum_mod_k = deepcopy(head_sum)
occurrence_positions = {frozenset(head_sum_mod_k.items()): [0]}
# iterate over string
perfect_counter = 0
for i, val in enumerate(string):
head_sum[val] += 1
head_sum_mod_k[val] = head_sum[val] % k
while head_sum[val] - tail_sum[val] > k:
# update tail to avoid longer than k sequnces
tail_sum[string[tail_position]] += 1
tail_position += 1
# print(f'str[{tail_position}..{i}]=\'{string[tail_position:i+1]}\', head_sum_mod_k={head_sum_mod_k} occurrence_positions={occurrence_positions}')
# get matching sequences between head and tail
indices = list(filter(lambda i: i >= tail_position, occurrence_positions.get(frozenset(head_sum_mod_k.items()), [])))
# for start in indices:
# print(f'{string[start:i+1]}')
perfect_counter += len(indices)
# add head
indices.append(i+1)
occurrence_positions[frozenset(head_sum_mod_k.items())] = indices
return perfect_counter
I am writing a piece of code in c# to retreive number of tablets for a given dosage. For example, if a Dosage is 20 mg of DrugA (if DrugA comes in 10mg, 5mg and 2mg tablets) then the code would return (2). If Dosage is 15 then the code would return (1 & 1). If a dosage is 3 then Invalid Dosage message is returned. The code must use the highest denominations first i.e. 10mg tablets and then 5mg tablets for the remainder and so on. I am using recursive function (GetDispenseBreakdownForSingleDosage) to acheive the above functionality. My code is working fine for most of the scenarios that I tested. The one scenario that it is incorrectly returning Invalid Dosage is for 8mg dosage. The code should return (4) since 2mg tablets is a valid option. I have given my code below. My questions are:
1) Is there a better way of acheiving my objective than using my code.
2) What changes should I make to avoid the trap of 8mg as invalid dosage. It is returning it invalid because code divides 8 with 5 during second recursive call and remainder becomes 3, on third recursive call 3 is not divisible by 2 so code returns invalid dosage.
My code is given below:
public string GetDispenseBreakdown(PrescriptionsBLL Prescription, double[] IndexAndNonIndexDosageForBreakdown)
{
int[] NoOfTablets = new int[Prescription.SelectedDrug.PrescriptionsDrugWeights.Count];
for (int Index = 1; Index <= IndexAndNonIndexDosageForBreakdown.Length; Index++)
{
GetDispenseBreakdownForSingleDosage(Prescription, ref NoOfTablets, IndexAndNonIndexDosageForBreakdown[(Index - 1)], Prescription.SelectedDrug.PrescriptionsDrugWeights[0].Weight, 1);//assuming that index 0 will always contain the highest weight i.e. if a drug has 2, 5, 10 as drug weights then index 0 should always contain 10 as we are sorting by Desc
}
return ConvertNumberOfTabletsIntoString(NoOfTablets);
}
public void GetDispenseBreakdownForSingleDosage(PrescriptionsBLL Prescription, ref int[] NoOfTablets, double Dosage, double Weight, int WeightCount)
{
int LoopIteration;
string TempLoopIteration = (Dosage / Weight).ToString();
if (TempLoopIteration.Contains("."))
LoopIteration = (int)Math.Floor(Dosage / Weight);
else
LoopIteration = int.Parse(TempLoopIteration);
double TempDosage = Weight * LoopIteration;
int WeightTablets = LoopIteration;
double RemainingDosage = Math.Round((Dosage - TempDosage), 2);
NoOfTablets[(WeightCount - 1)] = NoOfTablets[(WeightCount - 1)] + WeightTablets;
if (WeightCount == Prescription.SelectedDrug.PrescriptionsDrugWeights.Count && RemainingDosage > 0.0)
{
NoOfTablets[0] = -99999;//Invalid Dosage
return;
}
if (LoopIteration == 0 && Dosage > 0.0 && WeightCount == Prescription.SelectedDrug.PrescriptionsDrugWeights.Count)
{
NoOfTablets[0] = -99999;//Invalid Dosage
return;
}
if (WeightCount == Prescription.SelectedDrug.PrescriptionsDrugWeights.Count)
return;
GetDispenseBreakdownForSingleDosage(Prescription, ref NoOfTablets, RemainingDosage, Prescription.SelectedDrug.PrescriptionsDrugWeights[WeightCount].Weight, ++WeightCount);
}
public bool IsDosageValid(int[] NoOfTablets)
{
if (NoOfTablets[0] == -99999)
return false;
else
return true;
}
public string ConvertNumberOfTabletsIntoString(int[] NoOfTablets)
{
if (!IsDosageValid(NoOfTablets))
return "Dosage is Invalid";
string DispenseBreakDown = "(";
int ItemsAdded = 0;
for (int Count = 0; Count < NoOfTablets.Length; Count++)
{
if (NoOfTablets[Count] != 0)
{
if (ItemsAdded > 0)
DispenseBreakDown += " & " + NoOfTablets[Count];
else
DispenseBreakDown += NoOfTablets[Count];
ItemsAdded = ItemsAdded + 1;
}
}
DispenseBreakDown += ")";
return DispenseBreakDown;
}
This sounds like a version of the same logic required for coin change.
This site goes through that logic:
http://www.geeksforgeeks.org/dynamic-programming-set-7-coin-change/
You will also need to make a few adjustments:
You'll need to get back the possible results and accept the one that has highest number of larger pills.
You'll need to handle the possibility of no "correct change".
Here is a simple recursive method. Pass it the desired dosage and an empty list:
// Test if 2 floats are "equal", the difference between them
// is less than some predefined value (epsilon)
bool floatIsEqual(float f1, float f2)
{
float epsilon = 0.001f;
return Math.Abs(f1 - f2) <= epsilon;
}
static bool CalcDose(float desired, List<float> list)
{
// Order of array is important. Larger values will be attempted first
float[] sizes = new float[] { 8, 2, .4f, .2f };
// This path isn't working, return
if (desired < sizes[sizes.Length - 1])
{
return false;
}
// Try all combos
for (int i = 0; i < sizes.Length; i++)
{
if (floatIsEqual(desired, sizes[i]))
{
// Final step: perfect match
list.Add(sizes[i]);
return true;
}
if (sizes[i] <= desired)
{
// Attempt recursive call
if (true == CalcDose( desired - sizes[i], list))
{
// Success
list.Add(sizes[i]);
return true;
}
else break;
}
}
return false;
}
I have recently come across an interesting question on strings. Suppose you are given following:
Input string1: "this is a test string"
Input string2: "tist"
Output string: "t stri"
So, given above, how can I approach towards finding smallest substring of string1 that contains all the characters from string 2?
To see more details including working code, check my blog post at:
http://www.leetcode.com/2010/11/finding-minimum-window-in-s-which.html
To help illustrate this approach, I use an example: string1 = "acbbaca" and string2 = "aba". Here, we also use the term "window", which means a contiguous block of characters from string1 (could be interchanged with the term substring).
i) string1 = "acbbaca" and string2 = "aba".
ii) The first minimum window is found.
Notice that we cannot advance begin
pointer as hasFound['a'] ==
needToFind['a'] == 2. Advancing would
mean breaking the constraint.
iii) The second window is found. begin
pointer still points to the first
element 'a'. hasFound['a'] (3) is
greater than needToFind['a'] (2). We
decrement hasFound['a'] by one and
advance begin pointer to the right.
iv) We skip 'c' since it is not found
in string2. Begin pointer now points to 'b'.
hasFound['b'] (2) is greater than
needToFind['b'] (1). We decrement
hasFound['b'] by one and advance begin
pointer to the right.
v) Begin pointer now points to the
next 'b'. hasFound['b'] (1) is equal
to needToFind['b'] (1). We stop
immediately and this is our newly
found minimum window.
The idea is mainly based on the help of two pointers (begin and end position of the window) and two tables (needToFind and hasFound) while traversing string1. needToFind stores the total count of a character in string2 and hasFound stores the total count of a character met so far. We also use a count variable to store the total characters in string2 that's met so far (not counting characters where hasFound[x] exceeds needToFind[x]). When count equals string2's length, we know a valid window is found.
Each time we advance the end pointer (pointing to an element x), we increment hasFound[x] by one. We also increment count by one if hasFound[x] is less than or equal to needToFind[x]. Why? When the constraint is met (that is, count equals to string2's size), we immediately advance begin pointer as far right as possible while maintaining the constraint.
How do we check if it is maintaining the constraint? Assume that begin points to an element x, we check if hasFound[x] is greater than needToFind[x]. If it is, we can decrement hasFound[x] by one and advancing begin pointer without breaking the constraint. On the other hand, if it is not, we stop immediately as advancing begin pointer breaks the window constraint.
Finally, we check if the minimum window length is less than the current minimum. Update the current minimum if a new minimum is found.
Essentially, the algorithm finds the first window that satisfies the constraint, then continue maintaining the constraint throughout.
You can do a histogram sweep in O(N+M) time and O(1) space where N is the number of characters in the first string and M is the number of characters in the second.
It works like this:
Make a histogram of the second string's characters (key operation is hist2[ s2[i] ]++).
Make a cumulative histogram of the first string's characters until that histogram contains every character that the second string's histogram contains (which I will call "the histogram condition").
Then move forwards on the first string, subtracting from the histogram, until it fails to meet the histogram condition. Mark that bit of the first string (before the final move) as your tentative substring.
Move the front of the substring forwards again until you meet the histogram condition again. Move the end forwards until it fails again. If this is a shorter substring than the first, mark that as your tentative substring.
Repeat until you've passed through the entire first string.
The marked substring is your answer.
Note that by varying the check you use on the histogram condition, you can choose either to have the same set of characters as the second string, or at least as many characters of each type. (Its just the difference between a[i]>0 && b[i]>0 and a[i]>=b[i].)
You can speed up the histogram checks if you keep a track of which condition is not satisfied when you're trying to satisfy it, and checking only the thing that you decrement when you're trying to break it. (On the initial buildup, you count how many items you've satisfied, and increment that count every time you add a new character that takes the condition from false to true.)
Here's an O(n) solution. The basic idea is simple: for each starting index, find the least ending index such that the substring contains all of the necessary letters. The trick is that the least ending index increases over the course of the function, so with a little data structure support, we consider each character at most twice.
In Python:
from collections import defaultdict
def smallest(s1, s2):
assert s2 != ''
d = defaultdict(int)
nneg = [0] # number of negative entries in d
def incr(c):
d[c] += 1
if d[c] == 0:
nneg[0] -= 1
def decr(c):
if d[c] == 0:
nneg[0] += 1
d[c] -= 1
for c in s2:
decr(c)
minlen = len(s1) + 1
j = 0
for i in xrange(len(s1)):
while nneg[0] > 0:
if j >= len(s1):
return minlen
incr(s1[j])
j += 1
minlen = min(minlen, j - i)
decr(s1[i])
return minlen
I received the same interview question. I am a C++ candidate but I was in a position to code relatively fast in JAVA.
Java [Courtesy : Sumod Mathilakath]
import java.io.*;
import java.util.*;
class UserMainCode
{
public String GetSubString(String input1,String input2){
// Write code here...
return find(input1, input2);
}
private static boolean containsPatternChar(int[] sCount, int[] pCount) {
for(int i=0;i<256;i++) {
if(pCount[i]>sCount[i])
return false;
}
return true;
}
public static String find(String s, String p) {
if (p.length() > s.length())
return null;
int[] pCount = new int[256];
int[] sCount = new int[256];
// Time: O(p.lenght)
for(int i=0;i<p.length();i++) {
pCount[(int)(p.charAt(i))]++;
sCount[(int)(s.charAt(i))]++;
}
int i = 0, j = p.length(), min = Integer.MAX_VALUE;
String res = null;
// Time: O(s.lenght)
while (j < s.length()) {
if (containsPatternChar(sCount, pCount)) {
if ((j - i) < min) {
min = j - i;
res = s.substring(i, j);
// This is the smallest possible substring.
if(min==p.length())
break;
// Reduce the window size.
sCount[(int)(s.charAt(i))]--;
i++;
}
} else {
sCount[(int)(s.charAt(j))]++;
// Increase the window size.
j++;
}
}
System.out.println(res);
return res;
}
}
C++ [Courtesy : sundeepblue]
#include <iostream>
#include <vector>
#include <string>
#include <climits>
using namespace std;
string find_minimum_window(string s, string t) {
if(s.empty() || t.empty()) return;
int ns = s.size(), nt = t.size();
vector<int> total(256, 0);
vector<int> sofar(256, 0);
for(int i=0; i<nt; i++)
total[t[i]]++;
int L = 0, R;
int minL = 0; //gist2
int count = 0;
int min_win_len = INT_MAX;
for(R=0; R<ns; R++) { // gist0, a big for loop
if(total[s[R]] == 0) continue;
else sofar[s[R]]++;
if(sofar[s[R]] <= total[s[R]]) // gist1, <= not <
count++;
if(count == nt) { // POS1
while(true) {
char c = s[L];
if(total[c] == 0) { L++; }
else if(sofar[c] > total[c]) {
sofar[c]--;
L++;
}
else break;
}
if(R - L + 1 < min_win_len) { // this judge should be inside POS1
min_win_len = R - L + 1;
minL = L;
}
}
}
string res;
if(count == nt) // gist3, cannot forget this.
res = s.substr(minL, min_win_len); // gist4, start from "minL" not "L"
return res;
}
int main() {
string s = "abdccdedca";
cout << find_minimum_window(s, "acd");
}
Erlang [Courtesy : wardbekker]
-module(leetcode).
-export([min_window/0]).
%% Given a string S and a string T, find the minimum window in S which will contain all the characters in T in complexity O(n).
%% For example,
%% S = "ADOBECODEBANC"
%% T = "ABC"
%% Minimum window is "BANC".
%% Note:
%% If there is no such window in S that covers all characters in T, return the emtpy string "".
%% If there are multiple such windows, you are guaranteed that there will always be only one unique minimum window in S.
min_window() ->
"eca" = min_window("cabeca", "cae"),
"eca" = min_window("cfabeca", "cae"),
"aec" = min_window("cabefgecdaecf", "cae"),
"cwae" = min_window("cabwefgewcwaefcf", "cae"),
"BANC" = min_window("ADOBECODEBANC", "ABC"),
ok.
min_window(T, S) ->
min_window(T, S, []).
min_window([], _T, MinWindow) ->
MinWindow;
min_window([H | Rest], T, MinWindow) ->
NewMinWindow = case lists:member(H, T) of
true ->
MinWindowFound = fullfill_window(Rest, lists:delete(H, T), [H]),
case length(MinWindow) == 0 orelse (length(MinWindow) > length(MinWindowFound)
andalso length(MinWindowFound) > 0) of
true ->
MinWindowFound;
false ->
MinWindow
end;
false ->
MinWindow
end,
min_window(Rest, T, NewMinWindow).
fullfill_window(_, [], Acc) ->
%% window completed
Acc;
fullfill_window([], _T, _Acc) ->
%% no window found
"";
fullfill_window([H | Rest], T, Acc) ->
%% completing window
case lists:member(H, T) of
true ->
fullfill_window(Rest, lists:delete(H, T), Acc ++ [H]);
false ->
fullfill_window(Rest, T, Acc ++ [H])
end.
REF:
http://articles.leetcode.com/finding-minimum-window-in-s-which/#comment-511216
http://www.mif.vu.lt/~valdas/ALGORITMAI/LITERATURA/Cormen/Cormen.pdf
Please have a look at this as well:
//-----------------------------------------------------------------------
bool IsInSet(char ch, char* cSet)
{
char* cSetptr = cSet;
int index = 0;
while (*(cSet+ index) != '\0')
{
if(ch == *(cSet+ index))
{
return true;
}
++index;
}
return false;
}
void removeChar(char ch, char* cSet)
{
bool bShift = false;
int index = 0;
while (*(cSet + index) != '\0')
{
if( (ch == *(cSet + index)) || bShift)
{
*(cSet + index) = *(cSet + index + 1);
bShift = true;
}
++index;
}
}
typedef struct subStr
{
short iStart;
short iEnd;
short szStr;
}ss;
char* subStringSmallest(char* testStr, char* cSet)
{
char* subString = NULL;
int iSzSet = strlen(cSet) + 1;
int iSzString = strlen(testStr)+ 1;
char* cSetBackUp = new char[iSzSet];
memcpy((void*)cSetBackUp, (void*)cSet, iSzSet);
int iStartIndx = -1;
int iEndIndx = -1;
int iIndexStartNext = -1;
std::vector<ss> subStrVec;
int index = 0;
while( *(testStr+index) != '\0' )
{
if (IsInSet(*(testStr+index), cSetBackUp))
{
removeChar(*(testStr+index), cSetBackUp);
if(iStartIndx < 0)
{
iStartIndx = index;
}
else if( iIndexStartNext < 0)
iIndexStartNext = index;
else
;
if (strlen(cSetBackUp) == 0 )
{
iEndIndx = index;
if( iIndexStartNext == -1)
break;
else
{
index = iIndexStartNext;
ss stemp = {iStartIndx, iEndIndx, (iEndIndx-iStartIndx + 1)};
subStrVec.push_back(stemp);
iStartIndx = iEndIndx = iIndexStartNext = -1;
memcpy((void*)cSetBackUp, (void*)cSet, iSzSet);
continue;
}
}
}
else
{
if (IsInSet(*(testStr+index), cSet))
{
if(iIndexStartNext < 0)
iIndexStartNext = index;
}
}
++index;
}
int indexSmallest = 0;
for(int indexVec = 0; indexVec < subStrVec.size(); ++indexVec)
{
if(subStrVec[indexSmallest].szStr > subStrVec[indexVec].szStr)
indexSmallest = indexVec;
}
subString = new char[(subStrVec[indexSmallest].szStr) + 1];
memcpy((void*)subString, (void*)(testStr+ subStrVec[indexSmallest].iStart), subStrVec[indexSmallest].szStr);
memset((void*)(subString + subStrVec[indexSmallest].szStr), 0, 1);
delete[] cSetBackUp;
return subString;
}
//--------------------------------------------------------------------
Edit: apparently there's an O(n) algorithm (cf. algorithmist's answer). Obviously this have this will beat the [naive] baseline described below!
Too bad I gotta go... I'm a bit suspicious that we can get O(n). I'll check in tomorrow to see the winner ;-) Have fun!
Tentative algorithm:
The general idea is to sequentially try and use a character from str2 found in str1 as the start of a search (in either/both directions) of all the other letters of str2. By keeping a "length of best match so far" value, we can abort searches when they exceed this. Other heuristics can probably be used to further abort suboptimal (so far) solutions. The choice of the order of the starting letters in str1 matters much; it is suggested to start with the letter(s) of str1 which have the lowest count and to try with the other letters, of an increasing count, in subsequent attempts.
[loose pseudo-code]
- get count for each letter/character in str1 (number of As, Bs etc.)
- get count for each letter in str2
- minLen = length(str1) + 1 (the +1 indicates you're not sure all chars of
str2 are in str1)
- Starting with the letter from string2 which is found the least in string1,
look for other letters of Str2, in either direction of str1, until you've
found them all (or not, at which case response = impossible => done!).
set x = length(corresponding substring of str1).
- if (x < minLen),
set minlen = x,
also memorize the start/len of the str1 substring.
- continue trying with other letters of str1 (going the up the frequency
list in str1), but abort search as soon as length(substring of strl)
reaches or exceed minLen.
We can find a few other heuristics that would allow aborting a
particular search, based on [pre-calculated ?] distance between a given
letter in str1 and some (all?) of the letters in str2.
- the overall search terminates when minLen = length(str2) or when
we've used all letters of str1 (which match one letter of str2)
as a starting point for the search
Here is Java implementation
public static String shortestSubstrContainingAllChars(String input, String target) {
int needToFind[] = new int[256];
int hasFound[] = new int[256];
int totalCharCount = 0;
String result = null;
char[] targetCharArray = target.toCharArray();
for (int i = 0; i < targetCharArray.length; i++) {
needToFind[targetCharArray[i]]++;
}
char[] inputCharArray = input.toCharArray();
for (int begin = 0, end = 0; end < inputCharArray.length; end++) {
if (needToFind[inputCharArray[end]] == 0) {
continue;
}
hasFound[inputCharArray[end]]++;
if (hasFound[inputCharArray[end]] <= needToFind[inputCharArray[end]]) {
totalCharCount ++;
}
if (totalCharCount == target.length()) {
while (needToFind[inputCharArray[begin]] == 0
|| hasFound[inputCharArray[begin]] > needToFind[inputCharArray[begin]]) {
if (hasFound[inputCharArray[begin]] > needToFind[inputCharArray[begin]]) {
hasFound[inputCharArray[begin]]--;
}
begin++;
}
String substring = input.substring(begin, end + 1);
if (result == null || result.length() > substring.length()) {
result = substring;
}
}
}
return result;
}
Here is the Junit Test
#Test
public void shortestSubstringContainingAllCharsTest() {
String result = StringUtil.shortestSubstrContainingAllChars("acbbaca", "aba");
assertThat(result, equalTo("baca"));
result = StringUtil.shortestSubstrContainingAllChars("acbbADOBECODEBANCaca", "ABC");
assertThat(result, equalTo("BANC"));
result = StringUtil.shortestSubstrContainingAllChars("this is a test string", "tist");
assertThat(result, equalTo("t stri"));
}
//[ShortestSubstring.java][1]
public class ShortestSubstring {
public static void main(String[] args) {
String input1 = "My name is Fran";
String input2 = "rim";
System.out.println(getShortestSubstring(input1, input2));
}
private static String getShortestSubstring(String mainString, String toBeSearched) {
int mainStringLength = mainString.length();
int toBeSearchedLength = toBeSearched.length();
if (toBeSearchedLength > mainStringLength) {
throw new IllegalArgumentException("search string cannot be larger than main string");
}
for (int j = 0; j < mainStringLength; j++) {
for (int i = 0; i <= mainStringLength - toBeSearchedLength; i++) {
String substring = mainString.substring(i, i + toBeSearchedLength);
if (checkIfMatchFound(substring, toBeSearched)) {
return substring;
}
}
toBeSearchedLength++;
}
return null;
}
private static boolean checkIfMatchFound(String substring, String toBeSearched) {
char[] charArraySubstring = substring.toCharArray();
char[] charArrayToBeSearched = toBeSearched.toCharArray();
int count = 0;
for (int i = 0; i < charArraySubstring.length; i++) {
for (int j = 0; j < charArrayToBeSearched.length; j++) {
if (String.valueOf(charArraySubstring[i]).equalsIgnoreCase(String.valueOf(charArrayToBeSearched[j]))) {
count++;
}
}
}
return count == charArrayToBeSearched.length;
}
}
This is an approach using prime numbers to avoid one loop, and replace it with multiplications. Several other minor optimizations can be made.
Assign a unique prime number to any of the characters that you want to find, and 1 to the uninteresting characters.
Find the product of a matching string by multiplying the prime number with the number of occurrences it should have. Now this product can only be found if the same prime factors are used.
Search the string from the beginning, multiplying the respective prime number as you move into a running product.
If the number is greater than the correct sum, remove the first character and divide its prime number out of your running product.
If the number is less than the correct sum, include the next character and multiply it into your running product.
If the number is the same as the correct sum you have found a match, slide beginning and end to next character and continue searching for other matches.
Decide which of the matches is the shortest.
Gist
charcount = { 'a': 3, 'b' : 1 };
str = "kjhdfsbabasdadaaaaasdkaaajbajerhhayeom"
def find (c, s):
Ns = len (s)
C = list (c.keys ())
D = list (c.values ())
# prime numbers assigned to the first 25 chars
prmsi = [ 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89 , 97]
# primes used in the key, all other set to 1
prms = []
Cord = [ord(c) - ord('a') for c in C]
for e,p in enumerate(prmsi):
if e in Cord:
prms.append (p)
else:
prms.append (1)
# Product of match
T = 1
for c,d in zip(C,D):
p = prms[ord (c) - ord('a')]
T *= p**d
print ("T=", T)
t = 1 # product of current string
f = 0
i = 0
matches = []
mi = 0
mn = Ns
mm = 0
while i < Ns:
k = prms[ord(s[i]) - ord ('a')]
t *= k
print ("testing:", s[f:i+1])
if (t > T):
# included too many chars: move start
t /= prms[ord(s[f]) - ord('a')] # remove first char, usually division by 1
f += 1 # increment start position
t /= k # will be retested, could be replaced with bool
elif t == T:
# found match
print ("FOUND match:", s[f:i+1])
matches.append (s[f:i+1])
if (i - f) < mn:
mm = mi
mn = i - f
mi += 1
t /= prms[ord(s[f]) - ord('a')] # remove first matching char
# look for next match
i += 1
f += 1
else:
# no match yet, keep searching
i += 1
return (mm, matches)
print (find (charcount, str))
(note: this answer was originally posted to a duplicate question, the original answer is now deleted.)
C# Implementation:
public static Tuple<int, int> FindMinSubstringWindow(string input, string pattern)
{
Tuple<int, int> windowCoords = new Tuple<int, int>(0, input.Length - 1);
int[] patternHist = new int[256];
for (int i = 0; i < pattern.Length; i++)
{
patternHist[pattern[i]]++;
}
int[] inputHist = new int[256];
int minWindowLength = int.MaxValue;
int count = 0;
for (int begin = 0, end = 0; end < input.Length; end++)
{
// Skip what's not in pattern.
if (patternHist[input[end]] == 0)
{
continue;
}
inputHist[input[end]]++;
// Count letters that are in pattern.
if (inputHist[input[end]] <= patternHist[input[end]])
{
count++;
}
// Window found.
if (count == pattern.Length)
{
// Remove extra instances of letters from pattern
// or just letters that aren't part of the pattern
// from the beginning.
while (patternHist[input[begin]] == 0 ||
inputHist[input[begin]] > patternHist[input[begin]])
{
if (inputHist[input[begin]] > patternHist[input[begin]])
{
inputHist[input[begin]]--;
}
begin++;
}
// Current window found.
int windowLength = end - begin + 1;
if (windowLength < minWindowLength)
{
windowCoords = new Tuple<int, int>(begin, end);
minWindowLength = windowLength;
}
}
}
if (count == pattern.Length)
{
return windowCoords;
}
return null;
}
I've implemented it using Python3 at O(N) efficiency:
def get(s, alphabet="abc"):
seen = {}
for c in alphabet:
seen[c] = 0
seen[s[0]] = 1
start = 0
end = 0
shortest_s = 0
shortest_e = 99999
while end + 1 < len(s):
while seen[s[start]] > 1:
seen[s[start]] -= 1
start += 1
# Constant time check:
if sum(seen.values()) == len(alphabet) and all(v == 1 for v in seen.values()) and \
shortest_e - shortest_s > end - start:
shortest_s = start
shortest_e = end
end += 1
seen[s[end]] += 1
return s[shortest_s: shortest_e + 1]
print(get("abbcac")) # Expected to return "bca"
String s = "xyyzyzyx";
String s1 = "xyz";
String finalString ="";
Map<Character,Integer> hm = new HashMap<>();
if(s1!=null && s!=null && s.length()>s1.length()){
for(int i =0;i<s1.length();i++){
if(hm.get(s1.charAt(i))!=null){
int k = hm.get(s1.charAt(i))+1;
hm.put(s1.charAt(i), k);
}else
hm.put(s1.charAt(i), 1);
}
Map<Character,Integer> t = new HashMap<>();
int start =-1;
for(int j=0;j<s.length();j++){
if(hm.get(s.charAt(j))!=null){
if(t.get(s.charAt(j))!=null){
if(t.get(s.charAt(j))!=hm.get(s.charAt(j))){
int k = t.get(s.charAt(j))+1;
t.put(s.charAt(j), k);
}
}else{
t.put(s.charAt(j), 1);
if(start==-1){
if(j+s1.length()>s.length()){
break;
}
start = j;
}
}
if(hm.equals(t)){
t = new HashMap<>();
if(finalString.length()<s.substring(start,j+1).length());
{
finalString=s.substring(start,j+1);
}
j=start;
start=-1;
}
}
}
JavaScript solution in bruteforce way:
function shortestSubStringOfUniqueChars(s){
var uniqueArr = [];
for(let i=0; i<s.length; i++){
if(uniqueArr.indexOf(s.charAt(i)) <0){
uniqueArr.push(s.charAt(i));
}
}
let windoww = uniqueArr.length;
while(windoww < s.length){
for(let i=0; i<s.length - windoww; i++){
let match = true;
let tempArr = [];
for(let j=0; j<uniqueArr.length; j++){
if(uniqueArr.indexOf(s.charAt(i+j))<0){
match = false;
break;
}
}
let checkStr
if(match){
checkStr = s.substr(i, windoww);
for(let j=0; j<uniqueArr.length; j++){
if(uniqueArr.indexOf(checkStr.charAt(j))<0){
match = false;
break;
}
}
}
if(match){
return checkStr;
}
}
windoww = windoww + 1;
}
}
console.log(shortestSubStringOfUniqueChars("ABA"));
# Python implementation
s = input('Enter the string : ')
s1 = input('Enter the substring to search : ')
l = [] # List to record all the matching combinations
check = all([char in s for char in s1])
if check == True:
for i in range(len(s1),len(s)+1) :
for j in range(0,i+len(s1)+2):
if (i+j) < len(s)+1:
cnt = 0
b = all([char in s[j:i+j] for char in s1])
if (b == True) :
l.append(s[j:i+j])
print('The smallest substring containing',s1,'is',l[0])
else:
print('Please enter a valid substring')
Java code for the approach discussed above:
private static Map<Character, Integer> frequency;
private static Set<Character> charsCovered;
private static Map<Character, Integer> encountered;
/**
* To set the first match index as an intial start point
*/
private static boolean hasStarted = false;
private static int currentStartIndex = 0;
private static int finalStartIndex = 0;
private static int finalEndIndex = 0;
private static int minLen = Integer.MAX_VALUE;
private static int currentLen = 0;
/**
* Whether we have already found the match and now looking for other
* alternatives.
*/
private static boolean isFound = false;
private static char currentChar;
public static String findSmallestSubStringWithAllChars(String big, String small) {
if (null == big || null == small || big.isEmpty() || small.isEmpty()) {
return null;
}
frequency = new HashMap<Character, Integer>();
instantiateFrequencyMap(small);
charsCovered = new HashSet<Character>();
int charsToBeCovered = frequency.size();
encountered = new HashMap<Character, Integer>();
for (int i = 0; i < big.length(); i++) {
currentChar = big.charAt(i);
if (frequency.containsKey(currentChar) && !isFound) {
if (!hasStarted && !isFound) {
hasStarted = true;
currentStartIndex = i;
}
updateEncounteredMapAndCharsCoveredSet(currentChar);
if (charsCovered.size() == charsToBeCovered) {
currentLen = i - currentStartIndex;
isFound = true;
updateMinLength(i);
}
} else if (frequency.containsKey(currentChar) && isFound) {
updateEncounteredMapAndCharsCoveredSet(currentChar);
if (currentChar == big.charAt(currentStartIndex)) {
encountered.put(currentChar, encountered.get(currentChar) - 1);
currentStartIndex++;
while (currentStartIndex < i) {
if (encountered.containsKey(big.charAt(currentStartIndex))
&& encountered.get(big.charAt(currentStartIndex)) > frequency.get(big
.charAt(currentStartIndex))) {
encountered.put(big.charAt(currentStartIndex),
encountered.get(big.charAt(currentStartIndex)) - 1);
} else if (encountered.containsKey(big.charAt(currentStartIndex))) {
break;
}
currentStartIndex++;
}
}
currentLen = i - currentStartIndex;
updateMinLength(i);
}
}
System.out.println("start: " + finalStartIndex + " finalEnd : " + finalEndIndex);
return big.substring(finalStartIndex, finalEndIndex + 1);
}
private static void updateMinLength(int index) {
if (minLen > currentLen) {
minLen = currentLen;
finalStartIndex = currentStartIndex;
finalEndIndex = index;
}
}
private static void updateEncounteredMapAndCharsCoveredSet(Character currentChar) {
if (encountered.containsKey(currentChar)) {
encountered.put(currentChar, encountered.get(currentChar) + 1);
} else {
encountered.put(currentChar, 1);
}
if (encountered.get(currentChar) >= frequency.get(currentChar)) {
charsCovered.add(currentChar);
}
}
private static void instantiateFrequencyMap(String str) {
for (char c : str.toCharArray()) {
if (frequency.containsKey(c)) {
frequency.put(c, frequency.get(c) + 1);
} else {
frequency.put(c, 1);
}
}
}
public static void main(String[] args) {
String big = "this is a test string";
String small = "tist";
System.out.println("len: " + big.length());
System.out.println(findSmallestSubStringWithAllChars(big, small));
}
def minimum_window(s, t, min_length = 100000):
d = {}
for x in t:
if x in d:
d[x]+= 1
else:
d[x] = 1
tot = sum([y for x,y in d.iteritems()])
l = []
ind = 0
for i,x in enumerate(s):
if ind == 1:
l = l + [x]
if x in d:
tot-=1
if not l:
ind = 1
l = [x]
if tot == 0:
if len(l)<min_length:
min_length = len(l)
min_length = minimum_window(s[i+1:], t, min_length)
return min_length
l_s = "ADOBECODEBANC"
t_s = "ABC"
min_length = minimum_window(l_s, t_s)
if min_length == 100000:
print "Not found"
else:
print min_length