finding the subsequence of strings in c# urjent

finding the subsequence of strings in c# urjent - string

Alice has two strings, initial and goal. She can remove some number of characters from initial, which will give her a subsequence of that string. A string with no deletions is still considered a subsequence of itself. Given these two strings, can you find the minimum number of subsequences of initial that, when appended together, will form goal?

function minimumConcat(initial, goal) {
let result = 0;
let pattern = ''
let count1 = Array.apply(null, Array(26)).map(Number.prototype.valueOf, 0);
let count2 = Array.apply(null, Array(26)).map(Number.prototype.valueOf, 0);
initial.split('').forEach(c => {
pattern = pattern + c
});
pattern = "^[" + pattern + "]*$"
if (!RegExp(pattern).test(goal)) return -1
for (let i = 0; i < initial.length; i++) {
count1[initial.charCodeAt(i) - 97]++;
}
for (let i = 0; i < goal.length; i++) {
count2[goal.charCodeAt(i) - 97]++;
}
for (let i = 0; i < 26; i++) {
result += Math.abs(count1[i] - count2[i]);
}
return result;
}
var initial = readline();
var goal = readline();
print(minimumConcat(initial, goal));

Related

LC: Concatenated Words

LC Question: https://leetcode.com/problems/concatenated-words/
var findAllConcatenatedWordsInADict = function(words) {
let m = new Map(), memo = new Map();
let res = [];
for (let i = 0; i < words.length; i++) {
m.set(words[i], 1);
}
for (let i = 0; i < words.length; i++) {
if (isConcat(words[i], m, memo)) res.push(words[i]);
}
return res;
};
function isConcat(word, m, memo) {
if (memo.has(word)) return memo.get(word);
for (let i = 1; i < word.length; i++) {
let prefix = word.slice(0, i);
let suffix = word.slice(i);
if (m.has(prefix) && (m.has(suffix) || isConcat(suffix, m, memo))) {
memo.set(word, true);
return true;
}
}
memo.set(word, false);
return false;
};
Still trying to wrap my head around the solution why do we call the isConcat function ONLY on the suffix that we generate from words[i] and not prefix?
Further, I have tried running various test cases ["cat","cats","dog","dogcatsdog","rat","ratcatdogcat"].
It seems like we do not call the isConcat function on the first suffix that is generated from word[0] ('at'). However, we do seem to call it on 'catsdog' as part of the "dogcatsdog" word so I'm not sure how the logic works as to how it chooses which suffixes to call on....

My mistake was misreading the code and assuming that the following conditions had to return true for a valid concatenated word:
Map has both prefix and suffix; OR
isConcat(suffix) returns true
In fact, prefix is already known to be in map (since map.has(prefix) must be true in order to reach the recursive call in the first place!). Thus, a concatenated word is only valid if the suffix is also an element of map or it's another concatenated word. map.has(suffix) tests if the suffix is in map, while isConcat(suffix, map, memo) tests whether the suffix is also a concatenated word.

Longest Substring without repeating characters issue with edge case

I was trying to solve this problem: Longest substring without repeating characters. The issue is, it's failing in couple test cases, I don't know how to fix it. I would need your help to see where I'm going wrong.
Question:
Given a string, find the length of the longest substring without
repeating characters.
Examples:
Given "abcabcbb", the answer is "abc", which the length is 3.
Given "bbbbb", the answer is "b", with the length of 1.
Given "pwwkew", the answer is "wke", with the length of 3. Note that
the answer must be a substring, "pwke" is a subsequence and not a
substring.
This is my code:
function longestSubString(arr){
let localSum=0,globalSum=0;
let set = new Set();
for(let i=0; i<arr.length; i++){
let current = arr[i];
//if the key is present in the store.
if(set.has(current)){
set.clear();
localSum = 1;
set.add(current);
} else {
localSum +=1;
set.add(current);
}
if(globalSum < localSum){
globalSum = localSum;
}
}
return globalSum;
}
Tests:
let test = "abcabc"; //returns 3 - correct
let test2 = "bbb"; //returns 1 - correct
let test5 = "dvdf"; //returns 2 - INCORRECT! it should return 3 (i.e for vdf) since I'm doing set.clear() I'm not able to store previous elements.
longestSubString(test5); //incorrect
Live:
https://repl.it/Jo5Z/10

Not fully tested!
function longestSubString(arr){
let localSum=0,globalSum=0;
let set = new Set();
for(let i=0; i<arr.length; i++){
let current = arr[i];
//if the key is present in the store.
if(set.has(current)){
let a = Array.from(set);
a.splice(0,a.indexOf(current)+1);
set = new Set(a);
set.add(current);
localSum = set.size;
} else {
localSum +=1;
set.add(current);
}
if(globalSum < localSum){
globalSum = localSum;
}
}
return globalSum;
}
The idea is that when you get duplicate, you should start from the charachter after the first duplicated character, in your case dvdf, when you reach the second d you should continue from vd not from d!

You have to consider that the substring might start from any character in the string. Erasing the set only when you're finding a duplicate makes you only consider a substring starting from characters that are equal to to the first character.
An O(logn*n^2) solution modifying yours just a bit:
function longestSubString(arr){
let globalSum=0;
for(let i=0; i<arr.length; i++){
let set = new Set();
let localSum=0;
for(let j=i; j<arr.lenght; j++){
let current = arr[j];
//if the key is present in the store.
if(set.has(current)){
break;
} else {
localSum +=1;
set.add(current);
}
}
if(globalSum < localSum){
globalSum = localSum;
}
}
return globalSum;
}
There's also a O(n + d) (almost linear) solution, d being the number of characters in the alphabet. See http://www.geeksforgeeks.org/length-of-the-longest-substring-without-repeating-characters/.

There seem to be a lot of long answers here. The implementation I've thought of is simplified due to two observations:
Whenever you encounter a duplicate character, you need to start the next substring just after the previous occurrence of the current character.
Set() creates an array in insertion order when iterated.
function longestSubstring(str) {
let maxLength = 0
let current = new Set()
for (const character of str) {
if (current.has(character)) {
const substr = Array.from(current)
maxLength = Math.max(maxLength, substr.length)
current = new Set(substr.slice(substr.indexOf(character) + 1))
}
current.add(character)
}
return Math.max(maxLength, current.size)
}
const tests = [
"abcabc",
"bbb",
"pwwkew",
"geeksforgeeks",
"dvdf"
]
tests.map(longestSubstring).forEach(result => console.log(result))
A simple edit allows us to keep the first occurrence of the largest substring instead of the maximum length.
function longestSubstring(str) {
let maxSubstr = []
let current = new Set()
for (const character of str) {
if (current.has(character)) {
const substr = Array.from(current)
maxSubstr = maxSubstr.length < substr.length ? substr: maxSubstr
current = new Set(substr.slice(substr.indexOf(character) + 1))
}
current.add(character)
}
const substr = maxSubstr.length < current.size ? Array.from(current) : maxSubstr
return substr.join('')
}
const tests = [
"abcabc",
"bbb",
"pwwkew",
"geeksforgeeks",
"dvdf"
]
tests.map(longestSubstring).forEach(result => console.log(result))
As we can see, the last test yields vdf, as expected.

Below solution gets the length in O(n+d) time and also prints the longest non repeating character substring as well:
public void longestNonRepeatingLength(String a){
a="dvdf";
int visitedIndex[] = new int[256];
int curr_len = 0, max_len = 0, prev_ind = 0, start = 0, end = 1;
for(int i =0;i<256;i++)
visitedIndex[i] = -1;
visitedIndex[a.charAt(0)] = 0;
curr_len++;
int i = 0;
for( i=1;i<a.length();i++){
prev_ind = visitedIndex[a.charAt(i)];
if(prev_ind == -1 || i > prev_ind + curr_len)
curr_len++;
else{
if(curr_len>max_len){
start = prev_ind + 1;
end = i;
max_len = curr_len;
}
curr_len = i - prev_ind;
}
visitedIndex[a.charAt(i)] = i;
}
if(curr_len>max_len){
end = i-1;
max_len = curr_len;
}
for( i = start;i<=end;i++)
System.out.print(a.charAt(i));
System.out.println("");
System.out.println("Length = "+max_len);
}

As set contains the largest set of non-repeating characters of a String ending on index i, this means that when you encounter a previously seen character, rather than starting over with an empty set as your codes does now, you should just remove all characters from your set until that duplicate one.
Say for example your input is "abXcXdef". When the second "X" is encountered, you'll want to drop "a" and "b" from your set, leaving a set of ("c","X") as the longest set up to that point. Adding all other characters (as none are duplicates) you then end up with a max length of 5.
Something like this should work:
function longestSubString(arr) {
let globalSum = 0;
let set = new Set();
for (let i=0; i<arr.length; i++) {
let current = arr[i];
if (set.has(current)) {
while (true) {
let removeChar = arr[i - set.count];
if (removeChar != current)
set.remove(removeChar);
else
break;
}
} else {
set.add(current);
if (set.count > globalSum)
globalSum = set.count;
}
}
return globalSum;
}
As every character is added at most once and deleted at most once, this is an O(N) algorithm.

Encode string "aaa" to "3[a]"

give a string s, encode it by the format: "aaa" to "3[a]". The length of encoded string should the shortest.
example: "abbabb" to "2[a2[b]]"
update: suppose the string only contains lowercase letters
update: here is my code in c++, but it's slow. I know one of the improvement is using KMP to compute if the current string is combined by a repeat string.
// this function is used to check if a string is combined by repeating a substring.
// Also Here can be replaced by doing KMP algorithm for whole string to improvement
bool checkRepeating(string& s, int l, int r, int start, int end){
if((end-start+1)%(r-l+1) != 0)
return false;
int len = r-l+1;
bool res = true;
for(int i=start; i<=end; i++){
if(s[(i-start)%len+l] != s[i]){
res = false;
break;
}
}
return res;
}
// this function is used to get the length of the current number
int getLength(int l1, int l2){
return (int)(log10(l2/l1+1)+1);
}
string shortestEncodeString(string s){
int len = s.length();
vector< vector<int> > res(len, vector<int>(len, 0));
//Initial the matrix
for(int i=0; i<len; i++){
for(int j=0; j<=i; j++){
res[j][i] = i-j+1;
}
}
unordered_map<string, string> record;
for(int i=0; i<len; i++){
for(int j=i; j>=0; j--){
string temp = s.substr(j, i-j+1);
/* if the current substring has showed before, then no need to compute again
* Here is a example for this part: if the string is "abcabc".
* if we see the second "abc", then no need to compute again, just use the
* result from first "abc".
**/
if(record.find(temp) != record.end()){
res[j][i] = record[temp].size();
continue;
}
string ans = temp;
for(int k=j; k<i; k++){
string str1 = s.substr(j, k-j+1);
string str2 = s.substr(k+1, i-k);
if(res[j][i] > res[j][k] + res[k+1][i]){
res[j][i] = res[j][k]+res[k+1][i];
ans = record[str1] + record[str2];
}
if(checkRepeating(s, j, k, k+1, i) == true && res[j][i] > 2+getLength(k-j+1, i-k)+res[j][k]){
res[j][i] = 2+getLength(k-j+1, i-k)+res[j][k];
ans = to_string((i-j+1)/(k-j+1)) + '[' + record[str1] +']';
}
}
record[temp] = ans;
}
}
return record[s];
}

With very little to start with in terms of a question statement, I took a quick stab at this using JavaScript because it's easy to demonstrate. The comments are in the code, but basically there are alternating stages of joining adjacent elements, run-length checking, joining adjacent elements, and on and on until there is only one element left - the final encoded value.
I hope this helps.
function encode(str) {
var tmp = str.split('');
var arr = [];
do {
if (tmp.length === arr.length) {
// Join adjacent elements
arr.length = 0;
for (var i = 0; i < tmp.length; i += 2) {
if (i < tmp.length - 1) {
arr.push(tmp[i] + tmp[i + 1]);
} else {
arr.push(tmp[i]);
}
}
tmp.length = 0;
} else {
// Swap arrays and clear tmp
arr = tmp.slice();
tmp.length = 0;
}
// Build up the run-length strings
for (var i = 0; i < arr.length;) {
var runlength = runLength(arr, i);
if (runlength > 1) {
tmp.push(runlength + '[' + arr[i] + ']');
} else {
tmp.push(arr[i]);
}
i += runlength;
}
console.log(tmp);
} while (tmp.length > 1);
return tmp.join();
}
// Get the longest run length from a given index
function runLength(arr, ind) {
var count = 1;
for (var i = ind; i < arr.length - 1; i++) {
if (arr[i + 1] === arr[ind]) {
count++;
} else {
break;
}
}
return count;
}
<input id='inp' value='abbabb'>
<button type="submit" onClick='javascript:document.getElementById("result").value=encode(document.getElementById("inp").value)'>Encode</button>
<br>
<input id='result' value='2[a2[b]]'>

About KMP algorithm preprocessing function implementation

just try to implement a KMP algorithm, but when I try to check on the Internet, it turns out there are two different versions here:
Solution 1:
function computeLPSArray(str){
var j = -1, i = 0;
var arr = [];
arr[0] = j;
while(i < str.length){
if(j == -1||str[i]==str[j]){
i++;
j++;
arr[i] = j;
} else {
j = arr[j];
}
}
return arr;
}
Solution 2:
function computeLPSArray(pat){
var lps = [];
var len = 0, i;
lps[0] = 0;
i = 1;
while(i < pat.length){
if(pat[i] == pat[len]){
len++;
lps[i] = len;
i++;
} else {
if(len != 0){
len = lps[len-1];
} else {
lps[i++] = 0;
}
}
}
return lps;
}
The solution2 came from geeksforgeeks. Why not first solution?
Is there any corner case will failed when I use the Solution1?
Thx...

Not really - both versions can be used to perform the same tasks. The usage of the failure links array is a bit different, but the complexity of the algorithm is the same and both approaches are correct.
In one of the approaches the fail link is the length of longest proper suffix that is also a proper prefix(this would be version 2), while in the first version it is 1 less. As you can figure the two arrays are equivalent and can be converted from one to the other by adding/subtracting 1.

How to find the longest substring with no repeated characters?

I want an algorithm to find the longest substring of characters in a given string containing no repeating characters. I can think of an O(n*n) algorithm which considers all the substrings of a given string and calculates the number of non-repeating characters. For example, consider the string "AABGAKG" in which the longest substring of unique characters is 5 characters long which corresponds to BGAKG.
Can anyone suggest a better way to do it ?
Thanks
Edit: I think I'm not able to explain my question properly to others. You can have repeating characters in a substring (It's not that we need all distinct characters in a substring which geeksforgeeks solution does). The thing which I have to find is maximum no of non-repeating characters in any substring (it may be a case that some characters are repeated).
for eg, say string is AABGAKGIMN then BGAKGIMN is the solution.

for every start = 0 ... (n-1), try to expend end to the right-most position.
keep a bool array used[26] to remember if any character is already used.
suppose currently we finished (start, end)
for start+1,
first clear by set: used[str[start]] = false;
while ((end+1 < n) && (!used[str[end+1]])) { used[str[end+1]]=true; ++end;}
now we have check new (start, end). Total Complexity is O(N).

Here is the solution in C#. I tested in in Visual studio 2012 and it works
public static int LongestSubstNonrepChar(string str) {
int curSize = 0;
int maxSize = 0;
int end = 0;
bool[] present = new bool[256];
for (int start = 0; start < str.Length; start++) {
end = start;
while (end < str.Length) {
if (!present[str[end]] && end < str.Length)
{
curSize++;
present[str[end]] = true;
end++;
}
else
break;
}
if (curSize > maxSize) {
maxSize = curSize;
}
//reset current size and the set all letter to false
curSize = 0;
for (int i = 0; i < present.Length; i++)
present[i] = false;
}
return maxSize;
}

Pretty tricky question, I give you an O(n) solution based on C#.
public string MaxSubStringKUniqueChars(string source, int k)
{
if (string.IsNullOrEmpty(source) || k > source.Length) return string.Empty;
var start = 0;
var ret = string.Empty;
IDictionary<char, int> dict = new Dictionary<char, int>();
for (var i = 0; i < source.Length; i++)
{
if (dict.ContainsKey(source[i]))
{
dict[source[i]] = 1 + dict[source[i]];
}
else
{
dict[source[i]] = 1;
}
if (dict.Count == k + 1)
{
if (i - start > ret.Length)
{
ret = source.Substring(start, i - start);
}
while (dict.Count > k)
{
int count = dict[source[start]];
if (count == 1)
{
dict.Remove(source[start]);
}
else
{
dict[source[start]] = dict[source[start]] - 1;
}
start++;
}
}
}
//just for edge case like "aabbcceee", should return "cceee"
if (dict.Count == k && source.Length - start > ret.Length)
{
return source.Substring(start, source.Length - start);
}
return ret;
}
`
//This is the test case.
public void TestMethod1()
{
var ret = Item001.MaxSubStringKUniqueChars("aabcd", 2);
Assert.AreEqual("aab", ret);
ret = Item001.MaxSubStringKUniqueChars("aabbccddeee", 2);
Assert.AreEqual("ddeee", ret);
ret = Item001.MaxSubStringKUniqueChars("abccccccccaaddddeeee", 3);
Assert.AreEqual("ccccccccaadddd", ret);
ret = Item001.MaxSubStringKUniqueChars("ababcdcdedddde", 2);
Assert.AreEqual("dedddde", ret);
}

How about this:
public static String getLongestSubstringNoRepeats( String string ){
int iLongestSoFar = 0;
int posLongestSoFar = 0;
char charPrevious = 0;
int xCharacter = 0;
int iCurrentLength = 0;
while( xCharacter < string.length() ){
char charCurrent = string.charAt( xCharacter );
iCurrentLength++;
if( charCurrent == charPrevious ){
if( iCurrentLength > iLongestSoFar ){
iLongestSoFar = iCurrentLength;
posLongestSoFar = xCharacter;
}
iCurrentLength = 1;
}
charPrevious = charCurrent;
xCharacter++;
}
if( iCurrentLength > iLongestSoFar ){
return string.substring( posLongestSoFar );
} else {
return string.substring( posLongestSoFar, posLongestSoFar + iLongestSoFar );
}
}

Let s be the given string, and n its length.
Define f(i) to be the longest [contiguous] substring of s ending at s[i] with distinct letters. That's unique and well-defined.
Compute f(i) for each i. It's easy to deduce from f(i-1) and s[i]:
If the letter s[i] is in f(i-1), let j be the greatest position j < i such that s[j] = s[i]. Then f(i) is s[j+1 .. i] (in Python notation)
Otherwise, f(i) is f(i-1) with s[i] appended.
The solution to your problem is any f(i) of maximal length (not necessarily unique).
You could implement this algorithm to run in O(n * 26) time, where 26 is the number of letters in the alphabet.

public static int longestNonDupSubstring(char[] str) {
int maxCount = 0;
int count = 0;
int maxEnd = 0;
for(int i=1;i < str.length;i++) {
if(str[i] != str[i-1]) {
count++;
}
if (str[i] == str[i-1]) {
if(maxCount<count) {
maxCount = count;
maxEnd = i;
}
count = 0;
}
if ( i!=str.length-1 && str[i] == str[i+1]) {
if(maxCount<count) {
maxCount = count - 1;
maxEnd = i-1;
}
count = 0;
}
}
int startPos = maxEnd - maxCount + 1;
for(int i = 0; i < maxCount; i++) {
System.out.print(str[startPos+i]);
}
return maxCount;
}

//Given a string ,find the longest sub-string with all distinct characters in it.If there are multiple such strings,print them all.
#include<iostream>
#include<cstring>
#include<array>
using namespace std;
//for a string with all small letters
//for capital letters use 65 instead of 97
int main()
{
array<int ,26> count ;
array<string,26>largest;
for(int i = 0 ;i <26;i++)
count[i]=0;
string s = "abcdefghijrrstqrstuvwxyzprr";
string out = "";
int k = 0,max=0;
for(int i = 0 ; i < s.size() ; i++)
{
if(count[s[i] - 97]==1)
{
int loc = out.find(s[i]);
for(int j=0;j<=loc;j++) count[out[j] - 97]=0;
if(out.size() > max)
{
max = out.size();
k=1;
largest[0] = out;
}
else if(out.size()==max) largest[k++]=out;
out.assign(out,loc+1,out.size()-loc-1);
}
out = out + s[i];
count[s[i] - 97]++;
}
for(int i=0;i<k;i++) cout<<largest[i] << endl;
//output will be
// abcdefghijr
// qrstuvwxyzp
}

Let me contribute a little as well. I have this solution with complexity will be O(N). The algorithm’s space complexity will be O(K), where K is the number of distinct characters in the input string.
public static int NoRepeatSubstring(string str)
{
int start = 0;
int maxLen = 0;
Dictionary<char, int> dic = new Dictionary<char, int>();
for (int i = 0; i < str.Length; i++)
{
char rightChar = str[i];
// if the map already contains the 'rightChar', shrink the window from the beginning so that
// we have only one occurrence of 'rightChar'
if (dic.ContainsKey(rightChar))
{
// this is tricky; in the current window, we will not have any 'rightChar' after its previous index
// and if 'start' is already ahead of the last index of 'rightChar', we'll keep 'windowStart'
start = Math.Max(start, dic[rightChar] + 1);
}
if (dic.ContainsKey(str[i]))
dic[str[i]] = i;
else
dic.Add(str[i], i);
maxLen = Math.Max(maxLen, i - start + 1);
}
return maxLen;
}
And here some Unit Tests:
Assert.Equal(3, SlideWindow.NoRepeatSubstring("aabccbb"));
Assert.Equal(2, SlideWindow.NoRepeatSubstring("abbbb"));
Assert.Equal(3, SlideWindow.NoRepeatSubstring("abccde"));

string MaximumSubstringNonRepeating(string text)
{
string max = null;
bool isCapture = false;
foreach (string s in Regex.Split(text, #"(.)\1+"))
{
if (!isCapture && (max == null || s.Length > max.Length))
{
max = s;
}
isCapture = !isCapture;
}
return max;
}
. matches any character. ( ) captures that character. \1 matches the captured character again. + repeats that character. The whole pattern matches two or more repetitions of any one character. "AA" or ",,,,".
Regex.Split() splits the string at every match of the pattern, and returns an array of the pieces that are in between. (One caveat: It also includes the captured substrings. In this case, the one character that are being repeated. The captures will show up in between the pieces. This is way I just added the isCapture flag.)
The function cuts out all the repeated characters, and returns the longest piece that where in between the repeated each set of repeated characters.
>>> MaximumSubstringNonRepeating("AABGAKG") // "AA" is repeated
"BGAKG"
>>> MaximumSubstringNonRepeating("AABGAKGIMNZZZD") // "AA" and "ZZZ" are repeated.
"BGAKGIMN"

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

finding the subsequence of strings in c# urjent - string

Related

LC: Concatenated Words

Longest Substring without repeating characters issue with edge case

Encode string "aaa" to "3[a]"

About KMP algorithm preprocessing function implementation

How to find the longest substring with no repeated characters?

Categories

Resources