About KMP algorithm preprocessing function implementation

About KMP algorithm preprocessing function implementation - string

just try to implement a KMP algorithm, but when I try to check on the Internet, it turns out there are two different versions here:
Solution 1:
function computeLPSArray(str){
var j = -1, i = 0;
var arr = [];
arr[0] = j;
while(i < str.length){
if(j == -1||str[i]==str[j]){
i++;
j++;
arr[i] = j;
} else {
j = arr[j];
}
}
return arr;
}
Solution 2:
function computeLPSArray(pat){
var lps = [];
var len = 0, i;
lps[0] = 0;
i = 1;
while(i < pat.length){
if(pat[i] == pat[len]){
len++;
lps[i] = len;
i++;
} else {
if(len != 0){
len = lps[len-1];
} else {
lps[i++] = 0;
}
}
}
return lps;
}
The solution2 came from geeksforgeeks. Why not first solution?
Is there any corner case will failed when I use the Solution1?
Thx...

Not really - both versions can be used to perform the same tasks. The usage of the failure links array is a bit different, but the complexity of the algorithm is the same and both approaches are correct.
In one of the approaches the fail link is the length of longest proper suffix that is also a proper prefix(this would be version 2), while in the first version it is 1 less. As you can figure the two arrays are equivalent and can be converted from one to the other by adding/subtracting 1.

Related

NPOI Formula evaluation error IndexOutOfRangeException

I have an Excel.xls. There are many cells with complex formulas. When I got the results of several formulas, an exception occurred.
var a = wk.GetCreationHelper().CreateFormulaEvaluator();
a.ClearAllCachedResultValues();
var cellv = a.EvaluateInCell(cell);
The evaluation function EvaluateInCell returned IndexOutOfRangeException: Index was outside the bounds of the array.
It is worth noting that the formula in this cell is very long and has multiple levels of nesting.
Its formula is this:=IF(SUM(D22:D10000)+SUM(M28:M193)+SUM(M22:M23)+SUM(D22:D10000)+SUM(E22:E10000)*1.2=0,"",SUM(D22:D10000)+SUM(M28:M193)+SUM(M22:M23)+SUM(E22:E10000)*1.2)
First of all, I don't think it is a problem with the formula, because Excel can calculate the result by itself, but it returns an error through Npoi.
I looked through the source code and found that the error may be caused by a negative number in getHashCode. This is my guess, but I still don't know how to solve this problem. The source code is as follows
int startIx = cce.GetHashCode() % arr.Length;
for (int i = startIx; i < arr.Length; i++)
{
CellCacheEntry item = arr[i];
if (item == cce)
{
// already present
return false;
}
if (item == null)
{
arr[i] = cce;
return true;
}
}
for (int i = 0; i < startIx; i++)
{
CellCacheEntry item = arr[i];
if (item == cce)
{
// already present
return false;
}
if (item == null)
{
arr[i] = cce;
return true;
}
}

finding the subsequence of strings in c# urjent

Alice has two strings, initial and goal. She can remove some number of characters from initial, which will give her a subsequence of that string. A string with no deletions is still considered a subsequence of itself. Given these two strings, can you find the minimum number of subsequences of initial that, when appended together, will form goal?

function minimumConcat(initial, goal) {
let result = 0;
let pattern = ''
let count1 = Array.apply(null, Array(26)).map(Number.prototype.valueOf, 0);
let count2 = Array.apply(null, Array(26)).map(Number.prototype.valueOf, 0);
initial.split('').forEach(c => {
pattern = pattern + c
});
pattern = "^[" + pattern + "]*$"
if (!RegExp(pattern).test(goal)) return -1
for (let i = 0; i < initial.length; i++) {
count1[initial.charCodeAt(i) - 97]++;
}
for (let i = 0; i < goal.length; i++) {
count2[goal.charCodeAt(i) - 97]++;
}
for (let i = 0; i < 26; i++) {
result += Math.abs(count1[i] - count2[i]);
}
return result;
}
var initial = readline();
var goal = readline();
print(minimumConcat(initial, goal));

Longest Substring without repeating characters issue with edge case

I was trying to solve this problem: Longest substring without repeating characters. The issue is, it's failing in couple test cases, I don't know how to fix it. I would need your help to see where I'm going wrong.
Question:
Given a string, find the length of the longest substring without
repeating characters.
Examples:
Given "abcabcbb", the answer is "abc", which the length is 3.
Given "bbbbb", the answer is "b", with the length of 1.
Given "pwwkew", the answer is "wke", with the length of 3. Note that
the answer must be a substring, "pwke" is a subsequence and not a
substring.
This is my code:
function longestSubString(arr){
let localSum=0,globalSum=0;
let set = new Set();
for(let i=0; i<arr.length; i++){
let current = arr[i];
//if the key is present in the store.
if(set.has(current)){
set.clear();
localSum = 1;
set.add(current);
} else {
localSum +=1;
set.add(current);
}
if(globalSum < localSum){
globalSum = localSum;
}
}
return globalSum;
}
Tests:
let test = "abcabc"; //returns 3 - correct
let test2 = "bbb"; //returns 1 - correct
let test5 = "dvdf"; //returns 2 - INCORRECT! it should return 3 (i.e for vdf) since I'm doing set.clear() I'm not able to store previous elements.
longestSubString(test5); //incorrect
Live:
https://repl.it/Jo5Z/10

Not fully tested!
function longestSubString(arr){
let localSum=0,globalSum=0;
let set = new Set();
for(let i=0; i<arr.length; i++){
let current = arr[i];
//if the key is present in the store.
if(set.has(current)){
let a = Array.from(set);
a.splice(0,a.indexOf(current)+1);
set = new Set(a);
set.add(current);
localSum = set.size;
} else {
localSum +=1;
set.add(current);
}
if(globalSum < localSum){
globalSum = localSum;
}
}
return globalSum;
}
The idea is that when you get duplicate, you should start from the charachter after the first duplicated character, in your case dvdf, when you reach the second d you should continue from vd not from d!

You have to consider that the substring might start from any character in the string. Erasing the set only when you're finding a duplicate makes you only consider a substring starting from characters that are equal to to the first character.
An O(logn*n^2) solution modifying yours just a bit:
function longestSubString(arr){
let globalSum=0;
for(let i=0; i<arr.length; i++){
let set = new Set();
let localSum=0;
for(let j=i; j<arr.lenght; j++){
let current = arr[j];
//if the key is present in the store.
if(set.has(current)){
break;
} else {
localSum +=1;
set.add(current);
}
}
if(globalSum < localSum){
globalSum = localSum;
}
}
return globalSum;
}
There's also a O(n + d) (almost linear) solution, d being the number of characters in the alphabet. See http://www.geeksforgeeks.org/length-of-the-longest-substring-without-repeating-characters/.

There seem to be a lot of long answers here. The implementation I've thought of is simplified due to two observations:
Whenever you encounter a duplicate character, you need to start the next substring just after the previous occurrence of the current character.
Set() creates an array in insertion order when iterated.
function longestSubstring(str) {
let maxLength = 0
let current = new Set()
for (const character of str) {
if (current.has(character)) {
const substr = Array.from(current)
maxLength = Math.max(maxLength, substr.length)
current = new Set(substr.slice(substr.indexOf(character) + 1))
}
current.add(character)
}
return Math.max(maxLength, current.size)
}
const tests = [
"abcabc",
"bbb",
"pwwkew",
"geeksforgeeks",
"dvdf"
]
tests.map(longestSubstring).forEach(result => console.log(result))
A simple edit allows us to keep the first occurrence of the largest substring instead of the maximum length.
function longestSubstring(str) {
let maxSubstr = []
let current = new Set()
for (const character of str) {
if (current.has(character)) {
const substr = Array.from(current)
maxSubstr = maxSubstr.length < substr.length ? substr: maxSubstr
current = new Set(substr.slice(substr.indexOf(character) + 1))
}
current.add(character)
}
const substr = maxSubstr.length < current.size ? Array.from(current) : maxSubstr
return substr.join('')
}
const tests = [
"abcabc",
"bbb",
"pwwkew",
"geeksforgeeks",
"dvdf"
]
tests.map(longestSubstring).forEach(result => console.log(result))
As we can see, the last test yields vdf, as expected.

Below solution gets the length in O(n+d) time and also prints the longest non repeating character substring as well:
public void longestNonRepeatingLength(String a){
a="dvdf";
int visitedIndex[] = new int[256];
int curr_len = 0, max_len = 0, prev_ind = 0, start = 0, end = 1;
for(int i =0;i<256;i++)
visitedIndex[i] = -1;
visitedIndex[a.charAt(0)] = 0;
curr_len++;
int i = 0;
for( i=1;i<a.length();i++){
prev_ind = visitedIndex[a.charAt(i)];
if(prev_ind == -1 || i > prev_ind + curr_len)
curr_len++;
else{
if(curr_len>max_len){
start = prev_ind + 1;
end = i;
max_len = curr_len;
}
curr_len = i - prev_ind;
}
visitedIndex[a.charAt(i)] = i;
}
if(curr_len>max_len){
end = i-1;
max_len = curr_len;
}
for( i = start;i<=end;i++)
System.out.print(a.charAt(i));
System.out.println("");
System.out.println("Length = "+max_len);
}

As set contains the largest set of non-repeating characters of a String ending on index i, this means that when you encounter a previously seen character, rather than starting over with an empty set as your codes does now, you should just remove all characters from your set until that duplicate one.
Say for example your input is "abXcXdef". When the second "X" is encountered, you'll want to drop "a" and "b" from your set, leaving a set of ("c","X") as the longest set up to that point. Adding all other characters (as none are duplicates) you then end up with a max length of 5.
Something like this should work:
function longestSubString(arr) {
let globalSum = 0;
let set = new Set();
for (let i=0; i<arr.length; i++) {
let current = arr[i];
if (set.has(current)) {
while (true) {
let removeChar = arr[i - set.count];
if (removeChar != current)
set.remove(removeChar);
else
break;
}
} else {
set.add(current);
if (set.count > globalSum)
globalSum = set.count;
}
}
return globalSum;
}
As every character is added at most once and deleted at most once, this is an O(N) algorithm.

Encode string "aaa" to "3[a]"

give a string s, encode it by the format: "aaa" to "3[a]". The length of encoded string should the shortest.
example: "abbabb" to "2[a2[b]]"
update: suppose the string only contains lowercase letters
update: here is my code in c++, but it's slow. I know one of the improvement is using KMP to compute if the current string is combined by a repeat string.
// this function is used to check if a string is combined by repeating a substring.
// Also Here can be replaced by doing KMP algorithm for whole string to improvement
bool checkRepeating(string& s, int l, int r, int start, int end){
if((end-start+1)%(r-l+1) != 0)
return false;
int len = r-l+1;
bool res = true;
for(int i=start; i<=end; i++){
if(s[(i-start)%len+l] != s[i]){
res = false;
break;
}
}
return res;
}
// this function is used to get the length of the current number
int getLength(int l1, int l2){
return (int)(log10(l2/l1+1)+1);
}
string shortestEncodeString(string s){
int len = s.length();
vector< vector<int> > res(len, vector<int>(len, 0));
//Initial the matrix
for(int i=0; i<len; i++){
for(int j=0; j<=i; j++){
res[j][i] = i-j+1;
}
}
unordered_map<string, string> record;
for(int i=0; i<len; i++){
for(int j=i; j>=0; j--){
string temp = s.substr(j, i-j+1);
/* if the current substring has showed before, then no need to compute again
* Here is a example for this part: if the string is "abcabc".
* if we see the second "abc", then no need to compute again, just use the
* result from first "abc".
**/
if(record.find(temp) != record.end()){
res[j][i] = record[temp].size();
continue;
}
string ans = temp;
for(int k=j; k<i; k++){
string str1 = s.substr(j, k-j+1);
string str2 = s.substr(k+1, i-k);
if(res[j][i] > res[j][k] + res[k+1][i]){
res[j][i] = res[j][k]+res[k+1][i];
ans = record[str1] + record[str2];
}
if(checkRepeating(s, j, k, k+1, i) == true && res[j][i] > 2+getLength(k-j+1, i-k)+res[j][k]){
res[j][i] = 2+getLength(k-j+1, i-k)+res[j][k];
ans = to_string((i-j+1)/(k-j+1)) + '[' + record[str1] +']';
}
}
record[temp] = ans;
}
}
return record[s];
}

With very little to start with in terms of a question statement, I took a quick stab at this using JavaScript because it's easy to demonstrate. The comments are in the code, but basically there are alternating stages of joining adjacent elements, run-length checking, joining adjacent elements, and on and on until there is only one element left - the final encoded value.
I hope this helps.
function encode(str) {
var tmp = str.split('');
var arr = [];
do {
if (tmp.length === arr.length) {
// Join adjacent elements
arr.length = 0;
for (var i = 0; i < tmp.length; i += 2) {
if (i < tmp.length - 1) {
arr.push(tmp[i] + tmp[i + 1]);
} else {
arr.push(tmp[i]);
}
}
tmp.length = 0;
} else {
// Swap arrays and clear tmp
arr = tmp.slice();
tmp.length = 0;
}
// Build up the run-length strings
for (var i = 0; i < arr.length;) {
var runlength = runLength(arr, i);
if (runlength > 1) {
tmp.push(runlength + '[' + arr[i] + ']');
} else {
tmp.push(arr[i]);
}
i += runlength;
}
console.log(tmp);
} while (tmp.length > 1);
return tmp.join();
}
// Get the longest run length from a given index
function runLength(arr, ind) {
var count = 1;
for (var i = ind; i < arr.length - 1; i++) {
if (arr[i + 1] === arr[ind]) {
count++;
} else {
break;
}
}
return count;
}
<input id='inp' value='abbabb'>
<button type="submit" onClick='javascript:document.getElementById("result").value=encode(document.getElementById("inp").value)'>Encode</button>
<br>
<input id='result' value='2[a2[b]]'>

How to find the longest substring with no repeated characters?

I want an algorithm to find the longest substring of characters in a given string containing no repeating characters. I can think of an O(n*n) algorithm which considers all the substrings of a given string and calculates the number of non-repeating characters. For example, consider the string "AABGAKG" in which the longest substring of unique characters is 5 characters long which corresponds to BGAKG.
Can anyone suggest a better way to do it ?
Thanks
Edit: I think I'm not able to explain my question properly to others. You can have repeating characters in a substring (It's not that we need all distinct characters in a substring which geeksforgeeks solution does). The thing which I have to find is maximum no of non-repeating characters in any substring (it may be a case that some characters are repeated).
for eg, say string is AABGAKGIMN then BGAKGIMN is the solution.

for every start = 0 ... (n-1), try to expend end to the right-most position.
keep a bool array used[26] to remember if any character is already used.
suppose currently we finished (start, end)
for start+1,
first clear by set: used[str[start]] = false;
while ((end+1 < n) && (!used[str[end+1]])) { used[str[end+1]]=true; ++end;}
now we have check new (start, end). Total Complexity is O(N).

Here is the solution in C#. I tested in in Visual studio 2012 and it works
public static int LongestSubstNonrepChar(string str) {
int curSize = 0;
int maxSize = 0;
int end = 0;
bool[] present = new bool[256];
for (int start = 0; start < str.Length; start++) {
end = start;
while (end < str.Length) {
if (!present[str[end]] && end < str.Length)
{
curSize++;
present[str[end]] = true;
end++;
}
else
break;
}
if (curSize > maxSize) {
maxSize = curSize;
}
//reset current size and the set all letter to false
curSize = 0;
for (int i = 0; i < present.Length; i++)
present[i] = false;
}
return maxSize;
}

Pretty tricky question, I give you an O(n) solution based on C#.
public string MaxSubStringKUniqueChars(string source, int k)
{
if (string.IsNullOrEmpty(source) || k > source.Length) return string.Empty;
var start = 0;
var ret = string.Empty;
IDictionary<char, int> dict = new Dictionary<char, int>();
for (var i = 0; i < source.Length; i++)
{
if (dict.ContainsKey(source[i]))
{
dict[source[i]] = 1 + dict[source[i]];
}
else
{
dict[source[i]] = 1;
}
if (dict.Count == k + 1)
{
if (i - start > ret.Length)
{
ret = source.Substring(start, i - start);
}
while (dict.Count > k)
{
int count = dict[source[start]];
if (count == 1)
{
dict.Remove(source[start]);
}
else
{
dict[source[start]] = dict[source[start]] - 1;
}
start++;
}
}
}
//just for edge case like "aabbcceee", should return "cceee"
if (dict.Count == k && source.Length - start > ret.Length)
{
return source.Substring(start, source.Length - start);
}
return ret;
}
`
//This is the test case.
public void TestMethod1()
{
var ret = Item001.MaxSubStringKUniqueChars("aabcd", 2);
Assert.AreEqual("aab", ret);
ret = Item001.MaxSubStringKUniqueChars("aabbccddeee", 2);
Assert.AreEqual("ddeee", ret);
ret = Item001.MaxSubStringKUniqueChars("abccccccccaaddddeeee", 3);
Assert.AreEqual("ccccccccaadddd", ret);
ret = Item001.MaxSubStringKUniqueChars("ababcdcdedddde", 2);
Assert.AreEqual("dedddde", ret);
}

How about this:
public static String getLongestSubstringNoRepeats( String string ){
int iLongestSoFar = 0;
int posLongestSoFar = 0;
char charPrevious = 0;
int xCharacter = 0;
int iCurrentLength = 0;
while( xCharacter < string.length() ){
char charCurrent = string.charAt( xCharacter );
iCurrentLength++;
if( charCurrent == charPrevious ){
if( iCurrentLength > iLongestSoFar ){
iLongestSoFar = iCurrentLength;
posLongestSoFar = xCharacter;
}
iCurrentLength = 1;
}
charPrevious = charCurrent;
xCharacter++;
}
if( iCurrentLength > iLongestSoFar ){
return string.substring( posLongestSoFar );
} else {
return string.substring( posLongestSoFar, posLongestSoFar + iLongestSoFar );
}
}

Let s be the given string, and n its length.
Define f(i) to be the longest [contiguous] substring of s ending at s[i] with distinct letters. That's unique and well-defined.
Compute f(i) for each i. It's easy to deduce from f(i-1) and s[i]:
If the letter s[i] is in f(i-1), let j be the greatest position j < i such that s[j] = s[i]. Then f(i) is s[j+1 .. i] (in Python notation)
Otherwise, f(i) is f(i-1) with s[i] appended.
The solution to your problem is any f(i) of maximal length (not necessarily unique).
You could implement this algorithm to run in O(n * 26) time, where 26 is the number of letters in the alphabet.

public static int longestNonDupSubstring(char[] str) {
int maxCount = 0;
int count = 0;
int maxEnd = 0;
for(int i=1;i < str.length;i++) {
if(str[i] != str[i-1]) {
count++;
}
if (str[i] == str[i-1]) {
if(maxCount<count) {
maxCount = count;
maxEnd = i;
}
count = 0;
}
if ( i!=str.length-1 && str[i] == str[i+1]) {
if(maxCount<count) {
maxCount = count - 1;
maxEnd = i-1;
}
count = 0;
}
}
int startPos = maxEnd - maxCount + 1;
for(int i = 0; i < maxCount; i++) {
System.out.print(str[startPos+i]);
}
return maxCount;
}

//Given a string ,find the longest sub-string with all distinct characters in it.If there are multiple such strings,print them all.
#include<iostream>
#include<cstring>
#include<array>
using namespace std;
//for a string with all small letters
//for capital letters use 65 instead of 97
int main()
{
array<int ,26> count ;
array<string,26>largest;
for(int i = 0 ;i <26;i++)
count[i]=0;
string s = "abcdefghijrrstqrstuvwxyzprr";
string out = "";
int k = 0,max=0;
for(int i = 0 ; i < s.size() ; i++)
{
if(count[s[i] - 97]==1)
{
int loc = out.find(s[i]);
for(int j=0;j<=loc;j++) count[out[j] - 97]=0;
if(out.size() > max)
{
max = out.size();
k=1;
largest[0] = out;
}
else if(out.size()==max) largest[k++]=out;
out.assign(out,loc+1,out.size()-loc-1);
}
out = out + s[i];
count[s[i] - 97]++;
}
for(int i=0;i<k;i++) cout<<largest[i] << endl;
//output will be
// abcdefghijr
// qrstuvwxyzp
}

Let me contribute a little as well. I have this solution with complexity will be O(N). The algorithm’s space complexity will be O(K), where K is the number of distinct characters in the input string.
public static int NoRepeatSubstring(string str)
{
int start = 0;
int maxLen = 0;
Dictionary<char, int> dic = new Dictionary<char, int>();
for (int i = 0; i < str.Length; i++)
{
char rightChar = str[i];
// if the map already contains the 'rightChar', shrink the window from the beginning so that
// we have only one occurrence of 'rightChar'
if (dic.ContainsKey(rightChar))
{
// this is tricky; in the current window, we will not have any 'rightChar' after its previous index
// and if 'start' is already ahead of the last index of 'rightChar', we'll keep 'windowStart'
start = Math.Max(start, dic[rightChar] + 1);
}
if (dic.ContainsKey(str[i]))
dic[str[i]] = i;
else
dic.Add(str[i], i);
maxLen = Math.Max(maxLen, i - start + 1);
}
return maxLen;
}
And here some Unit Tests:
Assert.Equal(3, SlideWindow.NoRepeatSubstring("aabccbb"));
Assert.Equal(2, SlideWindow.NoRepeatSubstring("abbbb"));
Assert.Equal(3, SlideWindow.NoRepeatSubstring("abccde"));

string MaximumSubstringNonRepeating(string text)
{
string max = null;
bool isCapture = false;
foreach (string s in Regex.Split(text, #"(.)\1+"))
{
if (!isCapture && (max == null || s.Length > max.Length))
{
max = s;
}
isCapture = !isCapture;
}
return max;
}
. matches any character. ( ) captures that character. \1 matches the captured character again. + repeats that character. The whole pattern matches two or more repetitions of any one character. "AA" or ",,,,".
Regex.Split() splits the string at every match of the pattern, and returns an array of the pieces that are in between. (One caveat: It also includes the captured substrings. In this case, the one character that are being repeated. The captures will show up in between the pieces. This is way I just added the isCapture flag.)
The function cuts out all the repeated characters, and returns the longest piece that where in between the repeated each set of repeated characters.
>>> MaximumSubstringNonRepeating("AABGAKG") // "AA" is repeated
"BGAKG"
>>> MaximumSubstringNonRepeating("AABGAKGIMNZZZD") // "AA" and "ZZZ" are repeated.
"BGAKGIMN"

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

About KMP algorithm preprocessing function implementation - string

Related

NPOI Formula evaluation error IndexOutOfRangeException

finding the subsequence of strings in c# urjent

Longest Substring without repeating characters issue with edge case

Encode string "aaa" to "3[a]"

How to find the longest substring with no repeated characters?

Categories

Resources