Numbered String to Int with leading zeros - string

For a numbered string (1234) I normally use the package strconv with the function Atoi to convert from string to int in Go. However, what is the idiomatic way of approaching this if the numbered string starts with leading zeros (e.g. 01234)?
Slicing the string and then converting the []string to []int, is one approach, but is it the best/idiomatic way Go-ing about it?
Update 1:
From input string 01234, expected output 01234 as any type int (in any kind of simple or composed type of int, e.g. foo := "01234" to bar = []int{0, 1, 2, 3, 4}).
Is there an idiomatic (or standard package func) to approach this problem, or is the string to rune/byte conversation (do some business logic and then convert back) necessary when the variable has one or more leading zeros (e.g. 01234 or 00001).
Update 2: Just to make it completely clear, as I can feel the question can be clarified future: foo, _ := strconv.Atoi("01234") returns 1234 as an int (and same result can be obtained in other ways with strings package etc.).
The question here is, how (if possible, in Go idiomatic) can I get the result with the leading zeros, that is, 01234 (and NOT 1234) in type int?

use strings.TrimLeft(s, "0") to remove leading zeroes from the string.

fmt.Printf("%01d ", 5) // 5
fmt.Printf("%02d ", 5) // 05
fmt.Printf("%03d ", 5) // 005
fmt.Printf("%04d ", 5) // 0005
myInt := fmt.Sprintf("%05d ", 5)
fmt.Println(myInt) // 00005
https://pkg.go.dev/fmt#pkg-overview

To convert a string of decimals to a slice of integers for those decimal values, loop over the string and convert the rune to the corresponding integer using subtraction:
func toIntSlice(s string) ([]int, error) {
var result []int
for _, r := range s {
if r < '0' || r > '9' {
return nil, fmt.Errorf("not an integer %s", s)
}
result = append(result, int(r-'0'))
}
return result, nil
}
Example use:
foo := "01234"
bar, _ := toIntSlice(foo)
// bar is []int{0, 1, 2, 3, 4}
https://go.dev/play/p/etHtApYoWUi

Related

CS50 Problem Set 2: Substitution (Need Help)

Im facing some issue here. Can anyone tell me what is wrong with my code?
This is the check50 result:
:) substitution.c exists
:) substitution.c compiles
:( encrypts "A" as "Z" using ZYXWVUTSRQPONMLKJIHGFEDCBA as key
expected "ciphertext: Z...", not ""
:( encrypts "a" as "z" using ZYXWVUTSRQPONMLKJIHGFEDCBA as key
expected "ciphertext: z...", not ""
:( encrypts "ABC" as "NJQ" using NJQSUYBRXMOPFTHZVAWCGILKED as key
expected "ciphertext: NJ...", not ""
:( encrypts "XyZ" as "KeD" using NJQSUYBRXMOPFTHZVAWCGILKED as key
expected "ciphertext: Ke...", not ""
:( encrypts "This is CS50" as "Cbah ah KH50" using YUKFRNLBAVMWZTEOGXHCIPJSQD as key
expected "ciphertext: Cb...", not ""
:( encrypts "This is CS50" as "Cbah ah KH50" using yukfrnlbavmwzteogxhcipjsqd as key
expected "ciphertext: Cb...", not ""
:( encrypts "This is CS50" as "Cbah ah KH50" using YUKFRNLBAVMWZteogxhcipjsqd as key
expected "ciphertext: Cb...", not ""
:( encrypts all alphabetic characters using DWUSXNPQKEGCZFJBTLYROHIAVM as key
expected "ciphertext: Rq...", not ""
:( does not encrypt non-alphabetical characters using DWUSXNPQKEGCZFJBTLYROHIAVM as key
expected "ciphertext: Yq...", not ""
:) handles lack of key
:) handles too many arguments
:) handles invalid key length
:) handles invalid characters in key
:) handles duplicate characters in key
:) handles multiple duplicate characters in key
This is my code:
#include <cs50.h>
#include <stdio.h>
#include <ctype.h>
#include <string.h>
int main(int argc, string argv[])
{
string alphabet= "abcdefghijklmnopqrstuvwxyz";
if(argc != 2)
{
printf("missing/more than 1 command-line argument\n");
return 1;
}
//check if there are 26 characters
int a= strlen(argv[1]);
if(a!=26)
{
printf("key must contain 26 characters\n");
return 1;
}
//Check if characters are all alphabetic
for(int i=0, n=strlen(argv[1]); i<n; i++)
{
if(!isalpha(argv[1][i]))
{
printf("only alphabetic characters allowed\n");
return 1;
}
//check if each letter appear only once
for(int j=1; j<n; j++)
{
if(argv[1][i]==argv[1][j])
{
printf("repeated alphabets not allowed\n");
return 1;
}
}
}
//prompt user for plaintext
string b= get_string("plaintext: \n");
int m=strlen(b);
char ciphertxt[m+1];
//find out the alphabetical position of each character in string b (i.e character c in string b has alphabetical position of 3)
for(int k=0; k<m; k++)
{
for(int p=0, q=strlen(alphabet); p<q; p++)
{
if(b[k]==alphabet[p])
{
ciphertxt[k]= tolower(argv[1][p]);
break;
}
else if(b[k]==(alphabet[p]-32))
{
ciphertxt[k]= toupper(argv[1][p]);
break;
}
else
{
ciphertxt[k]= b[k];
}
}
}
ciphertxt[m]='\0';
//print ciphertext
printf("ciphertext: %s\n", ciphertxt);
return 0;
}
Did you run your code using the tests cs50 shows you? I did; it does not encrypt anything; it always gives "repeated alphabets not allowed" message.
The problem is in the j loop. It will always report the 2nd letter of argv[1] as a duplicate. That is because i and j are both 1 therefore this if(argv[1][i]==argv[1][j]) always evaluates to true.
There are a several approaches to solving this problem. I will not solve them for you using the C programming language. You must do that yourself. Following is an approach that works very efficiently using the Ada programming language, but is not easily accomplished using the C programming language.
In Ada a string is defined as
type string is array (Positive range <>) of Character;
Thus, a string is an unconstrained array type, meaning instances of an array may be any length. Ada arrays require the programmer to define the range of values for the array index. Index values need not start at 0. Index values may start at any value which is valid for the type declared to be the index type. Index types may be integer types or enumeration types. Ada characters are an enumeration type, which allows the programmer to index an array using characters.
The following example uses many of the features described above.
with Ada.Command_Line; use Ada.Command_Line;
with Ada.Text_IO; use Ada.Text_IO;
with Ada.Characters.Handling; use Ada.Characters.Handling;
procedure substitution is
subtype lower is Character range 'a' .. 'z';
subtype upper is Character range 'A' .. 'Z';
subtype sequence is String (1 .. 26);
alphabet : constant array (lower) of Positive :=
(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
21, 22, 23, 24, 25, 26);
function substitute (Char : Character; Key : sequence) return Character is
begin
if Char in lower then
return To_Lower (Key (alphabet (To_Lower (Char))));
elsif Char in upper then
return To_Upper (Key (alphabet (To_Lower (Char))));
else
return Char;
end if;
end substitute;
function is_duplicate (char : Character; Key : sequence) return Boolean is
count : Natural := 0;
begin
for C of Key loop
if C = char then
count := count + 1;
end if;
end loop;
return count > 1;
end is_duplicate;
Key : String (1 .. 26);
Invalid_Argument_Error : exception;
begin
if Argument_Count /= 1 then
Put_Line ("There must be exactly one command line argument.");
raise Invalid_Argument_Error;
end if;
if Argument (1)'Length /= 26 then
Put_Line ("The argument must contain 26 characters.");
raise Invalid_Argument_Error;
else
Key := Argument (1);
end if;
for C of Key loop
if is_duplicate (C, Key) then
Put_Line ("The argument cannot contain duplicate values.");
raise Invalid_Argument_Error;
end if;
end loop;
for C of Key loop
if not (C in lower or else C in upper) then
Put_Line ("The argument must contain only alphabetic characters.");
raise Invalid_Argument_Error;
end if;
end loop;
Put_Line ("Enter plain text:");
declare
input : String := Get_Line;
cipher : String (input'Range);
begin
for I in input'Range loop
cipher (I) := substitute (input (I), Key);
end loop;
Put_Line ("cipher text: " & cipher);
end;
end substitution;
Ada allows the starting procedure for a program to be named whatever the programmer wants to name it. In C the starting function must be named "main". In this example the starting procedure is named "substitution". Ada characters are full eight bit characters and represent the Latin-1 character set. The lower seven bits of the Latin-1 character set is the same as the ASCII character set. Thus, there are some lower case characters in the Latin-1 character set which are not part of the ASCII character set. For this reason the program defines the upper case characters and lower case characters unique to the ASCII character set by declaring two subtypes of the type character.
subtype lower is Character range 'a' .. 'z';
subtype upper is Character range 'A' .. 'Z';
The syntax 'a' .. 'z' defines a range of values and includes all the characters starting with 'a' and ending with 'z'.
A subtype of the Ada string type is named sequence and is declared to be a string indexed by the value range 1 .. 26. Thus, each instance of the subtype sequence must contain a 26 character string. Ada does not append a null character to the end of its strings.
The array named alphabet is defined to be a constant array indexed by the subtype lower. Each element of the array is an integer with a minimum value of 1. The array is initialized to the numbers 1 through 26 with 1 indexed by 'a' and 26 indexed by 'z'. This array is used as a look-up table for indexing into the key entered on the program command line.
The function named substitute takes two parameters; Char, which is a Character and Key which is a sequence (a 26 character string). Substitute returns the encrypted character value.
The function returns the character in the Key parameter indexed by the number which is indexed by the letter in the parameter Char. The array named alphabet becomes the look-up table for the index value corresponding to the character contained in the parameter Char.
The function named is_duplicate is used to determine if a character occurs more than once in a Key sequence. It simply counts the number of times the character in the Char parameter occurs in the Key sequence. The function returns TRUE if the count is greater than 1 and false if the count is not greater than 1.
After performing the necessary checks on the command-line parameter the program prompts for a string to encrypt and then simply assigns to the string cipher the encrypted character corresponding to each input character.

Leetcode Problem First Unique Character In a string

I am struggling to understand how the leetcode solution for the above problem works. If any help on how the post increment operator is working on the value of the array it would be great.
class Solution {
public int firstUniqChar(String s) {
int [] charArr = new int[26];
for(int i=0;i<s.length();i++){
charArr[s.charAt(i)-'a']++;
}
for(int i=0;i<s.length();i++){
if(charArr[s.charAt(i)-'a']==1) return i;
}
return -1;
}
The problem link here https://leetcode.com/problems/first-unique-character-in-a-string/submissions/!
First, you need to understand there are 26 letters in the English alphabet. So the code creates an array of 26 integers that will hold the count of each letter in the string.
int [] charArr = new int[26];
The count of all a's will be at index 0, the cound of all b's at index 1, etc. The default value for int is 0, so this gives an array of 26 zeros to start with.
Each letter has two character codes; one for upper case and one for lower case. The function String.charAt() returns a char but char is an integral type so you can do math on it. When doing math on a char, it uses the char code. So for example:
char c = 'B';
c -= 'A';
System.out.println((int)c); // Will print 1 since char code of 'A' = 65, 'B' = 66
So this line:
charArr[s.charAt(i)-'a']++;
Takes the char at i and subtracts 'a' from it. The range of lower case codes are 97-122. Subtracting 'a' shifts those values to 0-25 - which gives the indexes into the array. (Note this code only checks lower case letters).
After converting the character to an index, it increments the value at that index. So each item in the array represents the character count of the corresponding letter.
For example, the string "aabbee" will give the array {2, 2, 0, 0, 2, 0, 0....0}

find minimum steps required to change one binary string to another

Given two string str1 and str2 which contain only 0 or 1, there
are some steps to change str1 to str2,
step1: find a substring of str1 of length 2 and reverse the substring, and str1 becomes str1' (str1' != str1)
step2: find a substring of str1' of length 3, and reverse the substring, and str1' becomes str1'' (str1'' != str1')
the following steps are similar.
the string length is in the range [2, 30]
Requirement: each step must be performed once and we can not skip
previous steps and perform the next step.
If it is possible to change str1 to str2, output the minimum steps required, otherwise, output -1
Example 1
str1 = "1010", str2 = "0011", the minimum step required is 2
first, choose substring in range [2, 3], "1010" --> "1001",
then choose substring in the range [0, 2], "1001" --> "0011"
Example 2
str1 = "1001", str2 = "0110", it is impossible to change str1 to str2,
because in step1, str1 can be changed to "0101" or "1010", but in step3, it is impossible to change a length3 substring to make it different. So the output is -1.
Example 3
str1 = "10101010", str2 = "00101011", output is 7
I can not figure out example 3, because there are two many possibilities. Can anyone gives some hint on how to solve this problem? What is the type of this
problem? Is it dynamic programming?
This is in fact a dynamic programming problem. To solve it, we are going to try all possible permutations, but memoize the results along the way. It could seem that there are way too many options - there are 2^30 different binary strings of length 30, but keep in mind that reverting a string doesn't change number of zeroes and ones we have, so the upper bound is in fact 30 choose 15 = 155117520 when we have a string of 15 zeroes and ones. Around 150 million possible results is not too bad.
So starting with our start string, we are going to derive all possible string from each string we derived so far, until we generate end string. We are also going to track predecessors to reconstruct generation. Here's my code:
start = '10101010'
end = '00101011'
dp = [{} for _ in range(31)]
dp[1][start] = '' # Originally only start string is reachable
for i in range(2, len(start) + 1):
for s in dp[i - 1].keys():
# Try all possible reversals for each string in dp[i - 1]
for j in range(len(start) - i + 1):
newstr = s
newstr = newstr[:j] + newstr[j:j+i][::-1] + newstr[j+i:]
dp[i][newstr] = s
if end in dp[i]:
ans = []
cur = end
for j in range(i, 0, -1):
ans.append(cur)
cur = dp[j][cur]
print(ans[::-1])
exit(0)
print('Impossible!')
And for your third example, this gives us sequence ['10101010', '10101001', '10101100', '10100011', '00101011'] - from your str1 to str2. If you check differences between the strings, you'll see which transitions were made. So this transformation can be done in 4 steps rather than 7 like you suggested.
Lastly, this will be a bit slow for 30 in python, but if you rewrite it into C++, it's going to be a couple of seconds tops.
This Question can be solved using Backtracking. here is my C++ Code, Which runs smooth with my testcases. This Question Came in an OA of Persistent systems and i was a bit confused about the steps, but this is simple Backtracking. Wants your suggestions if Dp can Optimize my solution!.
//prabaljainn
#include <bits/stdc++.h>
using namespace std;
string s1,s2;
int ans=1e9; int n;
void rec(string s1,int level){
if(s1==s2){
ans = min(ans,level-2);
return;
}
for(int i=0; i<= n-level; i++){
reverse(s1.begin()+i, s1.begin()+i+level);
rec(s1,level+1);
reverse(s1.begin()+i, s1.begin()+i+level);
}
}
int main(){
cin>>s1>>s2;
n = s1.size();
rec(s1,2);
if(ans==1e9)
cout<<"-1"<<endl;
else
cout<<ans<<endl;
}
Happy coding
This problem can be solved using breadth-first search. The following solution uses a queue which stores a pair having the current string as the first member and current operation length(initially 2) as the second member. A set is used to store already visited strings to prevent entering redundant states. For current string, we reverse every substring of length k where k is current operation length and add it to the queue if it hasn't been seen before. If the current string equals the desired string then answer is 'current operation length-2'. If queue becomes empty, then the answer isn't possible.
string str1,str2;
cin>>str1>>str2;
queue<pair<string, int>> q;
set<string> s;
q.push({str1,2});
s.insert(str1);
while(!q.empty())
{
auto p=q.front();
q.pop();
if(p.first==str2)
{
cout<<p.second-2;
return 0;
}
if(p.second<=p.first.size())
{
for(int i=0;i<=p.first.size()-p.second;i++)
{
string x=p.first;
reverse(x.begin()+i,x.begin()+i+p.second);
if(s.find(x)==s.end())
{
q.push({x,p.second+1});
s.insert(x);
}
}
}
}
cout<<-1;
save str1 as start of BFS and at each step,reverse values of all substrings of length 2 and 3 and see if the new strings formed after reversing have been seen previously or not.....if not seen....push them in the queue and also maintain count of steps...if the string at the front of queue is str2 at any time...that step is the answer

Pattern matching a string in linear time

Given two strings S and T, where the T is the pattern string. Find if any scrambled form of pattern string exists as SubString in the string S and if present return the start index.
Example:
String S: abcdef
String T: efd
String S has "def", a combination of search string T: "efd".
I have found a solution with a run time of O(m*n). I am working on a linear time solution where I used to HashMaps (static one, maintained for String T, and another a dynamic copy of the previous HashMap used for checking the current substring of T). I'd start checking at the next character where it fails. But this runs in O(m*n) in worst case.
I'd like to get some pointers to make it work in O(m+n) time. Any help would be appreciated.
First of all, I would like to know boundaries for string S length (m) and pattern T length (n).
There exist one general idea but complexity of the solution based on it depends on the pattern length. Complexity varies from O(m) to O(m*n^2) for short patterns with length<=100 and O(n) for long patterns.
Fundamental theorem of arithmetic states that every integer number can be uniquely represented as a product of prime numbers.
Idea - I guess, your alphabet is english letters. So, alphabet size is 26. Let's replace first letter with first prime, second letter with the second and so on. I mean the following replacement: a->2b->3c->5d->7e->11 and so on.
Let's denote product of primes corresponding for the letters of some string as prime product(string). For example, primeProduct(z) will be 101 as 101 is 26-th prime number, primeProduct(abc) will be 2*3*5=30,primeProduct(cba) will also be 5*3*2=30.
Why we choose prime numbers? If we replace a ->2; b ->3, c->4, we won't be able to decipher for exapmle 4 - is it "c" or "aa".
Solution for the short patterns case:
For the string S, we should calculate in linear time prime product for all prefixes. I mean we have to create array A such that A[0] = primeProduct(S[0]), A[1] = primeProduct(S[0]S[1]), A[N] = primeProduct(S). Sample implementation:
A[0] = getPrime(S[0]);
for(int i=1;i<S.length;i++)
A[i]=A[i-1]*getPrime(S[i]);
Searching pattern T. Calculate primeProduct(T). For all 'windows' in S which have the same length with pattern compare it's primeProduct with primeProduct(pattern). If currentWindow is equal to the pattern or currentWindow is a scrumbled form(anagramm) of the pattern primeProducts will be the same.
Important note! We have prepared array A for fast computing primeProduct for any substring of S. primeProduct of(S[i],S[i+1],...S[j]) = getPrime(S[i])*...*getPrime(S[j]) = A[j]/A[i-1];
Complexity: if pattern length is <=9, even 'zzzzzzzzz' is 101^9<=MAX_LONG_INT; All calculations fit in standart long type and complexity is O(N)+O(M) where N is for calculating primeProduct of pattern and M is iterating over all windows in S. If length<=100 you have to add complexity of mul/div long numbers that's why complexity becomes O(m*n^2). length of 101^length is O(N) mul/div of such long numbers is O(N^2)
For the long patterns with length>=1000 it's better to store some hash map(prime,degree). Array of prefixes will become array of hash maps and A[j]/A[i-1] trick will become differenceBetween(A[j] and A[i-1] hashmaps's key sets).
Would this JavaScript example be linear time?
<script>
function matchT(t,s){
var tMap = [], answer = []
//map the character count in t
for (var i=0; i<t.length; i++){
var chr = t.charCodeAt(i)
if (tMap[chr]) tMap[chr]++
else tMap[chr] = 1
}
//traverse string
for (var i=0; i<s.length; i++){
if (tMap[s.charCodeAt(i)]){
var start = i, j = i + 1, tmp = []
tmp[s.charCodeAt(i)] = 1
while (tMap[s.charCodeAt(j)]){
var chr = s.charCodeAt(j++)
if (tmp[chr]){
if (tMap[chr] > tmp[chr]) tmp[chr]++
else break
}
else tmp[chr] = 1
}
if (areEqual (tmp,tMap)){
answer.push(start)
i = j - 1
}
}
}
return answer
}
//function to compare arrays
function areEqual(arr1,arr2){
if (arr1.length != arr2.length) return false
for (var i in arr1)
if (arr1[i] != arr2[i]) return false
return true
}
</script>
Output:
console.log(matchT("edf","ghjfedabcddef"))
[3, 10]
If the alphabet is not too large (say, ASCII), then there is no need to use a hash to take care of strings.
Just use a big array which is of the same size as the alphabet, and the existence checking becomes O(1). Thus the whole algorithm becomes O(m+n).
Let us consider for the given example,
String S: abcdef
String T: efd
Create a HashSet which consists of the characters present in the Substring T. So, the set consists of .
Generate a label for the Substring T: 1e1f1d. (number of occurences of each characters + the character itself, can be done using technique similar to count sort)
Now we have to generate labels for the input of the sub-string's length.
Let us start from the first position, which has character a. Since it is not present we do not create any sub-string and move to the next character b. Similarly, to character c and then stop at d.
Since d is present in the HashSet start generating labels(of the sub-string length) for each time the character appears. We can do this in different function to avoid clearing the count array(doing this reduces the complexity from O(m*n) to O(m+n)). If at any point the input string does not consists of the Substring T we can start the label generation from the next position(since the position till the break occurred cannot be a part of the anagram).
So, by generating the labels we can solve the problem in linear O(m+n) time complexity.
m: length of the input string,
n: length of the sub string.
That Code below I used for the pattern searching questions in GFG its accepted in all test cases and works in linear time.
// { Driver Code Starts
import java.util.*;
class Implement_strstr
{
public static void main(String args[])
{
Scanner sc = new Scanner(System.in);
int t = sc.nextInt();
sc.nextLine();
while(t>0)
{
String line = sc.nextLine();
String a = line.split(" ")[0];
String b = line.split(" ")[1];
GfG g = new GfG();
System.out.println(g.strstr(a,b));
t--;
}
}
}// } Driver Code Ends
class GfG
{
//Function to locate the occurrence of the string x in the string s.
int strstr(String a, String d)
{
if(a.equals("") && d.equals("")) return 0;
if(a.length()==1 && d.length()==1 && a.equals(d)) return 0;
if(d.length()==1 && a.charAt(a.length()-1)==d.charAt(0)) return a.length()-1;
int t=0;
int pl=-1;
boolean b=false;
int fl=-1;
for(int i=0;i<a.length();i++)
{
if(pl!=-1)
{
if(i==pl+1 && a.charAt(i)==d.charAt(t))
{
t++;
pl++;
if(t==d.length())
{
b=true;
break;
}
}
else
{
fl=-1;
pl=-1;
t=0;
}
}
else
{
if(a.charAt(i)==d.charAt(t))
{
fl=i;
pl=i;
t=1;
}
}
}
return b?fl:-1;
}
}
Here is the link to the question https://practice.geeksforgeeks.org/problems/implement-strstr/1

How to tokenize a striped string based on a list of patterns

Given a string S and a list L of patterns [L1, ..., Ln], how would you find the list of all tokens in S matching a pattern in L and so that the total number of matched letters in S is maximized?
A dummy example would be S = "thenuke", L = {"the", "then", "nuke"} and we would like to retrieve ["the", "nuke"] as if we start by matching "then", we do not get the solution maximizing the total number of letters in S being matched.
I have been looking at other SO questions, string matching algorithms but found nothing to efficiently solve the maximization part of the problem.
This must have been studied e.g. in bioinformatics but I'm not in the field so any help (including link to academic papers) deeply appreciated!
This can be solved in O(|S| + |L| + k) time, where k is the total number of matches of all strings from L in S. There are two major steps:
Run Aho-Corasick. This will give you all matches of any string from L in S. This runs in the same time as mentioned above.
Initialize an array, A, of integers of length |S| + 1 to all zeros. March through the array, at position i set A[i] to A[i-1] if it is larger, then for every match, M, from L in S at position i, set A[i+|M|] to the max of A[i+|M|] and A[i] + |M|.
Here is some code, in Go, that does exactly this. It uses a package I wrote that has a convenient wrapper for calling Aho-Corasick.
package main
import (
"fmt"
"github.com/runningwild/stringz"
)
func main() {
input := []byte("thenuke")
patterns := [][]byte{[]byte("hen"), []byte("thenu"), []byte("uke")}
// This runs Aho-Corasick on the input string and patterns, it returns a
// map, matches, such that matches[i] is a list of indices into input where
// patterns[i] matches the input string. The running time of this is
// O(|input| + |patterns| + k) and uses O(|patterns| + k) auxillary storage,
// where k is the total number of matches found.
find := stringz.FindSet(patterns)
matches := find.In(input)
// We want to invert the map so that it maps from index to all strings that
// match at that index.
at_pos := make([][]int, len(input)+1)
for index, hits := range matches {
for _, hit := range hits {
at_pos[hit] = append(at_pos[hit], index)
}
}
// Now we do a single pass through the string, at every position we will
// keep track of how many characters in the input string we can match up to
// that point.
max := make([]int, len(input)+1)
for i := range max {
// If this position isn't as good as the previous position, then we'll use
// the previous position. It just means that there is a character that we
// were unable to match anything to.
if i > 0 && max[i-1] > max[i] {
max[i] = max[i-1]
}
// Look through any string that matches at this position, if its length is
// L, then L positions later in the string we can have matched L more
// character if we use this string now. This might mean that we don't use
// another string that earlier we thought we'd be matching right now, we'll
// find out later which one was better.
for _, hit := range at_pos[i] {
L := len(patterns[hit])
if i+L < len(max) && max[i+L] < max[i]+L {
max[i+L] = max[i] + L
}
}
}
fmt.Printf("%v\n", max)
}
You can solve this in time O(|L||S|) by dynamic programming: build up iteratively a table giving the best match for each initial substring of S = s1s2...sn:
B(0), the best match for the zero-length initial substring of S, is the empty match.
Suppose we have already computed the best match, B(i), for each i < k, and we want now to compute B(k). Let p be a pattern in L, with length |p|, and let j = k − |p| + 1. If p = sj...sk then there is a match for s1s2...sk that consists of B(j) followed by p. Let B(k) be the best such match found after considering all the patterns in L.
B(n) is the best match for the whole of S.

Resources