What is the maximum number of comparisons required to search a string of length L in a text whose length is T using first pattern matching technique?
Knuth–Morris–Pratt algorithm gives you the time complexity of O(L+T).
Related
I could do longest common substring using two strings each time. But consider 3 strings below:
ABZDCC
ABZDEC
EFGHIC
Here we see that the lcs of the first two strings is ABZD. But when this will be compared to the third string, the length of lcs will be zero. But it is clear that the lcs is "C". How can I find the longest common substring of n strings using suffix array?
If you have a suffix array that contains all the suffixes of every input string, then for any string X that is a (contiguous) substring of all the input strings, there is a contiguous subarray in which every suffix starts with X, that includes a suffix from every input string.
Furthermore, if X is a longest common substring, then the subarray can be as small as possible, such that the first and last suffixes in the subarray are the only suffixes from their corresponding input strings.
To find the longest substring, then:
For each position in the suffix array, find the smallest subarray starting at that position that includes a suffix from every string. You can use the incrementing-two-pointers technique to do this in amortized constant time per entry.
For each such subarray, find the longest common prefix of the first and last entries, which will be a common prefix of every item in the subarray.
Remember the longest common prefix you find, which is the longest substring that occurs in every input.
When working with more than two strings, find all common substrings between the two shortest input strings and then start eliminating common substrings that aren't included in the other input strings.
When done, return the longest common remaining substring.
I have encountered this question while studying for algorithms test:
Given a set of k words (strings), with a total character count of n, (meaning the sum of all words lengths are n), perform some sort of manipulation on the words in O(n) time, such that whenever 2 words are being compared, return answer (whether they are identical or not) in O(1) time.
It's an interesting question but I could not find any direction to deal with it...
Construct a trie of all of the words, and for each word store the index of its last character in the array. This is a O(n) operation.
Given two words, they are the same if and only if the index of their last character is the same.
I have string "hrhrhrhrhr".
I want to find smallest sub-string of t such that we can make whole string by appending that sub-string in itself several time.
in this example i can make string "hrhrhrhrhr" by four time appending of "hr" with itself.
how to find this kind of substring?
fox example,
"abcabcabc" then "abc" is answer.
"ttttttt" -> "t" is answer.
"abcd" -> "abcd" is answer.
which algorithm or specific method i should use?
I would suggest you to take a look at string matching/search algorithms. Particularly, if you use KMP (Knuth-Morris Pratt) algorithm to search the string in itself, the lookup table would yield the pattern. In addition, the highest number in the table would give you the end character of the substring you are searching for (if the string is indeed composed of the repetition of one substring).
If I want to find the longest common substring for 2 strings then which approach will be more efficient in terms of time/space complexity: using suffix arrays of DP?
DP will incur O(m*n) space with O(m*n) time complexity, what will be the time complexity of the suffix array approach?
1) Calculate the suffixes O(m) + O(n)
2) Sort them O(m+n log2(m+n))
3) Finding longest common prefix for m+n-1 strings? [I'm not sure how to calculate #of comparisons]
Suffix arrays allow us to do many more things with the sub-strings (like search for sub-string etc.), but since in this case rest of the functions are not needed, will DP be considered an easier/cleaner approach?Which one should be used in the case where we are comparing 2 strings?
Also, what if we have more than 2 strings?
Suffix array would be better. The LCS(longest common substring for n strings) problem can be solve as below:
Concatenate S1, S2, ..., Sn as follows:
S = S1$1S2$2...$nSn, Here $i are special symbols (sentinels) that are different and
lexicographically less than other symbols of the initial alphabet.
Compute the suffix array. Generally, We implemented suffix array in O(n*log n) but there is an important algorithm called DC3 which computes suffix arrays in O(n), n is the total length of N strings. You can google this algorithm.
Compute the LCP of all adjacent suffixes.
Is it possible to still perform a O(n) time complexity to search multiple occurrences of Knuth–Morris–Pratt algorithm?
Suppose we have a string S[0,...,N]. Recall that the ith entry in the prefix array stores the length of the maximal prefix of S[0,...,i] that matches the suffix.
We can calculate the prefix array P for pattern$subject (assuming that $ doesn't occur in subject). It remains to find indices such that P[i]==length(pattern), which can be done in linear time.