How to detect palindrome cycle length in a string? - string

Suppose a string is like this "abaabaabaabaaba", the palindrome cycle here is 3, because you can find the string aba at every 3rd position and you can augment the palindrome by concatenating any number of "aba"s to the string.
I think it's possible to detect this efficiently using Manacher's Algorithm but how?

You can find it easily by searching the string S in S+S. The first index you find is the cycle number you want (may be the entire string). In python it would be something like:
In [1]: s = "abaabaabaabaaba"
In [2]: print (s+s).index(s, 1)
3
The 1 is there to ignore the index 0, that would be a trivial match.

Related

Finding position of first letter in subtring in list of strings (Python 3)

I have a list of strings, and I'm trying to find the position of the first letter of the substring I am searching for in the list of strings. I'm using the find() method to do this, however when I try to print the position of the first letter Python returns the correct position but then throws a -1 after it, like it couldn't find the substring, but only after it could find it. I want to know how to return the position of the first letter of he substring without returning a -1 after the correct value.
Here is my code:
mylist = ["blasdactiverehu", "sdfsfgiuyremdn"]
word = "active"
if any(word in x for x in mylist) == True:
for x in mylist:
position = x.find(word)
print(position)
The output is:
5
-1
I expected the output to just be:
5
I think it may be related to the fact the loop is searching for the substring for every string in the list and after it's found the position it still searches for more but of course returns an error as there is only one occurrence of the substring "active", however I'm not sure how to stop searching after successfully finding one substring. Any help is appreciated, thank you.
Indeed your code will not work as you want it to, since given that any of the words contain the substring, it will do the check for each and every one of them.
A good way to avoid that is using a generator. More specifically, next()
default_val = '-1'
position = next((x.find(word) for x in mylist if word in x), default_val)
print(position)
It will simply give you the position of the substring "word" for the first string "x" that will qualify for the condition if word in x, in the list 'mylist'.
By the way, no need to check for == True when using any(), it already returns True/False, so you can simply do if any(): ...

find number of repeating substrings in a string

I am looking for an algorithm that will find the number of repeating substrings in a single string.
For this, I was looking for some dynamic programming algorithms but didn't find any that would help me. I just want some tutorial on how to do this.
Let's say I have a string ABCDABCDABCD. The expected output for this would be 3, because there is ABCD 3 times.
For input AAAA, output would be 4, since A is repeated 4 times.
For input ASDF, output would be 1, since every individual character is repeated 1 time only.
I hope that someone can point me in the right direction. Thank you.
I am taking the following assumptions:
The repeating substrings must be consecutive. That is, in case of ABCDABC, ABC would not count as a repeating substring, but it would in case of ABCABC.
The repeating substrings must be non-overalpping. That is, in case of ABCABC, ABC would not count as a repeating substring.
In case of multiple possible answers, we want the one with the maximum value. That is, in the case of AAAA, the answer should be 4 (a is the substring) rather than 2 (aa is the substring).
Under these assumptions, the algorithm is as follows:
Let the input string be denoted as inputString.
Calculate the KMP failure function array for the input string. Let this array be denoted as failure[]. This operation if of linear time complexity with respect to the length of the string. So, by definition, failure[i] denotes the length of the longest proper-prefix of the substring inputString[0....i] that is also a proper-suffix of the same substring.
Let len = inputString.length - failure.lastIndexValue. At this point, we know that if there is any repeating string at all, then it has to be of this length len. But we'll need to check for that; First, just check if len perfectly divides inputString.length (that is, inputString.length % len == 0). If yes, then check if every consecutive (non-overlapping) substring of len characters is the same or not; this operation is again of linear time complexity with respect to the length of the input string.
If it turns out that every consecutive non-overlapping substring is the same, then the answer would be = inputString.length/ len. Otherwise, the answer is simply inputString.length, as there is no such repeating substring present.
The overall time complexity would be O(n), where n is the number of characters in the input string.
A sample code for calculating the KMP failure array is given here.
For example,
Let the input string be abcaabcaabca.
Its KMP failure array would be - [0, 0, 0, 1, 1, 2, 3, 4, 5, 6, 7, 8].
So, our len = (12 - 8) = 4.
And every consecutive non-overlapping substring of length 4 is the same (abca).
Therefore the answer is 12/4 = 3. That is, abca is repeated 3 times repeatedly.
The solution for this with C# is:
class Program
{
public static string CountOfRepeatedSubstring(string str)
{
if (str.Length < 2)
{
return "-1";
}
StringBuilder substr = new StringBuilder();
// Length of the substring cannot be greater than half of the actual string
for (int i = 0; i < str.Length / 2; i++)
{
// We will iterate through half of the actual string and
// create a new string by appending the current character to the previous character
substr.Append(str[i]);
String clearedOfNewSubstrings = str.Replace(substr.ToString(), "");
// We will remove the newly created substring from the actual string and
// check if the length of the actual string, cleared of the newly created substring, is 0.
// If 0 it tells us that it is only made of its substring
if (clearedOfNewSubstrings.Length == 0)
{
// Next we will return the count of the newly created substring in the actual string.
var countOccurences = Regex.Matches(str, substr.ToString()).Count;
return countOccurences.ToString();
}
}
return "-1";
}
static void Main(string[] args)
{
// Input: {"abcdaabcdaabcda"}
// Output: 3
// Input: { "abcdaabcdaabcda" }
// Output: -1
// Input: {"barrybarrybarry"}
// Output: 3
var s = "asdf"; // Output will be -1
Console.WriteLine(CountOfRepeatedSubstring(s));
}
}
How do you want to specify the "repeating string"? Is it simply the first group of characters up until either a) the first character is found again, b) the pattern begins to repeat, or c) some other criteria?
So, if your string is "ABBAABBA", is that a 2 because "ABBA" repeats twice or is it 1 because you have "ABB" followed by "AAB"? What about "ABCDABCE" -- does "ABC" count (despite the "D" in between repetitions?) In "ABCDABCABCDABC", is the repeating string "ABCD" (1) or "ABCDABC" (2)?
What about "AAABBAAABB" -- is that 3 ("AAA") or 2 ("AAABB")?
If the end of the repeating string is another instance of the first letter, it's pretty simple:
Work your way through the string character by character, putting each character into another variable as you go, until the next character matches the first one. Then, given the length of the substring in your second variable, check the next bit of your string to see if it matches. Continue until it doesn't match or you hit the end of the string.
If you just want to find any length pattern that repeats regardless of whether the first character is repeated within the pattern, it gets more complicated (but, fortunately, it's the sort of thing computers are good at).
You'll need to go character by character building a pattern in another variable as above, but you'll also have to watch for the first character to reappear and start building a second substring as you go, to see if it matches the first. This should probably go in an array as you might encounter a third (or more) instance of the first character which would trigger the need to track yet another possible match.
It's not difficult but there is a lot to keep track of and it's a rather annoying problem. Is there a particular reason you're doing this?

Trying to understand a Python line code

I am new to python, and when I search for a way to get a string length without using "len()", I found this answer:
sum([1 for _ in "your string goes here"])
Can someone help me understand this line,what's the '1' doing there for example?
This is basically equivalent to this:
lst = []
for dontCareAboutTheName in "your string goes here":
lst.append(1)
print(sum(lst))
The list comprehension basically collects the number 1 for each character it finds while looping through the string. So the list will contain exactly as many elements as the length of the string. And since all those list elements are 1, when calculating the sum of all those elements, you end up with the length of the string.

How to write a method that takes a string and returns the longest valid substring

I have been practicing interview questions with a friend, and he threw me this one he made up:
Given a method that tells you if a string is valid, write a method that takes a string, and returns the longest valid substring (without reordering the characters).
My first brute force solution would be to find all of the subsets of the input string, and then plug them through (longest to shortest) the given method till a valid string is found and return that.
But that obviously isn't good enough.
So I was trying to think of it this way:
Check the input string
Check all of the subsets of the inputString, with length == inputString length - 1
So and and so forth until all of the subsets with length 1 are checked, and then return false
The problem in my head, then, is that in order for this to be optimal, we want to utilize the fact that we only care for the longest valid string. If I were to check each subset recursively, then I would be doing a depth-first traversal of the subsets, when I'm really looking for a breadth-first, so I can find the longest quicker.
Once I realized that, I got stuck. I couldn't even come up with pseudo code to tackle this problem.
Is a "breadth-first" search of the subsets of a string even possible?
The closest solution I could find was on the math stackexchange, somebody posted a promising looking answer-- https://math.stackexchange.com/questions/89419/algorithm-wanted-enumerate-all-subsets-of-a-set-in-order-of-increasing-sums
but it unfortunately is pretty hard for me to comprehend.
Would the best solution just be a depth-first recursive iteration through all of the subsets and return the longest valid string from there?
string in
for int sub_len in len(in) , 1 //length of the substring must be smaller than/equal to
//the length of the input and atleast 1
for int sub_offset in 0 , len(in) - sub_len
//the offset of the string must be in [0 , n]
//where n is the number of characters that are not in the
//substring
string sub = substring(in , sub_offset , sub_len)
if isValid(sub)
return sub
This generates all possible substrings for a given input (in) and returns the first/longest valid substring.

Given a word, convert it into a palindrome with minimum addition of letters to it [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
Here is a pretty interesting interview question:
Given a word, append the fewest number of letters to it to convert it into a palindrome.
For example, if "hello" is the string given, the result should be "hellolleh." If "coco" is given, the result should be "cococ."
One approach I can think of is to append the reverse of the string to the end of the original string, then try to eliminate the extra characters from the end. However, I can't figure out how to do this efficiently. Does anyone have any ideas?
Okay! Here's my second attempt.
The idea is that we want to find how many of the characters at the end of the string can be reused when appending the extra characters to complete the palindrome. In order to do this, we will use a modification of the KMP string matching algorithm. Using KMP, we search the original string for its reverse. Once we get to the very end of the string, we will have as much a match as possible between the reverse of the string and the original string that occurs at the end of the string. For example:
HELLO
O
1010
010
3202
202
1001
1001
At this point, KMP normally would say "no match" unless the original string was a palindrome. However, since we currently know how much of the reverse of the string was matched, we can instead just figure out how many characters are still missing and then tack them on to the end of the string. In the first case, we're missing LLEH. In the second case, we're missing 1. In the third, we're missing 3. In the final case, we're not missing anything, since the initial string is a palindrome.
The runtime of this algorithm is the runtime of a standard KMP search plus the time required to reverse the string: O(n) + O(n) = O(n).
So now to argue correctness. This is going to require some effort. Consider the optimal answer:
| original string | | extra characters |
Let's suppose that we are reading this backward from the end, which means that we'll read at least the reverse of the original string. Part of this reversed string extends backwards into the body of the original string itself. In fact, to minimize the number of characters added, this has to be the largest possible number of characters that ends back into the string itself. We can see this here:
| original string | | extra characters |
| overlap |
Now, what happens in our KMP step? Well, when looking for the reverse of the string inside itself, KMP will keep as long of a match as possible at all times as it works across the string. This means that when the KMP hits the end of the string, the matched portion it maintains will be the longest possible match, since KMP only moves the starting point of the candidate match forward on a failure. Consequently, we have this longest possible overlap, so we'll get the shortest possible number of characters required at the end.
I'm not 100% sure that this works, but it seems like this works in every case I can throw at it. The correctness proof seems reasonable, but it's a bit hand-wavy because the formal KMP-based proof would probably be a bit tricky.
Hope this helps!
To answer I would take this naive approach:
when we need 0 characters? when string it's a palindrome
when we need 1 character? when except the first character string is a palindrome
when we need 2 characters? when except the 2 start characters the string is a palindrome
etc etc...
So an algorithm could be
for index from 1 to length
if string.right(index) is palindrome
return string + reverse(string.left(index))
end
next
edit
I'm not much a Python guy, but a simple minded implementation of the the above pseudo code could be
>>> def rev(s): return s[::-1]
...
>>> def pal(s): return s==rev(s)
...
>>> def mpal(s):
... for i in range(0,len(s)):
... if pal(s[i:]): return s+rev(s[:i])
...
>>> mpal("cdefedcba")
'cdefedcbabcdefedc'
>>> pal(mpal("cdefedcba"))
True
Simple linear time solution.
Let's call our string S.
Let f(X, P) be the length of the longest common prefix of X and P. Compute f(S[0], rev(S)), f(S[1], rev(S)), ... where S[k] is the suffix of S starting at position k. Obviously, you want to choose the minimum k such that k + f(S[k], rev(S)) = len(S). That means that you just have to append k characters at the end. If k is 0, the sting is already a palindrom. If k = len(S), then you need to append the entire reverse.
We need compute f(S[i], P) for all S[i] quickly. This is the tricky part. Create a suffix tree of S. Traverse the tree and update every node with the length of the longest common prefix with P. The values at the leaves correspond to f(S[i], P).
First make a function to test string for palindrome-ness, keeping in mind that "a" and "aa" are palindromes. They are palindromes, right???
If the input is a palindrome, return it (0 chars needed to be added)
Loop from x[length] down to x[1] checking if the subset of the string x[i]..x[length] is a palindrome, to find the longest palindrome.
Take the substring from the input string before the longest palindrome, reversing it and adding it to the end should make the shortest palindrome via appending.
coco => c+oco => c+oco+c
mmmeep => mmmee+p => mmmee+p+eemmm

Resources