Given a word, convert it into a palindrome with minimum addition of letters to it [closed] - string

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
Here is a pretty interesting interview question:
Given a word, append the fewest number of letters to it to convert it into a palindrome.
For example, if "hello" is the string given, the result should be "hellolleh." If "coco" is given, the result should be "cococ."
One approach I can think of is to append the reverse of the string to the end of the original string, then try to eliminate the extra characters from the end. However, I can't figure out how to do this efficiently. Does anyone have any ideas?

Okay! Here's my second attempt.
The idea is that we want to find how many of the characters at the end of the string can be reused when appending the extra characters to complete the palindrome. In order to do this, we will use a modification of the KMP string matching algorithm. Using KMP, we search the original string for its reverse. Once we get to the very end of the string, we will have as much a match as possible between the reverse of the string and the original string that occurs at the end of the string. For example:
HELLO
O
1010
010
3202
202
1001
1001
At this point, KMP normally would say "no match" unless the original string was a palindrome. However, since we currently know how much of the reverse of the string was matched, we can instead just figure out how many characters are still missing and then tack them on to the end of the string. In the first case, we're missing LLEH. In the second case, we're missing 1. In the third, we're missing 3. In the final case, we're not missing anything, since the initial string is a palindrome.
The runtime of this algorithm is the runtime of a standard KMP search plus the time required to reverse the string: O(n) + O(n) = O(n).
So now to argue correctness. This is going to require some effort. Consider the optimal answer:
| original string | | extra characters |
Let's suppose that we are reading this backward from the end, which means that we'll read at least the reverse of the original string. Part of this reversed string extends backwards into the body of the original string itself. In fact, to minimize the number of characters added, this has to be the largest possible number of characters that ends back into the string itself. We can see this here:
| original string | | extra characters |
| overlap |
Now, what happens in our KMP step? Well, when looking for the reverse of the string inside itself, KMP will keep as long of a match as possible at all times as it works across the string. This means that when the KMP hits the end of the string, the matched portion it maintains will be the longest possible match, since KMP only moves the starting point of the candidate match forward on a failure. Consequently, we have this longest possible overlap, so we'll get the shortest possible number of characters required at the end.
I'm not 100% sure that this works, but it seems like this works in every case I can throw at it. The correctness proof seems reasonable, but it's a bit hand-wavy because the formal KMP-based proof would probably be a bit tricky.
Hope this helps!

To answer I would take this naive approach:
when we need 0 characters? when string it's a palindrome
when we need 1 character? when except the first character string is a palindrome
when we need 2 characters? when except the 2 start characters the string is a palindrome
etc etc...
So an algorithm could be
for index from 1 to length
if string.right(index) is palindrome
return string + reverse(string.left(index))
end
next
edit
I'm not much a Python guy, but a simple minded implementation of the the above pseudo code could be
>>> def rev(s): return s[::-1]
...
>>> def pal(s): return s==rev(s)
...
>>> def mpal(s):
... for i in range(0,len(s)):
... if pal(s[i:]): return s+rev(s[:i])
...
>>> mpal("cdefedcba")
'cdefedcbabcdefedc'
>>> pal(mpal("cdefedcba"))
True

Simple linear time solution.
Let's call our string S.
Let f(X, P) be the length of the longest common prefix of X and P. Compute f(S[0], rev(S)), f(S[1], rev(S)), ... where S[k] is the suffix of S starting at position k. Obviously, you want to choose the minimum k such that k + f(S[k], rev(S)) = len(S). That means that you just have to append k characters at the end. If k is 0, the sting is already a palindrom. If k = len(S), then you need to append the entire reverse.
We need compute f(S[i], P) for all S[i] quickly. This is the tricky part. Create a suffix tree of S. Traverse the tree and update every node with the length of the longest common prefix with P. The values at the leaves correspond to f(S[i], P).

First make a function to test string for palindrome-ness, keeping in mind that "a" and "aa" are palindromes. They are palindromes, right???
If the input is a palindrome, return it (0 chars needed to be added)
Loop from x[length] down to x[1] checking if the subset of the string x[i]..x[length] is a palindrome, to find the longest palindrome.
Take the substring from the input string before the longest palindrome, reversing it and adding it to the end should make the shortest palindrome via appending.
coco => c+oco => c+oco+c
mmmeep => mmmee+p => mmmee+p+eemmm

Related

Is there anything else used instead of slicing the String?

This is one of the practice problems from Problem solving section of Hackerrank. The problem statement says
Steve has a string of lowercase characters in range ascii[‘a’..’z’]. He wants to reduce the string to its shortest length by doing a series of operations. In each operation he selects a pair of adjacent lowercase letters that match, and he deletes them.
For example : 'aaabbccc' -> 'ac' , 'abba' -> ''
I have tried solving this using slicing of strings but this gives me timeout runtime error on larger strings. Is there anything else to be used?
My code:
s = list(input())
i=1
while i<len(s):
if s[i]==s[i-1]:
s = s[:i-1]+s[i+1:]
i = i-2
i+=1
if len(s)==0:
print("Empty String")
else:
print(''.join(s))
This gives me terminated due to timeout message.
Thanks for your time :)
Interning each new immutable string can be expensive,
as it has O(N) linear cost with the length of the string.
Consider processing "aa" * int(1e6).
You will write on the order of 1e12 characters to memory
by the time you're finished.
Take a moment (well, take linear time) to
copy each character over to a mutable list element:
[c for c in giant_string]
Then you can perform dup processing by writing a tombstone
of "" to each character you wish to delete,
using just constant time.
Finally, in linear time you can scan through the survivors using "".join( ... )
One other possible solution is to use regex. The pattern ([a-z])\1 matches a duplicate lowercase letter. The implementation would involve something like this:
import re
pattern = re.compile(r'([a-z])\1')
while pattern.search(s): # While match is found
s = pattern.sub('', s) # Remove all matches from "s"
I'm not an expert at efficiency, but this seems to write fewer strings to memory than your solution. For the case of "aa" * int(1e6) that J_H mentioned, it will only write one, thanks to pattern.sub replacing all occurances at once.

Minimum Character that needed to be deleted

Original Problem:
A word was K-good if for every two letters in the word, if the first appears x times and the second appears y times, then |x - y| ≤ K.
Given some word w, how many letters does he have to remove to make it K-good?
Problem Link.
I have solved the above problem and i not asking solution for the above
problem
I just misread the statement for first time and just thought how can we solve this problem in linear line time , which just give rise to a new problem
Modification Problem
A word was K-good if for every two consecutive letters in the word, if the first appears x times and the second appears y times, then |x - y| ≤ K.
Given some word w, how many letters does he have to remove to make it K-good?
Is this problem is solvable in linear time , i thought about it but could not find any valid solution.
Solution
My Approach: I could not approach my crush but her is my approach to this problem , try everything( from movie Zooptopia)
i.e.
for i range(0,1<<n): // n length of string
for j in range(0,n):
if(i&(1<<j) is not zero): delete the character
Now check if String is K good
For N in Range 10^5. Time Complexity: Time does not exist in that dimension.
Is there any linear solution to this problem , simple and sweet like people of stackoverflow.
For Ex:
String S = AABCBBCB and K=1
If we delete 'B' at index 5 so String S = AABCBCB which is good string
F[A]-F[A]=0
F[B]-F[A]=1
F[C]-F[B]=1
and so on
I guess this is a simple example there can me more complex example as deleting an I element makens (I-1) and (I+1) as consecutive
Is there any linear solution to this problem?
Consider the word DDDAAABBDC. This word is 3-good, becauseDandCare consecutive and card(D)-card(C)=3, and removing the lastDmakes it 1-good by makingDandCnon-consecutive.
Inversely if I consider DABABABBDC which is 2-good, removing the lastDmakes CandBconsecutive and increases the K-value of the word to 3.
This means that in the modified problem, the K-value of a word is determined by both the cardinals of each letter and the cardinals of each couple of consecutive letters.
By removing a letter, I reduce its cardinal of the letter as well as the cardinals of the pairs to which it belongs, but I also increase the cardinal of other pair (potentially creating new ones).
It is also important to notice that if in the original problem, all letters are equivalent (I can remove any indifferently), while it is no longer the case in the modified problem.
As a conclusion, I think we can safely assume that the "consecutive letters" constrain makes the problem not solvable in linear time for any alphabet/word.
Instead of finding the linear time solution, which i think doesn't exist (among others because there seem to be a multitude of alternative solutions to each K request), i'd like to preset the totally geeky solution.
Namely, take the parallel array processing language Dyalog APL and create these two tiny dynamic functions:
good←{1≥⍴⍵:¯1 ⋄ b←(⌈/a←(∪⍵)⍳⍵)⍴0 ⋄ b[a]+←1 ⋄ ⌈/|2-/b[a]}
make←{⍵,(good ⍵),a,⍺,(l-⍴a←⊃b),⍴b←(⍺=good¨b/¨⊂⍵)⌿(b←↓⍉~(l⍴2)⊤0,⍳2⊥(l←⍴⍵)⍴1)/¨⊂⍵}
good tells us the K-goodness of a string. A few examples below:
// fn" means the fn executes on each of the right args
good" 'AABCBBCB' 'DDDAAABBDC' 'DDDAAABBC' 'DABABABBDC' 'DABABABBC' 'STACKOVERFLOW'
2 3 1 2 3 1
make takes as arguments
[desired K] make [any string]
and returns
- original string
- K for original string
- reduced string for desired K
- how many characters were removed to achieve deired K
- how many possible solutions there are to achieve desired K
For example:
3 make 'DABABABBDC'
┌──────────┬─┬─────────┬─┬─┬──┐
│DABABABBDC│2│DABABABBC│3│1│46│
└──────────┴─┴─────────┴─┴─┴──┘
A little longer string:
1 make 'ABCACDAAFABBC'
┌─────────────┬─┬────────┬─┬─┬────┐
│ABCACDAAFABBC│4│ABCACDFB│1│5│3031│
└─────────────┴─┴────────┴─┴─┴────┘
It is possible to both increase and decrease the K-goodness.
Unfortunately, this is brute force. We generate the 2-base of all integers between 2^[lenght of string] and 1, for example:
0 1 0 1 1
Then we test the goodness of the substring, for example of:
0 1 0 1 1 / 'STACK' // Substring is now 'TCK'
We pick only those results (substrings) that match the desired K-good. Finally, out of the multitude of possible results, we pick the first one, which is the one with most characters left.
At least this was fun to code :-).

Find the minimal lexographical string formed by merging two strings

Suppose we are given two strings s1 and s2(both lowercase). We have two find the minimal lexographic string that can be formed by merging two strings.
At the beginning , it looks prettty simple as merge of the mergesort algorithm. But let us see what can go wrong.
s1: zyy
s2: zy
Now if we perform merge on these two we must decide which z to pick as they are equal, clearly if we pick z of s2 first then the string formed will be:
zyzyy
If we pick z of s1 first, the string formed will be:
zyyzy which is correct.
As we can see the merge of mergesort can lead to wrong answer.
Here's another example:
s1:zyy
s2:zyb
Now the correct answer will be zybzyy which will be got only if pick z of s2 first.
There are plenty of other cases in which the simple merge will fail. My question is Is there any standard algorithm out there used to perform merge for such output.
You could use dynamic programming. In f[x][y] store the minimal lexicographical string such that you've taken x charecters from the first string s1 and y characters from the second s2. You can calculate f in bottom-top manner using the update:
f[x][y] = min(f[x-1][y] + s1[x], f[x][y-1] + s2[y]) \\ the '+' here represents
\\ the concatenation of a
\\ string and a character
You start with f[0][0] = "" (empty string).
For efficiency you can store the strings in f as references. That is, you can store in f the objects
class StringRef {
StringRef prev;
char c;
}
To extract what string you have at certain f[x][y] you just follow the references. To udapate you point back to either f[x-1][y] or f[x][y-1] depending on what your update step says.
It seems that the solution can be almost the same as you described (the "mergesort"-like approach), except that with special handling of equality. So long as the first characters of both strings are equal, you look ahead at the second character, 3rd, etc. If the end is reached for some string, consider the first character of the other string as the next character in the string for which the end is reached, etc. for the 2nd character, etc. If the ends for both strings are reached, then it doesn't matter from which string to take the first character. Note that this algorithm is O(N) because after a look-ahead on equal prefixes you know the whole look-ahead sequence (i.e. string prefix) to include, not just one first character.
EDIT: you look ahead so long as the current i-th characters from both strings are equal and alphabetically not larger than the first character in the current prefix.

How to write a method that takes a string and returns the longest valid substring

I have been practicing interview questions with a friend, and he threw me this one he made up:
Given a method that tells you if a string is valid, write a method that takes a string, and returns the longest valid substring (without reordering the characters).
My first brute force solution would be to find all of the subsets of the input string, and then plug them through (longest to shortest) the given method till a valid string is found and return that.
But that obviously isn't good enough.
So I was trying to think of it this way:
Check the input string
Check all of the subsets of the inputString, with length == inputString length - 1
So and and so forth until all of the subsets with length 1 are checked, and then return false
The problem in my head, then, is that in order for this to be optimal, we want to utilize the fact that we only care for the longest valid string. If I were to check each subset recursively, then I would be doing a depth-first traversal of the subsets, when I'm really looking for a breadth-first, so I can find the longest quicker.
Once I realized that, I got stuck. I couldn't even come up with pseudo code to tackle this problem.
Is a "breadth-first" search of the subsets of a string even possible?
The closest solution I could find was on the math stackexchange, somebody posted a promising looking answer-- https://math.stackexchange.com/questions/89419/algorithm-wanted-enumerate-all-subsets-of-a-set-in-order-of-increasing-sums
but it unfortunately is pretty hard for me to comprehend.
Would the best solution just be a depth-first recursive iteration through all of the subsets and return the longest valid string from there?
string in
for int sub_len in len(in) , 1 //length of the substring must be smaller than/equal to
//the length of the input and atleast 1
for int sub_offset in 0 , len(in) - sub_len
//the offset of the string must be in [0 , n]
//where n is the number of characters that are not in the
//substring
string sub = substring(in , sub_offset , sub_len)
if isValid(sub)
return sub
This generates all possible substrings for a given input (in) and returns the first/longest valid substring.

algorithms for fast string approximate matching

Given a source string s and n equal length strings, I need to find a quick algorithm to return those strings that have at most k characters that are different from the source string s at each corresponding position.
What is a fast algorithm to do so?
PS: I have to claim that this is a academic question. I want to find the most efficient algorithm if possible.
Also I missed one very important piece of information. The n equal length strings form a dictionary, against which many source strings s will be queried upon. There seems to be some sort of preprocessing step to make it more efficient.
My gut instinct is just to iterate over each String n, maintaining a counter of how many characters are different than s, but I'm not claiming it is the most efficient solution. However it would be O(n) so unless this is a known performance problem, or an academic question, I'd go with that.
Sedgewick in his book "Algorithms" writes that Ternary Search Tree allows "to locate all words within a given Hamming distance of a query word". Article in Dr. Dobb's
Given that the strings are fixed length, you can compute the Hamming distance between two strings to determine the similarity; this is O(n) on the length of the string. So, worst case is that your algorithm is O(nm) for comparing your string against m words.
As an alternative, a fast solution that's also a memory hog is to preprocess your dictionary into a map; keys are a tuple (p, c) where p is the position in the string and c is the character in the string at that position, values are the strings that have characters at that position (so "the" will be in the map at {(0, 't'), "the"}, {(1, 'h'), "the"}, {(2, 'e'), "the"}). To query the map, iterate through query string's characters and construct a result map with the retrieved strings; keys are strings, values are the number of times the strings have been retrieved from the primary map (so with the query string "the", the key "thx" will have a value of 2, and the key "tee" will have a value of 1). Finally, iterate through the result map and discard strings whose values are less than K.
You can save memory by discarding keys that can't possibly equal K when the result map has been completed. For example, if K is 5 and N is 8, then when you've reached the 4th-8th characters of the query string you can discard any retrieved strings that aren't already in the result map since they can't possibly have 5 matching characters. Or, when you've finished with the 6th character of the query string, you can iterate through the result map and remove all keys whose values are less than 3.
If need be you can offload the primary precomputed map to a NoSql key-value database or something along those lines in order to save on main memory (and also so that you don't have to precompute the dictionary every time the program restarts).
Rather than storing a tuple (p, c) as the key in the primary map, you can instead concatenate the position and character into a string (so (5, 't') becomes "5t", and (12, 'x') becomes "12x").
Without knowing where in each input string the match characters will be, for a particular string, you might need to check every character no matter what order you check them in. Therefore it makes sense to just iterate over each string character-by-character and keep a sum of the total number of mismatches. If i is the number of mismatches so far, return false when i == k and true when there are fewer than k-i unchecked characters remaining in the string.
Note that depending on how long the strings are and how many mismatches you'll allow, it might be faster to iterate over the whole string rather than performing these checks, or perhaps to perform them only after every couple characters. Play around with it to see how you get the fastest performance.
My method if we're thinking out loud :P I can't see a way to do this without going through each n string, but I'm happy to be corrected. On that it would begin with a pre-process to save a second set of your n strings so that the characters are in ascending order.
The first part of the comparison would then be to check each n string a character at a time say n' to each character in s say s'.
If s' is less than n' then not equal and move to the next s'. If n' is less than s' then go to next n'. Otherwise record a matching character. Repeat this until k miss matches are found or the alternate matches are found and mark n accordingly.
For further consideration, an added pre-processing could be done on each adjacent string in n to see the total number of characters that differ. This could then be used when comparing strings n to s and if sufficient difference exist between these and the adjacent n there may not be a need to compare it?

Resources