Find Sub-string that create whole string by repetation - string

I have string "hrhrhrhrhr".
I want to find smallest sub-string of t such that we can make whole string by appending that sub-string in itself several time.
in this example i can make string "hrhrhrhrhr" by four time appending of "hr" with itself.
how to find this kind of substring?
fox example,
"abcabcabc" then "abc" is answer.
"ttttttt" -> "t" is answer.
"abcd" -> "abcd" is answer.
which algorithm or specific method i should use?

I would suggest you to take a look at string matching/search algorithms. Particularly, if you use KMP (Knuth-Morris Pratt) algorithm to search the string in itself, the lookup table would yield the pattern. In addition, the highest number in the table would give you the end character of the substring you are searching for (if the string is indeed composed of the repetition of one substring).

Related

Extract Number from string into a list in Scala

I have the following string :
var myStr = "abc12ef4567gh90ijkl789"
The size of the list is not fixed and it contains number in between. I want to extract the numbers and store them in the form of a list in this manner:
List(12,4567,90,789)
I tried the solution mentioned here but cannot extend it to my case. I just want to know if there is any faster or efficient solution instead of just traversing the string and extracting the numbers one by one using brute force ? Also, the string can be arbitrary length.
It seems you may just collect the numbers using
("""\d+""".r findAllIn myStr).toList
See the Scala demo. \d+ matches one or more digits, findAllIn searches for multiple occurrences of the pattern inside a string (and also un-anchors the pattern so that partial matches could be found).
If you prefer a splitting approach, you might use
myStr.split("\\D+").filter(_.nonEmpty).toList
See another demo. Here, \D+ matches one or more non-digit chars, and these chunks are used to split on (texts between these chunks land in the result). .filter(_.nonEmpty) will remove empty items that usually appear due to matches at the start/end of the string.

Delete all ocurrences of substring in minimal steps

I want to find the minimum number of deletions I need to make in order for a substring to no longer appear in a given string. Both the string and substring are composed of only lower case letters.
For example, for string "recorerecore" and substring "recore" I would need 2 deletions.
For string "recorecore" and substring "recore" I would need only 1.
For string "recorecorecorecore" and substring "recore" I would need 2, either the first and third or the second and fourth.
For string "rerecorecore" I would need to take out 1, the second occurrence, as taking the first out would lead to having recore again.
I only can think of the brute force solution which involves actually deleting in every combination possible and finding the minimum, but this takes too long.
Does anyone know a way to do this faster?
recursively Boyer–Moore the string with the substring and delete as you find them

Similar String Comparison Algorithm

Got this question in a recent interview. Basic String compare with a little twist. I have an input String, STR1 = 'ABC'. I should return "Same/Similar" when the string to compare, STR2 has anyone of these values - 'ACB' 'BAC' 'ABC' 'BCA' 'CAB' 'CBA' (That is same characters, same length and same no of occurrences). The only answer struck at that moment was to proceed with 'Merge sort' or 'Quick Sort' since it's complexity is logarithmic. Is there any other better algorithm to achieve the above result?
Sorting both, and comparing the results for equality, is not a bad approach for strings of reasonable lengths.
Another approach is to use a map/dictionary/object (depending on language) from character to number-of-occurrences. You then iterate over the first string, incrementing the counts, and iterate over the second string, decrementing them. You can return false as soon as you get a negative number.
And if your set of possible characters is small enough to be considered constant, you can use an array as the "map", resulting in O(n) worst-case complexity.
Supposing you can use any language, I would opt for a python 'dictionary' solution. You could use 2 dictionaries having as keys each string's characters. Then you can compare the dictionaries and return the respective result. This actually works for strings with characters that appear more than once.

Permutations of a string of non-unique characters

While there are a lot of solutions for how to find all the (unique) permutations of a string of unique characters, I haven't found solutions that work when the characters are non-unique. I have listed out my idea below and would appreciate feedback, but also feel free to provide your own ideas.
My idea:
To illustrate my algorithm, I'm using the example of the string ABBC, which I want to find all permutations of. Since there are two B's I will be labelling them B1 and B2.
Create a new string by removing all duplicate characters from the original string (e.g. turn AB1B2C into AB1C).
Find all possible permutations of the new string (e.g. AB1C, ACB1, B1AC, etc.). There are many algorithms to do this, since the string's characters are all unique.
Choose one duplicate character. For each permutation, insert the chosen duplicate characters at every "position" of the permutation, except when the character just before the duplicate character has the same value as the duplicate character (e.g. For the permutation AB1C, since the duplicate character is B2, insert it to get B2AB1C, AB2B1C, AB1CB2. Exception: Don't do AB1B2C, since that's just a duplicate of AB2B1C).
Continue to do step 3 but now choose a different duplicate character. (Do this until all duplicate characters have been chosen exactly once.)
Prior research: The answer by Prakhar on this SO question claims to work for duplicates: Generate list of all possible permutations of a string. It might, but I suspect there's a bug in the code.
How about this: suppose that the string with duplicates is of length N. Now consider the sequence 0,1,...N-1. Find all its permutations using one of the known algorithms. For each permutation in this list, generate a corresponding string by using the number in the permutation as an index into the original string. For example, if the string is ABBC, then the sequences will be 0,1,2,3; 0,1,3,2; etc. The sequence 3,0,1,2, as an example, is one of the permutations, and it yields the string CABB

compare a string to a cell array of srings in matlab and find the most similar

I have a list of images stored in a directory. They are all named. My GUI reads all the images and saves their names in a cell array. Now I have added a editable box that the user can type in a name and the program will show that image. The problem is I want the program to take into account typos and misspellings by the user and find the most similar file name to the user typed word. Can you please help me?
Many Thanks,
Hamid
You should read this WP article: Approximate string matching and look at "Calculation of distance between strings" on FEx.
I think you should use the longest common subsequence algorithm to approximately compare strings.
Here is a matlab implementation:
http://www.mathworks.com/matlabcentral/fileexchange/24559-longest-common-subsequence
After, just do something like that:
[~,ind]=min(cellarray( #(x) LCS(lower(userInput),lower(x)), allFileNames));
chosenFile=allFileName{ind};
(the function LCS is the longest common subsequence algorithm, and the functionlower converts to lower case)
Not exactly what you are looking for, but you can compare the first few characters of the strings ignoring case to find a close match. See the command strncmpi:
strncmpi Compare first N characters of strings ignoring case.
TF = strncmpi(S,C,N) performs a case-insensitive comparison between the
first N characters of string S and the first N characters in each element
of cell array C. Input S is a character vector (or 1-by-1 cell array), and
input C is a cell array of strings. The function returns TF, a logical
array that is the same size as C and contains logical 1 (true) for those
elements of C that are a match, except for letter case, and logical 0
(false) for those elements that are not. The order of the two input
arguments is not important.

Resources