How to find if a string S is contained inside a string made of S inserted at any position in S (only once) itself

How to find if a string S is contained inside a string made of S inserted at any position in S (only once) itself - string

As a first check, since a valid input must be made from the insertion of the string into itself, it must be of size twice the string S.
Eg. If S=abc then ababca or aabcbc should return True but False for input such as abcab, abcxa, abcabcabc.
I have already attempted the naive way of check the substring, if it exists then cut out that part and check if the remaining string matches S. But this fails for some type of inputs.
private static void printResult(String s, String p){
int x = p.indexOf(s);
if(x<0){
System.out.println("False");
return;
}
String s1="";
if(p.length()>=s.length()*2){
s1 = p.substring(0,x)+p.substring(x+s.length());
if(s1.equals(s)){
System.out.println("True");
}
else{
System.out.println("False");
}
return;
}
System.out.println("False");
}

Looking at the first occurence of s may not be appropriate in some cases.
Suppose your original word is w=xyx (for x,y some words) then you can insert winto itself to produce xyXYXx (uppercase to show the insertion). Now you can see that if you try to find xyx your algorithm will find it in the first position and then produce yxx as the remaining part.
So you need to look at every possible position before concluding.

Related

Substring to extract before or/and after specific character in text

I'm currently writing a groovy script that can extract characters based on the condition given, however I struggled extracting specific string after specific number of char. For example:
If (text = 'ABCDEF')
{
Return (start from C and print only CDE)
}
I already used substring but didn't give me the right output:
If (text = 'ABCDEF')
{
Return(text.substring(2));
}

Try this:
if (text == 'ABCDEF')
{
return text.substring(2, 5)
}
= is for assigning a value to a variable.
== is for checking equality between the two variable.

Your capitalization is all out of whack
if (text == 'ABCDEF') {
text.substring(2)
}
There's probably also issues with using return, but that depends on context you haven't shown in your question

Your substring function isn't complete. If you need to grab specific indices (in this case, index 2 to 5), you need to add the index you want to end at. If you don't do that, your string will print the string starting from index 2, and then the remaining characters in the string. You need to need to type this:
if(text == 'ABCDEF') {
return text.substring(2, 5);
}
Also, keep in mind that the end index (index 5) is exclusive, so the character at index 5 won't be printed.

Efficient algorithm for phrase anagrams

What is an efficient way to produce phrase anagrams given a string?
The problem I am trying to solve
Assume you have a word list with n words. Given an input string, say, "peanutbutter", produce all phrase anagrams. Some contenders are: pea nut butter, A But Ten Erupt, etc.
My solution
I have a trie that contains all words in the given word list. Given an input string, I calculate all permutations of it. For each permutation, I have a recursive solution (something like this) to determine if that specific permuted string can be broken in to words. For example, if one of the permutations of peanutbutter was "abuttenerupt", I used this method to break it into "a but ten erupt". I use the trie to determine if a string is a valid word.
What sucks
My problem is that because I calculate all permutations, my solution runs very slow for phrases that are longer than 10 characters, which is a big let down. I want to know if there is a way to do this in a different way.
Websites like https://wordsmith.org/anagram/ can do the job in less than a second and I am curious to know how they do it.

Your problem can be decomposed to 2 sub-problems:
Find combination of words that use up all characters of the input string
Find all permutations of the words found in the first sub-problem
Subproblem #2 is a basic algorithm and you can find existing standard implementation in most programming language. Let's focus on subproblem #1
First convert the input string to a "character pool". We can implement the character pool as an array oc, where oc[c] = number of occurrence of character c.
Then we use backtracking algorithm to find words that fit in the charpool as in this pseudo-code:
result = empty;
function findAnagram(pool)
if (pool empty) then print result;
for (word in dictionary) {
if (word fit in charpool) {
result = result + word;
update pool to exclude characters in word;
findAnagram(pool);
// as with any backtracking algorithm, we have to restore global states
restore pool;
restore result;
}
}
}
Note: If we pass the charpool by value then we don't have to restore it. But as it is quite big, I prefer passing it by reference.
Now we remove redundant results and apply some optimizations:
Assuming A comes before B in the dictionary. If we choose the first word is B, then we don't have to consider word A in following steps, because those results (if we take A) would already be in the case where A is chosen as the first word
If the character set is small enough (< 64 characters is best), we can use a bitmask to quickly filter words that cannot fit in the pool. A bitmask mask which character is in a word, no matter how many time it occurs.
Update the pseudo-code to reflect those optimizations:
function findAnagram(charpool, minDictionaryIndex)
pool_bitmask <- bitmask(charpool);
if (pool empty) then print result;
for (word in dictionary AND word's index >= minDictionaryIndex) {
// bitmask of every words in the dictionary should be pre-calculated
word_bitmask <- bitmask(word)
if (word_bitmask contains bit(s) that is not in pool_bitmask)
then skip this for iteration
if (word fit in charpool) {
result = result + word;
update charpool to exclude characters in word;
findAnagram(charpool, word's index);
// as with any backtracking algorithm, we have to restore global states
restore pool;
restore result;
}
}
}
My C++ implementation of subproblem #1 where the character set contains only lowercase 'a'..'z': http://ideone.com/vf7Rpl .

Instead of a two stage solution where you generate permutations and then try and break them into words, you could speed it up by checking for valid words as you recursively generate the permutations. If at any point your current partially-complete permutation does not correspond to any valid words, stop there and do not recurse any further. This means you don't waste time generating useless permutations. For example, if you generate "tt", there is no need to permute "peanubuter" and append all the permutations to "tt" because there are no English words beginning with tt.
Suppose you are doing basic recursive permutation generation, keep track of the current partial word you have generated. If at any point it is a valid word, you can output a space and start a new word, and recursively permute the remaining character. You can also try adding each of the remaining characters to the current partial word, and only recurse if doing so results in a valid partial word (i.e. a word exists starting with those characters).
Something like this (pseudo-code):
void generateAnagrams(String partialAnagram, String currentWord, String remainingChars)
{
// at each point, you can either output a space, or each of the remaining chars:
// if the current word is a complete valid word, you can output a space
if(isValidWord(currentWord))
{
// if there are no more remaining chars, output the anagram:
if(remainingChars.length == 0)
{
outputAnagram(partialAnagram);
}
else
{
// output a space and start a new word
generateAnagrams(partialAnagram + " ", "", remainingChars);
}
}
// for each of the chars in remainingChars, check if it can be
// added to currentWord, to produce a valid partial word (i.e.
// there is at least 1 word starting with these characters)
for(i = 0 to remainingChars.length - 1)
{
char c = remainingChars[i];
if(isValidPartialWord(currentWord + c)
{
generateAnagrams(partialAnagram + c, currentWord + c,
remainingChars.remove(i));
}
}
}
You could call it like this
generateAnagrams("", "", "peanutbutter");
You could optimize this algorithm further by passing the node in the trie corresponding to the current partially completed word, as well as passing currentWord as a string. This would make your isValidPartialWord check even faster.
You can enforce uniqueness by changing your isValidWord check to only return true if the word is in ascending (greater or equal) alphabetic order compared to the previous word output. You might also need another check for dupes at the end, to catch cases where two of the same word can be output.

Check if a substring exists at the beginning, middle and end of a string while allowing intersections

It sound easy, you can simply iterate and check them, but the problem here is optimization: Don't make any needless checking, needless new objects or operation.
The algorithm will be tested against a huge set of test cases to verify its efficiency.
Examples:
"aaaa" contains "aa" at the beginning, middle and end.
"baabaabaaaabbaab" contains "baab" at the beginning, middle and end. See the intersection.
And one more thing I forgot to say:
You are not given the substring to check for, you need to find if such a substring exists, if it doesn't return false, if it does return true.
Find the longest substring satisfying those conditions and return it, or print it (your choice).
A simple Boolean function, right?
Update:
The substring needs to be at least 2 character shorter that the main string.
Sorry, it was my mistake in the "aaa" example, I fixed it.

You can solve it with KMP, a string matching algorithm. Using it to generate an array fail[]
fail[i] = max {k | S[1:k] == S[i-k+1:i]}
Then you can enumerate all possible value of fail[n](fail[n], fail[ fail[n] ], fail[ fail[fail[n]] ] ...) to check whether it exists in the middle.
The complexity is O(n).

Let's jump the shark:
function the_best_match_at_the_beginning_the_middle_and_the_end( s ){
print( s );
return true;
}

That's one of these "you might get significantly better in terms of theoretical complexity, but in reality, linear operation is always faster" answers:
Assuming in is your input string, pattern is what you're looking for, and you're able to read or look up C-standard-lib-style methods like strncmp. Let l_in be the number of characters in the input, l_pattern the number of characters in the pattern.
Simply explicitely check the start (strncmp(in,pattern,l_pattern)); then use a bog-normal linear search from the second letter on (strstr(in+1, pattern):
If strstr didn't find anything, there's no middle match nor a end match.
If it's at the end (result of strstr is l_in-l_pattern), you've got no middle match.
If it's not found at the end, you've got a middle match. Manually check (strncmp(in+l_in-l_patter, pattern, l_pattern)) for the end match.
Why this is faster? Because modern computers are pretty optimized for searching through data linearly, see Bjarne "C++" Stroustrup's why you should avoid linked lists. Simply put, letting your CPU run on a continous amount of memory prefetched to a CPU cache is much much faster than being "clever" about avoiding a few duplicate checks.

One clean way to approach this is to just check all substrings in the input from the beginning. Compare each substring to see that it exists at the end, and then check to see if it exists in the middle. For the middle check, you can compare against the input string with its first and last characters removed.
public boolean subStrings(String input) {
if (input == null || input.equals("")) {
return false;
}
if (input.length() == 1) {
System.out.println(input + " is a match!");
return true;
}
boolean foundIt = false;
String longestMatch = "";
for (int i=1; i < inputNew.length(); ++i) {
String substring = inputNew.substring(0, i);
boolean endMatch = inputNew.substring(inputNew.length()-i, inputNew.length()).equals(substring);
boolean midMatch = inputNew.substring(1, inputNew.length()-1).contains(substring);
if (endMatch && midMatch) {
longestMatch = substring;
foundIt = true;
}
}
if (foundIt) {
System.out.println(longestMatch + " is a match!");
return true;
}
else {
return false;
}
}
subStrings("baabaabaaaabbaab");
Output:
baab is a match!

How to detect a number in my Linked List of Strings, and get the value

I need to sort my Linked List, the problem is that each of my Linked List elements are Strings with sentences. So the question is... how to detect each number in my Linked List and get the value?.
I tried to split my linked list so I can pass trough each element.
private LinkedList<String> list = new LinkedList<String>();
list.add("Number One: 1")
list.add("Number Three: 3")
list.add("Number two:2")
for(Iterator<String> iterator =list.iterator(); iterator.hasNext(); )
{
String string = iterator.next();
for (String word : string.split(" ")){
}
I also tried with "if((word.contains("1") || (word.contains("2")...." inside the for loop, and then pass the value "word" to Double... but I think is not very smart
So my goal is this Output (Number One: 1 , Number Two: 2, Number Three: 3), therefore I need the value of each number first.

why not use tryParse on the string,
for (String word : string.split(" ")){
int outvalue
if(int.TryParse(word, outvalue)){
//DoSomething with result
}
}

Best way to compare multiple string in java

Suppose I have a string "That question is on the minds of every one.".
I want to compare each word in string with a set of word I.e. (to , is ,on , of) and if those word occurs I want to append some string on the existing string.
Eg.
to = append "Hi";
Is = append "Hello";
And so on.
To be more specific I have used StringTokenizer to get the each word and compared thru if else statement. However we can use Switch also but it is available in Jdk 1.
7.

I don't know if this is what you mean, but:
You could use String.split() to separate the words from your string like
String[] words = myString.split(" ");
and then, for each word, compare it with the given set
for(String s : words)
{
switch(s)
{
case("to"):
[...]
}
}
Or you could just use the String.contains() method without even splitting your string, but I don't know if that's what you wanted.

Use a HashMap<String,String> variable to store your set of words and the replacement words you want. Then split your string with split(), loop through the resulting String[] and for each String in the String[], check whether the HashMap containsKey() that String. Build your output/resulting String in the loop - if the word is contained in the HashMap, replace it with the value of the corresponding key in the HashMap, otherwise use the String you are currently on from the String[].

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to find if a string S is contained inside a string made of S inserted at any position in S (only once) itself - string

Related

Substring to extract before or/and after specific character in text

Efficient algorithm for phrase anagrams

Check if a substring exists at the beginning, middle and end of a string while allowing intersections

How to detect a number in my Linked List of Strings, and get the value

Best way to compare multiple string in java

Categories

Resources