Word Break time complexity

Word Break time complexity - string

I came across the word break problem which goes something like this:
Given an input string and a dictionary of words,segment the input
string into a space-separated sequence of dictionary words if
possible.
For example, if the input string is "applepie" and dictionary contains a standard set of English words,then we would return the string "apple pie" as output
Now I myself came up with a quadratic time solution. And I came across various other quadratic time solutions using DP.
However in Quora a user posted a linear time solution to this problem
I cant figure out how it comes out to be linear. Is their some mistake in the time complexity calculations? What is the best possible worst case time complexity for this problem. I am posting the most common DP solution here
String SegmentString(String input, Set<String> dict) {
int len = input.length();
for (int i = 1; i < len; i++) {
String prefix = input.substring(0, i);
if (dict.contains(prefix)) {
String suffix = input.substring(i, len);
if (dict.contains(suffix)) {
return prefix + " " + suffix;
}
}
}
return null;
}

The 'linear' time algorithm that you linked here works as follows:
If the string is sharperneedle and dictionary is sharp, sharper, needle,
It pushes sharp in the string.
Then it sees that er is not in dictionary, but if we combine it with the last word added, then sharper exists. Hence it pops out the last element and pushes this in.
IMO the above logic fails for string eaterror and dictionary eat, eater, error.
Here er shall pop out eat from the list, and push in eater. The remaining string ror shall not be recognized and discarded.
As regards the code you posted, as mentioned in the comments, this works for only two words with one partition place.

Related

How do I find unique characters in users input?

This algorithm creates a string by taking each unique character in the message in the order they first appear and putting that letter and the number of times it appears in the original message into the shortened string. Your algorithm should ignore any spaces in the message, and any characters which it has already put into the shortened string. For example, the string "I will arrive in Mississippi really soon" becomes "8i1w4l2a3r1v2e2n1m5s2p1y2o".
Here's my code for determining how many unique characters there are. I'm having trouble creating the nested loop to scan the whole string. Help pls!!
boolean used = false;
for (int j = 0; j<i; j++){
if (input.substring(j,j+1).equals(ltr)){
used = true;
}
}
if (!used){
num++;
int count = 0;
for(int k=i; k<input.length(); k++){
if(input.substring(k,k+1).equals(ltr))
count++;
}
}

I am not sure about that. Maybe your nested loop is not right.
Do you use nested loop?
your code is like this: for(){} for(){}
not for(){ for(){ }}

your program just scan the current character and the next character in position ! to find it is unique or not that's the problem
here your problem exactly
if (input.substring(j,j+1).equals(ltr)){

Using a trie for string segmentation - time complexity?

Problem to be solved:
Given a non-empty string s and a string array wordArr containing a list
of non-empty words, determine if s can be segmented into a
space-separated sequence of one or more dictionary words. You may
assume the dictionary does not contain duplicate words.
For example, given s = "leetcode", wordArr = ["leet", "code"].
Return true because "leetcode" can be segmented as "leet code".
In the above problem, would it work to build a trie that has each string in wordArr. Then, for each char in given string s, work down the trie. If a trie branch terminates, then this substring is complete so pass the remaining string up to the root and do the exact same thing recursively.
This should be O(N) time and O(N) space correct? I ask because the problem I'm working on says this will be O(N^2) time in the most optimal way and I'm not sure what's wrong with my approach.
For example, if s = "hello" and wordArr = ["he", "ll", "ee", "zz", "o"], then "he" will be completed in the first branch of the trie, "llo" will be passed up to the root recursively. Then, "ll" will be completed, so "o" gets passed up to root of trie. Then "o" is completed, which is the end of s, so return true. If the end of s isn't completed, return false.
Is this correct?

Your example would indeed suggest a linear time complexity, but look at this example:
s = "hello"
wordArr = ["hell", "he", "e", "ll", "lo", "l", "h"]
Now, first "hell" is tried, but in the next recursion cycle, no solution is found (there is no "o"), so the algorithm needs to backtrack and assume "hell" is not suitable (pun not intended), so you try "he", and in the next level you find "ll", but then again it fails, as there is no "o". Again backtracking is needed. Now start with "h", then "e" and then again a failure is coming: you try "ll" without success, so backtracking to use "l" instead: the solution is now available: "h e l lo".
So, no this does not have O(n) time complexity.

I suspect off-hand that the issue is backtracking. What if the word is not segmentable based on a particular dictionary, or what if there are multiple possible substrings with a common prefix? E.g., suppose the dictionary contains he, llenic, and llo. Failure down one branch of the trie would require backtracking, with some corresponding increase in time complexity.
This is similar to a regex-match problem: the example you give is like testing an input word against
^(he|ll|ee|zz|o)+$
(any number of dictionary members, in any order, and nothing else). I don't know the time complexity of regex matchers offhand, but I know backtracking can get you into serious time trouble.
I did find this answer which says:
Running a DFA-compiled regular expression against a string is indeed O(n), but can require up to O(2^m) construction time/space (where m = regular expression size).
So maybe it is O(n^2) with reduced construction effort.

Let's start by converting the trie to a nfa. We create an accept node on the root and add an edge that moves from every word end of the dictionary in the trie to the root node for the empty char.
Time complexity: since each step in the trie we can move only to one edge that represent the current char in the input string and the root.
T(n) = 2×T (n-1)+c
That gives us O(2^n)
Indeed not O(n), But you can do better using Dynamic programming.
We will use top-down approach.
Before we solve it for any string check if we have already solve it.
We can use another HashMap to store the result of already solved strings.
Whenever any recursive call returns false, store that string in HashMap.
The idea is to calculate every suffix of the word only once. We have only n suffixes and It will end up with O(n^2).
Code form algorithms.tutorialhorizon.com:
Map<String, String> memoized;
Set<String> dict;
String SegmentString(String input) {
if (dict.contains(input)) return input;
if (memoized.containsKey(input) {
return memoized.get(input);
}
int len = input.length();
for (int i = 1; i < len; i++) {
String prefix = input.substring(0, i);
if (dict.contains(prefix)) {
String suffix = input.substring(i, len);
String segSuffix = SegmentString(suffix);
if (segSuffix != null) {
memoized.put(input, prefix + " " + segSuffix);
return prefix + " " + segSuffix;
}
}
And you can do better!
Map<String, String> memoized;
Trie<String> dict;
String SegmentString(String input)
{
if (dict.contains(input))
return input;
if (memoized.containsKey(input)
return memoized.get(input);
int len = input.length();
foreach (StringBuilder word in dict.GetAll(input))
{
String prefix = input.substring(0, word.length);
String suffix = input.substring(word.length, len);
String segSuffix = SegmentString(suffix);
if (segSuffix != null)
{
memoized.put(input, word.ToString() + " " + segSuffix);
return prefix + " " + segSuffix;
}
}
retrun null;
}
Using the Trieto find the recursive calls only when Trie reach a word end you will get o (z×n) where z is the length of the Trie.

How to find all cyclic shifted strings in a given input?

This is a coding exercise. Suppose I have to decide if one string is created by a cyclic shift of another. For example: cab is a cyclic shift of abc but cba is not.
Given two strings s1 and s2 we can do that as follows:
if (s1.length != s2.length)
return false
for(int i = 0; i < s1.length(); i++)
if ((s1.substring(i) + s1.substring(0, i)).equals(s2))
return true
return false
Now what if I have an array of strings and want to find all strings that are cyclic shift of one another? For example: ["abc", "xyz", "yzx", "cab", "xxx"] -> ["abc", "cab"], ["xyz", "yzx"], ["xxx"]
It looks like I have to check all pairs of the strings. Is there a "better" (more efficient) way to do that?

As a start, you can know if a string s1 is a rotation of a string s2 with a single call to contains(), like this:
public boolean isRotation(String s1, String s2){
String s2twice = s2+s2;
return s2twice.contains(s1);
}
Namely, if s1 is "rotation" and s2 is "otationr", the concat gives you "otationrotationr", which contains s1 indeed.
Now, even if we assume this is linear, or close to it (which is not impossible using Rabin-Karp, for instance), you are still left with O(n^2) pair comparisons, which may be too much.
What you could do is build an hashtable where the sorted word is the key, and the posting list contains all the words from your list that, if sorted, give the key (ie. key("bca") and key("cab") both should return "abc"):
private Map<String, List<String>> index;
/* ... */
public void buildIndex(String[] words){
for(String word : words){
String sortedWord = sortWord(word);
if(!index.containsKey(sortedWord)){
index.put(sortedWord, new ArrayList<String>());
}
index.get(sortedWord).add(word);
}
}
CAVEAT: The hashtable will contain, for each key, all the words that have exactly the same letters occurring the same amount of times (not just the rotations, ie. "abba" and "baba" will have the same key but isRotation("abba", "baba") will return false).
But once you have built this index, you can significantly reduce the number of pairs you need to consider: if you want all the rotations for "bca" you just need to sort("bca"), look it up in the hashtable, and check (using the isRotation method above, if you want) if the words in the posting list are the result of a rotation or not.

If strings are short compared to the number of strings in the list, you can do significantly better by rotating all strings to some normal form (lexicographic minimum, for example). Then sort lexicographically and find runs of the same string. That's O(n log n), I think... neglecting string lengths. Something to try, maybe.

Concerning the way to find the pairs in the table, there could be many better way, but what I came up as a first thought is to sort the table and apply the check per adjacent pair.
This is much better and simpler that checking every string with every other string in the table

Consider building an automaton for each string against which you wish to test.
Each automaton should have one entry point for each possible character in the string, and transitions for each character, plus an extra transition from the end to the start.
You could improve performance even further if you amalgated the automata.

I think a combination of the answers by Patrick87 and savinos would make a fair amount of sense. Specifically, in a Java-esque pseudo-code:
List<String> inputs = ["abc", "xyz", "yzx", "cab", "xxx"];
Map<String,List<String>> uniques = new Map<String,List<String>>();
for(String value : inputs) {
String normalized = normalize(value);
if(!uniques.contains(normalized)) {
unqiues.put(normalized, new List<String>());
}
uniques.get(normalized).add(value);
}
// you now have a Map of normalized strings to every string in the input
// that is "equal to" that normalized version
Normalizing the string, as stated by Patrick87 might be best done by picking the rotation of the string that results in the lowest lexographic ordering.
It's worth noting, however, that the "best" algorithm probably relies heavily on the inputs... the number of strings, the length of those string, how many duplicates there are, etc.

You can rotate all the strings to a normalized form using Booth's algorithm (https://en.wikipedia.org/wiki/Lexicographically_minimal_string_rotation) in O(s) time, where s is the length of the string.
You can then use the normalized form as a key in a HashMap (where the value is the set of rotations seen in the input). You can populate this HashMap in a single pass over the data. i.e., for each string
calculate the normalized form
check if the HashMap contains the normalized form as a key - if not insert the empty Set at this key
add the string to the Set in the HashMap
You then just need to output the values of the HashMap. This makes the total runtime of the algorithm O(n * s) - where n is the number of words and s is the average word length. The total space usage is also O(n * s).

Looping in a String to find Unicode characters is taking too much time

I am creating a custom field where I want to replace some unicode caracters by pictures. Its like doing emoticons for blackberry device. Well I have a problem looping the caracters in the edit field and replacing the unicode caracters by images. When the text becomes too long, the loop takes too much time.
My code is as follows:
String aabb = "";
char[] chara = this.getText().toCharArray();
for (int i = loc; i < chara.length; i ++) {
Character cc = new Character(chara[i]);
aabb += cc.toString();
if (unicodeCaracter) {
//Get the location
//draw the image in the appropriate X and Y
}
}
Well this works fine, and the images are getting in the right place. But the problem is when the text becomes large, the looping is taking too much time, and the input of the text on the device becomes non friendly.
How to find the unicode caracters in a text without having to loop each time for them? Is their another way than this that I missed?
I need help with this issue. Thanks in advance

Well you're creating a new Character and a new String in each iteration of the loop, and converting the string to a character array to start with. You're also using string concatenation in a loop rather than using a StringBuffer. All of these will be hurting performance.
It's not obvious what you mean by "Unicode characters" here - all characters in Java are Unicode characters. I suspect you really want something like:
String text = this.getText();
StringBuffer buffer = new StringBuffer(text.length());
for (int i = 0; i < text.length(); i++) {
char c = text.charAt(i);
buffer.append(c);
if (c > 127) { // Or whatever
// Take some action
}
}
I'm assuming the "take some action" will be changing the buffer in some respect, otherwise the buffer is pointless of course... but fundamentally that's likely to be the sort of change you want.
The string concatenation in a loop is a particularly bad idea - see my article on it for more details.

What takes time is the string concatenation.
Strings are immutable in Java. Each time you do
aabb += cc.toString();
you create a new String object containing all the chars of the previous one, which must be garbage collected, plus the new ones. Use a StringBuilder to build your string:
StringBuilder builder = new StringBuilder(this.getText().length() + 100); // size estimation
char[] chara = this.getText().toCharArray();
for (int i = loc; i < chara.length; i++) {
builder.append(chara[i]);
if (unicodeCaracter) {
//Get the location
//draw the image in the appropriate X and Y
}
}
String aabb = builder.toString();

Well, besides speeding up your loop, you could also try and minimize the work load.
If the user is appending text you could store the last position you scanned previously time and start from there..
On inserts/deletes you'd need to get the caret position and scan the deleted/inserted part and maybe surrounding characters (if you have character groups instead of single characters that get replaced).
However, fixing loop performance is likely to give you a better improvement in your case, as I doubt you'll have that long strings to make that algorithmic change worthwhile.

The most important performance enhancements have already been stated but looping backwards will also help in BlackBerry apps.
Programming Tips: General Coding Tips

What is an easy way to tell if a list of words are anagrams of each other?

How would you list words that are anagrams of each other?
I was asked this question when I applied for my current job.
orchestra can be rearranged into carthorse with all original letters used exactly once therefore the words are anagrams of each other.

Put all the letters in alphabetical order in the string (sorting algorithm) and then compare the resulting string.

Good thing we all live in the C# reality of in-place sorting of short words on quad core machines with oozles of memory. :-)
However, if you happen to be memory constrained and can't touch the original data and you know that those words contain characters from the lower half of the ASCII table, you could go for a different algorithm that counts the occurrence of each letter in each word instead of sorting.
You could also opt for that algorithm if you want to do it in O(N) and don't care about the memory usage (a counter for each Unicode char can be quite expensive).

Sort each element (removing whitespace) and compare against the previous. If they are all the same, they're all anagrams.

Interestingly enough, Eric Lippert's Fabulous Adventures In Coding Blog dealt with a variation on this very problem on February 4, 2009 in this post.

The following algorithm should work:
Sort the letters in each word.
Sort the sorted lists of letters in each list.
Compare each element in each list for equality.

Well Sort the words in the list.
if abc, bca, cab, cba are the inputs, then the sorted list will be abc, abc, abc, abc.
Now all of their Hash codes are equal. Compare the HashCodes.

Sort the letters and compare (letter by letter, string compare, ...) is the first things that comes to mind.

compare length (if not equal, not a chance)
make a bit vector of the length of the strings
for each char in the first string find occurrences of it in the second
set the bit for the first unset occurrence
if you can find one stop with fail

public static void main(String[] args) {
String s= "abc";
String s1="cba";
char[] aArr = s.toLowerCase().toCharArray();
char[] bArr = s1.toLowerCase().toCharArray();
// An array to hold the number of occurrences of each character
int[] counts = new int[26];
for (int i = 0; i < aArr.length; i++){
counts[aArr[i]-97]++; // Increment the count of the character at respective position
counts[bArr[i]-97]--; // Decrement the count of the character at respective position
}
// If the strings are anagrams, then counts array will be full of zeros not otherwise
for (int i = 0; i<26; i++){
if (counts[i] != 0)
return false;
}

Tried hashcode logic for anagram gives me false output
public static Boolean anagramLogic(String s,String s2){
char[] ch1 = s.toLowerCase().toCharArray();
Arrays.sort(ch1);
char[] ch2= s2.toLowerCase().toCharArray();
Arrays.sort(ch2);
return ch1.toString().hashCode()==ch2.toString().hashCode(); //wrong
}
to rectify this code, below is the only option I see,appreciate any recommendations
char[] ch1 = s.toLowerCase().toCharArray();
Arrays.sort(ch1);
char[] ch2= s2.toLowerCase().toCharArray();
Arrays.sort(ch2);
return Arrays.equals(ch1,ch2);
}

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Word Break time complexity - string

Related

How do I find unique characters in users input?

Using a trie for string segmentation - time complexity?

How to find all cyclic shifted strings in a given input?

Looping in a String to find Unicode characters is taking too much time

What is an easy way to tell if a list of words are anagrams of each other?

Categories

Resources