Considering a search in a string for an exact match of another string. Is it safe to continue the search at the position where a partial match stopped to match, without getting wrong results?
In code:
int indexOf(string target, string search){
for(int i=0; i + search.length < target.length; i++){
int f=0;
for(; f < search.length && search[f] == target[i + f]; f++); //empty loop
if(f == search.length) return i;
i += f; //is it safe to do this without to worry about a missing match?
}
}
The thing to worry about is to miss an exact match starting in the partial match (somewhere between i and i + f in the code above). But in fact I couldn't think up any example case to proof the worry. Can you?
There are various string search algorithms here.
I think this is what you want which is know as KMP.
Yes, you need to worry about it, and an example of why you need to worry about it would be searching for the substring "ananas" in the string "anananas".
Related
This algorithm creates a string by taking each unique character in the message in the order they first appear and putting that letter and the number of times it appears in the original message into the shortened string. Your algorithm should ignore any spaces in the message, and any characters which it has already put into the shortened string. For example, the string "I will arrive in Mississippi really soon" becomes "8i1w4l2a3r1v2e2n1m5s2p1y2o".
Here's my code for determining how many unique characters there are. I'm having trouble creating the nested loop to scan the whole string. Help pls!!
boolean used = false;
for (int j = 0; j<i; j++){
if (input.substring(j,j+1).equals(ltr)){
used = true;
}
}
if (!used){
num++;
int count = 0;
for(int k=i; k<input.length(); k++){
if(input.substring(k,k+1).equals(ltr))
count++;
}
}
I am not sure about that. Maybe your nested loop is not right.
Do you use nested loop?
your code is like this: for(){} for(){}
not for(){ for(){ }}
your program just scan the current character and the next character in position ! to find it is unique or not that's the problem
here your problem exactly
if (input.substring(j,j+1).equals(ltr)){
String s1="Welcome To Java";
String s2="Wela To Java";
Write a Java program to get the output that come is replaced by a.
If you are sure that that is one char replacing multi chars, you can use this:
String s1="Welcome To Java";
String s2="Wela To Java";
for (int i = 1; i < s2.length()+1; i++){
char c = s2.charAt(i-1);
String part1= s2.substring (0,i-1);
String part2=s2.substring(i);
if (s1.contains(part1)&&(s1.contains(part2))){
int t1 = s1.lastIndexOf(part1)+part1.length();
int t2 = s1.indexOf(part2);
System.out.println("Found "+s1.substring(t1,t2)+ " is replaced by "+c);
}
}
Assuming your question is replace all instances of the word 'come' with 'a', you should read up pattern-matching and the .replace() method in Java. See the docs for this:
https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#replace(java.lang.CharSequence,%20java.lang.CharSequence)
The above works nicely for either a specific string or more general patterns in the data you're working with.
P.S. Your question is unlikely to get a good answer because it's a pretty straightforward thing that requires just a quick search. I can tell you now that asking on Stack Overflow as a last resort is a good policy to follow.
So the idea here is that I'm taking a .csv into a string and each value needs to be stored into a variable. I am unsure how to properly parse a string to do this.
My idea is a function that looks like
final char delim = ',';
int nextItem(String data, int startFrom) {
if (data.charAt(startFrom) != delim) {
return data.charAt(startFrom)
} else {
return nextItem(data, startFrom + 1);
}
}
so if I passed it something like
nextItem("45,621,9", 0);
it would return 45
and if I passed it
nextItem("45,621,9", 3);
it would return 621
I'm not sure if I have that setup properly to be recursive, but I could also use a For loop I suppose, only real stipulation is I can't use the Substring method.
Please don't use recursion for a matter that can be easily done iteratively. Recursion is expensive in terms of stack and calling frames: A very long string could produce a StackOverflowError.
I suggest you take a look to standard method indexOf of java.lang.String:
A good alternative is Regular Expressions.
You can seperate the words considering comma ',' as delimeter
Code
String[] nextItem(String data) {
String[] words=data.split(",");
return words;
}
This will return an array of strings that is the words in your input string. Then you can use the array in anyway you need.
Hope it helps ;)
Processing comes with a split() function that does exactly what you're describing.
From the reference:
String men = "Chernenko,Andropov,Brezhnev";
String[] list = split(men, ',');
// list[0] is now "Chernenko", list[1] is "Andropov"...
Behind the scenes it's using the String#split() function like H. Sodi's answer, but you should just use this function instead of defining your own.
It sound easy, you can simply iterate and check them, but the problem here is optimization: Don't make any needless checking, needless new objects or operation.
The algorithm will be tested against a huge set of test cases to verify its efficiency.
Examples:
"aaaa" contains "aa" at the beginning, middle and end.
"baabaabaaaabbaab" contains "baab" at the beginning, middle and end. See the intersection.
And one more thing I forgot to say:
You are not given the substring to check for, you need to find if such a substring exists, if it doesn't return false, if it does return true.
Find the longest substring satisfying those conditions and return it, or print it (your choice).
A simple Boolean function, right?
Update:
The substring needs to be at least 2 character shorter that the main string.
Sorry, it was my mistake in the "aaa" example, I fixed it.
You can solve it with KMP, a string matching algorithm. Using it to generate an array fail[]
fail[i] = max {k | S[1:k] == S[i-k+1:i]}
Then you can enumerate all possible value of fail[n](fail[n], fail[ fail[n] ], fail[ fail[fail[n]] ] ...) to check whether it exists in the middle.
The complexity is O(n).
Let's jump the shark:
function the_best_match_at_the_beginning_the_middle_and_the_end( s ){
print( s );
return true;
}
That's one of these "you might get significantly better in terms of theoretical complexity, but in reality, linear operation is always faster" answers:
Assuming in is your input string, pattern is what you're looking for, and you're able to read or look up C-standard-lib-style methods like strncmp. Let l_in be the number of characters in the input, l_pattern the number of characters in the pattern.
Simply explicitely check the start (strncmp(in,pattern,l_pattern)); then use a bog-normal linear search from the second letter on (strstr(in+1, pattern):
If strstr didn't find anything, there's no middle match nor a end match.
If it's at the end (result of strstr is l_in-l_pattern), you've got no middle match.
If it's not found at the end, you've got a middle match. Manually check (strncmp(in+l_in-l_patter, pattern, l_pattern)) for the end match.
Why this is faster? Because modern computers are pretty optimized for searching through data linearly, see Bjarne "C++" Stroustrup's why you should avoid linked lists. Simply put, letting your CPU run on a continous amount of memory prefetched to a CPU cache is much much faster than being "clever" about avoiding a few duplicate checks.
One clean way to approach this is to just check all substrings in the input from the beginning. Compare each substring to see that it exists at the end, and then check to see if it exists in the middle. For the middle check, you can compare against the input string with its first and last characters removed.
public boolean subStrings(String input) {
if (input == null || input.equals("")) {
return false;
}
if (input.length() == 1) {
System.out.println(input + " is a match!");
return true;
}
boolean foundIt = false;
String longestMatch = "";
for (int i=1; i < inputNew.length(); ++i) {
String substring = inputNew.substring(0, i);
boolean endMatch = inputNew.substring(inputNew.length()-i, inputNew.length()).equals(substring);
boolean midMatch = inputNew.substring(1, inputNew.length()-1).contains(substring);
if (endMatch && midMatch) {
longestMatch = substring;
foundIt = true;
}
}
if (foundIt) {
System.out.println(longestMatch + " is a match!");
return true;
}
else {
return false;
}
}
subStrings("baabaabaaaabbaab");
Output:
baab is a match!
I came across the word break problem which goes something like this:
Given an input string and a dictionary of words,segment the input
string into a space-separated sequence of dictionary words if
possible.
For example, if the input string is "applepie" and dictionary contains a standard set of English words,then we would return the string "apple pie" as output
Now I myself came up with a quadratic time solution. And I came across various other quadratic time solutions using DP.
However in Quora a user posted a linear time solution to this problem
I cant figure out how it comes out to be linear. Is their some mistake in the time complexity calculations? What is the best possible worst case time complexity for this problem. I am posting the most common DP solution here
String SegmentString(String input, Set<String> dict) {
int len = input.length();
for (int i = 1; i < len; i++) {
String prefix = input.substring(0, i);
if (dict.contains(prefix)) {
String suffix = input.substring(i, len);
if (dict.contains(suffix)) {
return prefix + " " + suffix;
}
}
}
return null;
}
The 'linear' time algorithm that you linked here works as follows:
If the string is sharperneedle and dictionary is sharp, sharper, needle,
It pushes sharp in the string.
Then it sees that er is not in dictionary, but if we combine it with the last word added, then sharper exists. Hence it pops out the last element and pushes this in.
IMO the above logic fails for string eaterror and dictionary eat, eater, error.
Here er shall pop out eat from the list, and push in eater. The remaining string ror shall not be recognized and discarded.
As regards the code you posted, as mentioned in the comments, this works for only two words with one partition place.