Check if string contains three same letters in Octave - string

I'm trying to make a function in Octave to check whether a string contains three consecutive same characters. That is, if my string is "asdf" it should return 0 and if it's like "asdfffg" it should return 1. What I did so far is this
if(length(findstr(word,"aaa",0)) > 1 || length(findstr(word,"bbb",0)) > 1 || ..
It's costly and I think not that really inefficient. Any suggestions?

Use a regular expression:
match = regexp(word, '(.)\1{2}', 'once');
This means: match any character ((.)), followed by that same character (\1) twice ({2}). It will return the starting index of the first match, or an empty array if there isn't any match. So your desired result would be
result = ~isempty(match);
Another possibility is to use convolution:
result = any(conv([1 1], +~diff(word))==2);
This works as follows: diff will give 0 when two consecutive characters are the same. So you want to detect if the output of diff contains two consecutive zeros. This is done by negating (~), converting to double (+), convolving with the sequence [1 1] (conv([1 1], ...)), and seeing if 2 is present in the output.

Related

find number of repeating substrings in a string

I am looking for an algorithm that will find the number of repeating substrings in a single string.
For this, I was looking for some dynamic programming algorithms but didn't find any that would help me. I just want some tutorial on how to do this.
Let's say I have a string ABCDABCDABCD. The expected output for this would be 3, because there is ABCD 3 times.
For input AAAA, output would be 4, since A is repeated 4 times.
For input ASDF, output would be 1, since every individual character is repeated 1 time only.
I hope that someone can point me in the right direction. Thank you.
I am taking the following assumptions:
The repeating substrings must be consecutive. That is, in case of ABCDABC, ABC would not count as a repeating substring, but it would in case of ABCABC.
The repeating substrings must be non-overalpping. That is, in case of ABCABC, ABC would not count as a repeating substring.
In case of multiple possible answers, we want the one with the maximum value. That is, in the case of AAAA, the answer should be 4 (a is the substring) rather than 2 (aa is the substring).
Under these assumptions, the algorithm is as follows:
Let the input string be denoted as inputString.
Calculate the KMP failure function array for the input string. Let this array be denoted as failure[]. This operation if of linear time complexity with respect to the length of the string. So, by definition, failure[i] denotes the length of the longest proper-prefix of the substring inputString[0....i] that is also a proper-suffix of the same substring.
Let len = inputString.length - failure.lastIndexValue. At this point, we know that if there is any repeating string at all, then it has to be of this length len. But we'll need to check for that; First, just check if len perfectly divides inputString.length (that is, inputString.length % len == 0). If yes, then check if every consecutive (non-overlapping) substring of len characters is the same or not; this operation is again of linear time complexity with respect to the length of the input string.
If it turns out that every consecutive non-overlapping substring is the same, then the answer would be = inputString.length/ len. Otherwise, the answer is simply inputString.length, as there is no such repeating substring present.
The overall time complexity would be O(n), where n is the number of characters in the input string.
A sample code for calculating the KMP failure array is given here.
For example,
Let the input string be abcaabcaabca.
Its KMP failure array would be - [0, 0, 0, 1, 1, 2, 3, 4, 5, 6, 7, 8].
So, our len = (12 - 8) = 4.
And every consecutive non-overlapping substring of length 4 is the same (abca).
Therefore the answer is 12/4 = 3. That is, abca is repeated 3 times repeatedly.
The solution for this with C# is:
class Program
{
public static string CountOfRepeatedSubstring(string str)
{
if (str.Length < 2)
{
return "-1";
}
StringBuilder substr = new StringBuilder();
// Length of the substring cannot be greater than half of the actual string
for (int i = 0; i < str.Length / 2; i++)
{
// We will iterate through half of the actual string and
// create a new string by appending the current character to the previous character
substr.Append(str[i]);
String clearedOfNewSubstrings = str.Replace(substr.ToString(), "");
// We will remove the newly created substring from the actual string and
// check if the length of the actual string, cleared of the newly created substring, is 0.
// If 0 it tells us that it is only made of its substring
if (clearedOfNewSubstrings.Length == 0)
{
// Next we will return the count of the newly created substring in the actual string.
var countOccurences = Regex.Matches(str, substr.ToString()).Count;
return countOccurences.ToString();
}
}
return "-1";
}
static void Main(string[] args)
{
// Input: {"abcdaabcdaabcda"}
// Output: 3
// Input: { "abcdaabcdaabcda" }
// Output: -1
// Input: {"barrybarrybarry"}
// Output: 3
var s = "asdf"; // Output will be -1
Console.WriteLine(CountOfRepeatedSubstring(s));
}
}
How do you want to specify the "repeating string"? Is it simply the first group of characters up until either a) the first character is found again, b) the pattern begins to repeat, or c) some other criteria?
So, if your string is "ABBAABBA", is that a 2 because "ABBA" repeats twice or is it 1 because you have "ABB" followed by "AAB"? What about "ABCDABCE" -- does "ABC" count (despite the "D" in between repetitions?) In "ABCDABCABCDABC", is the repeating string "ABCD" (1) or "ABCDABC" (2)?
What about "AAABBAAABB" -- is that 3 ("AAA") or 2 ("AAABB")?
If the end of the repeating string is another instance of the first letter, it's pretty simple:
Work your way through the string character by character, putting each character into another variable as you go, until the next character matches the first one. Then, given the length of the substring in your second variable, check the next bit of your string to see if it matches. Continue until it doesn't match or you hit the end of the string.
If you just want to find any length pattern that repeats regardless of whether the first character is repeated within the pattern, it gets more complicated (but, fortunately, it's the sort of thing computers are good at).
You'll need to go character by character building a pattern in another variable as above, but you'll also have to watch for the first character to reappear and start building a second substring as you go, to see if it matches the first. This should probably go in an array as you might encounter a third (or more) instance of the first character which would trigger the need to track yet another possible match.
It's not difficult but there is a lot to keep track of and it's a rather annoying problem. Is there a particular reason you're doing this?

Convert string S to another string T by performing exactly K operations (append to / delete from the end of the string S)

I am trying to solve a problem. But I am missing some corner case. Please help me. The problem statement is:
You have a string, S , of lowercase English alphabetic letters. You can perform two types of operations on S:
Append a lowercase English alphabetic letter to the end of the string.
Delete the last character in the string. Performing this operation on an empty string results in an empty string.
Given an integer, k, and two strings, s and t , determine whether or not you can convert s to t by performing exactly k of the above operations on s.
If it's possible, print Yes; otherwise, print No.
Examples
Input Output
hackerhappy Yes
hackerrank
9
5 delete operations (h,a,p,p,y) and 4 append operations (r,a,n,k)
aba Yes
aba
7
4 delete operations (delete on empty = empty) and 3 append operations
I tried in this way (C language):
int sl = strlen(s); int tl = strlen(t); int diffi=0;
int i;
for(i=0;s[i]&&t[i]&&s[i]==t[i];i++); //going till matching
diffi=i;
((sl-diffi+tl-diffi<=k)||(sl+tl<=k))?printf("Yes"):printf("No");
Please help me to solve this.
Thank You
You also need the remaining operations to divide in 2, because you need to just add and remove letters to waste the operations.
so maybe:
// c language - strcmp(s,t) returns 0 if s==t.
if(strcmp(s,t))
((sl-diffi+tl-diffi<=k && (k-(sl-diffi+tl-diffi))%2==0)||(sl+tl<=k))?printf("Yes"):printf("No");
else
if(sl+tl<=k||k%2==0) printf("Yes"); else printf("No");
You can do it one more way using binary search.
Take the string of smaller length and take sub-string(pattern) of length/2.
1.Do a binary search(by character) on both of the string if u get a match append length/4 more character to the pattern if it matches add more by length/2^n else append one character to the original(pattern of length/2) and try .
2.If u get a mismatch for pattern of length/2 reduce length of the pattern to length/4 and if u get a match append next character .
Now repeat the steps 1 and 2
If n1+n2 <= k then the answer is Yes
else the answer is no
Example:
s1=Hackerhappy
s2=Hackerrank
pattern=Hacker // length = 10 (s2 is smaller and length of s2=10 length/2 =5)
//Do a binary search of the pattern you will get a match by steps 1 and 2
n1 number of mismatched characters is 5
n2 number of mismatched characters is 4
Now n1+n2<k // its because we will need to do these much operation to make these to equal.
So Yes
This should work for all cases:
int sl = strlen(s); int tl = strlen(t); int diffi=0;
int i,m;
for(i=0;s[i]&&t[i]&&s[i]==t[i];i++); //going till matching
diffi=i;
m = sl+tl-2*diffi;
((k>=m&&(k-m)%2==0)||(sl+tl<=k))?printf("Yes"):printf("No");

Comparison between strings and integers in matlab

I am doing some classification and needed to convert an integer code to strings for that reason. I wrote something like this:
s(1).class = 1;
s(2).class = 7;
s(3).class = 9;
[s([find([s.class] == 1)]).class] = deal('c1'); %first conversion
[s([find([s.class] > 1)]).class] = deal('c2'); %second conversion
and was surprised to find s being a 1x4 struct array after the second conversion instead of the expected 1x3 struct array with the values.
Now, after some research, I understand that after the first conversion the value of s(1).class is 'c1' and the argument to find in the second conversion is not what I assumed it would be. The [s.class] statement actually returns something like the string 'c1\a\t' with ASCII escape sequences for bell and horizontal tab.
As the comparison does work (returning the matrix [1 1 1 1] and thus expanding my structure) I assume that matlab converts either the operand [s.class] or the operand 1.
Which is it? What actually is compared here numbers or characters?
And on the other hand is there a built in way to make > more restrictive, i. e. to require the operands to be of the same type and if not to throw an error?
When you do the comparison 'ab' > 1, the char array 'ab' gets converted to a double array, namely the ASCII codes of the characters. So 'ab' > 1 is equivalent to double('ab') > 1, which gives [1 1].
To get the behaviour you want (issue an error if one of the arguments is char) you could define a function:
function z = greaterthan(x,y)
if ischar(x) || ischar(y)
error('Invalid comparison: one of the input arguments is of type char')
else
z = x>y;
end
so that
>> greaterthan([0 1 2], 1)
ans =
0 0 1
>> greaterthan('ab', 1)
??? Error using ==> greaterthan at 3
Invalid comparison between char and int
Because you have not provided any expected output yet, I am going with the observations.
You are using a comprehension method (by invoking find) to determine which locations you will be populating for struct s with the results from your method deal (takes the argument c1 and c2). You have already set your type for s{whatever).class in the first snippet you provided. Which means it is number you are comparing, not character.
There is this isa function to see which class your variable belongs to. Use that to see what it is you are actually putting in (should say int32 for your case).

Match first letter of a string in Tcl

I want to compare the first letter of a string with a known character. For example, I want to check if the string "example"'s first letter matches with "e" or not. I'm sure there must be a very simple way to do it, but I could not find it.
One way is to get the first character with string index:
if {[string index $yourstring 0] eq "e"} {
...
I think it's a good idea to collect the different methods in a single answer.
Assume
set mystring example
set mychar e
The goal is to test whether the first character in $mystring is equal to $mychar.
My suggestion was (slightly edited):
if {[string match $mychar* $mystring]} {
...
This invocation does a glob-style match, comparing $mystring to the character $mychar followed by a sequence of zero or more arbitrary characters. Due to shortcuts in the algorithm, the comparison stops after the first character and is quite efficient.
Donal Fellows:
if {[string index $mystring 0] eq $mychar} {
...
This invocation specifically compares a string consisting of the first character in $mystring with the string $mychar. It uses the efficient eq operator rather than the == operator, which is the only one available in older versions of Tcl.
Another way to construct a string consisting of the first character in $mystring is by invoking string range $mystring 0 0.
Mark Kadlec:
if {[string first $mychar $mystring] == 0 }
...
This invocation searches the string $mystring for the first occurrence of the character $mychar. If it finds any, it returns with the index where the character was found. This index number is then compared to 0. If they are equal the first character of $mystring was $mychar.
This solution is rather inefficient in the worst case, where $mystring is long and $mychar does not occur in it. The command will then examine the whole string even though only the first character is of interest.
One more string-based solution:
if {[string compare -length 1 $mychar $mystring] == 0} {
...
This invocation compares the first n characters of both strings (n being hardcoded to 1 here): if there is a difference the command will return -1 or 1 (depending on alphabetical order), and if they are equal 0 will be returned.
Another solution is to use a regular expression match:
if {[regexp -- ^$mychar.* $mystring]} {
...
This solution is similar to the string match solution above, but uses regular expression syntax rather than glob syntax. Don't forget the ^ anchor, otherwise the invocation will return true if $mychar occurs anywhere in $mystring.
Documentation: eq and ==, regexp, string
if { [string first e $yourString] == 0 }
...
set mychar "e"
if { [string first $mychar $myString] == 0}{
....

Groovy split without final trim

I'm parsing a CVS file like the following:
"07555555555",25.70,18/11/2010,01/03/2011,N,133,0,36,,896,537,547,,Mr,John,Doe,,
"07555555555",10.15,26/01/2011,01/03/2011,N,16,0,100,,896,537,547,,Mrs,Jane,Doe,,jane#doe.com
The thing is that when using a script like this:
file.eachLine{ line ->
items = line.split(",")
println items.length
}
The result is like the following:
16
18
Which makes me thing that the split function removes a final values. I need it to have all the items even if they are empty. Any idea?
I think you want to do:
items = line.split(',', -1)
to make sure you get all the tokens
(according to the javadoc):
The limit parameter controls the
number of times the pattern is applied
and therefore affects the length of
the resulting array. If the limit n is
greater than zero then the pattern
will be applied at most n - 1 times,
the array's length will be no greater
than n, and the array's last entry
will contain all input beyond the last
matched delimiter. If n is
non-positive then the pattern will be
applied as many times as possible and
the array can have any length. If n is
zero then the pattern will be applied
as many times as possible, the array
can have any length, and trailing
empty strings will be discarded.

Resources