Essentially, I have two strings of equal length, let's say 'AGGTCT' and 'AGGCCT' for examples sake. I want to compare them position by position and get a readout of when they do not match. So here I would hope to get 1 out because there is only 1 position where they do not match at position 4. If anyone has ideas for the positional comparison code that would help me a lot to get started.
Thank you!!
Use the following syntax to get the number of dissimilar characters for strings of equal size:
sum( str1 ~= str2 )
If you want to be case insensitive, use:
sum( lower(str1) ~= lower(str2) )
The expression str1 ~= str2 performs char-by-char comparison of the two strings, yielding a logical vector of the same size as the strings, with true where they mismatch (using ~=) and false where they match. To get your result simply sum the number of true values (mismatches).
EDIT: if you want to count the number of matching chars you can:
Use "equal to" == operator (instead of "not-equal to" ~= operator):
sum( str1 == str2 )
Subtract the number of mismatch, from the total number:
numel(str1) - sum( str1 ~= str2 )
You can compare all the element of the string:
r = all(seq1 == seq2)
This will compare char by char and return true if all the element in the resulting array are true. If the strings can have different sizes you may want to compare the sizes first. An alternative is
r = any(seq1 ~= seq2)
Another solution is to use strcmp:
r = strcmp(seq1, seq2)
Just would like to point out that you are asking to calculate the hamming distance (as you ask for alternatives - the article contains links to some). This is already discussed here. In short the builtin command pdist can do it.
Related
How can I, in ABAP, split a string into n parts AND determine which one is the biggest element? In my solution I would need to know how many elements there are, but I want to solve it for WHATEVER NUMBER of elements.
I tried the below code. And i searched the web.
DATA: string TYPE string VALUE 'this is a string'.
DATA: part1 TYPE c LENGTH 20.
DATA: part2 TYPE c LENGTH 20.
DATA: part3 TYPE c LENGTH 20.
DATA: part4 TYPE c LENGTH 20.
DATA: del TYPE c VALUE ' '.
DATA: bigger TYPE c LENGTH 20.
split: string AT del INTO part1 part2 part3 part4.
bigger = part1.
IF bigger > part2.
bigger = part1.
ELSEIF bigger > part3.
bigger = part2.
ELSE.
bigger = part4.
ENDIF.
WRITE: bigger.
Expected result: Works with any number of elements in a string and determines which one is biggest.
Actual result: I need to know how many elements there are
Here is one way to solve it:
DATA: string TYPE string VALUE 'this is a string'.
TYPES: BEGIN OF ty_words,
word TYPE string,
length TYPE i,
END OF ty_words.
DATA: ls_words TYPE ty_words.
DATA: gt_words TYPE STANDARD TABLE OF ty_words.
START-OF-SELECTION.
WHILE string IS NOT INITIAL.
SPLIT string AT space INTO ls_words-word string.
ls_words-length = strlen( ls_words-word ).
APPEND ls_words TO gt_words.
ENDWHILE.
SORT gt_words BY length DESCENDING.
READ TABLE gt_words
ASSIGNING FIELD-SYMBOL(<ls_longest_word>)
INDEX 1.
IF sy-subrc EQ 0.
WRITE: 'The longest word is:', <ls_longest_word>-word.
ENDIF.
Please note, it does not cover the case if there are more longest words with the same length, it will just show one of them.
You don't need to know the number of splitted parts if you split the string into an array. Then you LOOP over the array and check the string length to find the longest one.
While József Szikszai's solution works, it may be too complex for the functionality you need. This would work just as well: (also with the same limitation that it willl only output the first longest word and no other ones of the same length)
DATA string TYPE string VALUE 'this is a string'.
DATA parts TYPE STANDARD TABLE OF string.
DATA biggest TYPE string.
FIELD-SYMBOLS <part> TYPE string.
SPLIT string AT space INTO TABLE parts.
LOOP AT parts ASSIGNING <part>.
IF STRLEN( <part> ) > STRLEN( biggest ).
biggest = <part>.
ENDIF.
ENDLOOP.
WRITE biggest.
Edit: I assumed 'biggest' meant longest, but if you actually wanted the word that would be last in an alphabet, then you could sort the array descending and just output the first entry like this:
DATA string TYPE string VALUE 'this is a string'.
DATA parts TYPE STANDARD TABLE OF string.
DATA biggest TYPE string.
SPLIT string AT space INTO TABLE parts.
SORT parts DESCENDING.
READ TABLE parts INDEX 1 INTO biggest.
WRITE biggest.
With ABAP 740, you can also shorten it to:
SPLIT lv_s AT space INTO TABLE DATA(lt_word).
DATA(lv_longest) = REDUCE string( INIT longest = `` FOR <word> IN lt_word NEXT longest = COND #( WHEN strlen( <word> ) > strlen( longest ) THEN <word> ELSE longest ) ).
DATA(lv_alphabetic) = REDUCE string( INIT alph = `` FOR <word> IN lt_word NEXT alph = COND #( WHEN <word> > alph THEN <word> ELSE alph ) ).
If "biggest" means "longest" word here is the Regex way to do this:
FIND ALL OCCURRENCES OF REGEX '\w+' IN string RESULTS DATA(words).
SORT words BY length DESCENDING.
WRITE substring( val = string off = words[ 1 ]-offset len = words[ 1 ]-length ).
s = 'eljwboboblejr' # dont paste into grader
count = 0
for i in range (len(s)):
if s[i:i+3]== 'bob':
count+=1
print('Number of times bob occurs is: ' + str(count))
I do not get how len is working here, or if s[i:i+3] == 'bob'
So what happens here is that the i goes through all the letters, and slice all the letters by i and i+3 in each loop. What len is doing is just taking the length of s (basically how many characters there are in it) and returning it as an integer. What the s[i:i+3] == 'bob' is doing is determining if the sliced string is equal to 'bob'. So imagine that the i represents all the letters in the s string. So if the sliced string that is contained by the i and i+3 has 'bob' in it, it returns true. It's not the greatest of explanations, but I hope it helps.
documentation for len is here:
https://docs.python.org/3.2/library/functions.html#len
It will be implemented in string as a magic private function (__len__, I believe).
documentation for range is here:
https://docs.python.org/3.2/library/functions.html#range
With one arg, range generates integers 0 to that arg (excluding arg itself).
The slice in the loop evaluates to 'elj', then 'ljw', then 'jwb', ... in subsequent iterations. The slice [a:b] doesn't include the b'th element.
Is there a function in Octave that returns the position of the first occurrence of a string in a cell array?
I found findstr but this returns a vector, which I do not want. I want what index does but it only works for strings.
If there is no such function, are there any tips on how to go about it?
As findstr is being deprecated, a combination of find and strcmpi may prove useful. strcmpi compares strings by ignoring the case of the letters which may be useful for your purposes. If this is not what you want, use the function without the trailing i, so strcmp. The input into strcmpi or strcmp are the string to search for str and for your case the additional input parameter is a cell array A of strings to search in. The output of strcmpi or strcmp will give you a vector of logical values where each location k tells you whether the string k in the cell array A matched with str. You would then use find to find all locations of where the string matched, but you can further restrain it by specifying the maximum number of locations n as well as where to constrain your search - specifically if you want to look at the first or last n locations where the string matched.
If the desired string is in str and your cell array is stored in A, simply do:
index = find(strcmpi(str, A)), 1, 'first');
To reiterate, find will find all locations where the string matched, while the second and third parameters tell you to only return the first index of the result. Specifically, this will return the first occurrence of the desired searched string, or the empty array if it can't be found.
Example Run
octave:8> A = {'hello', 'hello', 'how', 'how', 'are', 'you'};
octave:9> str = 'hello';
octave:10> index = find(strcmpi(str, A), 1, 'first')
index = 1
octave:11> str = 'goodbye';
octave:12> index = find(strcmpi(str, A), 1, 'first')
index = [](1x0)
I have been given a binary string of length n and i need to find the minimum numbers of operations to perform such that string does not contain more than k consecutive equal characters.
Only kind of operation I am allowed to perform is to flip any ith character of the string. flipping a character means changing a '1' to '0' or a '0' to '1'.
for example:
if n = 4 , k = 1 and string = 1001
then Answer:
string = 1010 and minimum operations = 2
I need to also find the new string.
can anyone tell me an efficient algorithm for solving problem considering n <=10^5
There's one way:
if k>1:
if k+1 matching characters are found:
if a[k+1]==a[k+2]:
flip a[k+1]
else if a[k+1]!=a[k+2]:
flip a[k]
for k=1 you can do it!
Here flipping means from 1 to 0 and vice-versa
For k=1 there are only two possible output strings - the one beginning with 0 and the one beginning with 1. You can check which of them is closer to the input string.
For larger k, you can just look at every sequence of k+1 identical characters, and fix it internally - without changing the characters at either end. For a sequence of k' > k you would need floor(k'/(k+1)) flips. It should not be hard to show that this is optimal.
Running time is linear and extra space is constant.
There are 2 cases:
1)For k>1
We have 2 possibilities.
a)one that is starting with 0:
eg:0101010101
b)one that is starrting with 1
eg:10101010.....
We should now calculate the distance(the number of different elements between the 2 strings)for each possiblity.Then the ans will be the one that has minimum changes.
2)for k>1
res2=0;res1=1;
c1=A[i];//it represents the last elemnet
i=1;
while(A[i]!='\0'){
if(A[i]==c1){
res1++;//the no of consecutive elements
if(res1>k){
if(A[i]==A[i+1])
flip(i);//it flips the ith element
else
flip(i-1);
res2++;//it counts the no of changes
res1=1;
}
}
else
res1=1;
c1=A[i];
i++;
}
Many languages have functions for converting string to integer and vice versa. So what happens there? What algorithm is being executed during conversion?
I don't ask in specific language because I think it should be similar in all of them.
To convert a string to an integer, take each character in turn and if it's in the range '0' through '9', convert it to its decimal equivalent. Usually that's simply subtracting the character value of '0'. Now multiply any previous results by 10 and add the new value. Repeat until there are no digits left. If there was a leading '-' minus sign, invert the result.
To convert an integer to a string, start by inverting the number if it is negative. Divide the integer by 10 and save the remainder. Convert the remainder to a character by adding the character value of '0'. Push this to the beginning of the string; now repeat with the value that you obtained from the division. Repeat until the divided value is zero. Put out a leading '-' minus sign if the number started out negative.
Here are concrete implementations in Python, which in my opinion is the language closest to pseudo-code.
def string_to_int(s):
i = 0
sign = 1
if s[0] == '-':
sign = -1
s = s[1:]
for c in s:
if not ('0' <= c <= '9'):
raise ValueError
i = 10 * i + ord(c) - ord('0')
return sign * i
def int_to_string(i):
s = ''
sign = ''
if i < 0:
sign = '-'
i = -i
while True:
remainder = i % 10
i = i / 10
s = chr(ord('0') + remainder) + s
if i == 0:
break
return sign + s
I wouldn't call it an algorithm per se, but depending on the language it will involve the conversion of characters into their integral equivalent. Many languages will either stop on the first character that cannot be represented as an integer (e.g. the letter a), will blindly convert all characters into their ASCII value (e.g. the letter a becomes 97), or will ignore characters that cannot be represented as integers and only convert the ones that can - or return 0 / empty. You have to get more specific on the framework/language to provide more information.
String to integer:
Many (most) languages represent strings, on some level or another, as an array (or list) of characters, which are also short integers. Map the ones corresponding to number characters to their number value. For example, '0' in ascii is represented by 48. So you map 48 to 0, 49 to 1, and so on to 9.
Starting from the left, you multiply your current total by 10, add the next character's value, and move on. (You can make a larger or smaller map, change the number you multiply by at each step, and convert strings of any base you like.)
Integer to string is a longer process involving base conversion to 10. I suppose that since most integers have limited bits (32 or 64, usually), you know that it will come to a certain number of characters at most in a string (20?). So you can set up your own adder and iterate through each place for each bit after calculating its value (2^place).