Octave - return the position of the first occurrence of a string in a cell array - string

Is there a function in Octave that returns the position of the first occurrence of a string in a cell array?
I found findstr but this returns a vector, which I do not want. I want what index does but it only works for strings.
If there is no such function, are there any tips on how to go about it?

As findstr is being deprecated, a combination of find and strcmpi may prove useful. strcmpi compares strings by ignoring the case of the letters which may be useful for your purposes. If this is not what you want, use the function without the trailing i, so strcmp. The input into strcmpi or strcmp are the string to search for str and for your case the additional input parameter is a cell array A of strings to search in. The output of strcmpi or strcmp will give you a vector of logical values where each location k tells you whether the string k in the cell array A matched with str. You would then use find to find all locations of where the string matched, but you can further restrain it by specifying the maximum number of locations n as well as where to constrain your search - specifically if you want to look at the first or last n locations where the string matched.
If the desired string is in str and your cell array is stored in A, simply do:
index = find(strcmpi(str, A)), 1, 'first');
To reiterate, find will find all locations where the string matched, while the second and third parameters tell you to only return the first index of the result. Specifically, this will return the first occurrence of the desired searched string, or the empty array if it can't be found.
Example Run
octave:8> A = {'hello', 'hello', 'how', 'how', 'are', 'you'};
octave:9> str = 'hello';
octave:10> index = find(strcmpi(str, A), 1, 'first')
index = 1
octave:11> str = 'goodbye';
octave:12> index = find(strcmpi(str, A), 1, 'first')
index = [](1x0)

Related

Find biggest element in a String with words?

How can I, in ABAP, split a string into n parts AND determine which one is the biggest element? In my solution I would need to know how many elements there are, but I want to solve it for WHATEVER NUMBER of elements.
I tried the below code. And i searched the web.
DATA: string TYPE string VALUE 'this is a string'.
DATA: part1 TYPE c LENGTH 20.
DATA: part2 TYPE c LENGTH 20.
DATA: part3 TYPE c LENGTH 20.
DATA: part4 TYPE c LENGTH 20.
DATA: del TYPE c VALUE ' '.
DATA: bigger TYPE c LENGTH 20.
split: string AT del INTO part1 part2 part3 part4.
bigger = part1.
IF bigger > part2.
bigger = part1.
ELSEIF bigger > part3.
bigger = part2.
ELSE.
bigger = part4.
ENDIF.
WRITE: bigger.
Expected result: Works with any number of elements in a string and determines which one is biggest.
Actual result: I need to know how many elements there are
Here is one way to solve it:
DATA: string TYPE string VALUE 'this is a string'.
TYPES: BEGIN OF ty_words,
word TYPE string,
length TYPE i,
END OF ty_words.
DATA: ls_words TYPE ty_words.
DATA: gt_words TYPE STANDARD TABLE OF ty_words.
START-OF-SELECTION.
WHILE string IS NOT INITIAL.
SPLIT string AT space INTO ls_words-word string.
ls_words-length = strlen( ls_words-word ).
APPEND ls_words TO gt_words.
ENDWHILE.
SORT gt_words BY length DESCENDING.
READ TABLE gt_words
ASSIGNING FIELD-SYMBOL(<ls_longest_word>)
INDEX 1.
IF sy-subrc EQ 0.
WRITE: 'The longest word is:', <ls_longest_word>-word.
ENDIF.
Please note, it does not cover the case if there are more longest words with the same length, it will just show one of them.
You don't need to know the number of splitted parts if you split the string into an array. Then you LOOP over the array and check the string length to find the longest one.
While József Szikszai's solution works, it may be too complex for the functionality you need. This would work just as well: (also with the same limitation that it willl only output the first longest word and no other ones of the same length)
DATA string TYPE string VALUE 'this is a string'.
DATA parts TYPE STANDARD TABLE OF string.
DATA biggest TYPE string.
FIELD-SYMBOLS <part> TYPE string.
SPLIT string AT space INTO TABLE parts.
LOOP AT parts ASSIGNING <part>.
IF STRLEN( <part> ) > STRLEN( biggest ).
biggest = <part>.
ENDIF.
ENDLOOP.
WRITE biggest.
Edit: I assumed 'biggest' meant longest, but if you actually wanted the word that would be last in an alphabet, then you could sort the array descending and just output the first entry like this:
DATA string TYPE string VALUE 'this is a string'.
DATA parts TYPE STANDARD TABLE OF string.
DATA biggest TYPE string.
SPLIT string AT space INTO TABLE parts.
SORT parts DESCENDING.
READ TABLE parts INDEX 1 INTO biggest.
WRITE biggest.
With ABAP 740, you can also shorten it to:
SPLIT lv_s AT space INTO TABLE DATA(lt_word).
DATA(lv_longest) = REDUCE string( INIT longest = `` FOR <word> IN lt_word NEXT longest = COND #( WHEN strlen( <word> ) > strlen( longest ) THEN <word> ELSE longest ) ).
DATA(lv_alphabetic) = REDUCE string( INIT alph = `` FOR <word> IN lt_word NEXT alph = COND #( WHEN <word> > alph THEN <word> ELSE alph ) ).
If "biggest" means "longest" word here is the Regex way to do this:
FIND ALL OCCURRENCES OF REGEX '\w+' IN string RESULTS DATA(words).
SORT words BY length DESCENDING.
WRITE substring( val = string off = words[ 1 ]-offset len = words[ 1 ]-length ).

VBA: How to find the values after a "#" symbol in a string

I am trying to set the letters after a # symbol to a variable.
For example, x = #BAL
I want to set y = BAL
Or x = #NE
I want y = NE
I am using VBA.
Split() in my opinion is the easiest way to do it:
Dim myStr As String
myStr = "#BAL"
If InStr(, myStr, "#") > 0 Then '<-- Check for your string to not throw error
MsgBox Split(myStr, "#")(1)
End If
As wisely pointed out by Scott Craner, you should check to ensure the string contains the value, which he checks in this comment by doing: y = Split(x,"#")(ubound(Split(x,"#")). Another way you can do it is using InStr(): If InStr(, x, "#") > 0 Then...
The (1) will take everything after the first instance of the character you are looking for. If you were to have used (0), then this would have taken everything before the #.
Similar but different example:
Dim myStr As String
myStr = "#BAL#TEST"
MsgBox Split(myStr, "#")(2)
The message box would have returned TEST because you used (2), and this was the second instance of your # character.
Then you can even split them into an array:
Dim myStr As String, splitArr() As String
myStr = "#BAL#TEST"
splitArr = Split(myStr, "#") '< -- don't append the collection number this time
MsgBox SplitArr(1) '< -- This would return "BAL"
MsgBox SplitArr(2) '< -- This would return "TEST"
If you are looking for additional reading, here is more from the MSDN:
Split Function
Description Returns a zero-based, one-dimensional array containing a specified number of substrings. SyntaxSplit( expression [ ,delimiter [ ,limit [ ,compare ]]] ) The Split function syntax has thesenamed arguments:
expression
Required. String expression containing substrings and delimiters. If expression is a zero-length string(""), Split returns an empty array, that is, an array with no elements and no data.
delimiter
Optional. String character used to identify substring limits. If omitted, the space character (" ") is assumed to be the delimiter. If delimiter is a zero-length string, a single-element array containing the entire expression string is returned.
limit
Optional. Number of substrings to be returned; -1 indicates that all substrings are returned.
compare
Optional. Numeric value indicating the kind of comparison to use when evaluating substrings. See Settings section for values.
You can do the following to get the substring after the # symbol.
x = "#BAL"
y = Right(x,len(x)-InStr(x,"#"))
Where x can be any string, with characters before or after the # symbol.

Is there a way to substring, which is between two words in the string in Python?

My question is more or less similar to:
Is there a way to substring a string in Python?
but it's more specifically oriented.
How can I get a par of a string which is located between two known words in the initial string.
Example:
mySrting = "this is the initial string"
Substring = "initial"
knowing that "the" and "string" are the two known words in the string that can be used to get the substring.
Thank you!
You can start with simple string manipulation here. str.index is your best friend there, as it will tell you the position of a substring within a string; and you can also start searching somewhere later in the string:
>>> myString = "this is the initial string"
>>> myString.index('the')
8
>>> myString.index('string', 8)
20
Looking at the slice [8:20], we already get close to what we want:
>>> myString[8:20]
'the initial '
Of course, since we found the beginning position of 'the', we need to account for its length. And finally, we might want to strip whitespace:
>>> myString[8 + 3:20]
' initial '
>>> myString[8 + 3:20].strip()
'initial'
Combined, you would do this:
startIndex = myString.index('the')
substring = myString[startIndex + 3 : myString.index('string', startIndex)].strip()
If you want to look for matches multiple times, then you just need to repeat doing this while looking only at the rest of the string. Since str.index will only ever find the first match, you can use this to scan the string very efficiently:
searchString = 'this is the initial string but I added the relevant string pair a few more times into the search string.'
startWord = 'the'
endWord = 'string'
results = []
index = 0
while True:
try:
startIndex = searchString.index(startWord, index)
endIndex = searchString.index(endWord, startIndex)
results.append(searchString[startIndex + len(startWord):endIndex].strip())
# move the index to the end
index = endIndex + len(endWord)
except ValueError:
# str.index raises a ValueError if there is no match; in that
# case we know that we’re done looking at the string, so we can
# break out of the loop
break
print(results)
# ['initial', 'relevant', 'search']
You can also try something like this:
mystring = "this is the initial string"
mystring = mystring.strip().split(" ")
for i in range(1,len(mystring)-1):
if(mystring[i-1] == "the" and mystring[i+1] == "string"):
print(mystring[i])
I suggest using a combination of list, split and join methods.
This should help if you are looking for more than 1 word in the substring.
Turn the string into array:
words = list(string.split())
Get the index of your opening and closing markers then return the substring:
open = words.index('the')
close = words.index('string')
substring = ''.join(words[open+1:close])
You may want to improve a bit with the checking for the validity before proceeding.
If your problem gets more complex, i.e multiple occurrences of the pair values, I suggest using regular expression.
import re
substring = ''.join(re.findall(r'the (.+?) string', string))
The re should store substrings separately if you view them in list.
I am using the spaces between the description to rule out the spaces between words, you can modify to your needs as well.

Zipping strings together at arbitrary index and step (Python)

I am working in Python 2.7. I am trying to create a function which can zip a string into a larger string starting at an arbitrary index and with an arbitrary step.
For example, I may want to zip the string ##*#* into the larger string TNAXHAXMKQWGZESEJFPYDMYP starting at the 5th character with a step of 3. The resulting string should be:
TNAXHAX#MK#QW*GZ#ES*EJFPYDMYP
The working function that I came up with is
#Insert one character of string every nth position starting after ith position of text
text="TNAXHAXMKQWGZESEJFPYDMYP"
def zip_in(string,text,i,n):
text=list(text)
for c in string:
text.insert(i+n-1,c)
i +=n
text = ''.join(text)
print text
This function produces the desired result, but I feel that it is not as elegant as it could be.
Further, I would like it to be general enough that I can zip in a string backwards, that is, starting at the ith position of the text, I would like to insert the string in one character at a time with a backwards step.
For example, I may want to zip the string ##*#* into the larger string TNAXHAXMKQWGZESEJFPYDMYP starting at the 22nd position with a step of -3. The resulting string should be:
TNAXHAXMKQW*GZ#ES*EJ#FP#YDMYP
With my current function, I can do this by setting n negative, but if I want a step of -3, I need to set n as -2.
All of this leads me to my question:
Is there a more elegant (or Pythonic) way to achieve my end?
Here are some related questions which don't provide a general answer:
Pythonic way to insert every 2 elements in a string
Insert element in Python list after every nth element
Merge Two strings Together at N & X
You can use some functions from the itertools and more_itertools libraries (make sure to have them) and combine them to get your result : chunked and izip_longest.
# Parameters
s1 = 'ABCDEFGHIJKLMNOPQ' # your string
s2 = '####' # your string of elements to add
int_from = 4 # position from which we start adding letters
step = 2 # we will add in elements of s2 each 2 letters
return_list = list(s1)[:int_from] # keep the first int_from elements unchanged
for letter, char in izip_longest(chunked(list(s1)[int_from:], step), s2, fillvalue=''):
return_list.extend(letter)
return_list.append(char)
Then get your string back by doing :
''.join(return_list)
Output :
# For the parameters above the output is :
>> 'ABCDEF#GH#IJ#KL#MNOPQ'
What does izip_longest(chunked(list(s1)[int_from:], step), s2, fillvalue='') return ?:
for letter, char in izip_longest(chunked(list(s1)[int_from:], step), s2, fillvalue=''):
print(letter, char)
>> Output
>> (['E', 'F'], '#')
(['G', 'H'], '#')
(['I', 'J'], '#')
(['K', 'L'], '#')
(['M', 'N'], '')
(['O', 'P'], '')
(['Q'], '')

Finding indexes of strings in a string array in Matlab

I have two string arrays and I want to find where each string from the first array is in the second array, so i tried this:
for i = 1:length(array1);
cmp(i) = strfind(array2,array1(i,:));
end
This doesn't seem to work and I get an error: "must be one row".
Just for the sake of completeness, an array of strings is nothing but a char matrix. This can be quite restrictive because all of your strings must have the same number of elements. And that's what #neerad29 solution is all about.
However, instead of an array of strings you might want to consider a cell array of strings, in which every string can be arbitrarily long. I will report the very same #neerad29 solution, but with cell arrays. The code will also look a little bit smarter:
a = {'abcd'; 'efgh'; 'ijkl'};
b = {'efgh'; 'abcd'; 'ijkl'};
pos=[];
for i=1:size(a,1)
AreStringFound=cellfun(#(x) strcmp(x,a(i,:)),b);
pos=[pos find(AreStringFound)];
end
But some additional words might be needed:
pos will contain the indices, 2 1 3 in our case, just like #neerad29 's solution
cellfun() is a function which applies a given function, the strcmp() in our case, to every cell of a given cell array. x will be the generic cell from array b which will be compared with a(i,:)
the cellfun() returns a boolean array (AreStringFound) with true in position j if a(i,:) is found in the j-th cell of b and the find() will indeed return the value of j, our proper index. This code is more robust and works also if a given string is found in more than one position in b.
strfind won't work, because it is used to find a string within another string, not within an array of strings. So, how about this:
a = ['abcd'; 'efgh'; 'ijkl'];
b = ['efgh'; 'abcd'; 'ijkl'];
cmp = zeros(1, size(a, 1));
for i = 1:size(a, 1)
for j = 1:size(b, 1)
if strcmp(a(i, :), b(j, :))
cmp(i) = j;
break;
end
end
end
cmp =
2 1 3

Resources