Find biggest element in a String with words? - string

How can I, in ABAP, split a string into n parts AND determine which one is the biggest element? In my solution I would need to know how many elements there are, but I want to solve it for WHATEVER NUMBER of elements.
I tried the below code. And i searched the web.
DATA: string TYPE string VALUE 'this is a string'.
DATA: part1 TYPE c LENGTH 20.
DATA: part2 TYPE c LENGTH 20.
DATA: part3 TYPE c LENGTH 20.
DATA: part4 TYPE c LENGTH 20.
DATA: del TYPE c VALUE ' '.
DATA: bigger TYPE c LENGTH 20.
split: string AT del INTO part1 part2 part3 part4.
bigger = part1.
IF bigger > part2.
bigger = part1.
ELSEIF bigger > part3.
bigger = part2.
ELSE.
bigger = part4.
ENDIF.
WRITE: bigger.
Expected result: Works with any number of elements in a string and determines which one is biggest.
Actual result: I need to know how many elements there are

Here is one way to solve it:
DATA: string TYPE string VALUE 'this is a string'.
TYPES: BEGIN OF ty_words,
word TYPE string,
length TYPE i,
END OF ty_words.
DATA: ls_words TYPE ty_words.
DATA: gt_words TYPE STANDARD TABLE OF ty_words.
START-OF-SELECTION.
WHILE string IS NOT INITIAL.
SPLIT string AT space INTO ls_words-word string.
ls_words-length = strlen( ls_words-word ).
APPEND ls_words TO gt_words.
ENDWHILE.
SORT gt_words BY length DESCENDING.
READ TABLE gt_words
ASSIGNING FIELD-SYMBOL(<ls_longest_word>)
INDEX 1.
IF sy-subrc EQ 0.
WRITE: 'The longest word is:', <ls_longest_word>-word.
ENDIF.
Please note, it does not cover the case if there are more longest words with the same length, it will just show one of them.

You don't need to know the number of splitted parts if you split the string into an array. Then you LOOP over the array and check the string length to find the longest one.
While József Szikszai's solution works, it may be too complex for the functionality you need. This would work just as well: (also with the same limitation that it willl only output the first longest word and no other ones of the same length)
DATA string TYPE string VALUE 'this is a string'.
DATA parts TYPE STANDARD TABLE OF string.
DATA biggest TYPE string.
FIELD-SYMBOLS <part> TYPE string.
SPLIT string AT space INTO TABLE parts.
LOOP AT parts ASSIGNING <part>.
IF STRLEN( <part> ) > STRLEN( biggest ).
biggest = <part>.
ENDIF.
ENDLOOP.
WRITE biggest.
Edit: I assumed 'biggest' meant longest, but if you actually wanted the word that would be last in an alphabet, then you could sort the array descending and just output the first entry like this:
DATA string TYPE string VALUE 'this is a string'.
DATA parts TYPE STANDARD TABLE OF string.
DATA biggest TYPE string.
SPLIT string AT space INTO TABLE parts.
SORT parts DESCENDING.
READ TABLE parts INDEX 1 INTO biggest.
WRITE biggest.

With ABAP 740, you can also shorten it to:
SPLIT lv_s AT space INTO TABLE DATA(lt_word).
DATA(lv_longest) = REDUCE string( INIT longest = `` FOR <word> IN lt_word NEXT longest = COND #( WHEN strlen( <word> ) > strlen( longest ) THEN <word> ELSE longest ) ).
DATA(lv_alphabetic) = REDUCE string( INIT alph = `` FOR <word> IN lt_word NEXT alph = COND #( WHEN <word> > alph THEN <word> ELSE alph ) ).

If "biggest" means "longest" word here is the Regex way to do this:
FIND ALL OCCURRENCES OF REGEX '\w+' IN string RESULTS DATA(words).
SORT words BY length DESCENDING.
WRITE substring( val = string off = words[ 1 ]-offset len = words[ 1 ]-length ).

Related

How to separate a string by Capital Letter?

I currently have to a code in ABAP which contains a String that has multiple words that start with Capital letters/Uppercase and there is no space in-between.
I have to separate it into an internal table like this:
INPUT :
NameAgeAddress
OUTPUT :
Name
Age
Address
Here is the shortest code I could find, which uses a regular expression combined with SPLIT:
SPLIT replace( val = 'NameAgeAddress' regex = `(?!^.)\u` with = ` $0` occ = 0 )
AT ` `
INTO TABLE itab.
So, replace converts 'NameAgeAddress' into 'Name Age Address' and SPLIT puts the 3 words into an internal table.
Details:
(?!^.) to say the next character to find (\u) should not be the first character
\u being any upper case letter
$0 to replace the found string ($0) by itself preceded with a space character
occ = 0 to replace all occurrences
Unfortunately, the SPLIT statement in ABAP does not allow a regex as separator expression. Therefore, we have to use progressive matching, which is a bit awkward in ABAP:
report zz_test_split_capital.
parameters: p_input type string default 'NameAgeAddress' lower case.
data: output type stringtab,
off type i,
moff type i,
mlen type i.
while off < strlen( p_input ).
find regex '[A-Z][^A-Z]*'
in section offset off of p_input
match offset moff match length mlen.
if sy-subrc eq 0.
append substring( val = p_input off = moff len = mlen ) to output.
off = moff + mlen.
else.
exit.
endif.
endwhile.
cl_demo_output=>display_data( output ).
Just for comparison, the following statement would do the job in Perl:
my $input = "NameAgeAddress";
my #output = split /(?=[A-Z])/, $input;
# gives #output = ('Name','Age','Address')
It is easy with using regular expressions. The solution could look like this.
REPORT ZZZ.
DATA: g_string TYPE string VALUE `NameAgeAddress`.
DATA(gcl_regex) = NEW cl_abap_regex( pattern = `[A-Z]{1}[a-z]+` ).
DATA(gcl_matcher) = gcl_regex->create_matcher( text = g_string ).
WHILE gcl_matcher->find_next( ).
DATA(g_match_result) = gcl_matcher->get_match( ).
WRITE / g_string+g_match_result-offset(g_match_result-length).
ENDWHILE.
For when regular expressions are just overkill and plain old ABAP will do:
DATA(str) = 'NameAgeAddress'.
IF str CA sy-abcde.
DATA(off) = 0.
DO.
data(tailstart) = off + 1.
IF str+tailstart CA sy-abcde.
DATA(len) = sy-fdpos + 1.
WRITE: / str+off(len).
add len to off.
ELSE.
EXIT.
ENDIF.
ENDDO.
write / str+off.
ENDIF.
If you do not want to use or cannot use Regex, here another solution:
DATA: lf_input TYPE string VALUE 'NameAgeAddress',
lf_offset TYPE i,
lf_current_letter TYPE char1,
lf_letter_in_capital TYPE char1,
lf_word TYPE string,
lt_word LIKE TABLE OF lf_word.
DO strlen( lf_input ) TIMES.
lf_offset = sy-index - 1.
lf_current_letter = lf_input+lf_offset(1).
lf_letter_in_capital = to_upper( lf_current_letter ).
IF lf_current_letter = lf_letter_in_capital.
APPEND INITIAL LINE TO lt_word ASSIGNING FIELD-SYMBOL(<ls_word>).
ENDIF.
IF <ls_word> IS ASSIGNED. "if input string does not start with capital letter
<ls_word> = <ls_word> && lf_current_letter.
ENDIF.
ENDDO.

Octave - return the position of the first occurrence of a string in a cell array

Is there a function in Octave that returns the position of the first occurrence of a string in a cell array?
I found findstr but this returns a vector, which I do not want. I want what index does but it only works for strings.
If there is no such function, are there any tips on how to go about it?
As findstr is being deprecated, a combination of find and strcmpi may prove useful. strcmpi compares strings by ignoring the case of the letters which may be useful for your purposes. If this is not what you want, use the function without the trailing i, so strcmp. The input into strcmpi or strcmp are the string to search for str and for your case the additional input parameter is a cell array A of strings to search in. The output of strcmpi or strcmp will give you a vector of logical values where each location k tells you whether the string k in the cell array A matched with str. You would then use find to find all locations of where the string matched, but you can further restrain it by specifying the maximum number of locations n as well as where to constrain your search - specifically if you want to look at the first or last n locations where the string matched.
If the desired string is in str and your cell array is stored in A, simply do:
index = find(strcmpi(str, A)), 1, 'first');
To reiterate, find will find all locations where the string matched, while the second and third parameters tell you to only return the first index of the result. Specifically, this will return the first occurrence of the desired searched string, or the empty array if it can't be found.
Example Run
octave:8> A = {'hello', 'hello', 'how', 'how', 'are', 'you'};
octave:9> str = 'hello';
octave:10> index = find(strcmpi(str, A), 1, 'first')
index = 1
octave:11> str = 'goodbye';
octave:12> index = find(strcmpi(str, A), 1, 'first')
index = [](1x0)

Matlab. Find the indices of a cell array of strings with characters all contained in a given string (without repetition)

I have one string and a cell array of strings.
str = 'actaz';
dic = {'aaccttzz', 'ac', 'zt', 'ctu', 'bdu', 'zac', 'zaz', 'aac'};
I want to obtain:
idx = [2, 3, 6, 8];
I have written a very long code that:
finds the elements with length not greater than length(str);
removes the elements with characters not included in str;
finally, for each remaining element, checks the characters one by one
Essentially, it's an almost brute force code and runs very slowly. I wonder if there is a simple way to do it fast.
NB: I have just edited the question to make clear that characters can be repeated n times if they appear n times in str. Thanks Shai for pointing it out.
You can sort the strings and then match them using regular expression. For your example the pattern will be ^a{0,2}c{0,1}t{0,1}z{0,1}$:
u = unique(str);
t = ['^' sprintf('%c{0,%d}', [u; histc(str,u)]) '$'];
s = cellfun(#sort, dic, 'uni', 0);
idx = find(~cellfun('isempty', regexp(s, t)));
I came up with this :
>> g=#(x,y) sum(x==y) <= sum(str==y);
>> h=#(t)sum(arrayfun(#(x)g(t,x),t))==length(t);
>> f=cellfun(#(x)h(x),dic);
>> find(f)
ans =
2 3 6
g & h: check if number of count of each letter in search string <= number of count in str.
f : finally use g and h for each element in dic

Matlab - How do I compare two strings letter by letter?

Essentially, I have two strings of equal length, let's say 'AGGTCT' and 'AGGCCT' for examples sake. I want to compare them position by position and get a readout of when they do not match. So here I would hope to get 1 out because there is only 1 position where they do not match at position 4. If anyone has ideas for the positional comparison code that would help me a lot to get started.
Thank you!!
Use the following syntax to get the number of dissimilar characters for strings of equal size:
sum( str1 ~= str2 )
If you want to be case insensitive, use:
sum( lower(str1) ~= lower(str2) )
The expression str1 ~= str2 performs char-by-char comparison of the two strings, yielding a logical vector of the same size as the strings, with true where they mismatch (using ~=) and false where they match. To get your result simply sum the number of true values (mismatches).
EDIT: if you want to count the number of matching chars you can:
Use "equal to" == operator (instead of "not-equal to" ~= operator):
sum( str1 == str2 )
Subtract the number of mismatch, from the total number:
numel(str1) - sum( str1 ~= str2 )
You can compare all the element of the string:
r = all(seq1 == seq2)
This will compare char by char and return true if all the element in the resulting array are true. If the strings can have different sizes you may want to compare the sizes first. An alternative is
r = any(seq1 ~= seq2)
Another solution is to use strcmp:
r = strcmp(seq1, seq2)
Just would like to point out that you are asking to calculate the hamming distance (as you ask for alternatives - the article contains links to some). This is already discussed here. In short the builtin command pdist can do it.

Looping string characters?

How can I read each character in a String? For example, I want to read each character in String "a7m4d0". After that I want to verify that each character is a character or a number. Any tips or ideas?
DATA: smth TYPE string VALUE `qwert1yua22sd123bnm,`,
index TYPE i,
length TYPE i,
char TYPE c,
num TYPE i.
length = STRLEN( smth ).
WHILE index < length.
char = smth+index(1).
TRY .
num = char.
WRITE: / num,'was a number'.
CATCH cx_sy_conversion_no_number.
WRITE: / char,'was no number'.
ENDTRY.
ADD 1 TO index.
ENDWHILE.
Here's your problem solved :P
A bit convoluted and on a recent 740 ABAP server. :)
DATA: lv_text TYPE string VALUE `a7m4d0`.
DO strlen( lv_text ) TIMES.
DATA(lv_single) = substring( val = lv_text off = sy-index - 1 len = 1 ) && ` is ` &&
COND string( WHEN substring( val = lv_text off = sy-index - 1 len = 1 ) CO '0123456789' THEN 'Numeric'
ELSE 'Character' ).
WRITE : / lv_single.
ENDDO.
Here is how you can access a single character within a string:
This example will extract out the character "t" into the variable "lv_char1".
DATA: lv_string TYPE char10,
lv_char TYPE char1.
lv_string = "Something";
lv_char1 = lv_string+4(1).
Appending "+4" to the string name specifies the offset from the start of the string (in this case 4), and "(1)" specifies the number of characters to pick up.
See the documentation here for more info:
http://help.sap.com/saphelp_nw04/Helpdata/EN/fc/eb341a358411d1829f0000e829fbfe/content.htm
If you want to look at each character in turn, you could get the length of the field using "strlen( )" and do a loop for each character.
One more approach
PERFORM analyze_string USING `qwert1yua22sd123bnm,`.
FORM analyze_string USING VALUE(p_string) TYPE string.
WHILE p_string IS NOT INITIAL.
IF p_string(1) CA '0123456798'.
WRITE: / p_string(1) , 'was a number'.
ELSE.
WRITE: / p_string(1) , 'was no number'.
ENDIF.
p_string = p_string+1.
ENDWHILE.
ENDFORM.
No DATA statements, string functions or explicit indexing required.
I know the post it's old but this might be useful, this is what use :)
DATA lv_counter TYPE i.
DO STRLEN( lv_word ) TIMES.
IF lv_word+lv_counter(1) CA '0123456789'
"It's a number
ENDIF.
lv_counter = lv_counter + 1.
ENDDO.

Resources