Delete repeated characters in strings in cell array - string

I have a cell array like this :
Input = {'CEEEGH';'CCEEG';'ABCDEFF';'BCFGG';'BCDEEG';'BEFFH';'AACEGH'}
How can I delete all of the repeated characters and just keep only 1 character left in each string in the Input ? The expected output should be like this:
Output = {'CEGH';'CEG';'ABCDEF';'BCFG';'BCDEG';'BEFH';'ACEGH'}

use :
cellfun(#unique,input,'UniformOutput',0)
ans =
'CEGH'
'CEG'
'ABCDEF'
'BCFG'
'BCDEG'
'BEFH'
'ACEGH'
EDIT:
To conserve ordering in case the letters are not sorted, as #thewaywewalk commented, you can use:
cellfun(#(x) unique(x,'stable'),input,'UniformOutput',0)

Related

Separating a string with large letters into words that begin with the same letters

Suppose you have a string "TodayIsABeautifulDay". How can we get separate it in Python into words like this ["Today", "Is", "A", "Beautiful", "Day"]?
First, use an empty list ‘words’ and append the first letter of ‘word’ to it.
Now using a for loop, check if the current character is in lower case or not, if yes append it to the current string, otherwise, if uppercase, begin a new individual string.
def split_words(word):
words = [[word[0]]]
for char in word[1:]:
if words[-1][-1].islower() and char.isupper():
words.append(list(char))
else:
words[-1].append(char)
return [''.join(word) for word in words]
You can use this function :
word = "TodayIsABeautifulDay"
print(split_words(word))

How to remove the alphanumeric characters from a list and split them in the result?

'''def tokenize(s):
string = s.lower().split()
getVals = list([val for val in s if val.isalnum()])
result = "".join(getVals)
print (result)'''
tokenize('AKKK#eastern B!##est!')
Im trying for the output of ('akkkeastern', 'best')
but my output for the above code is - AKKKeasternBest
what are the changes I should be making
Using a list comprehension is a good way to filter elements out of a sequence like a string. In the example below, the list comprehension is used to build a list of characters (characters are also strings in Python) that are either alphanumeric or a space - we are keeping the space around to use later to split the list. After the filtered list is created, what's left to do is make a string out of it using join and last but not least use split to break it in two at the space.
Example:
string = 'AKKK#eastern B!##est!'
# Removes non-alpha chars, but preserves space
filtered = [
char.lower()
for char in string
if char.isalnum() or char == " "
]
# String-ifies filtered list, and splits on space
result = "".join(filtered).split()
print(result)
Output:
['akkkeastern', 'best']

Find biggest element in a String with words?

How can I, in ABAP, split a string into n parts AND determine which one is the biggest element? In my solution I would need to know how many elements there are, but I want to solve it for WHATEVER NUMBER of elements.
I tried the below code. And i searched the web.
DATA: string TYPE string VALUE 'this is a string'.
DATA: part1 TYPE c LENGTH 20.
DATA: part2 TYPE c LENGTH 20.
DATA: part3 TYPE c LENGTH 20.
DATA: part4 TYPE c LENGTH 20.
DATA: del TYPE c VALUE ' '.
DATA: bigger TYPE c LENGTH 20.
split: string AT del INTO part1 part2 part3 part4.
bigger = part1.
IF bigger > part2.
bigger = part1.
ELSEIF bigger > part3.
bigger = part2.
ELSE.
bigger = part4.
ENDIF.
WRITE: bigger.
Expected result: Works with any number of elements in a string and determines which one is biggest.
Actual result: I need to know how many elements there are
Here is one way to solve it:
DATA: string TYPE string VALUE 'this is a string'.
TYPES: BEGIN OF ty_words,
word TYPE string,
length TYPE i,
END OF ty_words.
DATA: ls_words TYPE ty_words.
DATA: gt_words TYPE STANDARD TABLE OF ty_words.
START-OF-SELECTION.
WHILE string IS NOT INITIAL.
SPLIT string AT space INTO ls_words-word string.
ls_words-length = strlen( ls_words-word ).
APPEND ls_words TO gt_words.
ENDWHILE.
SORT gt_words BY length DESCENDING.
READ TABLE gt_words
ASSIGNING FIELD-SYMBOL(<ls_longest_word>)
INDEX 1.
IF sy-subrc EQ 0.
WRITE: 'The longest word is:', <ls_longest_word>-word.
ENDIF.
Please note, it does not cover the case if there are more longest words with the same length, it will just show one of them.
You don't need to know the number of splitted parts if you split the string into an array. Then you LOOP over the array and check the string length to find the longest one.
While József Szikszai's solution works, it may be too complex for the functionality you need. This would work just as well: (also with the same limitation that it willl only output the first longest word and no other ones of the same length)
DATA string TYPE string VALUE 'this is a string'.
DATA parts TYPE STANDARD TABLE OF string.
DATA biggest TYPE string.
FIELD-SYMBOLS <part> TYPE string.
SPLIT string AT space INTO TABLE parts.
LOOP AT parts ASSIGNING <part>.
IF STRLEN( <part> ) > STRLEN( biggest ).
biggest = <part>.
ENDIF.
ENDLOOP.
WRITE biggest.
Edit: I assumed 'biggest' meant longest, but if you actually wanted the word that would be last in an alphabet, then you could sort the array descending and just output the first entry like this:
DATA string TYPE string VALUE 'this is a string'.
DATA parts TYPE STANDARD TABLE OF string.
DATA biggest TYPE string.
SPLIT string AT space INTO TABLE parts.
SORT parts DESCENDING.
READ TABLE parts INDEX 1 INTO biggest.
WRITE biggest.
With ABAP 740, you can also shorten it to:
SPLIT lv_s AT space INTO TABLE DATA(lt_word).
DATA(lv_longest) = REDUCE string( INIT longest = `` FOR <word> IN lt_word NEXT longest = COND #( WHEN strlen( <word> ) > strlen( longest ) THEN <word> ELSE longest ) ).
DATA(lv_alphabetic) = REDUCE string( INIT alph = `` FOR <word> IN lt_word NEXT alph = COND #( WHEN <word> > alph THEN <word> ELSE alph ) ).
If "biggest" means "longest" word here is the Regex way to do this:
FIND ALL OCCURRENCES OF REGEX '\w+' IN string RESULTS DATA(words).
SORT words BY length DESCENDING.
WRITE substring( val = string off = words[ 1 ]-offset len = words[ 1 ]-length ).

Delete repeated pairs in a string

I have a string S='BDBCFBCFABDDEABCCDGAEAABCEAAHF'. The string S is combined by many pairs respectively such as : 'BD', 'BC', 'FB',...,'HF'.
How can I delete all of the repeated pairs in this string? I would like to delete the pairs which has the same characters as well such as 'AA','BB',...,'ZZ'
The output should be:
Out = 'BDBCFBCFABEABCCDGAEACEHF'
Depending on your restrictions maybe you're after:
U = unique(reshape(S,[],2),'rows','stable')
And from there you can delete rows of double letters like:
out = U(U(:,1)~=U(:,2),:)

Matlab fints doesn't like a string value I pass as an argument

I have a program that takes the columns of a fints-object, multiplies them together pairwise in all combinations and output the result in a new fints object. I have the code for the data, but I also want the series labels to carry through so that the product of column a and b has label a*b.
function tsB = MulTS(tsA)
anames = fieldnames(tsA,1)';
A = fts2mat(tsA);
[i,j] = meshgrid(1:size(A,2),1:size(A,2));
B = Mul(A(:,i(:)),A(:,j(:)));
q = [anames(:,i(:)); anames(:,j(:))];
bnames = strcat(q(1,:),'*', q(2,:));
tsB=fints(tsA.dates, B, bnames);
end
I get warnings when I run it.
tsA= fints([1 2 3]', [[1 1 1]' [2 2 2]'],{'a','b'}');
MulTS(tsA)
??? Error using ==> fints.fints at 188
Illegal name(s) detected. Please check the name(s).
Error in ==> MulTS at 10
tsB=fints(tsA.dates, B, bnames);"
It seems Matlab doesn't like the format of bnames. I've tried googling stuff like "convert cell array to string matlab" and trying things like b = {bnames}. What am I doing wrong?
Your datanames (bnames in MulTS) seems to contain a "*" character, which is illegal according to fints documentation:
datanames
Cell array of data series names. Overrides the default data series names. Default data series names are series1, series2, and so on.
Note: Not all strings are accepted as datanames parameters. Supported data series names cannot start with a number and must contain only these characters:
Lowercase Latin alphabet, a to z
Uppercase Latin alphabet, A to Z
Underscore, _
Try replacing the "*" with "_" or something else.

Resources