MATLAB search cell array for string subset - string

I'm trying to find the locations where a substring occurs in a cell array in MATLAB. The code below works, but is rather ugly. It seems to me there should be an easier solution.
cellArray = [{'these'} 'are' 'some' 'nicewords' 'and' 'some' 'morewords'];
wordPlaces = cellfun(#length,strfind(cellArray,'words'));
wordPlaces = find(wordPlaces); % Word places is the locations.
cellArray(wordPlaces);
This is similar to, but not the same as this and this.

The thing to do is to encapsulate this idea as a function. Either inline:
substrmatch = #(x,y) ~cellfun(#isempty,strfind(y,x))
findmatching = #(x,y) y(substrmatch(x,y))
Or contained in two m-files:
function idx = substrmatch(word,cellarray)
idx = ~cellfun(#isempty,strfind(word,cellarray))
and
function newcell = findmatching(word,oldcell)
newcell = oldcell(substrmatch(word,oldcell))
So now you can just type
>> findmatching('words',cellArray)
ans =
'nicewords' 'morewords'

I don't know if you would consider it a simpler solution than yours, but regular expressions are a very good general-purpose utility I often use for searching strings. One way to extract the cells from cellArray that contains words with 'words' in them is as follows:
>> matches = regexp(cellArray,'^.*words.*$','match'); %# Extract the matches
>> matches = [matches{:}] %# Remove empty cells
matches =
'nicewords' 'morewords'

Related

MATLAB cell to string

I am trying to read an excel sheet and then and find cells that are not empty and have date information in them by finding two '/' in a string
but matlab keeps to erroring on handling cell type
"Undefined operator '~=' for input arguments of type 'cell'."
"Undefined function 'string' for input arguments of type 'cell'."
"Undefined function 'char' for input arguments of type 'cell'."
MyFolderInfo = dir('C:\');
filename = 'Export.xls';
[num,txt,raw] = xlsread(filename,'A1:G200');
for i = 1:length(txt)
if ~isnan(raw(i,1))
if sum(ismember(char(raw(i,1)),'/')) == 2
A(i,1) = raw(i,1);
end
end
end
please help fixing it
There are multiple issues with your code. Since raw is a cell array, you can't run isnan on it, isnan is for numerical arrays. Since all you're interested in is cells with text in them, you don't need to use raw at all, any blank cells will not be present in txt.
My approach is to create a logical array, has_2_slashes, and then use it to extract the elements from raw that have two slashes in them.
Here is my code. I generalized it to read multiple columns since your original code only seemed to be written to handle one column.
filename = 'Export.xls';
[~, ~, raw] = xlsread(filename, 'A1:G200');
[num_rows, num_cols] = size(raw);
has_2_slashes = false(num_rows, num_cols);
for row = 1:num_rows
for col = 1:num_cols
has_2_slashes(row, col) = sum(ismember(raw{row, col}, '/')) == 2;
end
end
A = raw(has_2_slashes);
cellfun(#numel,strfind(txt,'/'))
should give you a numerical array where the (i,j)th element contains the number of slashes. For example,
>> cellfun(#numel,strfind({'a','b';'/','/abc/'},'/'))
ans =
0 0
1 2
The key here is to use strfind.
Now you may want to expand a bit in your question on what you intend to do next with txt -- in other words, specify desired output more, which is always a good thing to do. If you intend to read the dates, it may be better to just read it upfront, for example by using regexp or datetime as opposed to getting an array which can then map to where the dates are. As is, using ans>=2 next gives you the logical array that can let you extract the matched entries.

How to remove several substring within a string in matlab?

I'm trying to implement in a different way what I can already do implementing some custom matlab functions. Let us suppose to have this string 'AAAAAAAAAAAaaaaaaaaaaaTTTTTTTTTTTTTTTTsssssssssssTTTTTTTTTT' I know to remove each lowercase sub strings with
regexprep(String, '[a-z]*', '')
But since I want to understand how to take indexes of these substrings and using them to check them and remove them maybe with a for loop I'm investigating about how to do it.
Regexp give the indexes :
[Start,End] = regexp(Seq,'[a-z]{1,}');
but i'm not succeeding in figuring out how to use them to check these sequences and eliminate them.
With the indexing approach you get several start and end indices (two in your example), so you need a loop to remove the corresponding sections from the string. You should remove them from last to first, otherwise indices that haven't been used yet will become invalid as you remove sections:
x = 'AAAAAAAAAAAaaaaaaaaaaaTTTTTTTTTTTTTTTTsssssssssssTTTTTTTTTT'; % input
y = x; % initiallize result
[Start, End] = regexp(x, '[a-z]{1,}');
for k = numel(Start):-1:1 % note: from last to first
y(Start(k):End(k)) = []; % remove section
end

Compare two strings and extract common word out

In MATLAB how can we compare 2 strings and print the common word out. For Example string1 = "hello my name is bob"; and string2 = "today bob went to the park"; the word bob is common in both. What is the structure to follow.
Use intersect with strsplit for a one-liner -
common_word = intersect(strsplit(string1),strsplit(string2))
strsplit splits each string to cells of words and then intersect finds the common one out.
If you would like to avoid strsplit, you can use regexp instead -
common_word =intersect(regexp(string1,'\s','Split'),regexp(string2,'\s','Split'))
Bonus: Removing stop-words from the common words
Let's add some stop-words that are common to these two strings -
string1 = 'hello my name is bob and I am going to the zoo'
string2 = 'today bob went to the park'
Using the solution presented earlier, you would get -
common_word =
'bob' 'the' 'to'
Now, these words - 'the' and 'to' are part of the stop-words. If you would like to have them removed, let me suggest this - Removing stop words from single string
and it's accepted solution.
The final output would be 'bob', whom you were looking for!
If you are looking for matching words only, that are separated by spaces, you can use strsplit to change each string into cell arrays of words, then loop through and search for each one.
str1 = 'test if this works';
str2 = 'does this work?';
cell1 = strsplit(str1);
cell2 = strsplit(str2);
for n = 1:length(cell1)
for m = 1:length(cell2)
if strcmp(cell1{n},cell2{m})
disp(cell1{n});
end
end
end
Notice that in my example the last member of cell2 is 'work?' so if you have punctuation in your strings, you'll have to do a check for that (isletter might help).

saving and retrieving string data in matlab

Hi can any one help me in dealing with strings in MATLAB. For example, the string
A = 'A good looking boy'
how can we store these individual words in arrays and later retrieve any of the words?
As found here, you could use
>> A = 'A good looking boy';
>> C = regexp(A,'[A-z]*', 'match')
C =
'A' 'good' 'looking' 'boy'
so that
>> C{1}
ans =
A
>> C{4}
ans =
boy
>> [C{:}]
ans =
Agoodlookingboy
The most intuitive way would be using strsplit
C = strsplit(A,' ')
However as it is not available in my version I suppose this is only a builtin function in matlab 2013a and above. You can find the documentation here.
If you are using an older version of matlab, you can also choose to get this File Exchange solution, which basically does the same.
You can use the simple function textscan for that:
C = textscan(A,'%s');
C will be a cell array. This function is in Matlab at least since R14.

Can you treat a string as one object in a list in MATLAB?

I would like to make a list of strings in MATLAB using the example below:
x = ['fun', 'today', 'sunny']
I want to be able to call x(1) and have it return 'fun', but instead I keep getting 'f'.
Also, is there a way to add a string to a list without getting the list giving back a number where the string should be? I have tried using str2double and a few other things. It seems like both of these thing should be possible to do in MATLAB.
The easiest way to store a list of strings that have different lengths is to use cell arrays. For example:
>> x = {'fun', 'today', 'sunny'}; %# Create a cell array of strings
>> x{1} %# Get the string from the first cell
ans =
fun
It's kind of a kludgy workaround, but
x = strsplit('fun.today.sunny', ',')
produces a list with individual, callable strings.

Resources