MATLAB find a row of chars in a matrix - string

There is a char matrix. Each row is a word. All the words/rows have the same length.
Is there anyway to find a target word in this matrix using find() function?
Say M=[o k a y; g o o d; h a v e]; A target word W='have'; Is there any function to search W in M and return the row index?
When I try index = find(isequal(W,M)) and index = find(isequal(W,M(:,:))), they both return empty array.
I know that I can use linear search, i.e. to compare row by row, just wonder if there is a built-in function to solve this problem. Thanks!

Another approach would be to use ismember. Assuming that M is a character array like you have defined, you would thus do:
W = 'have';
[~,idx] = ismember(M, W, 'rows');
The first input is the character array defined, the second input is the string we are searching for, and we choose 'rows' as the flag as we want to search on a row basis. Each word is defined in one row. The first output is true/false that is of the same size as M that tells you whether or not we have found the word indexed by that row in M is a match to the one you're searching for. Because we only want to know where that word is, we can suppress that first output. idx tells you which row the word was found.
When we call this function, we get:
idx =
3
This means that the third row contains the word you're looking for. However, if you're dead set on using find, consider using ismember in conjunction with find:
W = 'have';
idx = find(ismember(M, W, 'rows'));
idx =
3
However, the advantage with the above approach is that it will find all locations that match the particular word you're looking for.

You need to compare M with a string. I am assuming M contains of individual characters as follows:
M=['o' 'k' 'a' 'y'; 'g' 'o' 'o' 'd'; 'h' 'a' 'v' 'e']
and W='have'
So in order to compare each row of M with W, you need to use strcmp. For that, you need M to be a cell array. You can convert each row of M into a cell array by using mat2cell.
equalRows=strcmp(W, mat2cell(M,ones(1,size(M,1)),size(M,2)));
% Answer
equalRows =
0
0
1
Use find command on the above output to get the indices.

Related

2D vector lookup [x;y]

I've got a matrix, with two coordinates [i;j]
I'm trying to automatize a lookup:
As an example, this would have the coordinates of [1;2]
Here's a table of all the coordinates:
So here, obviously [1;2] would equate to 143,33
To simplify the issue:
I'll try to go step by step over what I'm trying to do to make the question bit less confusing.
Think of what I'm trying to do as a function, lookup(i, j) => value
Now, refer to the second picture (table)
I find all rows containing index [i] (inside column C) and then
only for those rows find row containing index [j] (inside column D ∩ for rows from previous step)
Return [i;j] value
So if u invoked lookup(2, 4)
Find all rows matching i = 2
Row 5: i = 2 ; j = 3
Row 6: i = 2 ; j = 4
Row 7: i = 2 ; j = 5
Lookup column j for j=4 from found rows
Found row 6: i = 2 ; j = 4.
Return value (offset for yij column = 143,33)
Now this isn't an issue algorhitmically speaking, but I have no idea how to go about doing this with excel formulas.
PS: I know this is reltively simple vba issue but I would prefer formulas
PSS: I removed what I tried to make the question more readable.
You can use SUMPRODUCT, which return 0 for not found values:
=SUMPRODUCT(($C$4:$C$18=$I4)*($D$4:$D$18=J$3)*$E$4:$E$18)
or AGGREGATE, which returns an error that can be hidden by the IFERROR function:
=IFERROR(AGGREGATE(15,6,(1/(($C$4:$C$18=$I12)*($D$4:$D$18=J$3)))*$E$4:$E$18,1),"")
You can use SUMIFS here assuming you will not have exact duplicate combinations of [i, j]. If you did have a duplicate combination, the amounts will be summed and placed in the corresponding cell
In cell B2 place this equation: =SUMIFS($Q$2:$Q$16,$P$2:$P$16,B$1,$O$2:$O$16,$A2) and drag across and over as needed
IF you want to convert the 0's to blanks you can nest the above formula inside a text formatter like so:
=TEXT([formula], "0;-0;;#")

How to find common elements in string cells?

I want to find the common elements in multiple (>=2) cell arrays of strings.
A related question is here, and the answer proposes to use the function intersect(), however it works for only 2 inputs.
In my case, I have more than two cells, and I want to obtain a single common subset. Here is an example of what I want to achieve:
c1 = {'a','b','c','d'}
c2 = {'b','c','d'}
c3 = {'c','d'}
c_common = my_fun({c1,c2,c3});
in the end, I want c_common={'c','d'}, since only these two strings occur in all the inputs.
How can I do this with MATLAB?
Thanks in advance,
P.S. I also need the indices from each input, but I can probably do that myself using the output c_common, so not necessary in the answer. But if anyone wants to tackle that too, my actual output will be like this:
[c_common, indices] = my_fun({c1,c2,c3});
where indices = {[3,4], [2,3], [1,2]} for this case.
Thanks,
Listed in this post is a vectorized approach to give us the common strings and indices using unique and accumarray. This would work even when the strings are not sorted within each cell array to give us indices corresponding to their positions within it, but they have to be unique. Please have a look at the sample input, output section* to see such a case run. Here's the implementation -
C = {c1,c2,c3}; % Add more cell arrays here
% Get unique strings and ID each of the strings based on their uniqueness
[unqC,~,unqID] = unique([C{:}]);
% Get count of each ID and the IDs that have counts equal to the number of
% cells arrays in C indicate that they are present in all cell arrays and
% thus are the ones to be finally selected
match_ID = find(accumarray(unqID(:),1)==numel(C));
common_str = unqC(match_ID)
% ------------ Additional work to get indices ----------------
N_str = numel(common_str);
% Store matches as a logical array to be used at later stages
matches = ismember(unqID,match_ID);
% Use ismember to find all those indices in unqID and subtract group
% lengths from them to give us the indices within each cell array
clens = [0 cumsum(cellfun('length',C(1:end-1)))];
match_index = reshape(find(matches),N_str,[]);
% Sort match_index along each column based on the respective unqID elements
[m,n] = size(match_index);
[~,sidx] = sort(reshape(unqID(matches),N_str,[]),1);
sorted_match_index = match_index(bsxfun(#plus,sidx,(0:n-1)*m));
% Subtract cumulative group lens to give us indices corres. to each cell array
common_idx = bsxfun(#minus,sorted_match_index,clens).'
Please note that at the step that calculates match_ID : accumarray(unqID(:),1) could be replaced by histc(unqID,1:max(unqID)). Also, histcounts be another alternative there.
*Sample input, output -
c1 =
'a' 'b' 'c' 'd'
c2 =
'b' 'c' 'a' 'd'
c3 =
'c' 'd' 'a'
common_str =
'a' 'c' 'd'
common_idx =
1 3 4
3 2 4
3 1 2
As noted in the comments to this question, there is a file in File Exchange called "MINTERSECT -- Multiple set intersection." at http://www.mathworks.com/matlabcentral/fileexchange/6144-mintersect-multiple-set-intersection that contains simple code to generalize intersect to multiple sets. In a nutshell, the code gets the output from performing intersect on the first pair of cells and then perform intersect on this output with the next cell. This process continues until all cells have been compared. Note that the author points out that the code is not particularly efficient but it may be sufficient for your use case.

How to extract excel column and import it to MATLAB?

Hello I have an excel file with multiple columns up to "CD". My code words perfectly for excel files with 26 columns but after that it doesn't work.
[ia ib] = ismember(header, {item});
letter = find(ia)+'A'-1;
cell = fprintf('%c:%c', letter, letter);
out = xlsread('filename', cell);
This code works until I get to Z:Z. When I get to AA, AB, AC,... it won't work. How do I extract the AA, CD, BG columns?
It doesn't work because you are assuming that your letter for the header is only one character as indicated by:
letter = find(ia) + 'A' - 1;
What are you doing is essentially building the ASCII code for a capital letter between A to Z. This will obviously fail if you are trying to find a header with more than one letter. What you'll need to do is build a dictionary of all possible characters of AA to ZZ, then you can use the output of find(ia) on this dictionary if we exceed the column Z in your Excel sheet to extract out the right sequence of characters you need, then finally use this sequence of characters to index into your Excel sheet.
Referencing this question, I'm going to take Rody Oldenhuis's answer. Therefore, construct this dictionary of all possible two characters:
x = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ';
y = unique(nchoosek(repmat(x, 1,26), 2), 'rows');
y will be a N x 2 character matrix where each row is a unique permutation of two letters from A-Z (so AA, AB, etc.). The way the code is written, it should maintain the exact ordering like how Excel does it for columns that go beyond Z, so AA, AB, AC, ... AZ, BA, BB, BC, ... BZ, ..., ZX, ZY, ZZ. Next, we need to see whether or not the found index is between 1 and 26. If it is, you can use your previous code. If it isn't, then we'll do what we outlined above. Note that I will have to subtract this found index by 26 so I can index into this character array that we created. Assuming that header has all unique entries, we can do:
[ia ib] = ismember(header, {item});
index = find(ia, 1);
if index <= 26 %// Check if we are within columns A - Z
letter = index + 'A' - 1;
else %// If not, we are at a column that is beyond Z.
x = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ';
y = unique(nchoosek(repmat(x, 1,26), 2), 'rows');
index = index - 26; %// Subtract by 26 to reference into character array
letter = y(index,:);
end
cell = sprintf('%s:%s', letter, letter);
out = xlsread('filename', cell);
Note that I changed your fprintf call to sprintf as you desire to store the string representation of which cells you want to access. fprintf (in your case) will print to the screen, which is probably not what you want. Also, I've changed the variable cell to ce as cell is an actual function in MATLAB.
Also note that I've changed the %c formatting string to %s as the header may consist of more than one character.

Counting the occurence of substrings in matlab

I have a cell, something like this P= {Face1 Face6 Scene6 Both9 Face9 Scene11 Both12 Face15}. I would like to count how many Face values, Scene values, Both values in P. I don't care about the numeric values after the string (i.e., Face1 and Face23 would be counted as two). I've tried the following (for the Face) but I got the error "If any of the input arguments are cell arrays, the first must be a cell array of strings and the second must be a character array".
strToSearch='Face';
numel(strfind(P,strToSearch));
Does anyone have any suggestion? Thank you!
Use regexp to find strings that start (^) with the desired text (such as 'Face'). The result will be a cell array, where each cell contains 1 if there is a match, or [] otherwise. So determine if each cell is nonempty (~cellfun('isempty', ...): will give a logical 1 for nonempty cells, and 0 for empty cells), and sum the results (sum):
>> P = {'Face1' 'Face6' 'Scene6' 'Both9' 'Face9' 'Scene11' 'Both12' 'Face15'};
>> sum(~cellfun('isempty', regexp(P, '^Face')))
ans =
4
>> sum(~cellfun('isempty', regexp(P, '^Scene')))
ans =
2
Your example should work with some small tweaks, provided all of P contains strings, but may give the error you get if there are any non-string values in the cell array.
P= {'Face1' 'Face6' 'Scene6' 'Both9' 'Face9' 'Scene11' 'Both12' 'Face15'};
strToSearch='Face';
n = strfind(P,strToSearch);
numel([n{:}])
(returns 4)

Union of cell array of cells

I'm looking for the way to do the union of two cell arrays of cell arrays of strings. For example:
A = {{'one' 'two'};{'three' 'four'};{'five' 'six'}};
B = {{'five' 'six'};{'seven' 'eight'};{'nine' 'ten'}};
And I'd like to get something like:
C = {{'one' 'two'};{'three' 'four'};{'five' 'six'};{'seven' 'eight'};{'nine' 'ten'}};
But when I use C = union(A, B) MATLAB returns an error saying:
Input A of class cell and input B of class cell must be cell arrays of strings, unless one is a string.
Does anyone know how to do something like this in a hopefully simple way? I'd greatly appreciate it.
ALTERNATIVE: A way to have a cell array of separated strings in any other way than a cell array of cell array of strings would be also useful, but as far as I know, it's not possible.
Thank you!
C=[A;B]
allWords=unique([A{:};B{:}])
F=cell2mat(cellfun(#(x)(ismember(allWords,x{1})+2*ismember(allWords,x{2}))',C,'uni',false))
[~,uniqueindices,~]=unique(F,'rows')
C(sort(uniqueindices))
What my code does: it builds up a list of all words allwords, then this list is used to build up a matrix which contains the correlation between the rows and which word they contain. 1=Match for first wird, 2=Match for second word. Finally, on this numeric matrix unique can be applied to get the indices.
Including my update, now the 2 words per cell is hardcoded. To get rid of this limitation it would be neseccary to replace the anonymous function (#(x)(ismember(allWords,x{1})+2*ismember(allWords,x{2}))) with a more generic implementation. Probably using cellfun again.
Union doesn't seem like compatible for cell arrays of cells. So, we need to look for some workaround.
One approach would be to get the data from A and B concatenated vertically. Then, along each column assign each cell of strings an unique ID. Those IDs can then be combined into a double array that opens up the possibility of of using unique with 'rows' option to get us the desired output. This is precisely achieved here.
%// Slightly complicated input for safest verification of results
A = {{'three' 'four'};
{'five' 'six'};
{'five' 'seven'};
{'one' 'two'}};
B = {{'seven' 'eight'};
{'five' 'six'};
{'nine' 'ten'};
{'three' 'six'};};
t1 = [A ; B] %// concatenate all cells from A and B vertically
t2 = vertcat(t1{:}) %// Get all the cells of strings from A and B
t22 = mat2cell(t2,size(t2,1),ones(1,size(t2,2)));
[~,~,row_ind] = cellfun(#(x) unique(x,'stable'),t22,'uni',0)
mat1 = horzcat(row_ind{:})
[~,ind] = unique(mat1,'rows','stable')
out1 = t2(ind,:) %// output as a cell array of strings, used for verification too
out = mat2cell(out1, ones(1,size(out1,1)),size(out1,2)) %//desired output
Output -
out1 =
'three' 'four'
'five' 'six'
'five' 'seven'
'one' 'two'
'seven' 'eight'
'nine' 'ten'
'three' 'six'

Resources