Union of cell array of cells - string

I'm looking for the way to do the union of two cell arrays of cell arrays of strings. For example:
A = {{'one' 'two'};{'three' 'four'};{'five' 'six'}};
B = {{'five' 'six'};{'seven' 'eight'};{'nine' 'ten'}};
And I'd like to get something like:
C = {{'one' 'two'};{'three' 'four'};{'five' 'six'};{'seven' 'eight'};{'nine' 'ten'}};
But when I use C = union(A, B) MATLAB returns an error saying:
Input A of class cell and input B of class cell must be cell arrays of strings, unless one is a string.
Does anyone know how to do something like this in a hopefully simple way? I'd greatly appreciate it.
ALTERNATIVE: A way to have a cell array of separated strings in any other way than a cell array of cell array of strings would be also useful, but as far as I know, it's not possible.
Thank you!

C=[A;B]
allWords=unique([A{:};B{:}])
F=cell2mat(cellfun(#(x)(ismember(allWords,x{1})+2*ismember(allWords,x{2}))',C,'uni',false))
[~,uniqueindices,~]=unique(F,'rows')
C(sort(uniqueindices))
What my code does: it builds up a list of all words allwords, then this list is used to build up a matrix which contains the correlation between the rows and which word they contain. 1=Match for first wird, 2=Match for second word. Finally, on this numeric matrix unique can be applied to get the indices.
Including my update, now the 2 words per cell is hardcoded. To get rid of this limitation it would be neseccary to replace the anonymous function (#(x)(ismember(allWords,x{1})+2*ismember(allWords,x{2}))) with a more generic implementation. Probably using cellfun again.

Union doesn't seem like compatible for cell arrays of cells. So, we need to look for some workaround.
One approach would be to get the data from A and B concatenated vertically. Then, along each column assign each cell of strings an unique ID. Those IDs can then be combined into a double array that opens up the possibility of of using unique with 'rows' option to get us the desired output. This is precisely achieved here.
%// Slightly complicated input for safest verification of results
A = {{'three' 'four'};
{'five' 'six'};
{'five' 'seven'};
{'one' 'two'}};
B = {{'seven' 'eight'};
{'five' 'six'};
{'nine' 'ten'};
{'three' 'six'};};
t1 = [A ; B] %// concatenate all cells from A and B vertically
t2 = vertcat(t1{:}) %// Get all the cells of strings from A and B
t22 = mat2cell(t2,size(t2,1),ones(1,size(t2,2)));
[~,~,row_ind] = cellfun(#(x) unique(x,'stable'),t22,'uni',0)
mat1 = horzcat(row_ind{:})
[~,ind] = unique(mat1,'rows','stable')
out1 = t2(ind,:) %// output as a cell array of strings, used for verification too
out = mat2cell(out1, ones(1,size(out1,1)),size(out1,2)) %//desired output
Output -
out1 =
'three' 'four'
'five' 'six'
'five' 'seven'
'one' 'two'
'seven' 'eight'
'nine' 'ten'
'three' 'six'

Related

How to find common elements in string cells?

I want to find the common elements in multiple (>=2) cell arrays of strings.
A related question is here, and the answer proposes to use the function intersect(), however it works for only 2 inputs.
In my case, I have more than two cells, and I want to obtain a single common subset. Here is an example of what I want to achieve:
c1 = {'a','b','c','d'}
c2 = {'b','c','d'}
c3 = {'c','d'}
c_common = my_fun({c1,c2,c3});
in the end, I want c_common={'c','d'}, since only these two strings occur in all the inputs.
How can I do this with MATLAB?
Thanks in advance,
P.S. I also need the indices from each input, but I can probably do that myself using the output c_common, so not necessary in the answer. But if anyone wants to tackle that too, my actual output will be like this:
[c_common, indices] = my_fun({c1,c2,c3});
where indices = {[3,4], [2,3], [1,2]} for this case.
Thanks,
Listed in this post is a vectorized approach to give us the common strings and indices using unique and accumarray. This would work even when the strings are not sorted within each cell array to give us indices corresponding to their positions within it, but they have to be unique. Please have a look at the sample input, output section* to see such a case run. Here's the implementation -
C = {c1,c2,c3}; % Add more cell arrays here
% Get unique strings and ID each of the strings based on their uniqueness
[unqC,~,unqID] = unique([C{:}]);
% Get count of each ID and the IDs that have counts equal to the number of
% cells arrays in C indicate that they are present in all cell arrays and
% thus are the ones to be finally selected
match_ID = find(accumarray(unqID(:),1)==numel(C));
common_str = unqC(match_ID)
% ------------ Additional work to get indices ----------------
N_str = numel(common_str);
% Store matches as a logical array to be used at later stages
matches = ismember(unqID,match_ID);
% Use ismember to find all those indices in unqID and subtract group
% lengths from them to give us the indices within each cell array
clens = [0 cumsum(cellfun('length',C(1:end-1)))];
match_index = reshape(find(matches),N_str,[]);
% Sort match_index along each column based on the respective unqID elements
[m,n] = size(match_index);
[~,sidx] = sort(reshape(unqID(matches),N_str,[]),1);
sorted_match_index = match_index(bsxfun(#plus,sidx,(0:n-1)*m));
% Subtract cumulative group lens to give us indices corres. to each cell array
common_idx = bsxfun(#minus,sorted_match_index,clens).'
Please note that at the step that calculates match_ID : accumarray(unqID(:),1) could be replaced by histc(unqID,1:max(unqID)). Also, histcounts be another alternative there.
*Sample input, output -
c1 =
'a' 'b' 'c' 'd'
c2 =
'b' 'c' 'a' 'd'
c3 =
'c' 'd' 'a'
common_str =
'a' 'c' 'd'
common_idx =
1 3 4
3 2 4
3 1 2
As noted in the comments to this question, there is a file in File Exchange called "MINTERSECT -- Multiple set intersection." at http://www.mathworks.com/matlabcentral/fileexchange/6144-mintersect-multiple-set-intersection that contains simple code to generalize intersect to multiple sets. In a nutshell, the code gets the output from performing intersect on the first pair of cells and then perform intersect on this output with the next cell. This process continues until all cells have been compared. Note that the author points out that the code is not particularly efficient but it may be sufficient for your use case.

How to print a number within a string in matlab

I would like to use the command text to type numbers within 57 hexagons. I want to use a loop:
for mm=1:57
text(x(m),y(m),'m')
end
where x(m) and y(m) are the coordinates of the text .
The script above types the string "m" and not the value of m. What am I doing wrong?
Jubobs pretty much told you how to do it. Use the num2str function. BTW, small typo in your for loop. You mean to use mm:
for mm=1:57
text(x(mm),y(mm),num2str(mm));
end
The reason why I've even decided to post an answer is because you can do this vectorized without a loop, which I'd also like to write an answer for. What you can do place each number into a character array where each row denotes a unique number, and you can use text to print out all numbers simultaneously.
m = sprintfc('%2d', 1:57);
d = reshape([m{:}], 2, 57).';
text(x, y, d);
The (undocumented!) function sprintfc takes a formatting specifier and an array and creates a cell array of strings where each cell is the string version of each element in the array you supply. In order to ensure that the character array has the same number of columns per row, I ensure that each string takes up 2 characters, and so any number less than 10 will have a blank space at the beginning. I then convert the cell array of strings into a character array by converting the cell array into a comma-separated list of strings and I reshape the matrix into an acceptable form, and then I call text with all of the pairs of x and y, with the corresponding labels in m together on the screen.

Find string (from table) in cell in matlab

I want to find the location of one string (which I take it from a table) inside of a cell:
A is my table, and B is the cell.
I have tested :
strncmp(A(1,8),B(:,1),1)
but it couldn't find the location.
I have tested many commands like:
ismember,strmatch,find(strcmp),find(strcmpi)find(ismember),strfind and etc ... but they all give me errors mostly because of the type of my data !
So please suggest me a solution.
You want strfind:
>> strfind('0123abcdefgcde', 'cde')
ans =
7 12
If A is a table and B a cell array, you need to index this way:
strfind(B{1}, A.VarName{1});
For example:
>> A = cell2table({'cde'},'VariableNames',{'VarName'}); %// create A as table
>> B = {'0123abcdefgcde'}; %// create B as cell array of strings
>> strfind(B{1}, A.VarName{1})
ans =
7 12
Luis Mendo's answer is absolotely correct, but I want to add some general information.
Your problem is that all the functions you tried (strfind, ...) only work for normal strings, but not for cell array. The way you index your A and B in your code snippet they still stay a cell array (of dimension (1,1)). You need to use curly brackets {} to "get rid of" the cell array and get the containign string. Luis Mendo shows how to do this.
Modified solution from a Mathworks forum, for the case of a single-column table with ragged strings
find(strcmp('mystring',mytable{:,:}))
will give you the row number.

Counting the occurence of substrings in matlab

I have a cell, something like this P= {Face1 Face6 Scene6 Both9 Face9 Scene11 Both12 Face15}. I would like to count how many Face values, Scene values, Both values in P. I don't care about the numeric values after the string (i.e., Face1 and Face23 would be counted as two). I've tried the following (for the Face) but I got the error "If any of the input arguments are cell arrays, the first must be a cell array of strings and the second must be a character array".
strToSearch='Face';
numel(strfind(P,strToSearch));
Does anyone have any suggestion? Thank you!
Use regexp to find strings that start (^) with the desired text (such as 'Face'). The result will be a cell array, where each cell contains 1 if there is a match, or [] otherwise. So determine if each cell is nonempty (~cellfun('isempty', ...): will give a logical 1 for nonempty cells, and 0 for empty cells), and sum the results (sum):
>> P = {'Face1' 'Face6' 'Scene6' 'Both9' 'Face9' 'Scene11' 'Both12' 'Face15'};
>> sum(~cellfun('isempty', regexp(P, '^Face')))
ans =
4
>> sum(~cellfun('isempty', regexp(P, '^Scene')))
ans =
2
Your example should work with some small tweaks, provided all of P contains strings, but may give the error you get if there are any non-string values in the cell array.
P= {'Face1' 'Face6' 'Scene6' 'Both9' 'Face9' 'Scene11' 'Both12' 'Face15'};
strToSearch='Face';
n = strfind(P,strToSearch);
numel([n{:}])
(returns 4)

Finding but not removing duplicates in Matlab

I have a big array in Matlab like this:
A =
{1x5 cell}
{1x7 cell}
{1x27 cell}
{1x11 cell}
...
where the cells look like this:
C{1}
ans =
'apple' 'banana' 'kiwi' 'orange'
I want to find where in A find cells containing double information, like:
C{27}
ans =
'turtle' 'kiwi' 'fox' 'badger'
I.e. here I want to see if C(1) and C(27) has a duplicate word 'kiwi'.
So I can manually look at them and decide where I should remove the duplicate where I see fit.
Sorry I'm not going to provide a coded solution, more the process I'd use so that you can start coding, if you then have any specific problems fell free to post an question
I would use nchoosek to generate an array of all the permutations of the cell array C so
nCells = length(C);
nPerms = nchoosek(1:nCells,2);
You can then loop over all the permutations using intersect to see if there are common strings.
result(i) = intersect(C{nPerms(i,1)},C{nPerms(i,2)});
This will give you an array listing all common strings and with the nPerms array you'll have the two rows with the common string. However if you try to run this it will fail as intersect likes to have the same number of element in each cell array.
So I'd create a temporary cell array padded out with blank cells so that each element in C is the same length, prior to the loop.
This will calculate the longest cell in the array C by calculating the number of elements (#numel) in each cell, followed by calculating the maximum.
cSize = cellfun(#numel,C);
maxSize = max(cSize);
We can then define a function to pad out blank cells
fcn = #(x) [x cell(1,maxSize - numel(x))];
paddedC = cellfun(fcn,C,'UniformOutput',false);
This should give you a cell array with same number of elements in each cell. You can then use this cell array in your loop testing each permutation.
No doubt someone will turn up with a one line cellfun solution but I hope that this is enough to get you started.

Resources