I have a cell, something like this P= {Face1 Face6 Scene6 Both9 Face9 Scene11 Both12 Face15}. I would like to count how many Face values, Scene values, Both values in P. I don't care about the numeric values after the string (i.e., Face1 and Face23 would be counted as two). I've tried the following (for the Face) but I got the error "If any of the input arguments are cell arrays, the first must be a cell array of strings and the second must be a character array".
strToSearch='Face';
numel(strfind(P,strToSearch));
Does anyone have any suggestion? Thank you!
Use regexp to find strings that start (^) with the desired text (such as 'Face'). The result will be a cell array, where each cell contains 1 if there is a match, or [] otherwise. So determine if each cell is nonempty (~cellfun('isempty', ...): will give a logical 1 for nonempty cells, and 0 for empty cells), and sum the results (sum):
>> P = {'Face1' 'Face6' 'Scene6' 'Both9' 'Face9' 'Scene11' 'Both12' 'Face15'};
>> sum(~cellfun('isempty', regexp(P, '^Face')))
ans =
4
>> sum(~cellfun('isempty', regexp(P, '^Scene')))
ans =
2
Your example should work with some small tweaks, provided all of P contains strings, but may give the error you get if there are any non-string values in the cell array.
P= {'Face1' 'Face6' 'Scene6' 'Both9' 'Face9' 'Scene11' 'Both12' 'Face15'};
strToSearch='Face';
n = strfind(P,strToSearch);
numel([n{:}])
(returns 4)
Related
I want to find the common elements in multiple (>=2) cell arrays of strings.
A related question is here, and the answer proposes to use the function intersect(), however it works for only 2 inputs.
In my case, I have more than two cells, and I want to obtain a single common subset. Here is an example of what I want to achieve:
c1 = {'a','b','c','d'}
c2 = {'b','c','d'}
c3 = {'c','d'}
c_common = my_fun({c1,c2,c3});
in the end, I want c_common={'c','d'}, since only these two strings occur in all the inputs.
How can I do this with MATLAB?
Thanks in advance,
P.S. I also need the indices from each input, but I can probably do that myself using the output c_common, so not necessary in the answer. But if anyone wants to tackle that too, my actual output will be like this:
[c_common, indices] = my_fun({c1,c2,c3});
where indices = {[3,4], [2,3], [1,2]} for this case.
Thanks,
Listed in this post is a vectorized approach to give us the common strings and indices using unique and accumarray. This would work even when the strings are not sorted within each cell array to give us indices corresponding to their positions within it, but they have to be unique. Please have a look at the sample input, output section* to see such a case run. Here's the implementation -
C = {c1,c2,c3}; % Add more cell arrays here
% Get unique strings and ID each of the strings based on their uniqueness
[unqC,~,unqID] = unique([C{:}]);
% Get count of each ID and the IDs that have counts equal to the number of
% cells arrays in C indicate that they are present in all cell arrays and
% thus are the ones to be finally selected
match_ID = find(accumarray(unqID(:),1)==numel(C));
common_str = unqC(match_ID)
% ------------ Additional work to get indices ----------------
N_str = numel(common_str);
% Store matches as a logical array to be used at later stages
matches = ismember(unqID,match_ID);
% Use ismember to find all those indices in unqID and subtract group
% lengths from them to give us the indices within each cell array
clens = [0 cumsum(cellfun('length',C(1:end-1)))];
match_index = reshape(find(matches),N_str,[]);
% Sort match_index along each column based on the respective unqID elements
[m,n] = size(match_index);
[~,sidx] = sort(reshape(unqID(matches),N_str,[]),1);
sorted_match_index = match_index(bsxfun(#plus,sidx,(0:n-1)*m));
% Subtract cumulative group lens to give us indices corres. to each cell array
common_idx = bsxfun(#minus,sorted_match_index,clens).'
Please note that at the step that calculates match_ID : accumarray(unqID(:),1) could be replaced by histc(unqID,1:max(unqID)). Also, histcounts be another alternative there.
*Sample input, output -
c1 =
'a' 'b' 'c' 'd'
c2 =
'b' 'c' 'a' 'd'
c3 =
'c' 'd' 'a'
common_str =
'a' 'c' 'd'
common_idx =
1 3 4
3 2 4
3 1 2
As noted in the comments to this question, there is a file in File Exchange called "MINTERSECT -- Multiple set intersection." at http://www.mathworks.com/matlabcentral/fileexchange/6144-mintersect-multiple-set-intersection that contains simple code to generalize intersect to multiple sets. In a nutshell, the code gets the output from performing intersect on the first pair of cells and then perform intersect on this output with the next cell. This process continues until all cells have been compared. Note that the author points out that the code is not particularly efficient but it may be sufficient for your use case.
I would like to use the command text to type numbers within 57 hexagons. I want to use a loop:
for mm=1:57
text(x(m),y(m),'m')
end
where x(m) and y(m) are the coordinates of the text .
The script above types the string "m" and not the value of m. What am I doing wrong?
Jubobs pretty much told you how to do it. Use the num2str function. BTW, small typo in your for loop. You mean to use mm:
for mm=1:57
text(x(mm),y(mm),num2str(mm));
end
The reason why I've even decided to post an answer is because you can do this vectorized without a loop, which I'd also like to write an answer for. What you can do place each number into a character array where each row denotes a unique number, and you can use text to print out all numbers simultaneously.
m = sprintfc('%2d', 1:57);
d = reshape([m{:}], 2, 57).';
text(x, y, d);
The (undocumented!) function sprintfc takes a formatting specifier and an array and creates a cell array of strings where each cell is the string version of each element in the array you supply. In order to ensure that the character array has the same number of columns per row, I ensure that each string takes up 2 characters, and so any number less than 10 will have a blank space at the beginning. I then convert the cell array of strings into a character array by converting the cell array into a comma-separated list of strings and I reshape the matrix into an acceptable form, and then I call text with all of the pairs of x and y, with the corresponding labels in m together on the screen.
I'm looking for the way to do the union of two cell arrays of cell arrays of strings. For example:
A = {{'one' 'two'};{'three' 'four'};{'five' 'six'}};
B = {{'five' 'six'};{'seven' 'eight'};{'nine' 'ten'}};
And I'd like to get something like:
C = {{'one' 'two'};{'three' 'four'};{'five' 'six'};{'seven' 'eight'};{'nine' 'ten'}};
But when I use C = union(A, B) MATLAB returns an error saying:
Input A of class cell and input B of class cell must be cell arrays of strings, unless one is a string.
Does anyone know how to do something like this in a hopefully simple way? I'd greatly appreciate it.
ALTERNATIVE: A way to have a cell array of separated strings in any other way than a cell array of cell array of strings would be also useful, but as far as I know, it's not possible.
Thank you!
C=[A;B]
allWords=unique([A{:};B{:}])
F=cell2mat(cellfun(#(x)(ismember(allWords,x{1})+2*ismember(allWords,x{2}))',C,'uni',false))
[~,uniqueindices,~]=unique(F,'rows')
C(sort(uniqueindices))
What my code does: it builds up a list of all words allwords, then this list is used to build up a matrix which contains the correlation between the rows and which word they contain. 1=Match for first wird, 2=Match for second word. Finally, on this numeric matrix unique can be applied to get the indices.
Including my update, now the 2 words per cell is hardcoded. To get rid of this limitation it would be neseccary to replace the anonymous function (#(x)(ismember(allWords,x{1})+2*ismember(allWords,x{2}))) with a more generic implementation. Probably using cellfun again.
Union doesn't seem like compatible for cell arrays of cells. So, we need to look for some workaround.
One approach would be to get the data from A and B concatenated vertically. Then, along each column assign each cell of strings an unique ID. Those IDs can then be combined into a double array that opens up the possibility of of using unique with 'rows' option to get us the desired output. This is precisely achieved here.
%// Slightly complicated input for safest verification of results
A = {{'three' 'four'};
{'five' 'six'};
{'five' 'seven'};
{'one' 'two'}};
B = {{'seven' 'eight'};
{'five' 'six'};
{'nine' 'ten'};
{'three' 'six'};};
t1 = [A ; B] %// concatenate all cells from A and B vertically
t2 = vertcat(t1{:}) %// Get all the cells of strings from A and B
t22 = mat2cell(t2,size(t2,1),ones(1,size(t2,2)));
[~,~,row_ind] = cellfun(#(x) unique(x,'stable'),t22,'uni',0)
mat1 = horzcat(row_ind{:})
[~,ind] = unique(mat1,'rows','stable')
out1 = t2(ind,:) %// output as a cell array of strings, used for verification too
out = mat2cell(out1, ones(1,size(out1,1)),size(out1,2)) %//desired output
Output -
out1 =
'three' 'four'
'five' 'six'
'five' 'seven'
'one' 'two'
'seven' 'eight'
'nine' 'ten'
'three' 'six'
Why does =FALSE<10000000000 evaluate as FALSE and =FALSE>10000000000 evaluate as TRUE? I have tried some different numbers and this seems to always be the case.
This is by design. Search help for "Troubleshoot Sort" to see the default sort order.
In an ascending sort, Microsoft Excel uses the following order.
Numbers: Numbers are sorted from the smallest negative number to the largest positive number.
Alphanumeric sort: When you sort alphanumeric text, Excel sorts left to right, character by character. For example, if a cell contains the text "A100," Excel places the cell after a cell that contains the entry "A1" and before a cell that contains the entry "A11."
Text and text that includes numbers are sorted in the following order:
0 1 2 3 4 5 6 7 8 9 (space) ! " # $ % & ( ) * , . / : ; ? # [ \ ] ^ _ ` { | } ~ + < = > A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Apostrophes (') and hyphens (-) are ignored, with one exception: If two text strings are the same except for a hyphen, the text with the hyphen is sorted last.
Logical values: In logical values, FALSE is placed before TRUE.
Error values: All error values are equal.
Blanks: Blanks are always placed last.
The default sort order matters because that is how Excel was designed to compare different data types. Logical values are always after text and numbers. Error values are always after that. Blanks are always last. When you use comparison operators (<, <=, =, etc.) it uses the same comparison algorithm as the sort (or more likely, the sort alogrithm uses the comparison operator code, which makes them identical).
TRUE<>1 according to the sort order, but --TRUE=1. The formula parser recognized that you're trying to negate something. If it's a Boolean value, it converts it to 0 or 1. There's nothing 0-ish or 1-ish about the Boolean value, it's just the result of an internal Type Coercion function. If you type --"SomeString" it does the same thing. It sends the string into the Type Coercion function that reports back 'Unable to coerce' and ends up as #VALUE! in the cell.
That's the 'Why it behaves that way' answer. I don't know the 'Why did they design it that way' answer.
Obviously the boolean TRUE/FALSE are different data types to numbers. Check this (http://msdn.microsoft.com/en-us/library/office/bb687869.aspx) to see that boolean variables are stored in 2-byte (or whatever a short integer is for a certain architecture). However this is the memory where the data is stored, because excel actually has a special data class for boolean vars. Specifically: xltypeNum for numbers, xltypeStr for strings, and xltypeBool for what we discuss.
The relations between same types is clear, now what TRUE<1000 does?? probably nothing meaningful-useful.
Ways to overcome this issue:
=ABS(BOOLEAN_VAR), i.e. =ABS(FALSE) --> 0 and =ABS(TRUE) --> 1
or
=INT(BOOLEAN_VAR), i.e. =INT(FALSE) --> 0 and =INT(TRUE) --> 1
or
=BOOLEAN_VAR*1, i.e. =FALSE*1 --> 0 and =TRUE*1 --> 1
or
=+BOOLEAN_VAR, i.e. =+FALSE --> 0 and =+TRUE --> 1
As you see in these ways you force excel to output a numeric type of data, either by providing the boolean into a function or using the boolean var in an expression.
I have a big array in Matlab like this:
A =
{1x5 cell}
{1x7 cell}
{1x27 cell}
{1x11 cell}
...
where the cells look like this:
C{1}
ans =
'apple' 'banana' 'kiwi' 'orange'
I want to find where in A find cells containing double information, like:
C{27}
ans =
'turtle' 'kiwi' 'fox' 'badger'
I.e. here I want to see if C(1) and C(27) has a duplicate word 'kiwi'.
So I can manually look at them and decide where I should remove the duplicate where I see fit.
Sorry I'm not going to provide a coded solution, more the process I'd use so that you can start coding, if you then have any specific problems fell free to post an question
I would use nchoosek to generate an array of all the permutations of the cell array C so
nCells = length(C);
nPerms = nchoosek(1:nCells,2);
You can then loop over all the permutations using intersect to see if there are common strings.
result(i) = intersect(C{nPerms(i,1)},C{nPerms(i,2)});
This will give you an array listing all common strings and with the nPerms array you'll have the two rows with the common string. However if you try to run this it will fail as intersect likes to have the same number of element in each cell array.
So I'd create a temporary cell array padded out with blank cells so that each element in C is the same length, prior to the loop.
This will calculate the longest cell in the array C by calculating the number of elements (#numel) in each cell, followed by calculating the maximum.
cSize = cellfun(#numel,C);
maxSize = max(cSize);
We can then define a function to pad out blank cells
fcn = #(x) [x cell(1,maxSize - numel(x))];
paddedC = cellfun(fcn,C,'UniformOutput',false);
This should give you a cell array with same number of elements in each cell. You can then use this cell array in your loop testing each permutation.
No doubt someone will turn up with a one line cellfun solution but I hope that this is enough to get you started.