I use intersect function to find common strings in two cell arrays A and B ([~,idx]=intersect(A,B))and save indexes in idx. Then I extract the common strings by A(idx). I see that the results are sorted in alphabetic order. I want to sort them as they sorted in A, Why these strings sorted in alphabetic order?
Thanks.
As explained in the documentation, you can add the option setOrder='stable' to preserve the order of the elements:
[C,ia,ib] = intersect(A,B,'stable');
You don't even have to capture the indices (unless used elsewhere), as the example shows:
C = intersect([7 0 5],[7 1 5],'stable')
returns
C = [7 5]
and
A='hgfedcba';
B='hac';
[~,ia]=intersect(A,B,'stable');
ia'
> 1 6 8
A(ia)
> hca
For Matlab R2011b and older:
If your matlab version doesn't support the 'stable' option, you can just use sort on the indices:
[~,ia]=intersect(A,B);
ia=sort(ia);
A(ia)
> 1 6 8
A(sort(ia))
Duplicates
If they're duplicates in A, intersect will only find them once. ismember might be better suited if you want to find all of the duplicates:
A='hhggffeeddccbbaa';
B='hac';
[~,ia]=intersect(A,B);
ia=sort(ia);
A(ia)
> hca
[~,loc] = ismember(A,B);
ia=find(loc~=0); % because you want the indices (logical indexing is also an option of course)
A(ia)
> hhccaa
Related
List 1 = ['_','_','_','a','_']
List 2 = ['d','_','_','_','_']
I'd like to merge two lists of identical length where:
Alphabets in List 2 must replace special characters in List 1 but
Special characters in List 2 must not replace alphabets in List 1.
The merged list would look like this:
Merged List = ['d','_','_','a','_']
Any tips on the fastest way to accomplish this would be much appreciated!
You can use enumerate and list comprehension:
mergedlist = [c if c.isalnum() else list2[i] for i, c in enumerate(list1)]
NB/ I have used isalnum for making the distinction with "special" characters. Depending on your definition of what makes a character special, you may need a different function to do that.
So there are a bunch of ways to reverse a list to turn it into a new list, using helper columns. I've written some code that does use helper columns to reverse a list and then use it.
I'm wondering how I would reverse a list without using a helper column for use in a sumproduct - for example,
=sumproduct(Reverse(A1:A3),B1:B3)
This array formula will reverse the order of a vertical array:
= INDEX(B18:B21,N(IF({1},MAX(ROW(B18:B21))-ROW(B18:B21)+1)))
Also, this reverses a horizontal array:
= INDEX(A1:D1,N(IF({1},MAX(COLUMN(A1:D1))-COLUMN(A1:D1)+1)))
EDIT
More generally, to vertically flip a matrix instead of just an array (which is just a one-dimensional matrix), use this array formula: (e.g. for range A1:D2)
= INDEX(A1:D2,N(IF({1},MAX(ROW(A1:D2))-ROW(A1:D2)+1)),
N(IF({1},COLUMN(A1:D2)-MIN(COLUMN(A1:D2))+1)))
And to horizontally flip a matrix, use this:
= INDEX(A1:D2,N(IF({1},ROW(A1:D2)-MIN(ROW(A1:D2))+1)),
N(IF({1},MAX(COLUMN(A1:D2))-COLUMN(A1:D2)+1)))
And a bonus... to flip a matrix horizontally and vertically in one shot (i.e. rotate it 180 degrees):
= INDEX(A1:D2,N(IF({1},MAX(ROW(A1:D2))-ROW(A1:D2)+1)),
N(IF({1},MAX(COLUMN(A1:D2))-COLUMN(A1:D2)+1)))
Actually this last one here could more generally be used to flip either a horizontal or vertical array.
This will do what you are asking:
=SUMPRODUCT(INDEX(A:A,N(IF(1,{3;2;1}))),B1:B3)
To make a little more dynamic you can use this array formula:
=SUM(INDEX(A:A,N(IF(1,LARGE(ROW(A1:A3),ROW(A1:A3)))))*B1:B3)
Being an array formula, it needs to be confirmed with Ctrl-Shift-Enter, instead of Enter when exiting Edit mode.
Found an easy solution that works in the latest Excel versions:
=SORTBY(*rowarray*, column(*rowarray*),-1)
=SORTBY(*columnarray*, row(*columnarray*),-1)
For what it's worth, here's another completely different method to reverse an array. (I'm posting this as a separate answer just because it is apples and oranges to the other answer I already provided.)
Instead of reversing the order of the array by reversing the indexing, it is also possible to use matrix multiplication (MMULT) to accomplish this.
If your data in A1:A3 is {1;3;5} (for example) then the following matrix multiplication effectively reverses this array:
[0 0 1] [1] [5]
[0 1 0] * [3] = [3]
[1 0 0] [5] [1]
In order to generate that matrix of 1's and 0's above, you can do this (line break added for readability):
= (ROW(INDEX(A:A,1):INDEX(A:A,ROWS(A1:A3)))=
(COLUMN(INDEX(1:1,ROWS(A1:A3)))-COLUMN(INDEX(1:1,1):INDEX(1:1,ROWS(A1:A3)))+1))+0
So in the end, the formula to reverse this array would be:
= MMULT((ROW(INDEX(A:A,1):INDEX(A:A,ROWS(A1:A3)))=
(COLUMN(INDEX(1:1,ROWS(A1:A3)))-COLUMN(INDEX(1:1,1):INDEX(1:1,ROWS(A1:A3)))+1))+0,A1:A3)
This same line of thinking can be used to reverse a horizontal array. For example if A1:C1 is {1,3,5}, then:
[0 0 1]
[1 3 5] * [0 1 0] = [5 3 1]
[1 0 0]
Note how the matrix of 1's and 0's is the second argument this time instead of the first argument.
Using the same general line of reasoning, you can get to this formula to reverse a horizontal array.
= MMULT(A1:C1,(ROW(INDEX(A:A,1):INDEX(A:A,COLUMNS(A1:C1)))=
(COLUMN(INDEX(1:1,COLUMNS(A1:C1)))-COLUMN(INDEX(1:1,1):INDEX(1:1,COLUMNS(A1:C1)))+1))+0)
This method has two major disadvantages as compared two the N(IF(...)) solution, namely:
It's way longer.
It only works for numbers since MMULT requires numbers, but the other method works if the cells contain anything (e.g. text).
I was using this solution to reverse arrays without helper columns until just recently when I learned about the N(IF(...)) alternative.
Actually you can make the formula in your Question work (with a small UDF()):
Pick a cell and enter:
=SUMPRODUCT(reverse(A1:A3),B1:B3)
with this in a standard module:
Public Function reverse(rng As Range)
Dim ary(), N As Long, i As Long
N = rng.Count
ReDim ary(1 To N)
i = N
For Each r In rng
ary(i) = r.Value
i = i - 1
Next r
With Application.WorksheetFunction
reverse = .Transpose(ary)
End With
End Function
With Dynamic Arrays
This is the best solution I have found.
=SORTBY(list,SEQUENCE(ROWS(list),1,ROWS(list),-1))
https://exceljet.net/formula/reverse-a-list-or-range
=INDIRECT(ADDRESS(ROW()+COUNTA(A1:$A$3)+1-ROW(A1),1))
I want to find the common elements in multiple (>=2) cell arrays of strings.
A related question is here, and the answer proposes to use the function intersect(), however it works for only 2 inputs.
In my case, I have more than two cells, and I want to obtain a single common subset. Here is an example of what I want to achieve:
c1 = {'a','b','c','d'}
c2 = {'b','c','d'}
c3 = {'c','d'}
c_common = my_fun({c1,c2,c3});
in the end, I want c_common={'c','d'}, since only these two strings occur in all the inputs.
How can I do this with MATLAB?
Thanks in advance,
P.S. I also need the indices from each input, but I can probably do that myself using the output c_common, so not necessary in the answer. But if anyone wants to tackle that too, my actual output will be like this:
[c_common, indices] = my_fun({c1,c2,c3});
where indices = {[3,4], [2,3], [1,2]} for this case.
Thanks,
Listed in this post is a vectorized approach to give us the common strings and indices using unique and accumarray. This would work even when the strings are not sorted within each cell array to give us indices corresponding to their positions within it, but they have to be unique. Please have a look at the sample input, output section* to see such a case run. Here's the implementation -
C = {c1,c2,c3}; % Add more cell arrays here
% Get unique strings and ID each of the strings based on their uniqueness
[unqC,~,unqID] = unique([C{:}]);
% Get count of each ID and the IDs that have counts equal to the number of
% cells arrays in C indicate that they are present in all cell arrays and
% thus are the ones to be finally selected
match_ID = find(accumarray(unqID(:),1)==numel(C));
common_str = unqC(match_ID)
% ------------ Additional work to get indices ----------------
N_str = numel(common_str);
% Store matches as a logical array to be used at later stages
matches = ismember(unqID,match_ID);
% Use ismember to find all those indices in unqID and subtract group
% lengths from them to give us the indices within each cell array
clens = [0 cumsum(cellfun('length',C(1:end-1)))];
match_index = reshape(find(matches),N_str,[]);
% Sort match_index along each column based on the respective unqID elements
[m,n] = size(match_index);
[~,sidx] = sort(reshape(unqID(matches),N_str,[]),1);
sorted_match_index = match_index(bsxfun(#plus,sidx,(0:n-1)*m));
% Subtract cumulative group lens to give us indices corres. to each cell array
common_idx = bsxfun(#minus,sorted_match_index,clens).'
Please note that at the step that calculates match_ID : accumarray(unqID(:),1) could be replaced by histc(unqID,1:max(unqID)). Also, histcounts be another alternative there.
*Sample input, output -
c1 =
'a' 'b' 'c' 'd'
c2 =
'b' 'c' 'a' 'd'
c3 =
'c' 'd' 'a'
common_str =
'a' 'c' 'd'
common_idx =
1 3 4
3 2 4
3 1 2
As noted in the comments to this question, there is a file in File Exchange called "MINTERSECT -- Multiple set intersection." at http://www.mathworks.com/matlabcentral/fileexchange/6144-mintersect-multiple-set-intersection that contains simple code to generalize intersect to multiple sets. In a nutshell, the code gets the output from performing intersect on the first pair of cells and then perform intersect on this output with the next cell. This process continues until all cells have been compared. Note that the author points out that the code is not particularly efficient but it may be sufficient for your use case.
I have a n x 1 cell that contains something like this:
chair
chair
chair
chair
table
table
table
table
bike
bike
bike
bike
pen
pen
pen
pen
chair
chair
chair
chair
table
table
etc.
I would like to rename these elements so they will reflect the number of occurrences up to that point. The output should look like this:
chair_1
chair_2
chair_3
chair_4
table_1
table_2
table_3
table_4
bike_1
bike_2
bike_3
bike_4
pen_1
pen_2
pen_3
pen_4
chair_5
chair_6
chair_7
chair_8
table_5
table_6
etc.
Please note that the dash (_) is necessary Could anyone help? Thank you.
Interesting problem! This is the procedure that I would try:
Use unique - the third output parameter in particular to assign each string in your cell array to a unique ID.
Initialize an empty array, then create a for loop that goes through each unique string - given by the first output of unique - and creates a numerical sequence from 1 up to as many times as we have encountered this string. Place this numerical sequence in the corresponding positions where we have found each string.
Use strcat to attach each element in the array created in Step #2 to each cell array element in your problem.
Step #1
Assuming that your cell array is defined as a bunch of strings stored in A, we would call unique this way:
[names, ~, ids] = unique(A, 'stable');
The 'stable' is important as the IDs that get assigned to each unique string are done without re-ordering the elements in alphabetical order, which is important to get the job done. names will store the unique names found in your array A while ids would contain unique IDs for each string that is encountered. For your example, this is what names and ids would be:
names =
'chair'
'table'
'bike'
'pen'
ids =
1
1
1
1
2
2
2
2
3
3
3
3
4
4
4
4
1
1
1
1
2
2
names is actually not needed in this algorithm. However, I have shown it here so you can see how unique works. Also, ids is very useful because it assigns a unique ID for each string that is encountered. As such, chair gets assigned the ID 1, followed by table getting assigned the ID of 2, etc. These IDs will be important because we will use these IDs to find the exact locations of where each unique string is located so that we can assign those linear numerical ranges that you desire. These locations will get stored in an array computed in the next step.
Step #2
Let's pre-allocate this array for efficiency. Let's call it loc. Then, your code would look something like this:
loc = zeros(numel(A), 1);
for idx = 1 : numel(names)
id = find(ids == idx);
loc(id) = 1 : numel(id);
end
As such, for each unique name we find, we look for every location in the ids array that matches this particular name found. find will help us find those locations in ids that match a particular name. Once we find these locations, we simply assign an increasing linear sequence from 1 up to as many names as we have found to these locations in loc. The output of loc in your example would be:
loc =
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
5
6
7
8
5
6
Notice that this corresponds with the numerical sequence (the right most part of each string) of your desired output.
Step #3
Now all we have to do is piece loc together with each string in our cell array. We would thus do it like so:
out = strcat(A, '_', num2str(loc));
What this does is that it takes each element in A, concatenates a _ character and then attaches the corresponding numbers to the end of each element in A. Because we want to output strings, you need to convert the numbers stored in loc into strings. To do this, you must use num2str to convert each number in loc into their corresponding string equivalents. Once you find these, you would concatenate each number in loc with each element in A (with the _ character of course). The output is stored in out, and we thus get:
out =
'chair_1'
'chair_2'
'chair_3'
'chair_4'
'table_1'
'table_2'
'table_3'
'table_4'
'bike_1'
'bike_2'
'bike_3'
'bike_4'
'pen_1'
'pen_2'
'pen_3'
'pen_4'
'chair_5'
'chair_6'
'chair_7'
'chair_8'
'table_5'
'table_6'
For your copying and pasting pleasure, this is the full code. Be advised that I've nulled out the first output of unique as we don't need it for your desired output:
[~, ~, ids] = unique(A, 'stable');
loc = zeros(numel(A), 1);
for idx = 1 : numel(names)
id = find(ids == idx);
loc(id) = 1 : numel(id);
end
out = strcat(A, '_', num2str(loc));
If you want an alternative to unique, you can work with a hash table, which in Matlab would entail to using the containers.Map object. You can then store the occurrences of each individual label and create the new labels on the go, like in the code below.
data={'table','table','chair','bike','bike','bike'};
map=containers.Map(data,zeros(numel(data),1)); % labels=keys, counts=values (zeroed)
new_data=data; % initialize matrix that will have outputs
for ii=1:numel(data)
map(data{ii}) = map(data{ii})+1; % increment counts of current labels
new_data{ii} = sprintf('%s_%d',data{ii},map(data{ii})); % format outputs
end
This is similar to rayryeng's answer but replaces the for loop by bsxfun. After the strings have been reduced to unique labels (line 1 of code below), bsxfun is applied to create a matrix of pairwise comparisons between all (possibly repeated) labels. Keeping only the lower "half" of that matrix and summing along rows gives how many times each label has previously appeared (line 2). Finally, this is appended to each original string (line 3).
Let your cell array of strings be denoted as c.
[~, ~, labels] = unique(c); %// transform each string into a unique label
s = sum(tril(bsxfun(#eq, labels, labels.')), 2); %'// accumulated occurrence number
result = strcat(c, '_', num2str(x)); %// build result
Alternatively, the second line could be replaced by the more memory-efficient
n = numel(labels);
M = cumsum(full(sparse(1:n, labels, 1)));
s = M((1:n).' + (labels-1)*n);
I'll give you a psuedocode, try it yourself, post the code if it doesn't work
Initiate a counter to 1
Iterate over the cell
If counter > 1 check with previous value if the string is same
then increment counter
else
No- reset counter to 1
end
sprintf the string value + counter into a new array
Hope this helps!
Is there a way to reference a data frame's column names as a variable, not a string (in R)? Say I want to get the first column name of data frame df. the code colnames returns...
> colnames(df)[[1]]
[1] "colname1"
The reason I ask is I'm having a hard time making the function subset generalized to any data frame. Say I wish to do a conditional subset on a data frame with a known conditional, but I don't know the column name at runtime (just the column number). Example --
> df<-data.frame( x=c(1:3), y=c(4:6))
> df.sub <- subset(df, df$y >5 )
But lets say I don't know the column name of df at runtime, only that its column number 2. The function call
> df.sub <- subset(df, colnames(df)[[2]] >5 )
Doesn't work because colnames returns a string, and subset is 'smart' and looks inside df for the object name. Is there a good way around this? I could use [ 's instead but I feel the problem would be the same.
You should be able to use double square brackets successfully for either name or index number:
> subset(df, df[["y"]] > 5)
x y
3 3 6
> subset(df, df[[2]] > 5)
x y
3 3 6
However, note the following from the help page to subset:
Warning
This is a convenience function intended for use interactively. For
programming it is better to use the standard subsetting functions like
[, and in particular the non-standard evaluation of argument subset
can have unanticipated consequences.
And, to give some bad advice, you could also use get:
> subset(df, get(colnames(df)[2]) > 5)
x y
3 3 6
As #Roland notes in the comments, most R users would actually use something along the lines of:
> df[df[[2]] > 5, ]
x y
3 3 6