How to extract excel column and import it to MATLAB? - excel

Hello I have an excel file with multiple columns up to "CD". My code words perfectly for excel files with 26 columns but after that it doesn't work.
[ia ib] = ismember(header, {item});
letter = find(ia)+'A'-1;
cell = fprintf('%c:%c', letter, letter);
out = xlsread('filename', cell);
This code works until I get to Z:Z. When I get to AA, AB, AC,... it won't work. How do I extract the AA, CD, BG columns?

It doesn't work because you are assuming that your letter for the header is only one character as indicated by:
letter = find(ia) + 'A' - 1;
What are you doing is essentially building the ASCII code for a capital letter between A to Z. This will obviously fail if you are trying to find a header with more than one letter. What you'll need to do is build a dictionary of all possible characters of AA to ZZ, then you can use the output of find(ia) on this dictionary if we exceed the column Z in your Excel sheet to extract out the right sequence of characters you need, then finally use this sequence of characters to index into your Excel sheet.
Referencing this question, I'm going to take Rody Oldenhuis's answer. Therefore, construct this dictionary of all possible two characters:
x = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ';
y = unique(nchoosek(repmat(x, 1,26), 2), 'rows');
y will be a N x 2 character matrix where each row is a unique permutation of two letters from A-Z (so AA, AB, etc.). The way the code is written, it should maintain the exact ordering like how Excel does it for columns that go beyond Z, so AA, AB, AC, ... AZ, BA, BB, BC, ... BZ, ..., ZX, ZY, ZZ. Next, we need to see whether or not the found index is between 1 and 26. If it is, you can use your previous code. If it isn't, then we'll do what we outlined above. Note that I will have to subtract this found index by 26 so I can index into this character array that we created. Assuming that header has all unique entries, we can do:
[ia ib] = ismember(header, {item});
index = find(ia, 1);
if index <= 26 %// Check if we are within columns A - Z
letter = index + 'A' - 1;
else %// If not, we are at a column that is beyond Z.
x = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ';
y = unique(nchoosek(repmat(x, 1,26), 2), 'rows');
index = index - 26; %// Subtract by 26 to reference into character array
letter = y(index,:);
end
cell = sprintf('%s:%s', letter, letter);
out = xlsread('filename', cell);
Note that I changed your fprintf call to sprintf as you desire to store the string representation of which cells you want to access. fprintf (in your case) will print to the screen, which is probably not what you want. Also, I've changed the variable cell to ce as cell is an actual function in MATLAB.
Also note that I've changed the %c formatting string to %s as the header may consist of more than one character.

Related

How to find common elements in string cells?

I want to find the common elements in multiple (>=2) cell arrays of strings.
A related question is here, and the answer proposes to use the function intersect(), however it works for only 2 inputs.
In my case, I have more than two cells, and I want to obtain a single common subset. Here is an example of what I want to achieve:
c1 = {'a','b','c','d'}
c2 = {'b','c','d'}
c3 = {'c','d'}
c_common = my_fun({c1,c2,c3});
in the end, I want c_common={'c','d'}, since only these two strings occur in all the inputs.
How can I do this with MATLAB?
Thanks in advance,
P.S. I also need the indices from each input, but I can probably do that myself using the output c_common, so not necessary in the answer. But if anyone wants to tackle that too, my actual output will be like this:
[c_common, indices] = my_fun({c1,c2,c3});
where indices = {[3,4], [2,3], [1,2]} for this case.
Thanks,
Listed in this post is a vectorized approach to give us the common strings and indices using unique and accumarray. This would work even when the strings are not sorted within each cell array to give us indices corresponding to their positions within it, but they have to be unique. Please have a look at the sample input, output section* to see such a case run. Here's the implementation -
C = {c1,c2,c3}; % Add more cell arrays here
% Get unique strings and ID each of the strings based on their uniqueness
[unqC,~,unqID] = unique([C{:}]);
% Get count of each ID and the IDs that have counts equal to the number of
% cells arrays in C indicate that they are present in all cell arrays and
% thus are the ones to be finally selected
match_ID = find(accumarray(unqID(:),1)==numel(C));
common_str = unqC(match_ID)
% ------------ Additional work to get indices ----------------
N_str = numel(common_str);
% Store matches as a logical array to be used at later stages
matches = ismember(unqID,match_ID);
% Use ismember to find all those indices in unqID and subtract group
% lengths from them to give us the indices within each cell array
clens = [0 cumsum(cellfun('length',C(1:end-1)))];
match_index = reshape(find(matches),N_str,[]);
% Sort match_index along each column based on the respective unqID elements
[m,n] = size(match_index);
[~,sidx] = sort(reshape(unqID(matches),N_str,[]),1);
sorted_match_index = match_index(bsxfun(#plus,sidx,(0:n-1)*m));
% Subtract cumulative group lens to give us indices corres. to each cell array
common_idx = bsxfun(#minus,sorted_match_index,clens).'
Please note that at the step that calculates match_ID : accumarray(unqID(:),1) could be replaced by histc(unqID,1:max(unqID)). Also, histcounts be another alternative there.
*Sample input, output -
c1 =
'a' 'b' 'c' 'd'
c2 =
'b' 'c' 'a' 'd'
c3 =
'c' 'd' 'a'
common_str =
'a' 'c' 'd'
common_idx =
1 3 4
3 2 4
3 1 2
As noted in the comments to this question, there is a file in File Exchange called "MINTERSECT -- Multiple set intersection." at http://www.mathworks.com/matlabcentral/fileexchange/6144-mintersect-multiple-set-intersection that contains simple code to generalize intersect to multiple sets. In a nutshell, the code gets the output from performing intersect on the first pair of cells and then perform intersect on this output with the next cell. This process continues until all cells have been compared. Note that the author points out that the code is not particularly efficient but it may be sufficient for your use case.

How to print a number within a string in matlab

I would like to use the command text to type numbers within 57 hexagons. I want to use a loop:
for mm=1:57
text(x(m),y(m),'m')
end
where x(m) and y(m) are the coordinates of the text .
The script above types the string "m" and not the value of m. What am I doing wrong?
Jubobs pretty much told you how to do it. Use the num2str function. BTW, small typo in your for loop. You mean to use mm:
for mm=1:57
text(x(mm),y(mm),num2str(mm));
end
The reason why I've even decided to post an answer is because you can do this vectorized without a loop, which I'd also like to write an answer for. What you can do place each number into a character array where each row denotes a unique number, and you can use text to print out all numbers simultaneously.
m = sprintfc('%2d', 1:57);
d = reshape([m{:}], 2, 57).';
text(x, y, d);
The (undocumented!) function sprintfc takes a formatting specifier and an array and creates a cell array of strings where each cell is the string version of each element in the array you supply. In order to ensure that the character array has the same number of columns per row, I ensure that each string takes up 2 characters, and so any number less than 10 will have a blank space at the beginning. I then convert the cell array of strings into a character array by converting the cell array into a comma-separated list of strings and I reshape the matrix into an acceptable form, and then I call text with all of the pairs of x and y, with the corresponding labels in m together on the screen.

Matlab Split-String

Hello,
I have a little problem.
I have a txt file with over 200mb.
It looks like:
%Hello World
%second sentences
%third;
%example
12.02.2014
;-400;-200;200
;123;233;434
%Hello World
%second sentences
%third
%example
12.02.2014
;-410;200;20300
;63;23;43
;23;44;78213
..
... ...
I need only the Values after the semicolon like:
Value1{1,1}=[-400]; Value{1,2}=[-200]; and Value{1,3}=[200]
Value2{1,1}=[123]; Value{1,2}=[233]; and Value{1,3}=[434]
and so on.
Hase someone an ideas, how i can split the values in a cell array or vektor?
Thus, the variables must be:
Var1=[-400 -200 200;
434 233 434;
Var2=[
-410 200 20300;
63 23 43;
23 44 28213]
I will seperate, after every date in a another Value. Example when i have 55 Dates, i will have 55 Values.
shareeditundeleteflag
This could be one approach assuming a uniformly structured data (3 valid numbers per row) -
%// Read in entire text data into a cell array
data = importdata('sample.txt','');
%// Remove empty lines
data = data(~cellfun('isempty',data))
%// Find boundaries based on delimiter "%example"
exmp_delim_matches = arrayfun(#(n) strcmp(data{n},'%example'),1:numel(data))
bound_idx = [find(exmp_delim_matches) numel(exmp_delim_matches)]
%// Find lines that start with delimiter ";"
matches_idx = find(arrayfun(#(n) strcmp(data{n}(1),';'),1:numel(data)))
%// Select lines that start with character ";" and split lines based on it
%// Split selected lines based on the delimiter ";"
split_data = regexp(data(matches_idx),';','split')
%// Collect all cells data into a 1D cell array
all_data = [split_data{:}]
%// Select only non-empty cells and convert them to a numeric array.
%// Finally reshape into a format with 3 numbers per row as final output
out = reshape(str2double(all_data(~cellfun('isempty',all_data))),3,[]).' %//'
%// Separate out lines based on the earlier set bounds
out_sep = arrayfun(#(n) out(matches_idx>bound_idx(n) & ...
matches_idx<bound_idx(n+1),:),1:numel(bound_idx)-1,'Uni',0)
%// Display results for verification
celldisp(out_sep)
Code run -
out_sep{1} =
-400 -200 200
123 233 434
out_sep{2} =
-410 200 20300
63 23 43
23 44 78213
A brute force approach would be to open up your file, then read each line one at a time. With each line, you check to see if the first character is a semi-colon and if it is, split up the string by the ; delimiter from the second character of the line up until the end. You will receive a cell array of strings, so you'd have to convert this into an array of numbers. Because you will probably have each line containing a different amount of numbers, let's store each array into a cell array where each element in this cell array will contain the numbers per line. As such, do something like this. Let's assume your text file is stored in text.txt:
fid = fopen('text.txt');
if fid == -1
error('Cannot find file');
end
nums = {};
while true
st = fgetl(fid);
if st == -1
break;
end
if st(1) == ';'
st_split = strsplit(st(2:end), ';');
arr = cellfun(#str2num, st_split);
nums = [nums arr];
end
end
Let's go through the above code slowly. We first use fopen to open up the file for reading. We check to see if the ID returned from fopen is -1 and if that's the case, we couldn't find or open the file so spit out an error. Next, we declare an empty cell array called nums which will store our numbers that you are getting when parsing your text file.
Now, until we reach the end of the file, get one line of text starting from the top of the file and we proceed to the end. We use fgetl for this. If we read a -1, this means we have reached the end of the file, so get out of the loop. Else, we check to see if the first character is ;. If it is, then we take a look at the second character until the end of this line, and split the string based on the ; character with strsplit. The result of this will be a cell array of strings where each element is the string representation of your number. You need to convert this cell array back into a numeric array, and so what you would need to do is apply str2num to each element in this cell. You can either use a loop to go through each cell, or you can conveniently use [cellfun](http://www.mathworks.com/help/matlab/ref/cellfun.html to allow you to go through each element in this cell and convert the string representation into a numeric value. The resulting output of cellfun will give you a numeric array representation of each value delimited by the ; character for that line. We then place this array into a single cell stored in nums.
The end result of this entire code will give you numeric arrays that are based on what you are looking for stored in nums.
Warning
I am assuming that your text file only has numbers delimited by ; characters if we encounter a line that starts with ;. If this is not the case, then my code will not work. I'm assuming this isn't the case!

MATLAB find a row of chars in a matrix

There is a char matrix. Each row is a word. All the words/rows have the same length.
Is there anyway to find a target word in this matrix using find() function?
Say M=[o k a y; g o o d; h a v e]; A target word W='have'; Is there any function to search W in M and return the row index?
When I try index = find(isequal(W,M)) and index = find(isequal(W,M(:,:))), they both return empty array.
I know that I can use linear search, i.e. to compare row by row, just wonder if there is a built-in function to solve this problem. Thanks!
Another approach would be to use ismember. Assuming that M is a character array like you have defined, you would thus do:
W = 'have';
[~,idx] = ismember(M, W, 'rows');
The first input is the character array defined, the second input is the string we are searching for, and we choose 'rows' as the flag as we want to search on a row basis. Each word is defined in one row. The first output is true/false that is of the same size as M that tells you whether or not we have found the word indexed by that row in M is a match to the one you're searching for. Because we only want to know where that word is, we can suppress that first output. idx tells you which row the word was found.
When we call this function, we get:
idx =
3
This means that the third row contains the word you're looking for. However, if you're dead set on using find, consider using ismember in conjunction with find:
W = 'have';
idx = find(ismember(M, W, 'rows'));
idx =
3
However, the advantage with the above approach is that it will find all locations that match the particular word you're looking for.
You need to compare M with a string. I am assuming M contains of individual characters as follows:
M=['o' 'k' 'a' 'y'; 'g' 'o' 'o' 'd'; 'h' 'a' 'v' 'e']
and W='have'
So in order to compare each row of M with W, you need to use strcmp. For that, you need M to be a cell array. You can convert each row of M into a cell array by using mat2cell.
equalRows=strcmp(W, mat2cell(M,ones(1,size(M,1)),size(M,2)));
% Answer
equalRows =
0
0
1
Use find command on the above output to get the indices.

Union of cell array of cells

I'm looking for the way to do the union of two cell arrays of cell arrays of strings. For example:
A = {{'one' 'two'};{'three' 'four'};{'five' 'six'}};
B = {{'five' 'six'};{'seven' 'eight'};{'nine' 'ten'}};
And I'd like to get something like:
C = {{'one' 'two'};{'three' 'four'};{'five' 'six'};{'seven' 'eight'};{'nine' 'ten'}};
But when I use C = union(A, B) MATLAB returns an error saying:
Input A of class cell and input B of class cell must be cell arrays of strings, unless one is a string.
Does anyone know how to do something like this in a hopefully simple way? I'd greatly appreciate it.
ALTERNATIVE: A way to have a cell array of separated strings in any other way than a cell array of cell array of strings would be also useful, but as far as I know, it's not possible.
Thank you!
C=[A;B]
allWords=unique([A{:};B{:}])
F=cell2mat(cellfun(#(x)(ismember(allWords,x{1})+2*ismember(allWords,x{2}))',C,'uni',false))
[~,uniqueindices,~]=unique(F,'rows')
C(sort(uniqueindices))
What my code does: it builds up a list of all words allwords, then this list is used to build up a matrix which contains the correlation between the rows and which word they contain. 1=Match for first wird, 2=Match for second word. Finally, on this numeric matrix unique can be applied to get the indices.
Including my update, now the 2 words per cell is hardcoded. To get rid of this limitation it would be neseccary to replace the anonymous function (#(x)(ismember(allWords,x{1})+2*ismember(allWords,x{2}))) with a more generic implementation. Probably using cellfun again.
Union doesn't seem like compatible for cell arrays of cells. So, we need to look for some workaround.
One approach would be to get the data from A and B concatenated vertically. Then, along each column assign each cell of strings an unique ID. Those IDs can then be combined into a double array that opens up the possibility of of using unique with 'rows' option to get us the desired output. This is precisely achieved here.
%// Slightly complicated input for safest verification of results
A = {{'three' 'four'};
{'five' 'six'};
{'five' 'seven'};
{'one' 'two'}};
B = {{'seven' 'eight'};
{'five' 'six'};
{'nine' 'ten'};
{'three' 'six'};};
t1 = [A ; B] %// concatenate all cells from A and B vertically
t2 = vertcat(t1{:}) %// Get all the cells of strings from A and B
t22 = mat2cell(t2,size(t2,1),ones(1,size(t2,2)));
[~,~,row_ind] = cellfun(#(x) unique(x,'stable'),t22,'uni',0)
mat1 = horzcat(row_ind{:})
[~,ind] = unique(mat1,'rows','stable')
out1 = t2(ind,:) %// output as a cell array of strings, used for verification too
out = mat2cell(out1, ones(1,size(out1,1)),size(out1,2)) %//desired output
Output -
out1 =
'three' 'four'
'five' 'six'
'five' 'seven'
'one' 'two'
'seven' 'eight'
'nine' 'ten'
'three' 'six'

Resources