sort string according to first characters matlab - string

I have an cell array composed by several strings
names = {'2name_19surn', '3name_2surn', '1name_2surn', '10name_1surn'}
and I would like to sort them according to the prefixnumber.
I tried
[~,index] = sortrows(names.');
sorted_names = names(index);
but I get
sorted_names = {'10name_1surn', '1name_2surn', '2name_19surn', '3name_2surn'}
instead of the desired
sorted_names = {'1name_2surn', '2name_19surn', '3name_2surn','10name_1surn'}
any suggestion?

Simple approach using regular expressions:
r = regexp(names,'^\d+','match'); %// get prefixes
[~, ind] = sort(cellfun(#(c) str2num(c{1}), r)); %// convert to numbers and sort
sorted_names = names(ind); %// use index to build result

As long as speed is not a concern you can loop through all strings and save the first digets in an array. Subsequently sort the array as usual...
names = {'2name_2', '3name', '1name', '10name'}
number_in_string = zeros(1,length(names));
% Read numbers from the strings
for ii = 1:length(names)
number_in_string(ii) = sscanf(names{ii}, '%i');
end
% Sort names using number_in_string
[sorted, idx] = sort(number_in_string)
sorted_names = names(idx)

Take the file sort_nat from here
Then
names = {'2name', '3name', '1name', '10name'}
sort_nat(names)
returns
sorted_names = {'1name', '2name', '3name','10name'}

You can deal with arbitrary patterns using a regular expression:
names = {'2name', '3name', '1name', '10name'}
match = regexpi(names,'(?<number>\d+)\D+','names'); % created with regex editor on rubular.com
match = cell2mat(match); % cell array to struct array
clear numbersStr
[numbersStr{1:length(match)}] = match.number; % cell array with number strings
numbers = str2double(numbersStr); % vector of numbers
[B,I] = sort(numbers); % sorted vector of numbers (B) and the indices (I)
clear namesSorted
[namesSorted{1:length(names)}] = names{I} % cell array with sorted name strings

Related

Matlab substring

I am trying to get average a specific value in a long string and cannot figure out how to pull a value out of the middle of the string. I would like to pull out the 27 from this and the other strings and add them
2015-10-1,33,27,20,29,24,20,96,85,70,30.51,30.40,30.13,10,9,4,10,6,,T,5,Snow,35
2015-10-1,33,27,20,29,24,20,96,85,70,30.51,30.40,30.13,10,9,4,10,6,,T,5,Snow,35
2015-10-2,37,32,27,32,27,23,92,80,67,30.35,30.31,30.28,10,10,7,7,4,,T,8,Rain-Snow,19
2015-10-3,39,36,32,35,33,29,100,90,79,30.30,30.17,30.11,10,7,0,8,3,,0.21,8,Fog-Rain,13
2015-10-4,40,37,34,38,36,33,100,96,92,30.23,30.19,30.14,2,1,0,6,0,,0.13,8,Fog-Rain,27
2015-10-5,46,38,30,38,34,30,100,91,61,30.19,30.08,29.93,10,7,0,6,2,,T,6,Fog-Rain,23
fid = fopen('MonthlyHistory.html');
for i=1:2
str = fgets(fid);
c = strsplit(str,',');
mean=mean+c;
end
fprintf('Average Daily Temperature: %d\n',mean);
Method 1: use readtable
I'm guessing this is pulled from weather underground? Take your csv file and make sure it is saved with a .csv ending. Then what I would do is:
my_data = readtable('MonthlyHistory.csv');
This reads the whole file into the highly convenient table variable type. Then you can do:
average_daily_temp = my_data.MeanTemperatureF; %or whatever it is called in the table
I find tables are a super convenient way to keep track of tabular data. (plus readtable is pretty good).
Method 2: continue your approach...
fid = fopen('mh2.csv');
str = fgets(fid); % May need to read off a few lines to get to the
str = fgets(fid); % numbers
my_data = []; %initialize an empty array
while(true)
str = fgets(fid); % read off a line
if(str == -1) % if str is -1, which signifies end of file
break; %exit loop
end
ca = strsplit(str,','); % split string into a cell array of strings
my_data(end+1,:) = str2num(ca{3}); % convert the 3rd element to a number and store it
end
fclose(fid);
Now my_data is an array holding the 3rd element of each line.
You can use textscan, you might be able to simplfy your code using this as well, but for a single string, it works like this:
S='2015-10-1,33,27,20,29,24,20,96,85,70,30.51,30.40,30.13,10,9,4,10,6,,T,5,Snow,35'
T=textscan(S,'%s','Delimiter',',')
str2double(T{1}{3}) %// the value we want is the 3rd field

Compare two arrays of strings

I have two lists of strings as a column in a table (PM25_spr{i}.MonitorID and O3_spr{i}.MonitorID). The lists are of different lengths. I want to compare the first 11 characters of each entry and pull out the index for each list where they are the same.
Example
List 1:
'01-003-0010-44201'
'01-027-0001-44201'
'01-051-0001-44201'
'01-073-0023-44201'
'01-073-1003-44201'
'01-073-1005-44201'
'01-073-1009-44201'
'01-073-1010-44201'
'01-073-2006-44201'
'01-073-5002-44201'
'01-073-5003-44201'
'01-073-6002-44201'
List 2:
'01-073-0023-88101'
'01-073-2003-88101'
'04-013-0019-88101'
'04-013-9992-88101'
'04-013-9997-88101'
'05-119-0007-88101'
'05-119-1008-88101'
'06-019-0008-88101'
'06-029-0014-88101'
'06-037-0002-88101'
'06-037-1103-88101'
'06-037-4002-88101'
'06-059-0001-88101'
'06-065-8001-88101'
'06-067-0010-88101'
'06-073-0003-88101'
'06-073-1002-88101'
'06-073-1007-88101'
'08-001-0006-88101'
'08-031-0002-88101'
I tried intersect, which isn't the right approach for what I want to do. I'm not sure how to use ismember given that I only want to look at the first 11 characters.
I tried strncmp, but Inputs must be the same size or either one can be a scalar.
chars2compare = length('18-097-0083');
strncmp(O3_spr{i}.MonitorID, PM25_spr{i}.MonitorID,chars2compare)
PM25_spr_MID = cell(length(years),1); % Preallocate cell array
for n = 1:length(PM25_spr{i}.MonitorID)
s = char(PM25_spr{i}.MonitorID(n)); % Convert string to char
PM25_spr_MID{i}(n) = cellstr(s(1:11)); % Pull out 1-11 characters and convert to cell
end
O3_spr_MID = cell(length(years),1); % Preallocate cell array
for n = 1:length(O3_spr{i}.MonitorID)
s = char(O3_spr{i}.MonitorID(n));
O3_spr_MID{i}(n) = cellstr(s(1:11));
end
[C, ia, ib] = intersect(O3_spr_MID{i}, PM25_spr_MID{i})
PerCap_spr_O3{i} = O3_spr{i}(ia,:);
PerCap_spr_PM25{i} = PM25_spr{i}(ib,:);
Assuming list1 and list2 to be the two input cell arrays, you can use few approaches.
I. Operate on cell arrays
With intersect -
%// Clip off after first 11 characters in each cell of the input cell arrays
list1_f11 = arrayfun(#(n) list1{n}(1:11),1:numel(list1),'uni',0)
list2_f11 = arrayfun(#(n) list2{n}(1:11),1:numel(list2),'uni',0)
%// Use intersect to find common indices in the input cell arrays
[~,idx_list1,idx_list2] = intersect(list1_f11,list2_f11)
With ismember -
%// Clip off after first 11 characters in each cell of the input cell arrays
list1_f11 = arrayfun(#(n) list1{n}(1:11),1:numel(list1),'uni',0)
list2_f11 = arrayfun(#(n) list2{n}(1:11),1:numel(list2),'uni',0)
%// Use ismember to find common indices in the input cell arrays
[LocA,LocB] = ismember(list1_f11,list2_f11);
idx_list1 = find(LocA)
idx_list2 = LocB(LocA)
II. Operate on char arrays
We can use char dierctly on the input cell arrays to get 2D char arrays as working with them could be faster than working withcells.
With intersect + 'rows' -
%// Convert to char arrays
list1c = char(list1)
list2c = char(list2)
%// Clip char arrays after first 11 columns
list1c_f11 = list1c(:,1:11)
list2c_f11 = list2c(:,1:11)
%// Use intersect with 'rows' option
[~,idx_list1,idx_list2] = intersect(list1c_f11,list2c_f11,'rows')
III. Operate on numeric arrays
We can convert the char arrays further to numeric arrays with just one column as that could lead to faster solutions.
%// Convert to char arrays
list1c = char(list1)
list2c = char(list2)
%// Clip char arrays after first 11 columns
list1c_f11 = list1c(:,1:11)
list2c_f11 = list2c(:,1:11)
%// Remove char columns of hyphens (3 and 7 for the given input)
list1c_f11(:,[3 7])=[];
list2c_f11(:,[3 7])=[];
%// Convert char arrays to numeric arrays
ncols = size(list1c_f11,2);
list1c_f11num = (list1c_f11 - '0')*(10.^(ncols-1:-1:0))'
list2c_f11num = (list2c_f11 - '0')*(10.^(ncols-1:-1:0))'
This point onwards you have three more approaches to work with that are listed next.
With ismember ( would be memory efficient, but maybe not fast across all datasizes) -
[LocA,LocB] = ismember(list1c_f11num,list2c_f11num);
idx_list1 = find(LocA)
idx_list2 = LocB(LocA)
With intersect (could be slow) -
[~,idx_list1,idx_list2] = intersect(list1c_f11num,list2c_f11num)
With bsxfun ( would be memory inefficient, but maybe fast for small to decent sized inputs) -
[idx_list1,idx_list2] = find(bsxfun(#eq,list1c_f11num,list2c_f11num'))

meshgrid equivalent for strings

I have two cells:
Months1 = {'F','G','H','J','K','M','N','Q','U','V','X','Z'};
Months2 = 2009:2014;
How do I generate all combinations without running a loop so that I achieve the following:
Combined = {'F09','F10','F11','',...,'G09',.....};
Basically all combinations of Months1 and Months2 as in meshgrid.
If you don't need cells and can use char arrays only, this can work:
Months1 = ['F','G','H','J','K','M','N','Q','U','V','X','Z']';
Months2 = num2str((2009:2014)');
[x, y] = meshgrid(1:12, 1:6);
Combined = strcat(Months1(x(:)), Months2(y(:),:));
and you can then reshape if required. I'm not yet sure how to do this with cells, though.
Inspired by this post.
My take on the problem would apply ndgrid, datestr (to handle any millennium) and strcat to do the work:
yearStrings = datestr(datenum(num2str(Months2(:)),'yyyy'),'yy');
[ii,jj] = ndgrid(1:numel(Months2),1:numel(Months1));
Combined = strcat(Months1(jj(:)).',yearStrings(ii(:),:)).'
Note: Years change faster than the prefixed letters, so Months2 goes first in ndgrid, then Months1. IMO, this is more intuitive behavior than meshgrid, which forces you to think in x,y space to predict how the outputs vary.
Or instead of the strcat line:
tmp = [Months1(jj(:)).',yearStrings(ii(:),:)].';
Combined = cellstr(reshape([tmp{:}],[],numel(ii)).').'
You can convert cell array to indices with grp2idx, then use meshgrid, then strcat to combine strings. Before you also need to convert numeric Months2 vector to cell array of strings.
[id1,id2] = meshgrid(grp2idx(Months1),Months2);
Months2cell = cellstr(num2str(id2(:)-2000,'%02d'))';
Combined = strcat( Months1(id1(:)), Months2cell );

Is it possible to concatenate a string with series of number?

I have a string (eg. 'STA') and I want to make a cell array that will be a concatenation of my sting with a numbers from 1 to X.
I want the code to do something like the fore loop here below:
for i = 1:Num
a = [{a} {strcat('STA',num2str(i))}]
end
I want the end results to be in the form of {<1xNum cell>}
a = 'STA1' 'STA2' 'STA3' ...
(I want to set this to a uitable in the ColumnFormat array)
ColumnFormat = {{a},... % 1
'numeric',... % 2
'numeric'}; % 3
I'm not sure about starting with STA1, but this should get you a list that starts with STA (from which I guess you could remove the first entry).
N = 5;
[X{1:N+1}] = deal('STA');
a = genvarname(X);
a = a(2:end);
You can do it with combination of NUM2STR (converts numbers to strings), CELLSTR (converts strings to cell array), STRTRIM (removes extra spaces)and STRCAT (combines with another string) functions.
You need (:) to make sure the numeric vector is column.
x = 1:Num;
a = strcat( 'STA', strtrim( cellstr( num2str(x(:)) ) ) );
As an alternative for matrix with more dimensions I have this helper function:
function c = num2cellstr(xx, varargin)
%Converts matrix of numeric data to cell array of strings
c = cellfun(#(x) num2str(x,varargin{:}), num2cell(xx), 'UniformOutput', false);
Try this:
N = 10;
a = cell(1,N);
for i = 1:N
a(i) = {['STA',num2str(i)]};
end

MATLAB empty cell(n,m) array of strings?

What is the quickest way to create an empty cell array of strings ?
cell(n,m)
creates an empty cell array of double.
How about a similar command but creating empty strings ?
Depends on what you want to achieve really. I guess the simplest method would be:
repmat({''},n,m);
Assignment to all cell elements using the colon operator will do the job:
m = 3; n = 5;
C = cell(m,n);
C(:) = {''}
The cell array created by cell(n,m) contains empty matrices, not doubles.
If you really need to pre populate your cell array with empty strings
test = cell(n,m);
test(:) = {''};
test(1,:) = {'1st row'};
test(:,1) = {'1st col'};
This is a super old post but I'd like to add an approach that might be working. I am not sure if it's working in an earlier version of MATLAB. I tried in 2018+ versions and it works.
Instead of using remat, it seems even more convenient and intuitive to start a cell string array like this:
C(1:10) = {''} % Array of empty char
And the same approach can be used to generate cell array with other data types
C(1:10) = {""} % Array of empty string
C(1:10) = {[]} % Array of empty double, same as cell(1,10)
But be careful with scalers
C(1:10) = {1} % an 1x10 cell with all values = {[1]}
C(1:10) = 1 % !!!Error
C(1:10) = '1' % !!!Error
C(1:10) = [] % an 1x0 empty cell array

Resources