produce a table/array with string column and numbers - string

I have a cell structure of strings such as.
my_cell = 'apple.csv' 'banana.csv' 'orange.csv'
from reading in datasets
I have a vector of data.
my_number = [1 2 3]
I want to output a table/array that has the names in the first column and some numbers in the second.
my_output=['apple' 1; 'banana' 2; 'orange' 3]

i think this will only work if your vector is the same size as your cell
my_output=cell(length(my_cell),2)
for i=1:length(my_cell)
my_output(i,:)=[my_cell(i),my_number(i)];
end

In order to mix strings and numbers in the same array, your "table" needs to be a cell array:
my_cell = {'apple.csv','banana.csv','orange.csv'}; % data
my_number = [1 2 3]; % data
my_output = cell(length(my_cell),2); % initialize output cell array
[my_output{:,1}] = deal(my_cell{:}); % asign first column of cell array
my_number_cell = num2cell(my_number); % convert vector to cell
[my_output{:,2}] = deal(my_number_cell{:}); % asign first column of cell array
gives
>> disp(my_output)
'apple.csv' [1]
'banana.csv' [2]
'orange.csv' [3]

Other than a cell array, you can also use dataset(), which after some initial overhead are lighter than than the former and also allow you to access its fields with . (dot) syntax, i.e. 'struct' syntax:
% Example input
my_cell = repmat({'apple.csv'; 'banana.csv'; 'orange.csv'} ,1000,1);
my_number = repmat([1; 2; 3],1000,1);
% a is a cell array, b is a dataset
a = [my_cell(:), num2cell(my_number(:))]
b = dataset({my_cell(:), 'name'},{my_number(:),'number'})
Displayed variables:
a =
'apple.csv' [1]
'banana.csv' [2]
'orange.csv' [3]
b =
name number
'apple.csv' 1
'banana.csv' 2
'orange.csv' 3
Alternative ways to index a dataset():
b(:,1)
ans =
name
'apple.csv'
'banana.csv'
'orange.csv'
b.name
ans =
'apple.csv'
'banana.csv'
'orange.csv'
b(:,'number')
ans =
number
1
2
3

Related

How to split a string into sub strings of n length?

How would i split a string into sub array's of n length in Matlab?
eg.
Input: "ABCDEFGHIJKL", with sub arrays of length 3
Output: {ABC}, {DEF}, {GHI}, {JKL}
If the string length is not a multiple of n you probably need a loop or arrayfun:
x = 'ABCDEFGHIJK'; % length 11
n = 3;
result = arrayfun(#(k) x(k:min(k+n-1, end)), 1:n:numel(x), 'UniformOutput', false)
Alternatively, accumarray can be used as well:
x = 'ABCDEFGHIJK';
n = 3;
result = accumarray(floor((0:numel(x)-1).'/n)+1, x, [], #(t) {t.'}).';
Either of the above gives, in this example,
result =
1×4 cell array
{'ABC'} {'DEF'} {'GHI'} {'JK'}
A regular expression can do the job here:
str = 'abcdefgh'
exp = '.{1,3}' %the regular expression (get all the group of 3 char, if number of char left < 3, take the rest)
res = regexp(str,exp,'match')
which give:
res =
1×3 cell array
{'abc'} {'def'} {'gh'}
If you only want to match group of 3 char:
exp = '.{3}' %this will output {'abc'} {'def'} but no {'gh'}
This shoud do it :)
string = cellstr(reshape(string, 3, [])')

Compare 1 string with a cell array of strings with indexes (Matlab)

I have 1 string and 1 cell array of srings :
F = 'ABCD'
R = {'ACBD','CDAB','CABD'};
I would like to compare the string F with all of the strings in R as follows: F(1)='A' and R{1}(1)='A', we will count 1 ( because they have the same value 'A') , F(2)='B' and R{1}(2)='C' we will count 0 ( because they have different values)...and like that until the end of all strings.
We will get same = 2 , dif = 2 for this 'ABCD' and 'ACBD'.
How can I compare F with all the elements in R in the above rule and get the total(same) and total(dif) ?
Assuming all strings in R has the same length as F you can use cellfun:
same = cellfun( #(r) sum(F==r), R )
Results with
2 0 1
That is, the same value per string in R. If you want dif:
dif = numel(F)-same;
If you want the totals:
tot_same = sum(same);
tot_dif = sum(dif);

cell array of strings to matrix

A = {'a','b','c','b','a',...}
A is a <1X400> cell array and I want to create a matrix from A such that if the cell is a, the matrix shows 1, if it is b, it shows as 2 in the matrix and 3 for c.
Thank you.
Specific Case
For a simple specific case as listed in the question, you can use char to convert all the cell elements to characters and then subtract 96 from it, which is ascii equivalent of 'a'-1 -
A_numeric = char(A)-96
Sample run -
>> A
A =
'a' 'b' 'c' 'b' 'a'
>> A_numeric = char(A)-96
A_numeric =
1
2
3
2
1
Generic Case
For a generic substitution case, you need to do a bit more of work like so -
%// Inputs
A = {'correct','boss','cat','boss','correct','cat'}
newcellval = {'correct','cat','boss'}
newnumval = [8,2,5]
[unqcell,~,idx] = unique(A,'stable')
[~,newcell_idx,unqcell_idx] = intersect(newcellval,unqcell,'stable')
A_numeric = newnumval(changem(idx,newcell_idx,unqcell_idx))
Sample input-output -
>> A,newcellval,newnumval
A =
'correct' 'boss' 'cat' 'boss' 'correct' 'cat'
newcellval =
'correct' 'cat' 'boss'
newnumval =
8 2 5
>> A_numeric
A_numeric =
8 5 2 5 8 2
That's easy:
result = cell2mat(A)-'a'+1
For a generic association of letters to numbers 1,2,3...:
letters2numbers = 'abc'; %// 'a'->1, 'b'->2 etc.
[~, result] = ismember(cell2mat(A), letters2numbers)
For a generic association of strings to numbers 1,2,3...:
strings2numbers = {'hi', 'hello', 'hey', 'good morning', 'howdy'};
A = {'hello', 'hi', 'hello', 'howdy', 'bye'};
[~, result] = ismember(A, strings2numbers)
In this example,
result =
2 1 2 5 0
use a For Loop which iterate over A and convert character to number
for loop = 1:length(A)
outMat(loop) = char(A(loop)) - 96
end
I hope it works.

Given two strings, how do I find number of reoccurences of one in another?

For example, s1='abc', s2='kokoabckokabckoab'.
Output should be 3. (number of times s1 appears in s2).
Not allowed to use for or strfind. Can only use reshape,repmat,size.
I thought of reshaping s2, so it would contain all of the possible strings of 3s:
s2 =
kok
oko
koa
oab
.... etc
But I'm having troubles from here..
Assuming you have your matrix reshaped into the format you have in your post, you can replicate s1 and stack the string such that it has as many rows as there are in the reshaped s2 matrix, then do an equality operator. Rows that consist of all 1s means that we have found a match and so you would simply search for those rows where the total sum is equal to the total length of s1. Referring back to my post on dividing up a string into overlapping substrings, we can decompose your string into what you have posted in your question like so:
%// Define s1 and s2 here
s1 = 'abc';
len = length(s1);
s2 = 'kokoabckokabckoab';
%// Hankel starts here
c = (1 : len).';
r = (len : length(s2)).';
nr = length(r);
nc = length(c);
x = [ c; r((2:nr)') ]; %-- build vector of user data
cidx = (1:nc)';
ridx = 0:(nr-1);
H = cidx(:,ones(nr,1)) + ridx(ones(nc,1),:); % Hankel subscripts
ind = x(H); % actual data
%// End Hankel script
%// Now get our data
subseqs = s2(ind.');
%// Case where string length is 1
if len == 1
subseqs = subseqs.';
end
subseqs contains the matrix of overlapping characters that you have alluded to in your post. You've noticed a small bug where if the length of the string is 1, then the algorithm won't work. You need to make sure that the reshaped substring matrix consists of a single column vector. If we ran the above code without checking the length of s1, we would get a row vector, and so simply transpose the result if this is the case.
Now, simply replicate s1 for as many times as we have rows in subseqs so that all of these strings get stacked into a 2D matrix. After, do an equality operator.
eqs = subseqs == repmat(s1, size(subseqs,1), 1);
Now, find the column-wise sum and see which elements are equal to the length of your string. This will produce a single column vector where 1 indicates that we have found a match, and zero otherwise:
sum(eqs, 2) == len
ans =
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
Finally, to add up how many times the substring matched, you just have to add up all elements in this vector:
out = sum(sum(eqs, 2) == len)
out =
2
As such, we have two instances where abc is found in your string.
Here is another one,
s1='abc';
s2='bkcokbacaabcsoabckokabckoabc';
[a,b] = ismember(s2,s1);
b = [0 0 b 0 0];
a1=circshift(b,[0 -1]);
a2=circshift(b,[0 -2]);
sum((b==1)&(a1==2)&(a2==3))
It gives 3 for your input and 4 for my example, and it seems to work well if ismember is okey.
Just for the fun of it: this can be done with nlfilter from the Image Processing Toolbox (I just discovered this function today and am eager to apply it!):
ds1 = double(s1);
ds2 = double(s2);
result = sum(nlfilter(ds2, [1 numel(ds1)], #(x) all(x==ds1)));

Is it possible to concatenate a string with series of number?

I have a string (eg. 'STA') and I want to make a cell array that will be a concatenation of my sting with a numbers from 1 to X.
I want the code to do something like the fore loop here below:
for i = 1:Num
a = [{a} {strcat('STA',num2str(i))}]
end
I want the end results to be in the form of {<1xNum cell>}
a = 'STA1' 'STA2' 'STA3' ...
(I want to set this to a uitable in the ColumnFormat array)
ColumnFormat = {{a},... % 1
'numeric',... % 2
'numeric'}; % 3
I'm not sure about starting with STA1, but this should get you a list that starts with STA (from which I guess you could remove the first entry).
N = 5;
[X{1:N+1}] = deal('STA');
a = genvarname(X);
a = a(2:end);
You can do it with combination of NUM2STR (converts numbers to strings), CELLSTR (converts strings to cell array), STRTRIM (removes extra spaces)and STRCAT (combines with another string) functions.
You need (:) to make sure the numeric vector is column.
x = 1:Num;
a = strcat( 'STA', strtrim( cellstr( num2str(x(:)) ) ) );
As an alternative for matrix with more dimensions I have this helper function:
function c = num2cellstr(xx, varargin)
%Converts matrix of numeric data to cell array of strings
c = cellfun(#(x) num2str(x,varargin{:}), num2cell(xx), 'UniformOutput', false);
Try this:
N = 10;
a = cell(1,N);
for i = 1:N
a(i) = {['STA',num2str(i)]};
end

Resources