Remove characters from a cell array of strings (Matlab) - string

I have a cell array of strings. I need to extract say 1-to-n characters for each item. Strings are always longer than n characters. Please see:
data = { 'msft05/01/2010' ;
'ap01/01/2013' }
% For each string, last 10 characters are removed and put it in the next column
answer = { 'msft' '05/01/2010' ;
'ap' '01/01/2013' }
Is there a vectorized solution possible? I have tried using cellfun but wasn't successful. Thanks.

data = { 'msft05/01/2010' ;
'ap01/01/2013' };
for i = 1:length(data)
s = data{i};
data{i} = {s(1:end-10) s(end-9:end)};
end
Sorry, didn't notice that you need vectorized... Perhaps I can suggest only one-liner...
data = cellfun(#(s) {s(1:end-10) s(end-9:end)}, data, 'UniformOutput', false);

Related

MATLAB: Find locations of multiple characters inside string

How can I find the locations of certains characters within a string. This is my attempt:
Example = "Hello, this is Tom. I wonder, should I go run?";
SearchedCharacters = {'.','!',',','?'};
%Plan one
Locations = strfind(Example, SearchedCharacters);
%Plan two
Locations = cellfun(#(s)find(~cellfun('isempty',strfind(C,s))),SearchedCharacters,'uni',0);
Both of my plans give errors.
Finally. Having the locations of the characters within the string, I would like to determine, the second last character of interest in the string. In this case it would be ","(Just after the word "wonder"), in location = 29.
Help will be appreciated.
Thanks.
You can use ismember and find.
Find the second last location:
Example = 'Hello, this is Tom. I wonder, should I go run?' ;
SearchedCharacters = '.!,?' ;
idx = ismember (Example, SearchedCharacters);
Loc = find (idx, 2, 'last');
if numel (Loc) < 2
error ('the requested character cannot be found')
end
SecondLast = Loc (1);
Find all locations:
Locations = find (idx);

Finding indexes of strings in a string array in Matlab

I have two string arrays and I want to find where each string from the first array is in the second array, so i tried this:
for i = 1:length(array1);
cmp(i) = strfind(array2,array1(i,:));
end
This doesn't seem to work and I get an error: "must be one row".
Just for the sake of completeness, an array of strings is nothing but a char matrix. This can be quite restrictive because all of your strings must have the same number of elements. And that's what #neerad29 solution is all about.
However, instead of an array of strings you might want to consider a cell array of strings, in which every string can be arbitrarily long. I will report the very same #neerad29 solution, but with cell arrays. The code will also look a little bit smarter:
a = {'abcd'; 'efgh'; 'ijkl'};
b = {'efgh'; 'abcd'; 'ijkl'};
pos=[];
for i=1:size(a,1)
AreStringFound=cellfun(#(x) strcmp(x,a(i,:)),b);
pos=[pos find(AreStringFound)];
end
But some additional words might be needed:
pos will contain the indices, 2 1 3 in our case, just like #neerad29 's solution
cellfun() is a function which applies a given function, the strcmp() in our case, to every cell of a given cell array. x will be the generic cell from array b which will be compared with a(i,:)
the cellfun() returns a boolean array (AreStringFound) with true in position j if a(i,:) is found in the j-th cell of b and the find() will indeed return the value of j, our proper index. This code is more robust and works also if a given string is found in more than one position in b.
strfind won't work, because it is used to find a string within another string, not within an array of strings. So, how about this:
a = ['abcd'; 'efgh'; 'ijkl'];
b = ['efgh'; 'abcd'; 'ijkl'];
cmp = zeros(1, size(a, 1));
for i = 1:size(a, 1)
for j = 1:size(b, 1)
if strcmp(a(i, :), b(j, :))
cmp(i) = j;
break;
end
end
end
cmp =
2 1 3

Logic to find minimum number of strings for complete coverage of characters

I have set of strings
[abcd,
efgh,
abefg]
How to find the minimum number of strings that covers all the characters (abcdefgh)
Answer would be abcd and efgh. But what would be the algorithm to find this answer?
The "set cover problem" can be reduced to your problem. You can read about it on Wikipedia link. There is no known polynomial solution for it.
#j_random_hacker: That's what I meant. Corrected.
#Yuvaraj: Check the following pseudo code:
str = input string
S = input set
for each subset s of S in ascending order of cardinality:
if s covers str
return s
return none
python
>>> a="abcd efgh abefg"
>>> set(a)
set(['a', ' ', 'c', 'b', 'e', 'd', 'g', 'f', 'h'])
>>> ''.join(set(a))
'a cbedgfh'
>>> ''.join(set(a)-set(' '))
'acbedgfh'
If you want to check every possible combination of strings to find the shortest combination which covers a set of characters, there are two basic approaches:
Generating every combination of strings, and for each one, checking whether it covers the whole character set.
For each character in the set, making a list of strings it appears in, and then combining those lists to find combinations of strings which cover the character set.
(If the number of characters or strings is too big to check all combinations in reasonable time, you'll have to use an approximation algorithm, which will find a good-enough solution, but can't guarantee to find the optimal solution.)
The first approach generates N! combinations of strings (where N is the number of strings) so e.g. for 13 strings that is more than 2^32 combinations, and for 21 strings more than 2^64. For large numbers of strings, this may become too inefficient. On the other hand, the size of the character set doesn't have much impact on the efficiency of this approach.
The second approach generates N lists of indexes pointing to string (where N is the number of characters in the set), and each of these lists holds at most M indexes (where M is the number of strings). So there are potentially M^N combinations. However, the number of combinations that are actually considered is much lower; consider this example with 8 characters and 8 strings:
character set: abcdefg
strings: 0:pack, 1:my, 2:bag, 3:with, 4:five, 5:dozen, 6:beige, 7:eggs
string matches for each character:
a: [0,2]
b: [2,6]
c: [0]
d: [5]
e: [4,5,6,7]
f: [4]
g: [2,6,7]
optimal combinations (size 4):
[0,2,4,5] = ["pack,"bag","five","dozen"]
[0,4,5,6] = ["pack,"five","dozen","beige"]
Potentially there are 2x2x1x1x4x1x3 = 48 combinations. However, if string 0 is selected for character "a", that also covers character "c"; if string 2 is selected for character "a", that also covers characters "b" and "g". In fact, only three combinations are ever considered: [0,2,5,4], [0,6,5,4] and [2,0,5,4].
If the number of strings is much greater than the number of characters, approach 2 is the better choice.
code example 1
This is a simple algorithm which uses recursion to try all possible combinations of strings to find the combinations which contain all characters.
Run the code snippet to see the algorithm find solutions for 12 strings and the whole alphabet (see console for output).
// FIND COMBINATIONS OF STRINGS WHICH COVER THE CHARACTER SET
function charCover(chars, strings, used) {
used = used || [];
// ITERATE THROUGH THE LIST OF STRINGS
for (var i = 0; i < strings.length; i++) {
// MAKE A COPY OF THE CHARS AND DELETE THOSE WHICH OCCUR IN THE CURRENT STRING
var c = chars.replace(new RegExp("[" + strings[i] + "]","g"), "");
// MAKE A COPY OF THE STRINGS AFTER THE CURRENT STRING
var s = strings.slice(i + 1);
// ADD THE CURRENT STRING TO THE LIST OF USED STRINGS
var u = used.concat([strings[i]]);
// IF NO CHARACTERS ARE LEFT, PRINT THE LIST OF USED STRINGS
if (c.length == 0) console.log(u.length + " strings:\t" + u)
// IF CHARACTERS AND STRINGS ARE LEFT, RECURSE WITH THE REST
else if (s.length > 0) charCover(c, s, u);
}
}
var strings = ["the","quick","brown","cow","fox","jumps","over","my","lazy","cats","dogs","unicorns"];
var chars = "abcdefghijklmnopqrstuvwxyz";
charCover(chars, strings);
You can prune some unnecessary paths by adding this line after the characters are removed with replace():
// IF NO CHARS WERE DELETED, THIS STRING IS UNNECESSARY
if (c.length == chars.length) continue;
code example 2
This is an algorithm which firsts creates a list of matching strings for every character, and then uses recursion to combine the lists to find combinations of strings that cover the character set.
Run the code snippet to see the algorithm find solutions for 24 strings and 12 characters (see console for output).
// FIND COMBINATIONS OF STRINGS WHICH COVER THE CHARACTER SET
function charCover(chars, strings) {
// CREAT LIST OF STRINGS MATCHING EACH CHARACTER
var matches = [], min = strings.length, output = [];
for (var i = 0; i < chars.length; i++) {
matches[i] = [];
for (var j = 0; j < strings.length; j++) {
if (strings[j].indexOf(chars.charAt(i)) > -1) {
matches[i].push(j);
}
}
}
combine(matches);
return output;
// RECURSIVE FUNCTION TO COMBINE MATCHES
function combine(matches, used) {
var m = []; used = used || [];
// COPY ONLY MATCHES FOR CHARACTERS NOT ALREADY COVERED
for (var i = 0; i < matches.length; i++) {
for (var j = 0, skip = false; j < matches[i].length; j++) {
if (used.indexOf(matches[i][j]) > -1) {
skip = true;
break;
}
}
if (! skip) m.push(matches[i].slice());
}
// IF ALL CHARACTERS ARE COVERED, STORE COMBINATION
if (m.length == 0) {
// IF COMBINATION IS SHORTER THAN MINIMUM, DELETE PREVIOUSLY STORED COMBINATIONS
if (used.length < min) {
min = used.length;
output = [];
}
// CONVERT INDEXES TO STRINGS AND STORE COMBINATION
var u = [];
for (var i = 0; i < used.length; i++) {
u.push(strings[used[i]]);
}
output.push(u);
}
// RECURSE IF CURRENT MINIMUM NUMBER OF STRINGS HAS NOT BEEN REACHED
else if (used.length < min) {
// ITERATE OVER STRINGS MATCHING NEXT CHARACTER AND RECURSE
for (var i = 0; i < m[0].length; i++) {
combine(m, used.concat([m[0][i]]));
}
}
}
}
var strings = ["the","quick","brown","fox","jumps","over","lazy","dogs","pack","my","bag","with","five","dozen","liquor","jugs","jaws","love","sphynx","of","black","quartz","this","should","do"];
var chars = "abcdefghijkl";
var result = charCover(chars, strings);
for (var i in result) console.log(result[i]);
This algorithm can be further optimised to avoid finding duplicate combinations with the same strings in different order. Sorting the matches by size before combining them may also improve efficiency.
Thanks everyone for the response,
Finally completed it, have given the algorithm below in simple words as a refernce for others
Sub optimize_strings()
Capture list of strings in an array variable & number of strings in an integer
Initialize array of optimized strings as empty & pointer to it as zero
Get the list of all characters in an array & number of characters in a variable
Do While number of characters>0
Reset the frequency of all characters as zero & then calculate the frequency of all characters in uncovered strings in separate array
Reset the number of uncovered characters for each strings as zero & then calculate the number of uncovered characters in each strings in separate array
Sort the characters in characters array in ascending order based on their characters frequency array
Fetch list of strings that contains the character present in the top of the character array & place them in filtered strings array
Bubble sort filtered strings array in descending order based on the number of uncovered characters which was stored in step 2 of this loop
Store the Top of the filtered strings array in optimized strings array & increase its pointer to 1
Iterate through all the characters in the optimized string & remove all the characters present in it from characters array
Loop
Print the result of optimized strings present in optimized strings array
End Sub

Is it possible to concatenate a string with series of number?

I have a string (eg. 'STA') and I want to make a cell array that will be a concatenation of my sting with a numbers from 1 to X.
I want the code to do something like the fore loop here below:
for i = 1:Num
a = [{a} {strcat('STA',num2str(i))}]
end
I want the end results to be in the form of {<1xNum cell>}
a = 'STA1' 'STA2' 'STA3' ...
(I want to set this to a uitable in the ColumnFormat array)
ColumnFormat = {{a},... % 1
'numeric',... % 2
'numeric'}; % 3
I'm not sure about starting with STA1, but this should get you a list that starts with STA (from which I guess you could remove the first entry).
N = 5;
[X{1:N+1}] = deal('STA');
a = genvarname(X);
a = a(2:end);
You can do it with combination of NUM2STR (converts numbers to strings), CELLSTR (converts strings to cell array), STRTRIM (removes extra spaces)and STRCAT (combines with another string) functions.
You need (:) to make sure the numeric vector is column.
x = 1:Num;
a = strcat( 'STA', strtrim( cellstr( num2str(x(:)) ) ) );
As an alternative for matrix with more dimensions I have this helper function:
function c = num2cellstr(xx, varargin)
%Converts matrix of numeric data to cell array of strings
c = cellfun(#(x) num2str(x,varargin{:}), num2cell(xx), 'UniformOutput', false);
Try this:
N = 10;
a = cell(1,N);
for i = 1:N
a(i) = {['STA',num2str(i)]};
end

MATLAB empty cell(n,m) array of strings?

What is the quickest way to create an empty cell array of strings ?
cell(n,m)
creates an empty cell array of double.
How about a similar command but creating empty strings ?
Depends on what you want to achieve really. I guess the simplest method would be:
repmat({''},n,m);
Assignment to all cell elements using the colon operator will do the job:
m = 3; n = 5;
C = cell(m,n);
C(:) = {''}
The cell array created by cell(n,m) contains empty matrices, not doubles.
If you really need to pre populate your cell array with empty strings
test = cell(n,m);
test(:) = {''};
test(1,:) = {'1st row'};
test(:,1) = {'1st col'};
This is a super old post but I'd like to add an approach that might be working. I am not sure if it's working in an earlier version of MATLAB. I tried in 2018+ versions and it works.
Instead of using remat, it seems even more convenient and intuitive to start a cell string array like this:
C(1:10) = {''} % Array of empty char
And the same approach can be used to generate cell array with other data types
C(1:10) = {""} % Array of empty string
C(1:10) = {[]} % Array of empty double, same as cell(1,10)
But be careful with scalers
C(1:10) = {1} % an 1x10 cell with all values = {[1]}
C(1:10) = 1 % !!!Error
C(1:10) = '1' % !!!Error
C(1:10) = [] % an 1x0 empty cell array

Resources