Find strings from an cell array and create a new cell array - string

I want to find the strings from an cell array (m x n) and add those identified strings in new cell array (m x n), by using matlab, for example:
Human(i,1)={0
1
34
eyes_two
55
33
ears_two
nose_one
mouth_one
631
49
Tounge_one}
I want to remove the numbers and have just strings
New_Human(i,1)={eyes_two
ears_two
nose_one
mouth_one
tounge_one}

Based on your comment it sounds like all your data is being stored as strings. In that case you can use the following method to remove all strings which represent a valid number.
H = {'0'; '1'; '34'; 'eyes_two'; '55'; '33'; 'ears_two'; 'nose_one'; 'mouth_one'; '631'; '49'; 'Tounge_one'};
idx = cellfun(#(x)isnan(str2double(x)), H);
Hstr = H(idx)
Output
Hstr =
'eyes_two'
'ears_two'
'nose_one'
'mouth_one'
'Tounge_one'
The code determines which strings do not represent valid numeric values. This is accomplished by checking if the str2double function returns a NaN result on each string. If you want to understand more about how this works I suggest you read the documentation on cellfun.

Related

Finding indexes of strings in a string array in Matlab

I have two string arrays and I want to find where each string from the first array is in the second array, so i tried this:
for i = 1:length(array1);
cmp(i) = strfind(array2,array1(i,:));
end
This doesn't seem to work and I get an error: "must be one row".
Just for the sake of completeness, an array of strings is nothing but a char matrix. This can be quite restrictive because all of your strings must have the same number of elements. And that's what #neerad29 solution is all about.
However, instead of an array of strings you might want to consider a cell array of strings, in which every string can be arbitrarily long. I will report the very same #neerad29 solution, but with cell arrays. The code will also look a little bit smarter:
a = {'abcd'; 'efgh'; 'ijkl'};
b = {'efgh'; 'abcd'; 'ijkl'};
pos=[];
for i=1:size(a,1)
AreStringFound=cellfun(#(x) strcmp(x,a(i,:)),b);
pos=[pos find(AreStringFound)];
end
But some additional words might be needed:
pos will contain the indices, 2 1 3 in our case, just like #neerad29 's solution
cellfun() is a function which applies a given function, the strcmp() in our case, to every cell of a given cell array. x will be the generic cell from array b which will be compared with a(i,:)
the cellfun() returns a boolean array (AreStringFound) with true in position j if a(i,:) is found in the j-th cell of b and the find() will indeed return the value of j, our proper index. This code is more robust and works also if a given string is found in more than one position in b.
strfind won't work, because it is used to find a string within another string, not within an array of strings. So, how about this:
a = ['abcd'; 'efgh'; 'ijkl'];
b = ['efgh'; 'abcd'; 'ijkl'];
cmp = zeros(1, size(a, 1));
for i = 1:size(a, 1)
for j = 1:size(b, 1)
if strcmp(a(i, :), b(j, :))
cmp(i) = j;
break;
end
end
end
cmp =
2 1 3

How to compute word scores in Scrabble using MATLAB

I have a homework program I have run into a problem with. We basically have to take a word (such as MATLAB) and have the function give us the correct score value for it using the rules of Scrabble. There are other things involved such as double word and double point values, but what I'm struggling with is converting to ASCII. I need to get my string into ASCII form and then sum up those values. We only know the bare basics of strings and our teacher is pretty useless. I've tried converting the string into numbers, but that's not exactly working out. Any suggestions?
function[score] = scrabble(word, letterPoints)
doubleword = '#';
doubleletter = '!';
doublew = [findstr(word, doubleword)]
trouble = [findstr(word, doubleletter)]
word = char(word)
gameplay = word;
ASCII = double(gameplay)
score = lower(sum(ASCII));
Building on Francis's post, what I would recommend you do is create a lookup array. You can certainly convert each character into its ASCII equivalent, but then what I would do is have an array where the input is the ASCII code of the character you want (with a bit of modification), and the output will be the point value of the character. Once you find this, you can sum over the points to get your final point score.
I'm going to leave out double points, double letters, blank tiles and that whole gamut of fun stuff in Scrabble for now in order to get what you want working. By consulting Wikipedia, this is the point distribution for each letter encountered in Scrabble.
1 point: A, E, I, O, N, R, T, L, S, U
2 points: D, G
3 points: B, C, M, P
4 points: F, H, V, W, Y
5 points: K
8 points: J, X
10 points: Q, Z
What we're going to do is convert your word into lower case to ensure consistency. Now, if you take a look at the letter a, this corresponds to ASCII code 97. You can verify that by using the double function we talked about earlier:
>> double('a')
97
As there are 26 letters in the alphabet, this means that going from a to z should go from 97 to 122. Because MATLAB starts indexing arrays at 1, what we can do is subtract each of our characters by 96 so that we'll be able to figure out the numerical position of these characters from 1 to 26.
Let's start by building our lookup table. First, I'm going to define a whole bunch of strings. Each string denotes the letters that are associated with each point in Scrabble:
string1point = 'aeionrtlsu';
string2point = 'dg';
string3point = 'bcmp';
string4point = 'fhvwy';
string5point = 'k';
string8point = 'jx';
string10point = 'qz';
Now, we can use each of the strings, convert to double, subtract by 96 then assign each of the corresponding locations to the points for each letter. Let's create our lookup table like so:
lookup = zeros(1,26);
lookup(double(string1point) - 96) = 1;
lookup(double(string2point) - 96) = 2;
lookup(double(string3point) - 96) = 3;
lookup(double(string4point) - 96) = 4;
lookup(double(string5point) - 96) = 5;
lookup(double(string8point) - 96) = 8;
lookup(double(string10point) - 96) = 10;
I first create an array of length 26 through the zeros function. I then figure out where each letter goes and assign to each letter their point values.
Now, the last thing you need to do is take a string, take the lower case to be sure, then convert each character into its ASCII equivalent, subtract by 96, then sum up the values. If we are given... say... MATLAB:
stringToConvert = 'MATLAB';
stringToConvert = lower(stringToConvert);
ASCII = double(stringToConvert) - 96;
value = sum(lookup(ASCII));
Lo and behold... we get:
value =
10
The last line of the above code is crucial. Basically, ASCII will contain a bunch of indexing locations where each number corresponds to the numerical position of where the letter occurs in the alphabet. We use these positions to look up what point / score each letter gives us, and we sum over all of these values.
Part #2
The next part where double point values and double words come to play can be found in my other StackOverflow post here:
Calculate Scrabble word scores for double letters and double words MATLAB
Convert from string to ASCII:
>> myString = 'hello, world';
>> ASCII = double(myString)
ASCII =
104 101 108 108 111 44 32 119 111 114 108 100
Sum up the values:
>> total = sum(ASCII)
total =
1160
The MATLAB help for char() says (emphasis added):
S = char(X) converts array X of nonnegative integer codes into a character array. Valid codes range from 0 to 65535, where codes 0 through 127 correspond to 7-bit ASCII characters. The characters that MATLABĀ® can process (other than 7-bit ASCII characters) depend upon your current locale setting. To convert characters into a numeric array, use the double function.
ASCII chart here.

Matlab. Find the indices of a cell array of strings with characters all contained in a given string (without repetition)

I have one string and a cell array of strings.
str = 'actaz';
dic = {'aaccttzz', 'ac', 'zt', 'ctu', 'bdu', 'zac', 'zaz', 'aac'};
I want to obtain:
idx = [2, 3, 6, 8];
I have written a very long code that:
finds the elements with length not greater than length(str);
removes the elements with characters not included in str;
finally, for each remaining element, checks the characters one by one
Essentially, it's an almost brute force code and runs very slowly. I wonder if there is a simple way to do it fast.
NB: I have just edited the question to make clear that characters can be repeated n times if they appear n times in str. Thanks Shai for pointing it out.
You can sort the strings and then match them using regular expression. For your example the pattern will be ^a{0,2}c{0,1}t{0,1}z{0,1}$:
u = unique(str);
t = ['^' sprintf('%c{0,%d}', [u; histc(str,u)]) '$'];
s = cellfun(#sort, dic, 'uni', 0);
idx = find(~cellfun('isempty', regexp(s, t)));
I came up with this :
>> g=#(x,y) sum(x==y) <= sum(str==y);
>> h=#(t)sum(arrayfun(#(x)g(t,x),t))==length(t);
>> f=cellfun(#(x)h(x),dic);
>> find(f)
ans =
2 3 6
g & h: check if number of count of each letter in search string <= number of count in str.
f : finally use g and h for each element in dic

Matlab fints doesn't like a string value I pass as an argument

I have a program that takes the columns of a fints-object, multiplies them together pairwise in all combinations and output the result in a new fints object. I have the code for the data, but I also want the series labels to carry through so that the product of column a and b has label a*b.
function tsB = MulTS(tsA)
anames = fieldnames(tsA,1)';
A = fts2mat(tsA);
[i,j] = meshgrid(1:size(A,2),1:size(A,2));
B = Mul(A(:,i(:)),A(:,j(:)));
q = [anames(:,i(:)); anames(:,j(:))];
bnames = strcat(q(1,:),'*', q(2,:));
tsB=fints(tsA.dates, B, bnames);
end
I get warnings when I run it.
tsA= fints([1 2 3]', [[1 1 1]' [2 2 2]'],{'a','b'}');
MulTS(tsA)
??? Error using ==> fints.fints at 188
Illegal name(s) detected. Please check the name(s).
Error in ==> MulTS at 10
tsB=fints(tsA.dates, B, bnames);"
It seems Matlab doesn't like the format of bnames. I've tried googling stuff like "convert cell array to string matlab" and trying things like b = {bnames}. What am I doing wrong?
Your datanames (bnames in MulTS) seems to contain a "*" character, which is illegal according to fints documentation:
datanames
Cell array of data series names. Overrides the default data series names. Default data series names are series1, series2, and so on.
Note: Not all strings are accepted as datanames parameters. Supported data series names cannot start with a number and must contain only these characters:
Lowercase Latin alphabet, a to z
Uppercase Latin alphabet, A to Z
Underscore, _
Try replacing the "*" with "_" or something else.

Cell to Char in MATLAB doesn't work

I have used this code to read data from a plaintext file:
[race sex age namef] = textread('Fusion.txt', '%s %s %d %s');
I convert race from cell to char using: race = char(race); to do a string comparison (if(strcmp(race(k),'W')==1)) and it works as expected. I also need to namef to char but when I do, MATLAB returns 0 for every element of namef.
Here is a sample of my file:
W M 50 00001_930831_fb_a.ppm
W M 30 00002_930831_fa.ppm
W M 30 00002_930831_fb.ppm
W M 30 00002_931230_fa.ppm
W M 30 00002_931230_fb.ppm
W M 31 00002_940128_fa.ppm
W M 31 00002_940128_fb.ppm
Why is this happening?
From your question it is not clear whether conversion to char is necessary later on. For what you want to do, it is OK to compare to the individual elements of the cells race or namef:
strcmp(race{k}, 'W')
strcmp(named{k}, '00002_930831_fa.ppm')
Since strcmp operates on cell arrays of strings as well, you can also do things such as strcmp(race, 'W').
Since what you're doing should work fine, you're probably missing one thing: the last column in your file has multiple characters, so you need to access the whole row of the resulting string matrix, rather than a single element:
race = char(race); %// cell to character array of size [N,1]
namef = char(namef); %// cell to character array of size [N,M], padding added
for k=1:size(race,1)
condition_col1 = strcmp(race(k),'W')==1;
condition_col4 = strcmp(strtrim(namef(k,:)),'00002_930831_fa.ppm');
%// ... code goes here
end
If you use namef(k), you'll get the first character of each row, i.e. '0'. So namef(k,:) is my main point.
Also note that I added strtrim to the condition: turning to a character array will pad the fields to the length of the longest element (since matrices have to be rectangular).

Resources