I have a string:
sen = '0.31431 0.64431 Using drugs is not cool Speaker2';
I am trying to write code that will generate:
cell = {'0.31431','0.64431', 'Using drugs is not cool', 'Speaker2'};
The problem is that I don't want to use the number of words in 'Using drugs is not cool' because these will change in other examples.
I tried:
output = sscanf(sen,'%s %s %c %Speaker%d');
But it doesn't work as desired.
If you know you will always have to remove the first two words and last word, collecting everything else together, then you can use strsplit and strjoin as follows:
sen = '0.31431 0.64431 Using drugs is not cool Speaker2';
words = strsplit(sen); % Split all words up
words = [words(1:2) {strjoin(words(3:end-1), ' ')} words(end)] % Join words 3 to end-1
words =
1×4 cell array
'0.31431' '0.64431' 'Using drugs is not cool' 'Speaker2'
You can use regexp, but it's a bit ugly:
>> str = '0.31431 0.64431 Using drugs is not cool Speaker2';
>> regexp(str,'(\d+\.\d+)\s(\d+\.\d+)\s(.*?)\s(Speaker\d+)','tokens')
ans =
1×1 cell array
{1×4 cell}
>> ans{:}
ans =
1×4 cell array
{'0.31431'} {'0.64431'} {'Using drugs is not cool'} {'Speaker2'}
Related
In Matlab, Consider the string:
str = 'text text text [[word1,word2,word3]] text text'
I want to isolate randomly one word of the list ('word1','word2','word3'), say 'word2', and then write, in a possibly new file, the string:
strnew = 'text text text word2 text text'
My approach is as follows (certainly pretty bad):
Isolating the string '[[word1,word2,word3]]' can be achieved via
str2=regexp(str,'\[\[(.*?)\]\]','match')
Removing the opening and closing square brackets in the string is achieved via
str3=str2(3:end-2)
Finally we can split str3 into a list of words (stored in a cell)
ListOfWords = split(str3,',')
which outputs {'word1'}{'word2'}{'word3'} and I am stuck there. How can I pick one of the entries and plug it back into the initial string (or a copy of it...)? Note that the delimiters [[ and ]] could both be changed to || if it can help.
You can do it as follows:
Use regexp with the 'split' option;
Split the middle part into words;
Select a random word;
Concatenate back.
str = 'text text text [[word1,word2,word3]] text text'; % input
str_split = regexp(str, '\[\[|\]\]', 'split'); % step 1
list_of_words = split(str_split{2}, ','); % step 2
chosen_word = list_of_words{randi(numel(list_of_words))}; % step 3
strnew = [str_split{1} chosen_word str_split{3}]; % step 4
I have a horrible solution. I was trying to see if I could do it in one function call. You can... but at what cost! Abusing dynamic regular expressions like this barely counts as one function call.
I use a dynamic expression to process the comma separated list. The tricky part is selecting a random element. This is made exceedingly difficult because MATLAB's syntax doesn't support paren indexing off the result of a function call. To get around this, I stick it in a struct so I can dot index. This is terrible.
>> regexprep(str,'\[\[(.*)\]\]',"${struct('tmp',split(string($1),',')).tmp(randi(count($1,',')+1))}")
ans =
'text text text word3 text text'
Luis definitely has the best answer, but I think it could be simplified a smidge by not using regular expressions.
str = 'text text text [[word1,word2,word3]] text text'; % input
tmp = extractBetween(str,"[[","]]"); % step 1
tmp = split(tmp, ','); % step 2
chosen_word = tmp(randi(numel(tmp))) ; % step 3
strnew = replaceBetween(str,"[[","]]",chosen_word,"Boundaries","Inclusive") % step 4
How do you compare to see if two words side by side in a string are the same. For instance if the string "Hello world, how how are you doing today" was imported how would I write code to say that the word "how" is repeated in that sentence. I know I would start with something like this but have no clue where to go after.
x=input("Please type a sentence.")
x.split()
Python is a great language to start, I suggest using a dictionary. You can split the string into words and then, count and store the occurrence of each word into a dictionary, then call the dictionary to see how many times a word is repeated.
mystring = "Hello world, how how are you doing today"
words = mystring.split()
mydict = {}
for word in words:
if word in mydict:
mydict[word] += 1
else:
mydict[word] = 1
print(mydict['how'])
Update
mystring = "Hello world, how how are you doing today"
words = mystring.split()
lastword = ""
for word in words:
if lastword.lower() == word.lower():
print("The word " + word + " is repeated")
break
lastword = word
I have an cell array composed by several strings
names = {'2name_19surn', '3name_2surn', '1name_2surn', '10name_1surn'}
and I would like to sort them according to the prefixnumber.
I tried
[~,index] = sortrows(names.');
sorted_names = names(index);
but I get
sorted_names = {'10name_1surn', '1name_2surn', '2name_19surn', '3name_2surn'}
instead of the desired
sorted_names = {'1name_2surn', '2name_19surn', '3name_2surn','10name_1surn'}
any suggestion?
Simple approach using regular expressions:
r = regexp(names,'^\d+','match'); %// get prefixes
[~, ind] = sort(cellfun(#(c) str2num(c{1}), r)); %// convert to numbers and sort
sorted_names = names(ind); %// use index to build result
As long as speed is not a concern you can loop through all strings and save the first digets in an array. Subsequently sort the array as usual...
names = {'2name_2', '3name', '1name', '10name'}
number_in_string = zeros(1,length(names));
% Read numbers from the strings
for ii = 1:length(names)
number_in_string(ii) = sscanf(names{ii}, '%i');
end
% Sort names using number_in_string
[sorted, idx] = sort(number_in_string)
sorted_names = names(idx)
Take the file sort_nat from here
Then
names = {'2name', '3name', '1name', '10name'}
sort_nat(names)
returns
sorted_names = {'1name', '2name', '3name','10name'}
You can deal with arbitrary patterns using a regular expression:
names = {'2name', '3name', '1name', '10name'}
match = regexpi(names,'(?<number>\d+)\D+','names'); % created with regex editor on rubular.com
match = cell2mat(match); % cell array to struct array
clear numbersStr
[numbersStr{1:length(match)}] = match.number; % cell array with number strings
numbers = str2double(numbersStr); % vector of numbers
[B,I] = sort(numbers); % sorted vector of numbers (B) and the indices (I)
clear namesSorted
[namesSorted{1:length(names)}] = names{I} % cell array with sorted name strings
What is the quickest way to create an empty cell array of strings ?
cell(n,m)
creates an empty cell array of double.
How about a similar command but creating empty strings ?
Depends on what you want to achieve really. I guess the simplest method would be:
repmat({''},n,m);
Assignment to all cell elements using the colon operator will do the job:
m = 3; n = 5;
C = cell(m,n);
C(:) = {''}
The cell array created by cell(n,m) contains empty matrices, not doubles.
If you really need to pre populate your cell array with empty strings
test = cell(n,m);
test(:) = {''};
test(1,:) = {'1st row'};
test(:,1) = {'1st col'};
This is a super old post but I'd like to add an approach that might be working. I am not sure if it's working in an earlier version of MATLAB. I tried in 2018+ versions and it works.
Instead of using remat, it seems even more convenient and intuitive to start a cell string array like this:
C(1:10) = {''} % Array of empty char
And the same approach can be used to generate cell array with other data types
C(1:10) = {""} % Array of empty string
C(1:10) = {[]} % Array of empty double, same as cell(1,10)
But be careful with scalers
C(1:10) = {1} % an 1x10 cell with all values = {[1]}
C(1:10) = 1 % !!!Error
C(1:10) = '1' % !!!Error
C(1:10) = [] % an 1x0 empty cell array
I want to concatenate (padding with spaces) the strings in a cell array {'a', 'b'} to give a single string 'a b'. How can I do this in MATLAB?
You can cheat a bit, by using the cell array as a set of argument to the sprintf function, then cleaning up the extra spaces with strtrim:
strs = {'a', 'b', 'c'};
strs_spaces = sprintf('%s ' ,strs{:});
trimmed = strtrim(strs_spaces);
Dirty, but I like it...
matlab have a function to do this,
ref:
strjoin
http://www.mathworks.com/help/matlab/ref/strjoin.html
strjoin
Join strings in cell array into single string
Syntax
str = strjoin(C) example
str = strjoin(C,delimiter)
Ex:
Join List of Words with Whitespace
Join individual strings in a cell array of strings, C, with a single space.
C = {'one','two','three'};
str = strjoin(C)
str =
one two three
Small improvement (?) on the answer by Alex
strs = {'a','b','c'};
strs_spaces = [strs{1} sprintf(' %s', strs{2:end})];
You can accomplish this using the function STRCAT to append blanks to all but the last cell of your cell array and then concatenate all the strings together:
>> strCell = {'a' 'b' 'c' 'd' 'e'};
>> nCells = numel(strCell);
>> strCell(1:nCells-1) = strcat(strCell(1:nCells-1),{' '});
>> fullString = [strCell{:}]
fullString =
a b c d e
Both join and strjoin are introduced in R2013a. However, the mathworks site about strjoin reads:
Starting in R2016b, the join function is recommended to join elements of a string array.
>> C = {'one','two','three'};
>> join(C) %same result as: >> join(C, ' ')
ans =
string
"one two three"
>> join(C, ', and-ah ')
ans =
string
"one, and-ah two, and-ah three"
Personally I like Alex' solution as well, as older versions of Matlab are abundant in research groups around the world.