How to assign multiple lines to a string variable in Matlab - string

I have a few lines of text like this:
abc
def
ghi
and I want to assign these multiple lines to a Matlab variable for further processing.
I am copying these from very large text file and want to process it in Matlab Instead of saving the text into a file and then reading line by line for processing.
I tried to handle the above text lines as single string but am getting an error whilst trying to assign to a variable:
x = 'abc
def
ghi'
Error:
x = 'abc
|
Error: String is not terminated properly.
Any suggestions which could help me understand and solve the issue will be highly appreciated.

I frequently do this, namely copy text from elsewhere which I want to hard-code into a MATLAB script (in my case it's generally SQL code I want to manipulate and call from MATLAB).
To achieve this I have a helper function in clipboard2cellstr.m defined as follows:
function clipboard2cellstr
str = clipboard('paste');
str = regexprep(str, '''', ''''''); % Double any single quotes
strs = regexp(str, '\s*\r?\n\r?', 'split');
cs = sprintf('{\n''%s''\n}', strjoin(strs, sprintf('''\n''')));
clipboard('copy', cs);
disp(cs)
disp('(Copied to Clipboard)')
end
I then copy the text using Ctrl-c (or however) and run clipboard2cellstr. This changes the contents of the clipboard to something I can paste into the MATLAB editor using Ctrl-v (or however).
For example, copying this line
and this line
and this one, and then running the function generates this:
{
'For example, copying this line'
'and this line'
'and this one, and then running the function generates this:'
}
which is valid MATLAB which can be pasted directly in.

Your error is because you ended the line when MATLAB was expecting a closing quote character. You must use array notation to have multi-line or multi-element arrays.
You can assign like this if you use array notation
x = ['abc'
'def'
'hij']
>> x = 3×3 char array
Note: with this method, your rows must have the same number of characters, as you are really dealing with a character array. You can think of a character array like a numeric matrix, hence why it must be "rectangular".
If you have MATLAB R2016b or newer, you can use the string data type. This uses double quotes "..." rather than single quotes '...', and can be multi-line. You must still use array notation:
x = ["abc"
"def"
"hijk"]
>> x = 3×1 string array
We can have different numbers of characters in each line, as this is simply a 3 element string array, not a character array.
Alternatively, use a cell array of character arrays (or strings)
x = {'abc'
'def'
'hijk'}
>> x = 3×1 cell array
Again, you can have character arrays or strings of different lengths within a cell array.
In all of the above examples, a newline is simply for readability and can be replaced by a semi-colon ; to denote the next line of the array.
The option you choose will depend on what you want to do with the text. If you're reading from a file, I would suggest the string array or the cell array, as they can deal with different length lines. For backwards compatibility, use a cell array. You may find cellfun relevant for operating on cell arrays. For native string operations, use a string array.

Related

Reading a comma-separated string (not text file) in Matlab

I want to read a string in Matlab (not an external text file) which has numerical values separated by commas, such as
a = {'1,2,3'}
and I'd like to store it in a vector as numbers. Is there any function which does that? I only find processes and functions used to do that with text files.
I think you're looking for sscanf
A = sscanf(str,formatSpec) reads data from str, converts it according
to the format specified by formatSpec, and returns the results in an
array. str is either a character array or a string scalar.
You can try the str2num function:
vec = str2num('1,2,3')
If you have to use the cell a, per your example, it would be: vec=str2num(a{1})
There are some security warnings in the documentation to consider so be cognizant of how your code is being employed.
Another, more flexible, option is textscan. It can handle strings as well as file handles.
Here's an example:
cellResult = textscan('1,2,3', '%f','delimiter',',');
vec = cellResult{1};
I will use the eval function to "evaluate" the vector. If that is the structure, I will also use the cell2mat to get the '1,2,3' text (this can be approached by other methods too.
% Generate the variable "a" that contains the "vector"
a = {'1,2,3'};
% Generate the vector using the eval function
myVector = eval(['[' cell2mat(a) ']']);
Let me know if this solution works for you

Text to array conversion

How do you convert a text file which has commas into an array delimited by those commas? For example, I use
var = importdata(filename)
disp(var)
This obviously displays the contents which are
'please, help, me'
How do I then get var to be an array such that I could extract a single word using something similar to var(2)?
Use strsplit (requires ≥ R2013a) or split (requires ≥ R2016b) to split the character array var{1} into a cell array that contains those words at its separate cells.
v = strsplit(var{1},', '); %or v = split(var{1},' ,');
Now v{1}, v{2} and v{3} give 'please', 'help' and 'me' respectively.
var{1} is used since you must be returned a cell array var from importdata. If var was not a cell array but a character array then you would not be getting single quotes in the output of disp.

Removing first character from a string in octave

I wanted to know how to remove first character of a string in octave. I am manipulating the string in a loop and after every loop, I want to remove the first character of the remaining string.
Thanks in advance.
If it's just a one-line string then:
short_string = long_string(2:end)
But if you have a cell array of strings then either do it as above if you have a loop already, otherwise you can use this shorthand to do it in one line:
short_strings = cellfun(#(x)(x(2:end)), long_strings, 'uni', false)
Or else if you have a matrix of strings (i.e. all the same length), then you can vectorize it as:
short_strings = long_strings(:, 2:end)

Replace multiple substrings using strrep in Matlab

I have a big string (around 25M characters) where I need to replace multiple substrings of a specific pattern in it.
Frame 1
0,0,0,0,0,1,2,34,0
0,1,2,3,34,12,3,4,0
...........
Frame 2
0,0,0,0,0,1,2,34,0
0,1,2,3,34,12,3,4,0
...........
Frame 7670
0,0,0,0,0,1,2,34,0
0,1,2,3,34,12,3,4,0
...........
The substring I need to remove is the 'Frame #' and it occurs around 7670 times. I can give multiple search strings in strrep, using a cell array
strrep(text,{'Frame 1','Frame 2',..,'Frame 7670'},';')
However that returns a cell array, where in each cell, I have the original string with the corresponding substring of one of my input cell changed.
Is there a way to replace multiple substrings from a string, other than using regexprep? I noticed that it is considerably slower than strrep, that's why I am trying to avoid it.
With regexprep it would be:
regexprep(text,'Frame \d*',';')
and for a string of 25MB it takes around 47 seconds to replace all the instances.
EDIT 1: added the equivalent regexprep command
EDIT 2: added size of the string for reference, number of occurences for the substring and timing of execution for the regexprep
Ok, in the end I found a way to go around the problem. Instead of using regexprep to change the substring, I remove the 'Frame ' substring (including whitespace, but not the number)
rawData = strrep(text,'Frame ','');
This results in something like this:
1
0,0,0,0,0,1,2,34,0
0,1,2,3,34,12,3,4,0
...........
2
0,0,0,0,0,1,2,34,0
0,1,2,3,34,12,3,4,0
...........
7670
0,0,0,0,0,1,2,34,0
0,1,2,3,34,12,3,4,0
...........
Then, I change all the commas (,) and newline characters (\n) into a semicolon (;), using again strrep, and I create a big vector with all the numbers
rawData = strrep(rawData,sprintf('\r\n'),';');
rawData = strrep(rawData,';;',';');
rawData = strrep(rawData,';;',';');
rawData = strrep(rawData,',',';');
rawData = textscan(rawData,'%f','Delimiter',';');
then I remove the unnecessary numbers (1,2,...,7670), since they are located at a specific point in the array (each frame contains a specific amount of numbers).
rawData{1}(firstInstance:spacing:lastInstance)=[];
And then I go on with my manipulations. It seems that the additional strrep and removal of the values from the array is much much faster than the equivalent regexprep. With a string of 25M chars with regexprep I can do the whole operation in about 47", while with this workaround it takes only 5"!
Hope this helps somehow.
I think that this can be done using only textscan, which is known to be very fast. Be specifying a 'CommentStyle' the 'Frame #' lines are stripped out. This may only work because these 'Frame #' lines are on their own lines. This code returns the raw data as one big vector:
s = textscan(text,'%f','CommentStyle','Frame','Delimiter',',');
s = s{:}
You may want to know how many elements are in each frame or even reshape the data into a matrix. You can use textscan again (or before the above) to get just the data for the first frame:
f1 = textscan(text,'%f','CommentStyle','Frame 1','Delimiter',',');
f1 = s{:}
In fact, if you just want the elements from the first line, you can use this:
l1 = textscan(text,'%f,','CommentStyle','Frame 1')
l1 = l1{:}
However, the other nice thing about textscan is that you can use it to read in the file directly (it looks like you may be using some other means currently) using just fopen to get an FID. Thus the string data text doesn't have to be in memory.
Using regular expressions:
result = regexprep(text,'Frame [0-9]+','');
It's possible to avoid regular expressions as follows. I use strrep with suitable replacement strings that act as masks. The obtained strings are equal-length and are assured to be aligned, and can thus be combined into the final result using the masks. I've also included the ; you want. I don't know if it will be faster than regexprep or not, but it's definitely more fun :-)
% Data
text = 'Hello Frame 1 test string Frame 22 end of Frame 2 this'; %//example text
rep_orig = {'Frame 1','Frame 2','Frame 22'}; %//strings to be replaced.
%//May be of different lengths
% Computations
rep_dest = cellfun(#(s) char(zeros(1,length(s))), rep_orig, 'uni', false);
%//series of char(0) of same length as strings to be replaced (to be used as mask)
aux = cell2mat(strrep(text,rep_orig.',rep_dest.'));
ind_keep = all(double(aux)); %//keep characters according to mask
ind_semicolon = diff(ind_keep)==1; %//where to insert ';'
ind_keep = ind_keep | [ind_semicolon 0]; %// semicolons will also be kept
result = aux(1,:); %//for now
result(ind_semicolon) = ';'; %//include `;`
result = result(ind_keep); %//remove unwanted characters
With these example data:
>> text
text =
Hello Frame 1 test string Frame 22 end of Frame 2 this
>> result
result =
Hello ; test string ; end of ; this

Multiline string literal in Matlab?

Is there a multiline string literal syntax in Matlab or is it necessary to concatenate multiple lines?
I found the verbatim package, but it only works in an m-file or function and not interactively within editor cells.
EDIT: I am particularly after readbility and ease of modifying the literal in the code (imagine it contains indented blocks of different levels) - it is easy to make multiline strings, but I am looking for the most convenient sytax for doing that.
So far I have
t = {...
'abc'...
'def'};
t = cellfun(#(x) [x sprintf('\n')],t,'Unif',false);
t = horzcat(t{:});
which gives size(t) = 1 8, but is obviously a bit of a mess.
EDIT 2: Basically verbatim does what I want except it doesn't work in Editor cells, but maybe my best bet is to update it so it does. I think it should be possible to get current open file and cursor position from the java interface to the Editor. The problem would be if there were multiple verbatim calls in the same cell how would you distinguish between them.
I'd go for:
multiline = sprintf([ ...
'Line 1\n'...
'Line 2\n'...
]);
Matlab is an oddball in that escape processing in strings is a function of the printf family of functions instead of the string literal syntax. And no multiline literals. Oh well.
I've ended up doing two things. First, make CR() and LF() functions that just return processed \r and \n respectively, so you can use them as pseudo-literals in your code. I prefer doing this way rather than sending entire strings through sprintf(), because there might be other backslashes in there you didn't want processed as escape sequences (e.g. if some of your strings came from function arguments or input read from elsewhere).
function out = CR()
out = char(13); % # sprintf('\r')
function out = LF()
out = char(10); % # sprintf('\n');
Second, make a join(glue, strs) function that works like Perl's join or the cellfun/horzcat code in your example, but without the final trailing separator.
function out = join(glue, strs)
strs = strs(:)';
strs(2,:) = {glue};
strs = strs(:)';
strs(end) = [];
out = cat(2, strs{:});
And then use it with cell literals like you do.
str = join(LF, {
'abc'
'defghi'
'jklm'
});
You don't need the "..." ellipses in cell literals like this; omitting them does a vertical vector construction, and it's fine if the rows have different lengths of char strings because they're each getting stuck inside a cell. That alone should save you some typing.
Bit of an old thread but I got this
multiline = join([
"Line 1"
"Line 2"
], newline)
I think if makes things pretty easy but obviously it depends on what one is looking for :)

Resources