how to extract a substring in a text file, when the substring is between two parentheses? - string

I have a text file that contains sections as shown below
V1('ww', '6deg')
V2('bb', '15meter')
V3('cc','25yards')
.
.
V4('dd', '72cm')
these sections are randomly distributed inside the text file.
Using MATLAB, I need to find all the occurrences of VariableProp(VarName, VarValue) in the file, and change the VarValue.
Any ideas?
Thank you

You can do this with textscan. (You could also probably do it with regexp). Here's a textscan approach:
str = "V4('dd', '72cm')"; % a line from the file
% Call textscan on a single line of text
x = textscan(str, "%[^(](%[^']%[^'])", ...
MultipleDelimsAsOne=true, Delimiter=[","," ", "'"]);
% x is a 3-element cell array. If we got a match, each element in the
% outer cell is a scalar. Use vertcat to unwrap a layer of cell-ness:
x = vertcat(x{:});
% If we're left with 3 elements, it was a match
isMatch = numel(x) == 3;

Related

Matlab: Find string pattern with a list of words and replace in text with one word of the list

In Matlab, Consider the string:
str = 'text text text [[word1,word2,word3]] text text'
I want to isolate randomly one word of the list ('word1','word2','word3'), say 'word2', and then write, in a possibly new file, the string:
strnew = 'text text text word2 text text'
My approach is as follows (certainly pretty bad):
Isolating the string '[[word1,word2,word3]]' can be achieved via
str2=regexp(str,'\[\[(.*?)\]\]','match')
Removing the opening and closing square brackets in the string is achieved via
str3=str2(3:end-2)
Finally we can split str3 into a list of words (stored in a cell)
ListOfWords = split(str3,',')
which outputs {'word1'}{'word2'}{'word3'} and I am stuck there. How can I pick one of the entries and plug it back into the initial string (or a copy of it...)? Note that the delimiters [[ and ]] could both be changed to || if it can help.
You can do it as follows:
Use regexp with the 'split' option;
Split the middle part into words;
Select a random word;
Concatenate back.
str = 'text text text [[word1,word2,word3]] text text'; % input
str_split = regexp(str, '\[\[|\]\]', 'split'); % step 1
list_of_words = split(str_split{2}, ','); % step 2
chosen_word = list_of_words{randi(numel(list_of_words))}; % step 3
strnew = [str_split{1} chosen_word str_split{3}]; % step 4
I have a horrible solution. I was trying to see if I could do it in one function call. You can... but at what cost! Abusing dynamic regular expressions like this barely counts as one function call.
I use a dynamic expression to process the comma separated list. The tricky part is selecting a random element. This is made exceedingly difficult because MATLAB's syntax doesn't support paren indexing off the result of a function call. To get around this, I stick it in a struct so I can dot index. This is terrible.
>> regexprep(str,'\[\[(.*)\]\]',"${struct('tmp',split(string($1),',')).tmp(randi(count($1,',')+1))}")
ans =
'text text text word3 text text'
Luis definitely has the best answer, but I think it could be simplified a smidge by not using regular expressions.
str = 'text text text [[word1,word2,word3]] text text'; % input
tmp = extractBetween(str,"[[","]]"); % step 1
tmp = split(tmp, ','); % step 2
chosen_word = tmp(randi(numel(tmp))) ; % step 3
strnew = replaceBetween(str,"[[","]]",chosen_word,"Boundaries","Inclusive") % step 4

how to find two different strings in the same line in matlab

I have a cell obtained from text scan and I want to find the index of lines containing particular string,
fid = fopen('data.txt');
E = textscan(fid, '%s', 'Delimiter', '\n');
and I wanted to know the line numbers (index) of those lines which have a specific text, e.g. I wanted to find the rows that have the keyword "2016":
rows = find(contains(E{1},"2016" );
but I want to find the index of those lines which have two keywords "2016" and "Mathew Perry" (only those lines which have both the keywords).
I tried using this code but does not work
rows = find(contains(E{1},"2016" && contains(E{1},"Mathew Perry");
the error I get is:
Operands to the || and && operators must be convertible to logical scalar values.
To find a single string:
idx = strfind(E{1}, '2016');
idx = find(not(cellfun('isempty', idx)));
Use strfind instead of find. YOu may try the above with and/or. If it works fine, then no problem; if not, get the indices separately for each word and get the intersection of the indices.

Matlab substring

I am trying to get average a specific value in a long string and cannot figure out how to pull a value out of the middle of the string. I would like to pull out the 27 from this and the other strings and add them
2015-10-1,33,27,20,29,24,20,96,85,70,30.51,30.40,30.13,10,9,4,10,6,,T,5,Snow,35
2015-10-1,33,27,20,29,24,20,96,85,70,30.51,30.40,30.13,10,9,4,10,6,,T,5,Snow,35
2015-10-2,37,32,27,32,27,23,92,80,67,30.35,30.31,30.28,10,10,7,7,4,,T,8,Rain-Snow,19
2015-10-3,39,36,32,35,33,29,100,90,79,30.30,30.17,30.11,10,7,0,8,3,,0.21,8,Fog-Rain,13
2015-10-4,40,37,34,38,36,33,100,96,92,30.23,30.19,30.14,2,1,0,6,0,,0.13,8,Fog-Rain,27
2015-10-5,46,38,30,38,34,30,100,91,61,30.19,30.08,29.93,10,7,0,6,2,,T,6,Fog-Rain,23
fid = fopen('MonthlyHistory.html');
for i=1:2
str = fgets(fid);
c = strsplit(str,',');
mean=mean+c;
end
fprintf('Average Daily Temperature: %d\n',mean);
Method 1: use readtable
I'm guessing this is pulled from weather underground? Take your csv file and make sure it is saved with a .csv ending. Then what I would do is:
my_data = readtable('MonthlyHistory.csv');
This reads the whole file into the highly convenient table variable type. Then you can do:
average_daily_temp = my_data.MeanTemperatureF; %or whatever it is called in the table
I find tables are a super convenient way to keep track of tabular data. (plus readtable is pretty good).
Method 2: continue your approach...
fid = fopen('mh2.csv');
str = fgets(fid); % May need to read off a few lines to get to the
str = fgets(fid); % numbers
my_data = []; %initialize an empty array
while(true)
str = fgets(fid); % read off a line
if(str == -1) % if str is -1, which signifies end of file
break; %exit loop
end
ca = strsplit(str,','); % split string into a cell array of strings
my_data(end+1,:) = str2num(ca{3}); % convert the 3rd element to a number and store it
end
fclose(fid);
Now my_data is an array holding the 3rd element of each line.
You can use textscan, you might be able to simplfy your code using this as well, but for a single string, it works like this:
S='2015-10-1,33,27,20,29,24,20,96,85,70,30.51,30.40,30.13,10,9,4,10,6,,T,5,Snow,35'
T=textscan(S,'%s','Delimiter',',')
str2double(T{1}{3}) %// the value we want is the 3rd field

sort string according to first characters matlab

I have an cell array composed by several strings
names = {'2name_19surn', '3name_2surn', '1name_2surn', '10name_1surn'}
and I would like to sort them according to the prefixnumber.
I tried
[~,index] = sortrows(names.');
sorted_names = names(index);
but I get
sorted_names = {'10name_1surn', '1name_2surn', '2name_19surn', '3name_2surn'}
instead of the desired
sorted_names = {'1name_2surn', '2name_19surn', '3name_2surn','10name_1surn'}
any suggestion?
Simple approach using regular expressions:
r = regexp(names,'^\d+','match'); %// get prefixes
[~, ind] = sort(cellfun(#(c) str2num(c{1}), r)); %// convert to numbers and sort
sorted_names = names(ind); %// use index to build result
As long as speed is not a concern you can loop through all strings and save the first digets in an array. Subsequently sort the array as usual...
names = {'2name_2', '3name', '1name', '10name'}
number_in_string = zeros(1,length(names));
% Read numbers from the strings
for ii = 1:length(names)
number_in_string(ii) = sscanf(names{ii}, '%i');
end
% Sort names using number_in_string
[sorted, idx] = sort(number_in_string)
sorted_names = names(idx)
Take the file sort_nat from here
Then
names = {'2name', '3name', '1name', '10name'}
sort_nat(names)
returns
sorted_names = {'1name', '2name', '3name','10name'}
You can deal with arbitrary patterns using a regular expression:
names = {'2name', '3name', '1name', '10name'}
match = regexpi(names,'(?<number>\d+)\D+','names'); % created with regex editor on rubular.com
match = cell2mat(match); % cell array to struct array
clear numbersStr
[numbersStr{1:length(match)}] = match.number; % cell array with number strings
numbers = str2double(numbersStr); % vector of numbers
[B,I] = sort(numbers); % sorted vector of numbers (B) and the indices (I)
clear namesSorted
[namesSorted{1:length(names)}] = names{I} % cell array with sorted name strings

How do I read a delimited file with strings/numbers with Octave?

I am trying to read a text file containing digits and strings using Octave. The file format is something like this:
A B C
a 10 100
b 20 200
c 30 300
d 40 400
e 50 500
but the delimiter can be space, tab, comma or semicolon. The textread function works fine if the delimiter is space/tab:
[A,B,C] = textread ('test.dat','%s %d %d','headerlines',1)
However it does not work if delimiter is comma/semicolon. I tried to use dklmread:
dlmread ('test.dat',';',1,0)
but it does not work because the first column is a string.
Basically, with textread I can't specify the delimiter and with dlmread I can't specify the format of the first column. Not with the versions of these functions in Octave, at least. Has anybody ever had this problem before?
textread allows you to specify the delimiter-- it honors the property arguments of strread. The following code worked for me:
[A,B,C] = textread( 'test.dat', '%s %d %d' ,'delimiter' , ',' ,1 )
I couldn't find an easy way to do this in Octave currently. You could use fopen() to loop through the file and manually extract the data. I wrote a function that would do this on arbitrary data:
function varargout = coltextread(fname, delim)
% Initialize the variable output argument
varargout = cell(nargout, 1);
% Initialize elements of the cell array to nested cell arrays
% This syntax is due to {:} producing a comma-separated
[varargout{:}] = deal(cell());
fid = fopen(fname, 'r');
while true
% Get the current line
ln = fgetl(fid);
% Stop if EOF
if ln == -1
break;
endif
% Split the line string into components and parse numbers
elems = strsplit(ln, delim);
nums = str2double(elems);
nans = isnan(nums);
% Special case of all strings (header line)
if all(nans)
continue;
endif
% Find the indices of the NaNs
% (i.e. the indices of the strings in the original data)
idxnans = find(nans);
% Assign each corresponding element in the current line
% into the corresponding cell array of varargout
for i = 1:nargout
% Detect if the current index is a string or a num
if any(ismember(idxnans, i))
varargout{i}{end+1} = elems{i};
else
varargout{i}{end+1} = nums(i);
endif
endfor
endwhile
endfunction
It accepts two arguments: the file name, and the delimiter. The function is governed by the number of return variables that are specified, so, for example, [A B C] = coltextread('data.txt', ';'); will try to parse three different data elements from each row in the file, while A = coltextread('data.txt', ';'); will only parse the first elements. If no return variable is given, then the function won't return anything.
The function ignores rows that have all-strings (e.g. the 'A B C' header). Just remove the if all(nans)... section if you want everything.
By default, the 'columns' are returned as cell arrays, although the numbers within those arrays are actually converted numbers, not strings. If you know that a cell array contains only numbers, then you can easily convert it to a column vector with: cell2mat(A)'.

Resources