How can I use a specified order of strings to index from a cell array? - string

I am trying to index from a cell aray of a number of potential reference files to use for a comparison. The comparison files have distinct parts of their file names that I'd like to use to specify a single reference file.
However, I'm only able to return reference files that contain the three distinct parts, in any order. How can I enforce the order?
Example:
The comparison file is:
deg_baseFileName = "Test1_female_44k_70dBA_babble7ch_1sp_20k_00dBA_48k"
I use strsplit to break the filename into parts:
deg_parts = strsplit(deg_baseFileName, "_");
The distinguishing parts are:
deg_parts(2), deg_parts(4), deg_parts(8)
In this case: "female", "70dBA", "00dBA" - in that order.
I use these functions to identify and index with the distinguishing parts:
strToFind = {string(deg_parts(2)),string(deg_parts(4)),string(deg_parts(8))}'; % Strings to match
fun = #(s)~cellfun('isempty',strfind(ref_files,s));
out = cellfun(fun,strToFind,'UniformOutput',false);
idx = all(horzcat(out{:}),2);
However, the index returns two values from my reference file cell array:
Ref_female_44k_00dBA_babble7ch_1sp_20k_70dBA_48k.wav
Ref_female_44k_70dBA_babble7ch_1sp_20k_00dBA_48k.wav
Both contain the distinguishing parts, but only the second in the correct order.
Is there a way I can enforce the order in my out call?
Thanks!

In the simplest case, where the comparison and reference files only differ in their first part, you can use strrep:
refFile = strrep(deg_baseFileName, 'Test1', 'Ref');
If you know what the other parts of the file name will be, and they are the same for all the reference files but differ from the comparison file, you can just use sprintf to create your file name:
refFile = sprintf('Ref_%s_44k_%s_babble7ch_1sp_20k_%s_48k.wav', ...
deg_parts(2), deg_parts(4), deg_parts(8));
If you don't know or care what the other parts could be, you can generalize the above to create a match expression for use with regexp to find the index of reference files with the correct order:
expr = sprintf('Ref_%s_[^_]+_%s_[^_]+_[^_]+_[^_]+_%s_[^_]+.wav', ...
deg_parts(2), deg_parts(4), deg_parts(8));
index = ~cellfun('isempty', regexp(ref_files, expr));

Related

Calling a vector from a string

I am attempting to write an algorithm that selects a specific reference standard (vector) as a function of temperature. The temperature values are stored in a structure ( procspectra(i).temperature ). My reference standards are stored in another structure ( standards.interp.zeroed.ClOxxx ) where xxx are numbers such as 200, 210, 220, etc. I have built the rounding construct and paste it below.
for i = 1:length(procspectra);
if mod(-procspectra(i).temperature,10) > mod(procspectra(i).temperature,10);
%if mod(-) > mod(+) round down, else round up
tempvector(i) = procspectra(i).temperature - mod(procspectra(i).temperature,10);
else
tempvector(i) = procspectra(i).temperature + mod(-procspectra(i).temperature,10);
end
clostd = strcat('standards.interp.zeroed.ClO',num2str(tempvector(i)));
end
This construct works well. Now, I have built a string which is identical to the name of the vector I want to invoke, but I'm uncertain how to actually call the vector given that this is encoded as a string. Ideally I want to do something within the for-loop like:
parameters(i).standards.ClOstandard = clostd
where I actually am assigning that parameter structure to be the same as the vector I have saved in the standards structure I have previously generated (and not just a string)
Could anyone help out?
Don't construct clostd like that (containing the full variable name), make it contain only the last field name instead:
clostd = ['ClO' num2str(tempvector(i))];
parameters(i).standards.ClOstandard = standards.interp.zeroed.(clostd);
This is the syntax of accessing a structure's field dynamically, using a string. So the following three are equivalent:
struc.Cl0123
struc.('Cl0123')
fieldn='Cl0123'; struc.(fieldn)

Matlab: sorting strings on size in a struct field

This problem is bugging me and the solution is probably obvious but i cant find it.
I have a bunch of data files which i want to load:
ex_file-1.txt, ex_file-2.txt, ..., ex_file-10.txt
To get their filenames i use:
files = dir('ex_file-*.txt');
This returns a struct with fields name, type, etc. The field name returns:
ex_file-1.txt, ex_file-10.txt, ex_file-2.txt, ..., ex_file-9.txt
I would like to sort this such that ex_file-10.txt is the last file rather than the second.
I have attempted to concatenate, convert to cells and sort but none seem to give what i need. I know that the most obvious solution would be to rename all file names so all strings have the same length but i'd prefer not to do that.
This could be one approach -
%// Input cell array of filenames
names = {'ex_file-1.txt', 'ex_file-10.txt', 'ex_file-2.txt', 'ex_file-3.txt', ...
'ex_file-4.txt', 'ex_file-5.txt'}
%// Reomove the starting common "ex_file" string
stripped_names = strrep(names,'ex_file-','')
%// Remove the ending extension part
stripped_names = strrep(stripped_names,'.txt','')
%// Convert to doubles and then get the sorted indices
[~,idx] = sort(str2double(stripped_names))
%// Use sorted indices to rearrange names array, for the final output
names_out = names(idx)
Code run -
>> names
names =
'ex_file-1.txt' 'ex_file-10.txt' 'ex_file-2.txt' 'ex_file-3.txt' 'ex_file-4.txt' 'ex_file-5.txt'
>> names_out
names_out =
'ex_file-1.txt' 'ex_file-2.txt' 'ex_file-3.txt' 'ex_file-4.txt' 'ex_file-5.txt' 'ex_file-10.txt'
This can be done using regular expressions. The numeric part of file name is detected as a subsequence of numeric characters right before the .txt part.
files = dir('ex_file-*.txt'); %// get file struct array
names = {files.name}; %// get file names. Cell array of strings
numbers = regexp(names, '\d+(?=\.txt)', 'match'); %// strings with numeric part of name
numbers = str2double([numbers{:}]); %// convert from strings to numbers
[~, ind] = sort(numbers); %// sort those numbers
names_sorted = names(ind); %// apply that order to file names
Here is a alternative which does not require any details about the file name. Primary sorting rule shortest first, secondary lexicographic:
%secondary sorting
list=sort(list);
%primary sorting by length
[a,b]=sort(cellfun(#numel,list)):
list=list(b);

MATLAB: Only pick filenames coinciding with some input string

Say I have a directory full of filenames such as:
1242349_blabla.wav
fdp23424_asdf.wav
o2349_0.wav
and I have an input text file listing unique IDs on each newline coinciding with numbers within these filenames (e.g. '23424' for the second filename above).
I'd like to construct a struct of filenames only containing those filenames in that directory that coincide with some ID in the input text file:
fid = fopen('input.txt');
input = textscan(fid, '%s', 'Delimiter', '\n');
filenames = dir(fullfile('/somedir/', '*.wav'));
for i = 1:length(filenames)
for j = 1:length(input)
if (strfind(input{1}(j), filenames(i).name)) ~= [])
% create new struct with chosen filenames
end
end
end
However, I get the error "undefined function 'ne' for input arguments of type 'cell'". I've tried loads of options to no avail. Also, the input evaluates to a 38x1 cell, but which has length 1, so the inner loop will only go once... Any ideas?
Regular expressions are definitely the most flexible and powerful solution. But, if your needs are simpler...you can get away with something simpler, like using wildcards in your dir command. Try something like this:
%get your file IDs from the input file
fid = fopen('input.txt');
input = textscan(fid, '%s', 'Delimiter', '\n');
IDs = input{1};
%loop over each string
myfilenames = {};
for idx = 1:length(IDs)
%get all files build off the given ID
fnames = dir(['somedir/*' IDs{idx} '*.wav']); %wildcards!
%gather the new filenames that match
for Ifname=1:length(fnames)
myfilenames{end+1}=fnames(Ifname).name;
end
end
I would use regular expressions to search for occurrences of the ID in your cell array. Regular expressions are designed to search for patterns in a particular string for you. Because you want to search for specific numbers in a set of strings, I would certainly recommend you use it. Specifically, use the regexp function, and the pattern you want to search for is the ID that you want are searching for.
How regexp works is that you can provide a cell array of strings, and the output will be another cell array where each element is a numeric array that determines the starting index of where the particular pattern you're looking for starts for a particular string in the cell array. Should the array be empty, this means that we didn't find any pattern that matched what you're looking for. If it isn't empty, then it will contain the starting index of where the ID is located in the string. This doesn't really matter - you want to determine whether the ID exists in a particular string, and so checking to see whether each array is empty is what will be useful.
As such, given your filenames that you read through dir, we can create a cell array that stores just the file names themselves, run regexp, then filter out those file names that don't contain the ID you want. Something like this:
f = dir(fullfile('/somedir/', '*.wav'));
filenames = {f.name};
ID = 23424;
check = regexp(filenames, num2str(ID));
filtered_ind = cellfun(#isempty, check);
final_files = f(~filtered_ind);
The first line of code reads the files from your desired directory. The second line of code extracts the names from each name field of the structure as a cell array. The third line is the ID you want to check for. The fourth line does a regexp call on the file names and searches for those file names that contain your desired number. Note that we need to convert the number to a string, as the pattern is expected to be a string. The next line after that finds those filenames that do not have the ID you are looking for, and the last line simply finds those files that do have the ID you're looking for.
You can then go ahead and start your processing. Specifically, you can loop over this cell array and go ahead and create your structures per element in this cell:
for i = 1:length(final_files)
s = final_files(i); %// Get the dir structure for a file that passed the ID check
%// Create your structure now...
%// ...
end
However, you have a series of IDs that you want to check. We can simply take the code above and apply a loop to it. In other words, you'd do something like:
fid = fopen('input.txt');
input = textscan(fid, '%s', 'Delimiter', '\n');
IDs = input{1};
f = dir(fullfile('/somedir/', '*.wav'));
filenames = {f.name};
for idx = 1 : length(IDs)
%// Get an ID
ID = IDs{idx};
%// Do our checking and filter out those files that don't contain our ID
check = regexp(filenames,ID);
filtered_ind = cellfun(#isempty, check);
final_files = f(~filtered_ind);
%// Do your final processing
for i = 1:length(final_files)
s = final_files(i); %// Get the dir structure for a file that passed the ID check
%// Create your structure now...
%// ...
end
end
With the above code, we open the text file, then parse each string that's in the text file and place it into a cell array called IDs. Note here that the IDs are now all strings, so there's no need to do any conversions. After, for each ID we have, we search our filenames to see which files have this ID we're looking for. We filter out those filenames that don't have this ID, then we loop over each one of these files and create our structures. We do this for each ID that we have.
Just to demonstrate that this regexp stuff is working, as a small example, let's use the three filenames you have provided with your post. I've placed these names in a cell array, then I'll run lines 3 to 5 in the code I wrote, then I will filter out those filenames that don't contain the ID we're looking for:
filenames = {'1242349_blabla.wav'; 'fdp23424_asdf.wav'; 'o2349_0.wav'};
ID = 23424;
check = regexp(filenames, num2str(ID));
filtered_ind = cellfun(#isempty, check);
final_filenames = filenames(~filtered_ind);
final_filenames is a cell array our filenames that have our ID. We thus get:
final_filenames =
'fdp23424_asdf.wav'
Good luck!

Matlab: Remove fields with similar string names in a single command

So I have a structure, r, that contains multiple headers of the form:
Header_0001
Header_0002
Header_0003, and so on whose names are represented as strings.
Is there a way to format the strings so that I can remove these headers with a single command?
i.e.
r=rmfield(r,Header_00XX)
where X can be any number. I have tried using wildcards, anchors, etc. but have not found a method that works as of yet.
Try this:
fields = fieldnames(r);
r = rmfield(r, fields(find(~cellfun(#isempty,strfind(fields, 'Header_00')))))

xlsread ('not the file name but a string contained in an element of an array that is the file name)

I would like to read an excel file (xlsread) but I don't want to put manually the string every time but instead I want to xlsread the name of the file that is contained in an array.
For example, my array B is:
B =
'john.xlsx'
'mais.xlsx'
'car.xlsx'
Then I would like to read the excel WITH THE NAME that is inside the first element, that means: "john.xlsx"
How can I do this?
data = xlsread(B{1});
Or, if you want to read all of them:
for i=1:length(B)
data(i).nums = xlsread(B{i});
end
Assuming, of course, your B is a cell array. If it's not, it can't exist the way you described it. If all strings have the same length (then it would be possible) or padding with spaces, you can split the char array into a cell array using
B = mat2cell(B,ones(size(B,1),1),size(B,2));
Strings of different lengths would have to be inside a cell array, which you can access elements via the curly brackets {}. So, you can call xlsread on the first element this way:
names{1} = 'john.xlsx';
names{2} = 'mais.xlsx';
names{3} = 'car.xlsx';
num = xlsread(names{1});

Resources