(matlab) unique variable name from string - string

I have a simple script to import some spectroscopy data from files with some base filename (YYYYMMDD) and a header. My current method pushes the actual spectral intensities to some vector 'rawspectra' and I can call the data by `rawspectra{m,n}.data(q,r)
In the script, I specify by hand the base filename and save it as a string 'filebase'.
I would like to append the name of the rawspectra vector with the filebase so I might be able to use the script to import files acquired on different dates into the same workspace without overwriting the rawspectra vector (and also allowing for easy understanding of which vectors are attached to which experimental conditions. I can easily do this by manually renaming a vector, but I'd rather make this automatic.
My importation script follows:
%for the importation of multiple sequential files, starting at startfile
%and ending at numfiles. All raw scans are subsequently plotted.
numfiles = input('How many spectra?');
startfile = input('What is the starting file number?');
numberspectra = numfiles - (startfile - 1);
filebase = strcat(num2str(input('what is the base file number?')),'_');
rawspectra = cell(startfile, numberspectra);
for k= startfile:numberspectra
filename = strcat(filebase,sprintf('%.3d.txt', k));
%eval(strcat(filebase,'rawspectra')){k} = importdata(filename); - This does not work.
rawspectra{k} = importdata(filename);
figure;
plot(rawspectra{1,k}.data(:,1),rawspectra{1,k}.data(:,2))
end
If any of you can help me out with what should be a seemingly simple task, I would be very appreciative. Basically, I want 'filebase' to go in front of 'rawspectra' and then increment that by k++ within the loop.
Thanks!

Why not just
rawspectra(k) = importdata(filename);
rawspectra(k).filebase = filebase;

Related

I'm trying to 'shuffle' a folder of music and there is an error where random.choice() keeps choosing things that it is supposed to have removed

I'm trying to make a python script that renames files randomly from a list and I used numbers.remove(place) on it but it keeps choosing values that are supposed to have been removed.
I used to just use random.randint but now I have moved to choosing from a list then removing the chosen value from the list but it seems to keep choosing chosen values.
'''python
from os import chdir, listdir, rename
from random import choice
def main():
chdir('C:\\Users\\user\\Desktop\\Folders\\Music')
for f in listdir():
if f.endswith('.mp4'):
numbers = [str(x) for x in range(0, 100)]
had = []
print(f'numbers = {numbers}')
place = choice(numbers)
print(f'place = {place}')
numbers.remove(place)
print(f'numbers = {numbers}')
while place in had:
input('Place has been had.')
place = choice(numbers)
had.append(place)
name = place + '.mp4'
print(f'name = {name}')
print(f'\n\nRenaming {f} to {name}.\n\n')
try:
rename(f, name)
except FileExistsError:
pass
if __name__ == '__main__':
main()
'''
It should randomly number the files without choosing the same value for a file twice but it does that and I have no idea why.
When you call listdir() the first time, that's the same list that you're iterating over the entire time. Yes, you're changing the contents of the directory, but python doesn't really care about that because you only asked for the contents of the directory at a specific point in time - before you began modifying it.
I would do this in two separate steps:
# get the current list of files in the directory
dirlist = os.listdir()
# choose a new name for each file
to_rename = zip(
dirlist,
[f'{num}.mp4' for num in random.sample(range(100), len(dirlist))]
)
# actually rename each file
for oldname, newname in to_rename:
try:
os.rename(oldname, newname)
except FileExistsError:
pass
This method is more concise than the one you're using. First, I use random.sample() on the iterable range(100) to generate non-overlapping numbers from that range (without having to do the extra step of using had like you're doing now). I generate exactly as many as I need, and then use the built-in zip() function to bind together the original filenames with these new numbers.
Then, I do the rename() operations all at once.

Caching parsed document

I have a set of YAML files. I would like to cache these files so that as much work as possible is re-used.
Each of these files contains two documents. The first document contains “static” information that will always be interpreted in the same way. The second document contains “dynamic” information that must be reinterpreted every time the file is used. Specifically, it uses a tag-based macro system, and the document must be constructed anew each time the file is used. However, the file itself will not change, so the results of parsing the entire file could be cached (at a considerable resource savings).
In ruamel.yaml, is there a simple way to parse an entire file into multiple parsed documents, then run construction on each document individually? This would allow me to cache the result of constructing the first “static” document and cache the parse of the second “dynamic” document for later construction.
Example file:
---
default_argument: name
...
%YAML 1.2
%TAG ! tag:yaml-macros:yamlmacros.lib.extend,yamlmacros.lib.arguments:
---
!merge
name: !argument name
The first document contains metadata that is used (along with other data from elsewhere) in the construction of the second document.
If you don't want to process all YAML documents in a stream completely, you'll have to split up the stream by hand, which is not entirely easy to do in a generic way.
What you need to know is what a YAML stream can consist of:
zero or more documents. Subsequent documents require some sort of separation marker line. If a document is not terminated by a document end marker line, then the following document must begin with a directives end marker line.
A document end marker line is a line that starts with ... followed by space/newline and a directives end marker line is --- followed by space/newline.
The actual production rules are slightly more complicated and "starts with" should ignore the fact that you need to skip any mid-stream byte-order marks.
If you don't have any directives, byte-order-marks and no document-end-markers (and most multi-document YAML streams that I have seen, do not have those), then you can just data = Path().read() the multi-document YAML as a string, split using l = data.split('\n---') and process only the appropriate element of the resulting list with YAML().load(l[N]).
I am not sure the following properly handles all cases, but it does handle your multi-doc stream:
import sys
from pathlib import Path
import ruamel.yaml
docs = []
current = ""
state = "EOD"
for line in Path("example.yaml").open():
if state in ["EOD", "DIR"]:
if line.startswith("%"):
state = "DIR"
else:
state = "BODY"
current += line
continue
if line.startswith('...') and line[3].isspace():
state = "EOD"
docs.append(current)
current = ""
continue
if state == "BODY" and current and line.startswith('---') and line[3].isspace():
docs.append(current)
current = ""
continue
current += line
if current:
docs.append(current)
yaml = ruamel.yaml.YAML()
data = yaml.load(docs[1])
print(data['name'])
which gives:
name
Looks like you can indeed directly operate the parser internals of ruamel.yaml, it just isn't documented. The following function will parse a YAML string into document nodes:
from ruamel.yaml import SafeLoader
def parse_documents(text):
loader = SafeLoader(text)
composer = loader.composer
while composer.check_node():
yield composer.get_node()
From there, the documents can be individually constructed. In order to solve my problem, something like the following should work:
def process_yaml(text):
my_constructor = get_my_custom_constructor()
parsed_documents = list(parse_documents(path.read_text()))
metadata = my_constructor.construct_document(parsed_documents[0])
return (metadata, document[1])
cache = {}
def do_the_thing(file_path):
if file_path not in cache:
cache[file_path] = process_yaml(Path(file_path).read_text())
metadata, document = cache[file_path]
my_constructor = get_my_custom_constructor(metadata)
return my_constructor.construct_document(document)
This way, all of the file IO and parsing is cached, and only the last construction step need be performed each time.

Reportlab PDF creating with python duplicating text

I am trying to automate the production of pdfs by reading data from a pandas data frame and writing it a page on an existing pdf form using pyPDF2 and reportlab. The main meat of the program is here:
def pdfOperations(row, bp):
packet = io.BytesIO()
can = canvas.Canvas(packet, pagesize=letter)
createText(row, can)
packet.seek(0)
new_pdf = PdfFileReader(packet)
textPage = new_pdf.getPage(0)
secondPage = bp.getPage(1)
secondPage.mergePage(textPage)
assemblePDF(frontPage, secondPage, row)
del packet, can, new_pdf, textPage, secondPage
def main():
df = openData()
bp = readPDF()
frontPage = bp.getPage(0)
for ind in df.index:
row = df.loc[ind]
pdfOperations(row, bp)
This works fine for the first row of data and the first pdf generated, but for the subsequent ones all the text is overwritten. I.e. the second pdf contains text from the first iteration and the second. I thought the garbage collection would take care of all the in memory changes, but that does not seem to be happening. Anyone know why?
I even tries forcing the objects to be deleted after the function has run its course, but no luck...
You read bp only once before the loop. Then in the loop, you obtain its second page via getPage(1) and merge stuff to it. But since its always from the same object (bp), each iteration will merge to the same page, therefore all the merges done before add up.
While I don't find any way to create a "deepcopy" of a page in PyPDF2's docs, it should work to just create a new bp object for each iteration.
Somewhere in readPDF you must have done something where you open your template PDF into a binary stream and then pass that to PdfFileReader. Instead, you could read the data into a variable:
with open(filename, "rb") as f:
bp_bin = f.read()
And from that, create a new PdfFileReader instance for each loop iteration:
for ind in df.index:
row = df.loc[ind]
bp = PdfFileReader(bp_bin)
pdfOperations(row, bp)
This should "reset" the secondPage everytime without any additional file I/O overhead. Only the parsing is done again each time, but depending on the file size and contents, maybe the time that takes is low and you can live with that.

Matlab sprintf incorrect result using random strings from list

I want create a string variable using ´sprintf´ and a random name from a list (in order to save an image with such a name). A draft of the code is the following:
Names = [{'C'} {'CL'} {'SCL'} {'A'}];
nameroulette = ceil(rand(1)*4)
filename = sprintf('DG_%d.png', Names{1,nameroulette});
But when I check filename, what I get is the text I typed followed not by one of the strings, but by a number that I have no idea where it comes from. For example, if my nameroulette = 1 then filename is DG_67.png, and if nameroulette = 4, filename = 'DG_65.png' . Where does this number come from and how can I fix this problem?
You just need to change
filename = sprintf('DG_%d.png', Names{1,nameroulette});
to
filename = sprintf('DG_%s.png', Names{1,nameroulette});
By the way you may want to have a look at randi command for drawing random integers.

matlab iterative filenames for saving

this question about matlab:
i'm running a loop and each iteration a new set of data is produced, and I want it to be saved in a new file each time. I also overwrite old files by changing the name. Looks like this:
name_each_iter = strrep(some_source,'.string.mat','string_new.(j).mat')
and what I#m struggling here is the iteration so that I obtain files:
...string_new.1.mat
...string_new.2.mat
etc.
I was trying with various combination of () [] {} as well as 'string_new.'j'.mat' (which gave syntax error)
How can it be done?
Strings are just vectors of characters. So if you want to iteratively create filenames here's an example of how you would do it:
for j = 1:10,
filename = ['string_new.' num2str(j) '.mat'];
disp(filename)
end
The above code will create the following output:
string_new.1.mat
string_new.2.mat
string_new.3.mat
string_new.4.mat
string_new.5.mat
string_new.6.mat
string_new.7.mat
string_new.8.mat
string_new.9.mat
string_new.10.mat
You could also generate all file names in advance using NUM2STR:
>> filenames = cellstr(num2str((1:10)','string_new.%02d.mat'))
filenames =
'string_new.01.mat'
'string_new.02.mat'
'string_new.03.mat'
'string_new.04.mat'
'string_new.05.mat'
'string_new.06.mat'
'string_new.07.mat'
'string_new.08.mat'
'string_new.09.mat'
'string_new.10.mat'
Now access the cell array contents as filenames{i} in each iteration
sprintf is very useful for this:
for ii=5:12
filename = sprintf('data_%02d.mat',ii)
end
this assigns the following strings to filename:
data_05.mat
data_06.mat
data_07.mat
data_08.mat
data_09.mat
data_10.mat
data_11.mat
data_12.mat
notice the zero padding. sprintf in general is useful if you want parameterized formatted strings.
For creating a name based of an already existing file, you can use regexp to detect the '_new.(number).mat' and change the string depending on what regexp finds:
original_filename = 'data.string.mat';
im = regexp(original_filename,'_new.\d+.mat')
if isempty(im) % original file, no _new.(j) detected
newname = [original_filename(1:end-4) '_new.1.mat'];
else
num = str2double(original_filename(im(end)+5:end-4));
newname = sprintf('%s_new.%d.mat',original_filename(1:im(end)-1),num+1);
end
This does exactly that, and produces:
data.string_new.1.mat
data.string_new.2.mat
data.string_new.3.mat
...
data.string_new.9.mat
data.string_new.10.mat
data.string_new.11.mat
when iterating the above function, starting with 'data.string.mat'

Resources