Matlab: filesystem, string manipulation and figures saving - string

In the workspace I have many m-files containing data I'd like to plot.
I have to read them all and save their plot without showing the results (I'll see them after all is done).
The last part can be done this way?
f = figure('Visible', 'off');
plot(x,y);
saveas(f,'figure.fig');
but I don't want to load manually each m-file where x and y are stored.
So I need a way to explore the filesystem and run these statements for each file, manipulate their name and save a jpg with the same name of its m-file.

The dir function will return a structure containing info on the Folders and Files in the current directory
>> FileInfo = dir
Then you need to write code to use that info to automatically navigate the directory structure (using cd for instance), and select the files you want to read.
The function what can also be useful if you're wanting to only look for certain file types, e.g. .mat files.
Not surprisingly, similar questions to this have been asked before, for instance see here

Related

How to read the most recent Excel export into a Pandas dataframe without specifying the file name?

I frequent a real estate website that shows recent transactions, from which I will download data to parse within a Pandas dataframe. Everything about this dataset remains identical every time I download it (regarding the column names, that is).
The name of the Excel output may change, though. For example, if I already have download a few of these in my Downloads folder, the file that's exported may read "Generic_File_(3)" or "Generic_File_(21)" if I already have a few older "Generic_File" exports in that folder from a previous export.
Ideally, I'd like my workflow to look like this: export this Excel file of real estate sales, then run a Python script to read in the most recent export as a Pandas dataframe. The catch is, I don't want to have to go in and change the filename in the script to match the appending number of the Excel export everytime. I want the pd.read_excel method to simply read the "Generic_File" that is appended with the largest number (which will obviously correspond to the most rent export).
I suppose I could always just delete old exports out of my Downloads folder so the newest, freshest export is always named the same ("Generic_File", in this case), but I'm looking for a way to ensure I don't have to do this. Are wildcards the best path forward, or is there some other method to always read in the most recently downloaded Excel file from my Downloads folder?
I would use the OS package and create a method to read to file names in the downloads folder. Parsing string filenames you could then find the file following your specified format with the highest copy number. Something like the following might help you get started.
import os
downloads = os.listdir('C:/Users/[username here]/Downloads/')
is_file = [True if '.' in item else False for item in downloads]
files = [item for keep, item in zip(is_file, downloads) if keep]
** INSERT CODE HERE TO IDENTIFY THE FILE OF INTEREST **
Regex might be the best way to find matches if you have a diverse listing of files in your downloads folder.

copying files from one folder to another folder based on the file names in python 3

In Python 3.7, I want to write a scrip that
creates folders based on a list
iterates through a list (elements represent different "runs")
searches for .txt files in predifined directories derived from certain operations
copies certain .txt files to the previously created folders
I managed to do that via following script:
from shutil import copy
import os
import glob
# define folders and batches
folders = ['folder_ce', 'folder_se']
runs = ['A001', 'A002', 'A003']
# make folders
for f in folders:
os.mkdir(f)
# iterate through batches,
# extract files for every operation,
# and copy them to target folder
for r in runs:
# operation 1
ce = glob.glob(f'{r}/{r}/several/more/folders/{r}*.txt')
for c in ce:
copy(c, 'folder_ce')
# operation 2
se = glob.glob(f'{r}/{r}/several/other/folders/{r}*.txt')
for s in se:
copy(s, 'folder_se')
In the predifined directories there are several .txt files
one file with the format A001.txt (where the "A001"-part is derived from the list "runs" specified above)
plus sometimes several files with the format A001.20200624.1354.56.txt
If a file with the format A001.txt is there, I only want to copy this one to the target directory.
If the format A001.txt is not available, I want to copy all files with the longer format (e.g. A001.20200624.1354.56.txt).
After the comment of #adamkwm, I tried
if f'{b}/{b}/pcs.target/data/xmanager/CEPA_Station/Verwaltung_CEPA_44S4/{b}.txt' in cepa:
copy(f'{b}/{b}/pcs.target/data/xmanager/CEPA_Station/Verwaltung_CEPA_44S4/{b}.txt', 'c_py_1')
else:
for c in cepa:
copy(c, 'c_py_1')
but that still copies both files (A001.txt and A001.20200624.1354.56.txt), which I understand. I think the trick is to first check in ce, which is a list, if the {r}.txt format is present and if it is, only copy that one. If not, copy all files. However, I don't seem to get the logic right or use the wrong modules or methods, it seems.
After searching for answers, i didn't find one resolving this specific case.
Can you help me with a solution for this "selective copying" of the files?
Thanks!

Python 3 - Copy files if they do not exist in destination folder

I am attempting to move a couple thousand pdfs from one file location to another. The source folder contains multiple subfolders and I am combining just the pdfs (technical drawings) into one folder to simplify searching for the rest of my team.
The main goal is to only copy over files that do not already exist in the destination folder. I have tried a couple different options, most recently what is shown below, and in all cases, every file is copied every time. Prior to today, any time I attempted a bulk file move, I would received errors if the file existed in the destination folder but I no longer do.
I have verified that some of the files exist in both locations but are still being copied. Is there something I am missing or can modify to correct?
Thanks for the assistance.
import os.path
import shutil
source_folder = os.path.abspath(r'\\source\file\location')
dest_folder = os.path.abspath(r'\\dest\folder\location')
for folder, subfolders, files in os.walk(source_folder):
for file in files:
path_file=os.path.join(folder, file)
if os.path.exists(file) in os.walk(dest_folder):
print(file+" exists.")
if not os.path.exists(file) in os.walk(dest_folder):
print(file+' does not exist.')
shutil.copy2(path_file, dest_folder)
os.path.exists returns a Boolean value. os.walk creates a generator which produces triples of the form (dirpath, dirnames, filenames). So, that first conditional will never be true.
Also, even if that conditional were correct, your second conditional has a redundancy since it's merely the negation of the first. You could replace it with else.
What you want is something like
if file in os.listdir(dest_folder):
...
else:
...

Avoid overwriting of files with "for" loop

I have a list of dataframes (df_cleaned) created from multiple csv files chosen by the user.
My objective is to save each dataframe within the df_cleaned list as a separate csv file locally.
I have the following code done which saves the file with its original title. But I see that it overwrites and manages to save a copy of only the last dataframe.
How can I fix it? According to my very basic knowledge perhaps I could use a break-continue statement in the loop? But I do not know how to implement it correctly.
for i in range(len(df_cleaned)):
outputFile = df_cleaned[i].to_csv(r'C:\...\Data Docs\TrainData\{}.csv'.format(name))
print('Saving of files as csv is complete.')
You can create a different name for each file, as an example in the following I attach the index to name:
for i in range(len(df_cleaned)):
outputFile = df_cleaned[i].to_csv(r'C:\...\Data Docs\TrainData\{0}_{1}.csv'.format(name,i))
print('Saving of files as csv is complete.')
this will create a list of files named <name>_N.csv with N = 0, ..., len(df_cleaned)-1.
A very easy way of solving. Just figured out the answer myself. Posting to help someone else.
fileNames is a list I created at the start of the code to save the
names of the files chosen by the user.
for i in range(len(df_cleaned)):
outputFile = df_cleaned[i].to_csv(r'C:\...\TrainData\{}.csv'.format(fileNames[i]))
print('Saving of files as csv is complete.')
Saves a separate copy for each file in the defined directory.

Importing a whole folder of python files

In the current python program I'm working on, I need to access a lot of stored data. I store it in the form of a bunch of dictionaries, each in their own file. Each file has a single command: giveArchive(). So to access one of the files, I use:
import fileName
return fileName.giveArchive()
And this has worked well so far, but as the number of files I need grows, I want to streamline this a little bit. I'd like to store all of these files in the same folder, and that folder in the same directory as my main file. Is there some way I can import every file in a folder? And if I do, how can I use 'giveArchive()' from specific files in it?
You can do something like:
from folder.subfolder.deepersubfolder import filename
return filename.giveArchive()
this assumes folder can be accessed from the directory your script is running in

Resources