In the current python program I'm working on, I need to access a lot of stored data. I store it in the form of a bunch of dictionaries, each in their own file. Each file has a single command: giveArchive(). So to access one of the files, I use:
import fileName
return fileName.giveArchive()
And this has worked well so far, but as the number of files I need grows, I want to streamline this a little bit. I'd like to store all of these files in the same folder, and that folder in the same directory as my main file. Is there some way I can import every file in a folder? And if I do, how can I use 'giveArchive()' from specific files in it?
You can do something like:
from folder.subfolder.deepersubfolder import filename
return filename.giveArchive()
this assumes folder can be accessed from the directory your script is running in
Related
I am trying to loop through multiple folders and subfolders in Azure Blob container and read multiple xml files.
Eg: I have files in YYYY/MM/DD/HH/123.xml format
Similarly I have multiple sub folders under month, date, hours and multiple XML files at last.
My intention is to loop through all these folder and read XML files. I have tried using few Pythonic approaches which did not give me the intended result. Can you please help me with any ideas in implementing this?
import glob, os
for filename in glob.iglob('2022/08/18/08/225.xml'):
if os.path.isfile(filename): #code does not enter the for loop
print(filename)
import os
dir = '2022/08/19/08/'
r = []
for root, dirs, files in os.walk(dir): #Code not moving past this for loop, no exception
for name in files:
filepath = root + os.sep + name
if filepath.endswith(".xml"):
r.append(os.path.join(root, name))
return r
The glob is a python function and it won't recognize the blob folders path directly as code is in pyspark. we have to give the path from root for this. Also, make sure to specify recursive=True in that.
For Example, I have checked above pyspark code in databricks.
and the OS code as well.
You can see I got the no result as above. Because for the above, we need to give the absolute root. it means the root folder.
glob code:
import glob, os
for file in glob.iglob('/path_from_root_to_folder/**/*.xml',recursive=True):
print(file)
For me in databricks the root to access is /dbfs and I have used csv files.
Using os:
You can see my blob files are listed from folders and subfolders.
I have used databricks for my repro after mounting. Wherever you are trying this code in pyspark, make sure you are giving the root of the folder in the path. when using glob, set the recursive = True as well.
There is an easier way to solve this problem with PySpark!
The tough part is all the files have to have the same format. In the Azure databrick's sample directory, there is a /cs100 folder that has a bunch of files that can be read in as text (line by line).
The trick is the option called "recursiveFileLookup". It will assume that the directories are created by spark. You can not mix and match files.
I added to the data frame the name of the input file for the dataframe. Last but not least, I converted the dataframe to a temporary view.
Looking at a simple aggregate query, we have 10 unique files. The biggest have a little more than 1 M records.
If you need to cherry pick files for a mixed directory, this method will not work.
However, I think that is an organizational cleanup task, versus easy reading one.
Last but not least, use the correct formatter to read XML.
spark.read.format("com.databricks.spark.xml")
i need help to create a code using python to rename specific folder inside many subfolder in side the main folder
Example:
theirs folder name fromDrawing inside many folder subfolder and need to rename it to Drawing
main folder
sub-folders
To rename folder using python, you can use the os.rename functionality..
Example:
import os,sys
os.rename("path_to_initial_folder","same_path_with_changed_name")
Try using sys/os? I'm not good with sys and os, so I can't give you an example. (I have just never needed it really.)
I am attempting to move a couple thousand pdfs from one file location to another. The source folder contains multiple subfolders and I am combining just the pdfs (technical drawings) into one folder to simplify searching for the rest of my team.
The main goal is to only copy over files that do not already exist in the destination folder. I have tried a couple different options, most recently what is shown below, and in all cases, every file is copied every time. Prior to today, any time I attempted a bulk file move, I would received errors if the file existed in the destination folder but I no longer do.
I have verified that some of the files exist in both locations but are still being copied. Is there something I am missing or can modify to correct?
Thanks for the assistance.
import os.path
import shutil
source_folder = os.path.abspath(r'\\source\file\location')
dest_folder = os.path.abspath(r'\\dest\folder\location')
for folder, subfolders, files in os.walk(source_folder):
for file in files:
path_file=os.path.join(folder, file)
if os.path.exists(file) in os.walk(dest_folder):
print(file+" exists.")
if not os.path.exists(file) in os.walk(dest_folder):
print(file+' does not exist.')
shutil.copy2(path_file, dest_folder)
os.path.exists returns a Boolean value. os.walk creates a generator which produces triples of the form (dirpath, dirnames, filenames). So, that first conditional will never be true.
Also, even if that conditional were correct, your second conditional has a redundancy since it's merely the negation of the first. You could replace it with else.
What you want is something like
if file in os.listdir(dest_folder):
...
else:
...
I'am trying to make a file searching, Python based program, with GUI.
It's going to be used to search specified directories and subdirectories. For files which filenames have to be inserted in an Entry-box.
while I'am fairly new to python programming, I searched the web and gained some information on the os module.
Then I moved on and tried to write a simple code with os.walk and without the GUI program:
import os
for root, dirs, files in os.walk( 'Path\to\files'):
for file in files:
if file.endswith('.doc'):
print(os.path.join(root, file))
Which worked fine, however... file.endswith() Only looks to the last part of the filename.
The problem is that in the file path are over 1000 files with .doc. And I want the code to be able to search parts of the file name, for example "Caliper" in filename "Hilka_Vernier_Caliper.doc".
So I went on and searched for other methods than file.endswith() and found something about file.index(). So I changed the code to:
import os
for root, dirs, files in os.walk( 'Path\to\files'):
for file in files:
if file.index('Caliper'):
print(os.path.join(root, file))
But that didn't work as planned...
Does someone on here have an idea, how I could make this work?
You may use pathlib instead of the old os: https://docs.python.org/3/library/pathlib.html#pathlib.Path.rglob
BTW, file.index raises an exception if the name is not not found, so you need a try/except clause.
Another way is to use if "Caliper" in str(file):
I am writing two simple scripts, one to move all files into a folder, and one to move all files back to said folder. I am not getting any errors, but the files aren't moving so I am likely missing something stupidly obvious somewhere.
I tried making sure the file paths were correct, looked up how the syntax of the commands worked, and checked for any basic errors.
import shutil
import os
source = r'C:\\Users\JonTh\Saved Games\DCS\Mods\aircraft'
destination = r'C:\\Users\JonTh\Saved Games\dcs planes'
files = os.listdir(source)
for index in files:
shutil.move(source,destination)
you should modify your code to consider files from for loop
for index in files:
shutil.move(source+"\\"+index,destination)