os.walk() ignore few directories and subdirectories under them - python-3.x

I have a a top level directory and then few subdirectories underneath it which in turn have few subdirectories and files.
the tree view looks like this:-
treeview
├── abc
│ ├── pqr
│ │ ├── 1
│ │ ├── 2
│ │ └── 3
│ └── sty
└── xyz
Now I want my script to ignore walking into "abc" and "1"(this is a directory and not a file).
I have come up with the below script:-
import os
import shutil
files_to_keep = []
files_grandlist = []
dirs_to_keep = []
dirs_grandlist = []
rootpath="./treeview"
exclude = ['./treeview/abc', './treeview/abc/pqr/1']
exclude_temp = set(['abc','1'])
for path in exclude:
for root, dirs, files in os.walk(path):
dirs[:] = [d for d in dirs if d not in exclude_temp]
dirs_to_keep.append(root)
print("================dirs to keep======")
for i in dirs_to_keep:
print(i)
for root, dirs, files in os.walk(rootpath):
dirs_grandlist.append(root)
print("===============grandlist==========")
for i in dirs_grandlist:
print(i)
print("filter============================")
for i in list(set(dirs_grandlist)^set(dirs_to_keep)):
print(i)
This gives me output like below when run:
================dirs to keep======
./treeview/abc
./treeview/abc/pqr
./treeview/abc/pqr/2
./treeview/abc/pqr/3
./treeview/abc/sty
./treeview/abc/pqr/1
===============grandlist==========
./treeview
./treeview/abc
./treeview/abc/pqr
./treeview/abc/pqr/1
./treeview/abc/pqr/2
./treeview/abc/pqr/3
./treeview/abc/sty
./treeview/xyz
filter============================
./treeview
./treeview/xyz
The idea is to capture a list of directories/subdirectories under treeview top level directory and then capture the same information for a list of excluded directories.
the output under "filter============================" line should give me a list of directories which I want to remove from the filesystem.
Appreciate any help here.

Related

Copy subfolders from one location to another that has the same structure but only overwrite the new versions

I have 2 locations with folders and multiple subfolders as follows:
Location 1
2022_10
├── FolderA
│ └── Version 3
│ └── FolderA.docx
├── FolderB
│ └── Version 2
| └── FolderB.docx
├── FolderC
│ └── Version 2
│ └── FolderC.docx
├── FolderD
│ └── Version 3
└── FolderD.docx
Location 2
2022_10
├── FolderA
│ └── Version 1
│ └── FolderA.docx
│ └── Version 2
│ └── FolderA.docx
├── FolderB
│ └── Version 1
| └── FolderB.docx
├── FolderC
│ └── Version 1
│ └── FolderC.docx
├── FolderD
│ └── Version 1
│ └── FolderA.docx
│ └── Version 2
│ └── FolderA.docx
Location 1 has the latest version of subfolders that I need to copy to location 2 (centralized repository), but respecting the folder structure and the previous folder versions that already exist there.
At the end, my objective is to have location 2 ingesting only the latest version as follows, if the version already exist the script should overwrite the latest version from Location 1 into Location 2.
Location 2
2022_10
├── FolderA
│ └── Version 1
│ └── FolderA.docx
│ └── Version 2
│ └── FolderA.docx
│ └── Version 3
│ └── FolderA.docx
├── FolderB
│ └── Version 1
| └── FolderB.docx
│ └── Version 2
| └── FolderB.docx
With some help I got a script up and running which replicates the Location 1 structure, that part is done, but now I'm thinking on the best way to have the same script to accomplish the copy between locations as well perhaps using shutil with a multi-option menu: -A for generating location 1 structure, and -C to do the copy operation from Location 1 to Location 2.
Here is the code I have:
#!/usr/bin/env python3
import docx, os, glob, re, shutil, sys
from pathlib import Path
#Taking the folder to process from user input (second argument is considered)
folder = sys.argv[1]
#Function to create a new folder if the path does not exist
def create_dir(path):
is_exist = os.path.exists(path)
if not is_exist:
os.makedirs(path)
for file in glob.glob(os.path.join(folder, '*.docx')):
main_folder = os.path.join(folder,Path(file).stem)
file_name = os.path.basename(file)
#Getting the version information from every word
doc = docx.Document(file).paragraphs[6].text
#Getting the version number line = Version Number: (.*) and extracting the number only portion
version_number = re.search("(Version Number: (.*))", doc).group(1)
version_subfolder = version_number.split(':')[1].strip()
# path to actual sub_folder with version_no
version_subfolder = os.path.join(main_folder, version_subfolder)
# destination path
dest_file_path = os.path.join(version_subfolder, file_name)
for i in [main_folder,version_subfolder]:
create_dir(i) # function call
# to move the file to the corresponding version folder (overwrite if exists)
if os.path.exists(dest_file_path):
os.remove(dest_file_path)
shutil.move(file, version_subfolder)
else:
shutil.move(file, version_subfolder)

After uploading the files, what shall I do in order to display their output on the web page?

I am creating a CRUD web app using python and flask.
Here is the link to see the output:
http://zcds4327.pythonanywhere.com/
Here is the repository's link:
https://github.com/OCTRACORE/cs_proj_pro_1
Tree structure:
.
├── Password_creator.py
├── __pycache__
│ ├── Password_creator.cpython-38.pyc
│ ├── exec_prog.cpython-38.pyc
│ ├── global_data.cpython-38.pyc
│ └── table.cpython-38.pyc
├── exec_prog.py
├── global_data.py
├── passwd_file
├── requirements.txt
├── start.sh
├── static
│ ├── AddPasswdInp.css
│ ├── dispManipMenu.css
│ ├── favicon_io
│ │ ├── android-chrome-192x192.png
│ │ ├── apple-touch-icon.png
│ │ ├── favicon-32x32.png
│ │ ├── favicon.ico
│ │ └── site.webmanifest
│ ├── input.css
│ ├── show.css
│ └── showOutput.css
├── table.py
└── template
├── add.html
├── file-manip.html
├── input.html
├── modify.html
├── output_disp.html
├── semantic-error.html
└── str_main.html
Parent directory of the code:cs_proj_pro_1
Code snippet (from the file exec_prog.py):
#app.route('/generate/file-manip-upload',methods = ["GET","POST"])
def upload_file(): #for uploading the file
decrypted_pass_output_lst = []
decrypted_desc_output_lst = []
decrypted_password = []
decrypted_data = ""
Upload_key = "" #for storing the key
if request.method == "POST":
Uploadpassfile= request.files['Uploadpassfile'] #requesting the name of the required file
Uploadpassfile_name = secure_filename(Uploadpassfile.filename) #storing the name of the password saving file
Uploadkeyfile = request.files['Uploadkeyfile'] #requesting the name of the key storing file
Uploadkeyfile_name = secure_filename(Uploadkeyfile.filename) #storing the name of the key storing file
descfileUpload = request.files['descfileUpload'] #requesting the name of the file storing descriptions
descfileUpload_name = secure_filename(descfileUpload.filename) #storing the name of the file string descriptions
#print(Uploadkeyfile_name,Uploadpassfile_name,descfileUpload_name)
with open(Uploadkeyfile_name,"rb") as k: # reading the containing the key
Upload_key = k.read()
key_init_sec = Fernet(Upload_key)
with open(Uploadpassfile_name,"rb") as k: # reading the file containing the passwords by decrypting them
for i in k:
decrypted_pass_output_lst.append(str(key_init_sec.decrypt(i),encoding="utf-8"))
with open(descfileUpload_name,"rb") as k: #reading the file containing the descriptions by decrypting them
for i in k:
decrypted_desc_output_lst.append(str(key_init_sec.decrypt(i),encoding="utf-8"))
x = len(decrypted_pass_output_lst)
(GlobalData.password).clear() #clearing the GlobalData.password list to avoid piling up of data
for s in range(x): #for inserting the decrypted data inside the GlobalData.password list for display
data_decrypted = Table(decrypted_pass_output_lst[s],decrypted_desc_output_lst[s])
(GlobalData.password).append(data_decrypted)
Working directory:
/home/ZCDS4327/proj/cs_proj_pro_1
Environment used:
PythonAnywhere
When I try to save the passwords and their corresponding descriptions, I need to go to the file manipulation page by clicking on the save/upload / upload/save link present at the home page and output page respectively.
The routes of the pages are:
1.) http://zcds4327.pythonanywhere.com/generate/file-manip-disp
2.) http://zcds4327.pythonanywhere.com/generate/show-output
After that, I have to select three files from the local machine. One file will be used for storing the passwords by encrypting them. The other one will store the description of all the passwords by encrypting them. The third one will store the key used for encrypting both the files. After that, I have to click the submit button to save them.
The files finally get stored inside the ZCDS4327 folder.
To show the outputs again in the web app in the decrypted way I have to select the files containing the required data through the `uploading section of the page through the required buttons, then click on submit to receive the output.
I have created a virtual environment in my local machine which runs a replica of this program inside the localhost server. If I try to upload all the three files which have contents from the app hosted by the localhost server to the app hosted by the pythonanywhere server, I receive an internal server error. My aim is that the user is able to upload the files from whichever part of their system they want.
The error log
2020-10-07 08:47:16,087: Exception on /generate/file-manip-upload [POST]
Traceback (most recent call last):
File "/usr/lib/python3.8/site-packages/flask/app.py", line 2446, in wsgi_app
response = self.full_dispatch_request()
File "/usr/lib/python3.8/site-packages/flask/app.py", line 1951, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/usr/lib/python3.8/site-packages/flask/app.py", line 1820, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/usr/lib/python3.8/site-packages/flask/_compat.py", line 39, in reraise
raise value
File "/usr/lib/python3.8/site-packages/flask/app.py", line 1949, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/lib/python3.8/site-packages/flask/app.py", line 1935, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/home/ZCDS4327/proj/cs_proj_pro_1/exec_prog.py", line 155, in upload_file
with open(Uploadkeyfile_name,"rb") as k: # reading the containing the key
FileNotFoundError: [Errno 2] No such file or directory: 'keyfile.txt'
Here, I am the user as well as the developer.
So what shall I do in order to resolve this error? Every suggestion will be appreciated.
Notes:
The localhost server has been taken as an example. The problem, as I believe will be encountered in the case of any other server too.
The files saved by the user will get downloaded to their system. The code for that is not yet written and I do not think that in this case, it is required to know that.

Move files from sepecific subdirectories to another based on names in Python

Given a directory ./ and its subdirectories ('project1, project2, project3, ...') and files structure as follows:
├── project1
│   ├── file1
│   ├── file2
│   ├── process
│   │   └── f1
│   │   ├── pic1.png
│   │   └── pic2.jpg
│   ├── progress
│   │   └── f2
│   │   ├── 623.jpg
│   │   └── pic2323.png
│   └── session
│   └── f3
│   ├── 5632116.jpg
│   └── 72163.png
└── project2
├── file1
├── file2
├── process
│   └── f1
│   ├── pic1.png
│   └── pic2.jpg
├── progress
│   └── f2
│   ├── 623.jpg
│   └── pic2323.png
└── session
└── f3
├── 5632116.jpg
└── 72163.png
For each project folder, I need to move pictures from process to empty files1, from progress and session to empty files2.
The expected result will like this:
|── project1
| ├── file1
| │   ├── pic1.png
| │   └── pic2.jpg
| |── file2
|     ├── 623.jpg
|    └── pic2323.png
| ├── 5632116.jpg
| └── 72163.png
└── project2
├── file1
│   ├── pic1.png
│   └── pic2.jpg
|── file2
    ├── 623.jpg
   └── pic2323.png
├── 5632116.jpg
└── 72163.png
My trial code works, but I don't think it's concise enough, welcome to improve it:
base_dir = './'
src_dst_map = {
'session': 'files1',
'process': 'files2',
'progress': 'files2'
}
for child in os.listdir(base_dir):
# print(child)
child_path = os.path.join(base_dir, child)
# print(child_path)
# src_path = os.path.join(child_path, 'session')
# print(src_path)
for src, dst in src_dst_map.items():
# print(src)
src_path = os.path.join(child_path, src)
dst_path = os.path.join(child_path, dst)
print(src_path)
print(dst_path)
for root, dirs, files in os.walk(src_path):
# print(root)
for file in files:
# print(file)
srce_path = os.path.join(root, file)
print(srce_path)
shutil.move(srce_path, dst_path)
shutil.rmtree(src_path, ignore_errors = True)
To be honest, this is already quite concise and it does the job, so I would not bother too much about changing it. Your code runs in 10 lines (plus set up of the dictionary and base path).
Regarding the duplicated images, you could check for existance of file in dst_path, and if it is present, add a prefix (or suffix) to the dublicated file.
In the following, the prefix according to the source is added, until a unique file name is found. This also catches, if there are more than one dublicate of the same file in one subfolder. You could change this according to your specific needs, but it should give you the general idea.
...
for root, dirs, files in os.walk(src_path):
for file in files:
srce_path = os.path.join(root, file)
while os.path.isfile(os.path.join(dst_path, file)):
file = '_'.join([src, file])
shutil.move(srce_path, os.path.join(dst_path, file))
I opted for a prefix, as this is much simpler to implement. A suffix would have to be added between the filename and the file ending, this would require a bit of additional code.

How to loop in Python to create files with names in list

I have a list of colors with a txt file containing URLs of images of those colors. I am trying to create a folder to contain images of each color and move this directory so that I may ultimately download the images.
I am able to perform this for each element of the list individually, but this is tedious and I would prefer to automate it.
classes = ['red', 'orange', 'yellow', 'green', 'blue', 'purple']
This is the code I currently have for each color:
folder = 'red'
file = 'red.txt'
mv red.txt data/colors
path = Path('data/colors')
dest = path/colors
dest.mkdir(parents=True, exist_ok=True)
download_images(path/file, dest, max_pics=200)
I expect to have a folder per color containing the respective downloaded images.
Your list of colors is in classes python list. You have <color name>.txt files containing URLs of images of those colors listed in classes list. So you have an initial directory structure which looks like following directory tree:
.
├── blue.txt
├── green.txt
├── orange.txt
├── purple.txt
├── red.txt
├── script.py
└── yellow.txt
Now you want to create separate directories for each color. So finally your directory structure should look like following directory tree:
.
├── data
│   └── colors
│   ├── blue
│   ├── blue.txt
│   ├── green
│   ├── green.txt
│   ├── orange
│   ├── orange.txt
│   ├── purple
│   ├── purple.txt
│   ├── red
│   ├── red.txt
│   ├── yellow
│   └── yellow.txt
└── script.py
Where your download_image() method will download the image for given URLs in <color name>.txt file which it receives as one of the arguments. It also receives the destination of the image directory to be placed and the maximum no of images it should download.
If I understood your problem correctly following code would solve your problem. Code is well commented and self-explanatory. You can drop comments to ask for more clarifications.
import os
base_path = "data/colors/"
# create base path directories if not already present
os.system("mkdir -p data")
os.system("mkdir -p data/colors")
classes = ['red', 'orange', 'yellow', 'green', 'blue', 'purple']
# dummy download image function
def download_image(path, dest, max_pics):
print("URL file path: " + path)
print("Image destination: " + dest)
print("No of Images to be downloaded: " + str(max_pics))
if __name__ == "__main__":
for colour in classes:
# create directories for each colour if not already present
os.system("mkdir -p " + base_path + colour)
# move <colour_name>.txt file into base path
os.system("mv " + colour+".txt " + base_path)
dest = base_path + colour
# call download_image method
download_image(base_path+colour+".txt", dest, max_pics=200)

Auto importing modules in the folder and allowing them to be selected by name from a variable string

How it's organized
Currently, this is my folder structure:
├ Websites
│ ├ Prototypes
│ │ ├ __init__.py
│ │ ├ Website.py
│ │ └ XML.py
│ ├ __init__.py
│ └ FunHouse.py
└ scrape_sites.py
This is FunHouse.py:
from Websites.Prototypes.Website import Website, Search
class FunHouse(Website):
def doStuff():
# does stuff
This is my __init__.py in the Websites folder:
from Websites.Prototypes.Website import Website
from Websites.FunHouse import FunHouse
def choose_site(website):
if website == "FunHouse":
return FunHouse()
else:
return Website()
And in my scrape_sites.py file is the following:
import Websites
# Some code that loads a text file and sets website_string to "FunHouse"
website = Websites.choose_site(website_string)
website.doStuff()
My Question
If I want to add a website, I have to edit __init__.py. Is there any way to make it so that I don't have to edit the __init__.py file whenever I add a new website? So if I create Google.py, I can just throw it into the Websites folder and it will be available to call?
You can add the __all__ variable to your __init__.py file once.
__all__ = ["Google", "Yahoo", "Bing"]
Now, each of these modules will be available to you in your code to use.
I figured it out. I don't even need the __init__.py files:
def choose_site(website_str):
mod = __import__('Websites.' + website_str, fromlist=[website_str])
obj = getattr(mod, website_str)
return obj()
This relies on the class name being the same as the Python filename, because if you pass in "Google" in the argument, it will import Websites.Google.Google
If you're looking at this and wanted to have more flexibility, this will work:
def choose_class(module_name, filename, class_name):
"""
:param module_name: The folder
:param filename: The .py file
:param class_name: The class in the .py file
:return: The selected class
"""
mod = __import__(module_name+ '.' + filename, fromlist=[filename])
obj = getattr(mod, class_name)
return obj()

Resources