Determine if a path is valid in a class constructor - python-3.x

Without violating the guideline that a constructor should do work, I need to determine if the provided string (destination_directory) is a valid path before assigning it in the constructor.
It doesn't have to exist, but the provide string must be a valid one, i.e. no invalid symbols, or illegal characters. My project will run on Windows only, not Linux.
I looked at this page, but the answers seem to try and open the directory to test if the provided string is valid.
I also tried os.path.isabs(path)but it doesn't provide the results I require. For example, it says that T:\\\\Pictures is a absolute path, while that may be true, the \\\\ should mean the path is invalid.
Is there a clean, perhaps one line way of achieving what I want?
def __init__(self, destination_directory: str)
self._validate_path(path=destination_directory)
self.destination_directory = destination_directory
def _validate_path(self, path)
# code to validate path should go here.

We now a few things about a path, it contains at least a drive letter and subdirectories.
We also have rules about what symbols are not allowed in directories. We also know that a drive letter contains a single character.
Instead of allowing users of our class to pass in a full path, we break it down and only allow valid strings for directories names and one letter for the drive. When everything is validated, we can use the os module to build our path.
Here is how I would structure my Folder class:
class Folder:
def __init__(self, *subdirectories, root_drive):
self._validate_drive_letter(letter = root_drive)
self._validate_path(path=subdirectories)
self._root_drive = root_drive
self._subdirectories = subdirectories
def _validate_drive_letter(self, letter):
if not letter or len(letter) > 2 or not letter.isalpha():
raise ValueError("Drive letter is invalid")
def _validate_path(self, path):
self._forbidden_characters = ["<", ">", ":", "/", '"', "|", "?", "*", '\\']
for character in path:
for item in character:
if item in self._forbidden_characters:
raise ValueError("Directory cannot contain invalid characters")
def construct_full_path(self) -> str:
# use the os module and constructor parameters to build a valid path
def __str__(self) -> str:
return f"Drive Letter: {self._root_drive} Subdirectories: {self._subdirectories}"
Main:
def main():
try:
portable_drive = Folder("Pictures", "Landscape", root_drive="R") # Valid
# Using the construct_full_path() function, the returned string would be:
# R:\Pictures\Landscape
# Notice the user doesn't provide the : or the \, the class will do it.
vacation_pictures = Folder("Vac??tion", root_drive="T") # Will raise ValueError
# If we fix the error and call construct_full_path() we will get T:\Vacation
except ValueError as error:
print(error)
else:
print(portable_drive)
print(vacation_pictures)
if __name__ == "__main__":
main()
It may not be the best approach, but it works. I know a nested for loop is bad, but I don't see any other way to validate the individual characters of a string.

A regex solution:
import re
windows_path_regex = re.compile(r"""
\A
(?:(?:[a-z]:|\\\\[a-z0-9_.$\●-]+\\[a-z0-9_.$\●-]+)\\| # Drive
\\?[^\\/:*?"<>|\r\n]+\\?) # Relative path
(?:[^\\/:*?"<>|\r\n]+\\)* # Folder
[^\\/:*?"<>|\r\n]* # File
\Z
""", re.VERBOSE|re.I)
d = windows_path_regex .match(r"\test\txt.txt")
print(bool(d))
Note that\ is a valid path but / is not.
I used 8.18. Validate Windows Paths as a reference.

Related

List is empty when appending when using recursion

I have two functions. The first one is used to get a list of paths to text files, and the second one is used to iterate over this list of paths and then check if they include the word password. But because of the Try Except statement in the second function, I had to use recursion to make it continue running unless there's another way if possible to provide below. My problem is that the list returned in the second function is empty why and how to fix it?
def search_txt():
"""Function to search the C:\\ for .txt files -> then add them (including full path to file) to a list."""
list_of_txt = []
for dir_path, sub_dir, files in os.walk("C:\\"):
"""Method 1 -> checks the end of the file name (could be used for specific extensions)"""
for file in files:
if file.endswith(".txt"):
list_of_txt.append(os.path.join(dir_path, file))
return list_of_txt
def search_pass_file(list_of_files: list):
"""Function to iterate over each text file, searching if the word "password" is included -> Returns the text
file's path """
list_of_pass = []
if len(list_of_files) != 0:
for i in range(len(list_of_files)):
file = list_of_files.pop()
try:
with open(file, encoding="utf8") as f:
for line in f.readlines():
if "password" in line:
list_of_pass.append(file)
except UnicodeDecodeError:
return search_pass_file(list_of_files)
except PermissionError:
return search_pass_file(list_of_files)
else:
return list_of_pass
if __name__ == '__main__':
myList = search_txt()
print(search_pass_file(myList))
You're returning list_of_pass only if len(list_of_files) == 0 (it's in the else block). Your return statement should occur after the loop (which should be a while one btw)
You can except several errors in one line by putting them in parenthesis: except (UnicodeDecodeError, PermissionError) of except all exceptions (for instance, you're not handling FileNotFoundError).
I'd reduce your function to:
def search_pass_file(list_of_files: list):
"""Function to iterate over each text file, searching if the word "password" is included -> Returns the text
file's path """
list_of_pass = []
while list_of_files:
file = list_of_files.pop()
try:
with open(file, encoding="utf8") as f:
for line in f.readlines():
if "password" in line:
list_of_pass.append(file)
break
except Exception:
list_of_pass += search_pass_file(list_of_files)
return list_of_pass
Edit: also in your except block, you should append the returned value of the recursive function to list_of_pass otherwise you'll lose the files found after the error occurs.

Python: command line, sys.argv, "if __name__ == '__main__' "

I have a moderate amount of experience using Python in Jupyter but am pretty clueless about how to use the command line. I have this prompt for a homework assignment -- I understand how the algorithms work, but I don't know how to format everything so it works from the command line in the way that is specified.
The prompt:
Question 1: 80 points
Input: a text file that specifies a travel problem (see travel-input.txt
for the format) and a search algorithm
(more details are below).
python map.py [file] [search] should read
the travel problem from “file” and run the “search” algorithm to find
a solution. It will print the solution and its cost.
search is one of
[DFTS, DFGS, BFTS, BFGS, UCTS, UCGS, GBFTS, GBFGS, ASTS, ASGS]
Here is the template I was given:
from search import ... # TODO import the necessary classes and methods
import sys
if __name__ == '__main__':
input_file = sys.argv[1]
search_algo_str = sys.argv[2]
# TODO implement
goal_node = ... # TODO call the appropriate search function with appropriate parameters
# Do not change the code below.
if goal_node is not None:
print("Solution path", goal_node.solution())
print("Solution cost", goal_node.path_cost)
else:
print("No solution was found.")
So as far as python map.py [file] [search] goes, 'file' refers to travel-input.txt and 'search' refers to one of DFTS, DFGS, BFTS,... etc - a user-specified choice. My questions:
Where do I put my search functions? Should they all just be back-to-back in the same block of code?
How do I get the command line to recognize each function from its four or five-letter code? Is it just the name of the function? If I call it just using those letters, how can the functions receive input?
Do I need to reference the input file anywhere in my code?
Does it matter where I save my files in order for them to be accessible from the command line - .py files, travel-input.txt, etc? I've tried accessing them from the command line, with no success.
Thanks for the help!
The function definitions go before the if __name__ == "__main__" block. To select the correct function you can put them in a dict and use the four-letter abbreviations as keys, i.e.
def dfts_search(...):
...
def dfgs_search(...):
....
...
if __name__ == "__main__":
input_file = sys.argv[1]
search_algo_str = sys.argv[2]
search_dict = {"DFTS": dfts_search, "DFGS": dfgs_search, ...}
try:
func = search_dict[search_algo_str]
result = func(...)
except KeyError:
print(f'{search_algo_str} is an unknown search algorithm')
Not sure what you mean by reference, but input_file already refers to the input file. You will need to write a function to read the file and process the contents.
The location of the files shouldn't matter too much. Putting everything in the same directory is probably easiest. In the command window, just cd to the directory where the files are located and run the script as described in the assignment.

How to save files to a directory and append those files to a list in Python?

Scenario:
I want to check whether if a directory contains a certain '.png' image file. If so, this image file along with all the other files (with png extension only) gets stored in a different directory. (The solution I am looking for should work in all OS platforms i.e Windows, Unix, etc.) and in a remote server i.e (FTP etc.)
I have tried the following code below:
import os, sys
import shutil
import pathlib
import glob
def search():
image_file = 'picture.png'
try:
arr = [] #List will be used to append all the files in a particular directory.
directory = pathlib.Path("collection") #checks if the collection directory exists.
files = []
#need to convert the PosixPath (directory) to a string.
[files.extend(glob.glob(str(directory) + "/**/*.png", recursive = True))]
res = [img for img in files if(img in image_file)] #checks if the image is within the list of files i.e 'picture.png' == 'collection\\picture.png'
if str(bool(res)): #If True...proceed
print("Image is available in image upload storage directory")
for file in files:
transfer_file = str(file)
shutil.copy(file, 'PNG_files/') #send all the files to a different directory i.e 'PNG_files' by using the shutil module.
arr.append(transfer_file)
return arr
else:
print("image not found in directory")
except OSError as e:
return e.errno
result = search() #result should return the 'arr' list. This list should contain png images only.
However, during execution, the For loop is not getting executed. Which means:
The image files are not stored in the 'PNG_files' directory.
The images are not getting appended in the 'arr' list.
The code above the For loop worked as expected. Can anyone explain to me what went wrong?
There are several issues:
In this line
res = [img for img in files if(img in image_file)] #checks if the image is within the list of files i.e 'picture.png' == 'collection\\picture.png'
you should check the other way around (as written in the comment): image_file in img, e.g. picture.png in collection/picture.png.
str(directory) + "/**/*.png" is not OS independent. If you need this to work on Windows, too, you should use os.path.join(str(directory), '**', '*.png') instead!
This check is incorrect: if str(bool(res)):. It's actually always true, because bool(res) is either True or False, str(bool(res)) is either "True" or "False", but both are actually True, as neither is an empty string. Correctly: if res:.
And finally, you're missing the creation of the PNG_files directory. You need to either manually create it before running the script, or call os.mkdir().

How to test if object is a pathlib path?

I want to test if obj is a pathlib path and realized that the condition type(obj) is pathlib.PosixPath will be False for a path generated on a Windows machine.
Thus the question, is there a way to test if an object is a pathlib path (any of the possible, Path, PosixPath, WindowsPath, or the Pure...-analogs) without checking for all 6 version explicitly?
Yes, using isinstance(). Some sample code:
# Python 3.4+
import pathlib
path = pathlib.Path("foo/test.txt")
# path = pathlib.PureWindowsPath(r'C:\foo\file.txt')
# checks if the variable is any instance of pathlib
if isinstance(path, pathlib.PurePath):
print("It's pathlib!")
# No PurePath
if isinstance(path, pathlib.Path):
print("No Pure path found here")
if isinstance(path, pathlib.WindowsPath):
print("We're on Windows")
elif isinstance(path, pathlib.PosixPath):
print("We're on Linux / Mac")
# PurePath
else:
print("We're a Pure path")
Why does isinstance(path, pathlib.PurePath) work for all types? Take a look at this diagram:
We see that PurePath is at the top, that means everything else is a subclass of it. Therefore, we only have to check this one.
Same reasoning for Path to check non-pure Paths.
Bonus: You can use a tuple in isinstance(path, (pathlib.WindowsPath, pathlib.PosixPath)) to check 2 types at once.
I liked NumesSanguis answer and this is how I used what I learnt:
def check_path_instance(obj: object, name: str) -> pathlib.Path:
""" Check path instance type then convert and return
:param obj: object to check and convert
:param name: name of the object to check (apparently there is no sane way to get the name of the variable)
:return: pathlib.Path of the object else exit the programe with critical error
"""
if isinstance(obj, (pathlib.WindowsPath, pathlib.PosixPath)):
return pathlib.Path(obj)
else:
if isinstance(obj, str):
return pathlib.Path(str(obj))
else:
logging.critical(
f'{name} type is: {type(obj)}, not pathlib.WindowsPath or pathlib.PosixPath or str')
)
sys.exit(1)

Stuck in a recursive directory search using os.scandir

I'm searching through a large directory to sort an old archive into a specific order. I have embedded a function which is called recursively and when it finds a directory whose file path matches the search criteria it adds it to the 'found' dictionary fdict.
The expected outcome is that when the function is called on a directory with no subdirectories it completes with no actions and moves back up a level.
When run it gets stuck in the first directory it finds that contains no sub-directories and simply recursively calls the current directory for a search, getting stuck in a loop.
Below is the code abstract, any insight into why it is looping would be much appreciated.
def scan(queries, directory):
fdict = {}
def search(queries, directory, fdict):
for entry in os.scandir(directory):
if entry.is_dir():
for x in queries:
if str(x) in entry.path:
fdict[str(x)] = entry.path
print("{} found and dicted".format(str(x)))
else:
search(queries, entry.path, fdict)
else: pass
search(queries, directory, fdict)
return fdict
The whole thing can be written as
import os
# let qs be a list of queries [q]
# root be the start dir
for path, dirnames, filenames in os.walk(root):
for dirname in dirnames:
full_path = os.path.join(path, dirname) # optional (depends)
for q in qs:
if q in full_path:
# do whatever
os.walk is recursive. You can do some set operation as well, to eliminate for q in qs. Comment if it doesn't work for you.
OK so it turns out the problem was in the for x in queries: statement.
The apparent loop was caused by bad design which meant that only the first value in the queries list compared to entry.path before the else statement was called and the search function called on the current entry.path.
Once a directory with no sub-directories was reached, it would then step back up one level and test the second entry in queries against entry.path.
Although the code would eventually produce the required result, this approach would take absolutely ages (in this instance queries is a 4000 value long list!) and gave the appearance of a loop on inspection.
Below is the corrected code for future reference if anyone stumbles across a similar problem.
def scan(queries, directory):
fdict = {}
def search(queries, directory, fdict):
for entry in os.scandir(directory):
if entry.is_dir():
if entry.name in queries:
fdict[str(x)] = entry.path
else:
time.sleep(2)
search(queries, entry.path, fdict)
else: pass
search(queries, directory, fdict)
return fdict

Resources