Double backslashes for filepath_or_buffer with pd.read_csv - python-3.x

Python 3.6, OS Windows 7
I am trying to read a .txt using pd.read_csv() using relative filepath. So, from pd.read_csv() API checked out that the filepath argument can be any valid string path.
So, in order to define the relative path I use pathlib module. I have defined the relative path as:
df_rel_path = pathlib.Path.cwd() / ("folder1") / ("folder2") / ("file.txt")
a = str(df_rel_path)
Finally, I just want to use it to feed pd.read_csv() as:
df = pd.read_csv(a, engine = "python", sep = "\s+")
However, I am just getting an error stating "No such file or directory: ..." showing double backslashes on the folder path.
I have tried to manually write the path on pd.read_csv() using a raw string, that is, using r"relative/path". However, I am still getting the same result, double backslashes. Is there something I am overlooking?

You can get what you want by using os module
df_rel_path = os.path.abspath(os.path.join(os.getcwd(), "folder1", "folder2"))
This way the os module will deal with the joining the path parts with the proper separator. You can omit os.path.abspath if you read a file that's within the same directory but I wrote it for the sake of completeness.
For more info, refer to this SO question: Find current directory and file's directory

You need a filename to call pd.read_csv. In the example 'a' is a only the path and does not point to a specific file. You could do something like this:
df_rel_path = pathlib.Path.cwd() / ("folder1") / ("folder2")
a = str(df_rel_path)
df = pd.read_csv(a+'/' +'filename.txt')
With the filename your code works for me (on Windows 10):
df_rel_path = pathlib.Path.cwd() / ("folder1") / ("folder2")/ ("file.txt")
a = str(df_rel_path)
df = pd.read_csv(a)

Related

Python filepaths have double backslashes

Ultimately, I want to loop through every pdf in specified directory ('C:\Users\dude\pdfs_for_parsing') and print the metadata for each pdf. The issue is that when I try to loop through the "directory" I'm receiving the error "FileNotFoundError: [Errno 2] No such file or directory:". I understand this error is occurring because I now have double slashes in my filepaths for some reason.
Example Code
import PyPDF2
import os
path_of_the_directory = r'C:\Users\dude\pdfs_for_parsing'
directory = []
ext = ('.pdf')
def isolate_pdfs():
for files in os.listdir(path_of_the_directory):
if files.endswith(ext):
x = os.path.abspath(files)
directory.append(x)
for pdf in directory:
reader = PyPDF2.PdfReader(pdf)
information = reader.metadata
print(information)
isolate_pdfs()
If I print the file paths one at a time, I see that the files have single '/' like I'm expecting:
for pdf in directory:
print(pdf)
The '//' seems to get added when I try to open each of the PDFs 'PDFFile = open(pdf,'rb')'
Your issue has nothing to do with //, it's here:
os.path.abspath(files)
Say you have C:\Users....\x.pdf, you list that directory, so the files will contain x.pdf. You then take the absolute path of x.pdf, which the abspath supposes to be in the current directory. You should replace it with:
x = os.path.join(path_of_the_directory, files)
Other notes:
PDFFile and PDF shouldn't be in uppercase. Prefer pdf_file and pdf_reader. The latter also avoids the confusion with the for pdf in...
Try to use a debugger rather than print statements. This is how I found your bug. It can be in your IDE or in command line with python -i You can step through your code, test a few variations, fiddle with the variables...
Why is ext = ('.pdf') with braces ? It doesn't do anything but leads to think that it might be a tuple (but isn't).
As an exercise the first for can be written as: directory = [os.path.join(path_of_the_directory, x) for x in os.listdir(path_of_the_directory) if x.endswith(ext)]

Python redirect the path by detection of ../ from anothers path

I have a path in variable A
A=r'\\omega3t.cr.in.com\shop\recipe\fad\prod\CPL\Wite\Proton\Coach_Color_Dress.xml
I have anothers path B
B="..\..\Type\Car\Proton.xml"
By using python I would like to print entire path for path B which redrive from Path A
Expected output for C is:
C=r'\\omega3t.cr.in.com\shop\recipe\fad\prod\CPL\Type\Car\Proton.xml'
Anyone have ideas?
You could use the powerful pathlib module:
from pathlib import Path
a = A.replace('\\', '/')
b = B.replace('\\', '/')
c = Path(a) / Path(b)
print(c.resolve())
gives /omega3t.cr.in.com/shop/recipe/fad/prod/CPL/Wite/Type/Car/Proton.xml
You should check if that's really what you want. It's strange to use .. on a file path, usually that is used on a directory path.
If you really need backslashes you can still replace them:
str(c.resolve()).replace('/', '\\')
gives \omega3t.cr.in.com\shop\recipe\fad\prod\CPL\Wite\Type\Car\Proton.xml

Get filename and arguments from path on Windows system (with Python)

Im running into a issue when Im trying to get the filename and arguments from a binary path.
For example, here is the binary path that is giving me trouble:
binaryPath = "C:\Windows\System32\msiexec \V"
Ideally, I would like the result to be:
filename: "msiexec.exe"
arguments: "\V"
Here is what I have tried (and this works for 99% of paths with arguments, just not the one above). And obviously the "//V" is messing this up and os.path is looping it in with the filepath.
import os
binaryPath = "C:\Windows\System32\msiexec \V"
fn_with_arguments = os.path.basename(binaryPath)
image = fn_with_arguments[0].replace("'","")
arguments = " ".join(fn_with_arguments[1:])
if image:
print("Image: {}".format(image))
if arguments:
print("Arguments: {}".format(arguments))
>>> Image: V
Any ideas? Speed is of importance here so I dont really want to split the path into pieces and then iterate to find the piece with a "dot" in it...

Problem with multivariables in string formatting

I have several files in a folder named t_000.png, t_001.png, t_002.png and so on.
I have made a for-loop to import them using string formatting. But when I use the for-loop I got the error
No such file or directory: '/file/t_0.png'
This is the code that I have used I think I should use multiple %s but I do not understand how.
for i in range(file.shape[0]):
im = Image.open(dir + 't_%s.png' % str(i))
file[i] = im
You need to pad the string with leading zeroes. With the type of formatting you're currently using, this should work:
im = Image.open(dir + 't_%03d.png' % i)
where the format string %03s means "this should have length 3 characters and empty space should be padded by leading zeroes".
You can also use python's other (more recent) string formatting syntax, which is somewhat more succinct:
im = Image.open(f"{dir}t_{i:03d}")
You are not padding the number with zeros, thus you get t_0.png instead of t_000.png.
The recommended way of doing this in Python 3 is via the str.format function:
for i in range(file.shape[0]):
im = Image.open(dir + 't_{:03d}.png'.format(i))
file[i] = im
You can see more examples in the documentation.
Formatted string literals are also an option if you are using Python 3.6 or a more recent version, see Green Cloak Guy's answer for that.
Try this:
import os
for i in range(file.shape[0]):
im = Image.open(os.path.join(dir, f't_{i:03d}.png'))
file[i] = im
(change: f't_{i:03d}.png' to 't_{:03d}.png'.format(i) or 't_%03d.png' % i for versions of Python prior to 3.6).
The trick was to specify a certain number of leading zeros, take a look at the official docs for more info.
Also, you should replace 'dir + file' with the more robust os.path.join(dir, file), which would work regardless of dir ending with a directory separator (i.e. '/' for your platform) or not.
Note also that both dir and file are reserved names in Python and you may want to rename your variables.
Also check that if file is a NumPy array, file[i] = im may not be working.

Python 3.6 pathlib Path change name parent directory

The new Path package from the pathlib library, which has been added from Python 3.4, seems a powerful replacement of approaches such as os.path.join(), but I've some trouble working with it.
I have a path that can be anything from
folder_foo/file.csv
to
long/path/to/folder_foo/file.csv
I read the .csv file in folder_foo with pandas, modify it and want to save it to
folder_bar/file.csv
or
long/path/to/folder_bar/file.csv
Essentially I want to rename folder_foo to folder_bar in the Path object.
EDIT: example path code
csv_path = Path("long/path/to/folder_foo/file.csv")
Attempts
1
csv_path.parents[0] = csv_path.parents[0] + "_clean")
Which leads to the error TypeError: unsupported operand type(s) for +: 'PosixPath' and 'str', which means you cannot use + to combine a PosixPath with a str as described in TypeError: unsupported operand type(s) for +: 'PosixPath' and 'str'.
2
To solve this I tried the following:
csv_path.parents[0] = Path(str(csv_path.parents[0]) + "_clean")
Which however results in the error : TypeError: '_PathParents' object does not support item assignment.
Since PosixPath is not a list, this error is understandable.
3
Maybe .parts is a better approach, but
csv_path.parts[-2] = csv_path.parts[-2][:-3] + "bar"
results in: TypeError: 'tuple' object does not support item assignment.
Question
How can I easily rename the file's parent folder?
Would rather split this up for readability:
bar_folder = csv_path.parent.parent / 'folder_bar'
csv_path2 = bar_folder / csv_path.name
Having the destination folder as a variable also enables you to create the folder using for example:
bar_folder.mkdir(exist_ok=True)
You could also write a little function to replace the part of the path you want to change. Here's a runnable example:
from pathlib import Path
path1 = Path("a/b/c.txt")
path2 = Path("b/c.txt")
def rename_dir(path, src, dst):
# convert to list so that we can change elements
parts = list(path.parts)
# replace part that matches src with dst
parts[parts.index(src)] = dst
return Path(*parts)
rename_dir(path1, 'b', 'q')
#> PosixPath('a/q/c.txt')
rename_dir(path2, 'b', 'q')
#> PosixPath('q/c.txt')
Created at 2021-03-06 10:44:00 PST by reprexlite v0.4.2
EDIT: Found a cleaner solution without str()
csv_path2 = csv_path.parents[1] / (csv_path.parts[-2][:-3] + "bar") / csv_path.parts[-1]
# result
PosixPath('long/path/to/folder_bar/file.csv')
Path.parents gets the whole path to the folder minus the file. Path.parents[1] goes 2 levels up (long/path/to/), which is still a Path object. Then we get the last folder name with csv_path.parts[-2], which is a string. We apply [:-3] to get all string characters except "foo". This means we have "folder_". Then with + "bar" we get "folder_bar", which is added to our Path object. Finally we re-add the file name to our Path object with / csv_path.parts[-1].
Hack like solution
csv_path = Path(str(csv_path.parents[0])[:-3] + 'bar/' + csv_path.parts[-1])
It seems to me a bit unintuitive, however. There should be a more clean solution?

Resources