I have the following code, it prints out the file but it doesn't assign it to the variable file so that i can open it
for file in os.listdir('C:\\Users\\####\\Documents\\Visual Studio 2015\\Projects\\Data\\'):
if fnmatch.fnmatch(file, '*.csv'):
scanReport = open(file)
scanReader = csv.reader(scanReport)
fnmatch doesn't (and cannot) expand file into the proper path. It's just a wildcard pattern test.
os.listdir returns the file names not the file paths. match the filename (as you already do) but provide full path to open using os.path.join with your source directory:
the_dir = r'C:\Users\####\Documents\Visual Studio 2015\Projects\Data'
for file in os.listdir(the_dir):
if fnmatch.fnmatch(file, '*.csv'):
scanReport = open(os.path.join(the_dir,file))
or maybe it's better to use glob.glob in that case to get filter & absolute path at the same time.
import glob
for file in glob.glob(r'C:\Users\####\Documents\Visual Studio 2015\Projects\Data\*.csv'):
scanReport = open(file)
Related
Ultimately, I want to loop through every pdf in specified directory ('C:\Users\dude\pdfs_for_parsing') and print the metadata for each pdf. The issue is that when I try to loop through the "directory" I'm receiving the error "FileNotFoundError: [Errno 2] No such file or directory:". I understand this error is occurring because I now have double slashes in my filepaths for some reason.
Example Code
import PyPDF2
import os
path_of_the_directory = r'C:\Users\dude\pdfs_for_parsing'
directory = []
ext = ('.pdf')
def isolate_pdfs():
for files in os.listdir(path_of_the_directory):
if files.endswith(ext):
x = os.path.abspath(files)
directory.append(x)
for pdf in directory:
reader = PyPDF2.PdfReader(pdf)
information = reader.metadata
print(information)
isolate_pdfs()
If I print the file paths one at a time, I see that the files have single '/' like I'm expecting:
for pdf in directory:
print(pdf)
The '//' seems to get added when I try to open each of the PDFs 'PDFFile = open(pdf,'rb')'
Your issue has nothing to do with //, it's here:
os.path.abspath(files)
Say you have C:\Users....\x.pdf, you list that directory, so the files will contain x.pdf. You then take the absolute path of x.pdf, which the abspath supposes to be in the current directory. You should replace it with:
x = os.path.join(path_of_the_directory, files)
Other notes:
PDFFile and PDF shouldn't be in uppercase. Prefer pdf_file and pdf_reader. The latter also avoids the confusion with the for pdf in...
Try to use a debugger rather than print statements. This is how I found your bug. It can be in your IDE or in command line with python -i You can step through your code, test a few variations, fiddle with the variables...
Why is ext = ('.pdf') with braces ? It doesn't do anything but leads to think that it might be a tuple (but isn't).
As an exercise the first for can be written as: directory = [os.path.join(path_of_the_directory, x) for x in os.listdir(path_of_the_directory) if x.endswith(ext)]
I met strange problem using glob (python 3.10.0/Linux):
if I use glob for location of the required file using following construct:
def get_last_file(folder, date=datetime.today().date()):
os.chdir(folder)
_files = glob.glob("*.csv")
_files.sort(key=os.path.getctime)
os.chdir(os.path.join("..", ".."))
for _filename in _files[::-1]:
string = str(date).split("-")
if "".join(string) in _filename:
return _filename
# if cannot find the specific date, return newest file
return _files[-1]
but when I try to
os.path.join(fileDir, file)
with the resulting file, I get the relative path which leads to:
FileNotFoundError: [Errno 2] No such file or directory: 'data/1109.csv'.
File certainly exist and whet i try os.path.join(fileDir, '1109.csv'), file is found.
The weirdest thing - if i do:
filez = get_last_file(fileDir, datetime.today().date())
file = '1109.csv''
I still get file not found for file after os.path.join(fileDir, file).
Should I avoid using glob at all?
I made such solution:
file =''
_mtime=0
for root, dirs, filenames in os.walk(fileDir):
for f in sorted(filenames):
if f.endswith(".csv"):
if os.path.getmtime(fileDir+f) > _mtime:
_mtime = os.path.getmtime(fileDir+f)
file = f
print (f'fails {file}')
and the resulting os.path.join(fileDir, file) gives (relative) path fit for further operations
Also the difference between getctime and getmtime is accounted for.
While not a direct solution, try looking at Python's Pathlib library. It often leads to cleaner, less buggy solutions.
from pathlib import Path
def get_last_file(folder, date=datetime.today().date()):
folder = pathlib.Path(folder) # Works for both relative and absolute paths
_files = Path.cwd().glob("*.csv")
_files.sort(key=os.path.getctime)
grandparent_path = folder.parents[1]
for _filename in _files[::-1]:
string = str(date).split("-")
if "".join(string) in _filename:
return _filename
# if cannot find the specific date, return newest file
return _files[-1]
Then instead of using os.path.join() you can do path_dir / file_name where path_dir is Path object. This may also be the case that you are changing the base path in within your function, leading to unexpected behaviour.
I'm asking for help in trying to create a loop to make this script go through all files in a local directory. Currently I have this script working with a single HTML file, but would like it so it picks the first file in the directory and just loops until it gets to the last file in the directory.
Another way to help would be adding a line to the string would add a (1), (2), (3), etc. at the end if the names are duplicate.
Can anyone help with renaming thousands of files with a string that is parsed with BeautifulSoup4. Each file contains a name and reference number at the same position/line. Could be same name and reference number, or could be different reference number with same name.
import bs4, shutil, os
src_dir = os.getcwd()
print(src_dir)
dest_dir = os.mkdir('subfolder')
os.listdir()
dest_dir = src_dir+"/subfolder"
src_file = os.path.join(src_dir, 'example_filename_here.html')
shutil.copy(src_file, dest_dir)
exampleFile = open('example_filename_here.html')
exampleSoup = bs4.BeautifulSoup(exampleFile.read(), 'html.parser')
elems = exampleSoup.select('.bodycopy')
type(elems)
elems[2].getText()
dst_file = os.path.join(dest_dir, 'example_filename_here.html')
new_dst_file_name = os.path.join(dest_dir, elems[2].getText()+ '.html')
os.rename(dst_file, new_dst_file_name)
os.chdir(dest_dir)
print(elems[2].getText())
I have one scenario where i have to rename the files in the folder. Please find the scenario,
Example :
Elements(Main Folder)<br/>
2(subfolder-1) <br/>
sample_2_description.txt(filename1)<br/>
sample_2_video.avi(filename2)<br/>
3(subfolder2)
sample_3_tag.jpg(filename1)<br/>
sample_3_analysis.GIF(filename2)<br/>
sample_3_word.docx(filename3)<br/>
I want to modify the names of the files as,
Elements(Main Folder)<br/>
2(subfolder1)<br/>
description.txt(filename1)<br/>
video.avi(filename2)<br/>
3(subfolder2)
tag.jpg(filename1)<br/>
analysis.GIF(filename2)<br/>
word.docx(filename3)<br/>
Could anyone guide on how to write the code?
Recursive directory traversal to rename a file can be based on this answer. All we are required to do is to replace the file name instead of the extension in the accepted answer.
Here is one way - split the file name by _ and use the last index of the split list as the new name
import os
import sys
directory = os.path.dirname(os.path.realpath("/path/to/parent/folder")) #get the directory of your script
for subdir, dirs, files in os.walk(directory):
for filename in files:
subdirectoryPath = os.path.relpath(subdir, directory) #get the path to your subdirectory
filePath = os.path.join(subdirectoryPath, filename) #get the path to your file
newFilePath = filePath.split("_")[-1] #create the new name by splitting the old name by _ and grabbing last index
os.rename(filePath, newFilePath) #rename your file
Hope this helps.
check below code example for the first filename1, replace path with the actual path of the file:
import os
os.rename(r'path\\sample_2_description.txt',r'path\\description.txt')
print("File Renamed!")
I've researched and tested this issue for a while and can't seem to get it to work.
user_path
Is provided by the user and it contains .xlsm, ,xlsb and .xlsx file types. I'm trying to catch all of them and convert them to .csv. This works individually if I substitute the extensions:
all_files = glob.glob(os.path.join(user_path, "*.xlsm")) #xlsb, xlsm
I've tried the following two methods, neither of which work (win32com just tells me Excel can't access the out_folder.)
all_files = glob.glob(os.path.join(user_path, "*"))
all_files = glob.glob(user_path)
How can I send these two file types together with user_path?
Thanks in advance.
By using just *, glob matches all files AND directories under the given folder, including those you have no access to, which in your case is the out_folder directory, so when you iterate over the file names, make sure if they end with one of the file extensions you're looking for before you try to open them.
Since glob can't test for multiple file extensions at a time, it's actually better to use os.listdir and do the filtering of multiple file extensions on your own.
for filename in os.listdir(user_path):
if any(map(filename.endswith, ('.xlsm', '.xlsb', '.xlsx'))):
do_something(filename)
Or, with list comprehension,
all_files = [filename for filename in os.listdir(user_path) if any(map(filename.endswith, ('.xlsm', '.xlsb', '.xlsx')))]
Edit by the OP (actual code):
pathlib.Path(path + '\out_folder').mkdir(parents = True, exist_ok = True)
newpath = os.path.join(path,'out_folder')
#this is the line I can't seem to get to read both file types - it works as is.
all_files_test = glob.glob(os.path.join(user_path, "*.xlsm")) #xlsb, xlsm
for file in all_files_test:
name1 = os.path.splitext(os.path.split(file)[1])[0]