create a list of files to be deleted - python-3.x

I am working on a search-and-destroy type program which I need it to do is search all directories with a certain file-name and append them to a list. after that delete all those files...not objects in list or the list...
import os
file_list=[]
for root, dirs, files in os.walk(path-to-dir'):
for f_name in files:
if f_name.startswith("file-name"):
file_list.append(f_name)
I could write up to appending part of the code but I don't know next...
Some help please

To remove a file from your computer, use os.remove(). It takes full path to the file as it's parameter, so instead of calling os.remove("infectedFile.dll") you would call os.remove("C:/program files/avira/infectedFile.dll")
So your file_list should contain full paths to the files, and then just call:
for file in file_list:
os.remove(file)

Modify your file_list.append(f_name). The f_name is only a bare name. You need to add the path to the file name in the time of processing, because you do not know where the file was found in the directory hierarchy:
file_list.append(os.path.join(root, f_name))
The root variable contains the path during walking.
To make check whether your code works, just print the content of the list:
print('\n'.join(file_list))
Or you can do it in the loop to get ready for the later part:
for fname in file_list:
print(fname)
Then you just add the os.remove(fname) to remove the file name:
for fname in file_list:
print('removing', fname)
os.remove(fname)

Related

Extract text from first page of word document and use it as a folder name, then move the file inside that folder

I have hundreds of word documents that needs to be processed but need to organized them first by versions in subfolders.
I basically get a drop of these word documents within a single folder and need to automate the organization moving forward before I get nuts.
So I have a script that basically creates a folder with the same name of the file and moves the file inside that folder, this part is done.
Now I need to go into each subfolder, and get the document version from within the first word page of each document, then create a sub-folder withe version number and move the word file into that subfolder.
The structure should be as follows (taking two folders as examples):
(Folder) Test
(Subfolder) 12.0
Test.docx
(Folder) Test1
(Subfolder) 13.0
Test1.docx
Luckily I was able to figure it out that "doc.paragraphs[6].text" will always return the version information in a single line as follows:
>>> doc.paragraphs[6].text
'Version Number: 12.0'
Would appreciate if someone can point me out to the right direction.
This is the script I have so far:
#!/usr/bin/env python3
import glob, os, shutil, docx, sys
folder = sys.argv[1]
#print(folder)
for file_path in glob.glob(os.path.join(folder, '*.docx')):
new_dir = file_path.rsplit('.', 1)[0]
#print(new_dir)
try:
os.mkdir(os.path.join(folder, new_dir))
except WindowsError:
# Handle the case where the target dir already exist.
pass
shutil.move(file_path, os.path.join(new_dir, os.path.basename(file_path)))
Please see below the complete solution to your requirement.
Note: To know about re.search go through https://www.geeksforgeeks.org/python-regex-re-search-vs-re-findall/
import docx, os, glob, re, shutil
from pathlib import Path
def create_dir(path): # function to check if a given path exist and create one if not
# Check whether the specified path exists or not
is_exist = os.path.exists(path)
# Create a new directory the path does not exist
if not is_exist:
os.makedirs(path)
folder = fr"C:\Users\rams\Documents\word_docs" #my local folder
for file in glob.glob(os.path.join(folder, '*.docx')):
# Test, Test1, Test2 in your structure
main_folder = os.path.join(folder,Path(file).stem)
file_name = os.path.basename(file)
# Get the first line from the docx
doc = docx.Document(file).paragraphs[0].text
# group(1) = Version Number: (.*)
version_no = re.search("(Version Number: (.*))", doc).group(1)
# extract the number portion from version_no
sub_folder = version_no.split(':')[1].strip()
# path to actual sub_folder with version_no
sub_folder = os.path.join(main_folder, sub_folder)
# destination path
dest_file_path = os.path.join(sub_folder, file_name)
for i in [main_folder,sub_folder]:
create_dir(i) # function call
# to move the file to the corresponding version folder (overwrite if exists)
if os.path.exists(dest_file_path):
os.remove(dest_file_path)
shutil.move(file, sub_folder)
else:
shutil.move(file, sub_folder)
Before execution:
After Execution
So you have a script that creates a folder name being the file name and moves the file inside that folder. This part is done. OK.
Now you know how to get the document version from within the first word page of each document you need to create a sub-folder with this version number and move the word file into that sub-folder. This can be done using the same code as before replacing:
new_dir = file_path.rsplit('.', 1)[0]
with
document_dir = os.path.dirname(file_path)
document_name = os.path.basename(file_path)
# check if the document is already in the right directory:
assert os.path.basename(document_dir) == document_name.rsplit('.', 1)[0]
# here comes: doc = some_function_getting_the_doc_object(file_path)
doc_version_tuple = doc.paragraphs[6].text.rsplit(': ', 1)
# check if doc_version_tuple has the right content:
assert doc_version_tuple[0] == 'Version Number'
doc_version = doc_version_tuple[1]
new_dir = os.path.join(document_dir, doc_version)
Notice that you can also do both of the two steps in one run over the list of full path document names.
Notice further that running the script you posted in your question twice without the check:
assert os.path.basename(document_dir) != document_name.rsplit('.', 1)[0]
giving an Error if the script was already run and the documents are already in folders with the document name will destroy what you already achieved and you will need to write another script to reverse it.
The above is the reason why it would be a good idea to have a backup copy of all the documents you can use to re-create the directory with the documents in case something goes wrong. And ... it is generally a good idea to have always a backup copy if you work on files especially when using a self-written script.

Problem while giving strings in os module

import os
def compute(path, fileExt):
if os.path.exists(path): # checks if the path entered by the user exists
print("The path exists")
for foldername, subfolder, filename in os.walk(path):
for file in filename:
if file.endswith(fileExt):
print(os.path.join(foldername, subfolder, *file))
else:
print("The path does not exist !")
This program is about printing the full path of all the files in the folder mentioned by the user ending by the extension also mentioned by the user.
When I run the program i get a error on line 10 stating that : TypeError: join() argument must be str or bytes, not 'list'. I checked the variable type of file by using type(file) and it return str.
Where wrong ??
You should double check the os.walk() documentation and look closely at the error message you are getting. Your error message indicates you are trying to combine a list and a string. You have verified that file is a string, but what about foldername and subfolder.
os.walk() returns three things, the current directory file name, a list of all subdirectories of the current directory, and a list of all files in the current directory. This tells us that foldername is also a string but subfolder is actually a list of strings which causes the error message you are seeing.
Additionally, all of the files returned by os.walk() are contained within the root directory, not within the subdirectories. Therefore to print the path of the file, you want to join the file name to the root file path.
Just focusing on the os.walk() portion of your question, the following will print the full path of all files with the given file extension:
import os
for foldername, subfolder, files in os.walk(top, topdown=False):
for file in files:
if file.endswith(fileExt):
print(os.path.join(foldername, file))

Does the following program access a file in a subfolder of a folder?

using
import sys
folder = sys.argv[1]
for i in folder:
for file in i:
if file == "test.txt":
print (file)
would this access a file in the folder of a subfolder? For Example 1 main folder, with 20 subfolders, and each subfolder has 35 files. I want to pass the folder in commandline and access the first subfolder and the second file in it
Neither. This doesn't look at files or folders.
sys.argv[1] is just a string. i is the characters of that string. for file in i shouldn't work because you cannot iterate a character.
Maybe you want to glob or walk a directory instead?
Here's a short example using the os.walk method.
import os
import sys
input_path = sys.argv[1]
filters = ["test.txt"]
print(f"Searching input path '{input_path}' for matches in {filters}...")
for root, dirs, files in os.walk(input_path):
for file in files:
if file in filters:
print("Found a match!")
match_path = os.path.join(root, file)
print(f"The path is: {match_path}")
If the above file was named file_finder.py, and you wanted to search the directory my_folder, you would call python file_finder.py my_folder from the command line. Note that if my_folder is not in the same directory as file_finder.py, then you have to provide the full path.
No, this won't work, because folder will be a string, so you'll be iterating through the characters of the string. You could use the os module (e.g., the os.listdir() method). I don't know what exactly are you passing to the script, but probably it would be easiest by passing an absolute path. Look at some other methods in the module used for path manipulation.

Can`t use the files inside my subdirectories

I`m creating a program that can read certain data from some txt files, the problem comes when I try to use the files inside subdirectories (the subdirectories are inside the main directory of the program. I'm using a for the option to find all the files and then create a new file with the info that I found. The main problem is that I can't read those files.
I tried using a for a function that creates a list of directories, files and roots, this works fine, but in the moment of running the file it says "it cannot be found txt file". The if not condition is made so the program excludes all.DS_Store files. I think the problem could be the way I open the file but im not sure
for root, directories, filenames in os.walk("Files_to_Insert"):
if not (filenames[-1] == ".DS_Store"):
lastFile = filenames[-1]
print lastFile
with open (lastFile, 'rt') as myfile:
IOError: [Errno 2] No such file or directory: txt
The mistake happens in the with open because it can`t find the file.
When I print I get all the txt files, but I can,t use them in the "with open"
A typical os.walk I use goes like this:
import os
for root, directories, filenames in os.walk("."):
for f in filenames:
if f.endswith(".DS_Store"):
continue
print(os.path.abspath(f))
with open (os.path.abspath(f), 'rt') as myfile:
I solve it by giving the path and the text file in separate strings:
for root, directories, filenames in os.walk("Files_to_Insert"):
if not(filenames[-1] == ".DS_Store"):
lastFile = filenames[-1]
# print (lastFile)
with open(str(root) + '/' + lastFile,'rt') as myfile:

Loop through files in a directory for a particular file. If the file is not there, append a "File not found" to a list else append the file location

I have a list of file names in an excel sheet that I need the file location to. I want the code to run through all the files in the directory specified and if it finds the file, to append the file location to a list. If it doesn't find the file, I want it to append a "File not found" to the list.
I've got my code to work if the file does exist, however, I can't seem to find a way to solve the problem when the file doesn't exist. I've tried:
if name not in file:
file_location.append("File Not Found")
but what this does is append a "File Not Found" for each file it loops through.
def file_loc(basedir, filename):
"""Searches through the directory for a particular file and then saves that path into a list"""
for root, dirs, files in os.walk(basedir):
for name in files:
if filename in name:
location = os.path.abspath(os.path.join(root, name))
file_location.append(location)
I would like a list that prints out all the file locations of the files in a list and if a file location cannot be found, the list should contain a "File Not Found". So for example, if the files I wanted to search for were [foo, bar, soap] and the soap file was not in my directory it would print out:
[File/path/to/foo, File/path/to/bar, "File Not Found"]
Any help would be much appreciated!
The issue is you are checking whether a particular file name matches, and you want to know if any matched. If after you have run through all the files none matched, then you can say it was not found. So I would suggest using a boolean variable to see if it was found at all. Then after the loop is complete check and append if necessary. It should look something like this.
def file_loc(basedir, filename):
"""Searches through the directory for a particular file and then saves that path into a list"""
found = false
for root, dirs, files in os.walk(basedir):
for name in files:
if filename in name:
location = os.path.abspath(os.path.join(root, name))
file_location.append(location)
found = True
if not found:
file_location.append("File Not Found")

Resources