I am trying to write a script that loops through all the files within a folder and compares it to a list. If a file name in that folder matches with an item in the list then I want to save a copy of that file in another folder.
I have tried this so far - my code runs but no files are saved in the new folder. Is anyone able to tell me why it's not working?
import os
import shutil
import fnmatch
import csv
sample = open(...../Sample.csv','r')
reader = csv.reader(sample)
samplelist= []
for row in reader:
if row != " ":
samplelist.append(row)
source = '..... /My Files'
destination = '.../Sample'
for file in os.listdir(directory):
if file in samplelist:
shutil.copy(source,destination)
Related
So first off what im trying to do: create a pdf parser that will take ONLY tables out of any given pdf. I currently have some pdfs that are for parts manuals which contain an image of the part and then a table for details of the parts and I want to scrape and parse the table data from the pdf into a csv or similar excel style file(csv, xls etc)
What ive tried/trying: I am currently using python3 and tabula(i have no preference for either of these and open to other options) in which I have a py program that is able to scrape all the data of any pdf or directory of pdfs however it takes EVERYTHING including the image file code that has a bunch of 0 1 NaN(adding examples at the bottom). I was thinking of writing a filter function that removes these however that feels like overkill and was wondering/hoping there is a way to filter out the images with tabula or another library? (side note ive also attempted camelot however the module is not importing correctly even when it is in my pip freeze and this happens on both my mac m1 and mac m2 so assuming there is no arm support)
If anyone could help me or help guide me in a direction of a library or method of being able to iterate through all pages in a pdf and JUST grab the tables for export t csv that would be AMAZING!
current main file:
from tabula.io import read_pdf;
from traceback import print_tb;
import pandas as pd;
from tabulate import tabulate;
import os
def parser(fileName, count):
print("\nFile Number: ",count, "\nNow parsing file: ", fileName)
df = read_pdf(fileName, pages="all") #address of pdf file
for i in range(len(df)):
df[i].to_excel("./output/test"+str(i)+".xlsx")
print(tabulate(df))
print_tb(df)
def reader(type):
filecount = 1
if(type == 'f'):
file = input("\nFile(f) type selected\nplease enter full file name with path (ex. Users/Name/directory1/filename.pdf: ")
parser(file, filecount)
elif(type == 'd'):
#directory selected
location = input("\nPlease enter diectory path, if in the same folder just enter a period(.)")
print("Opening directory: ", location)
#loop through and parse directory
for filename in os.listdir(location):
f = os.path.join(location, filename)
# checking if it is a file
if os.path.isfile(f):
parser(f, filecount)
filecount + 1
else:
print('\n\n ERROR, path given does not contain a file or is not a directory type..')
else:
print("Error: please select directory(d) or file(f)")
fileType = input("\n-----> Hello!\n----> Would you like to parse a directory(d) or file(f)?").lower()
reader(fileType)
I'm asking for help in trying to create a loop to make this script go through all files in a local directory. Currently I have this script working with a single HTML file, but would like it so it picks the first file in the directory and just loops until it gets to the last file in the directory.
Another way to help would be adding a line to the string would add a (1), (2), (3), etc. at the end if the names are duplicate.
Can anyone help with renaming thousands of files with a string that is parsed with BeautifulSoup4. Each file contains a name and reference number at the same position/line. Could be same name and reference number, or could be different reference number with same name.
import bs4, shutil, os
src_dir = os.getcwd()
print(src_dir)
dest_dir = os.mkdir('subfolder')
os.listdir()
dest_dir = src_dir+"/subfolder"
src_file = os.path.join(src_dir, 'example_filename_here.html')
shutil.copy(src_file, dest_dir)
exampleFile = open('example_filename_here.html')
exampleSoup = bs4.BeautifulSoup(exampleFile.read(), 'html.parser')
elems = exampleSoup.select('.bodycopy')
type(elems)
elems[2].getText()
dst_file = os.path.join(dest_dir, 'example_filename_here.html')
new_dst_file_name = os.path.join(dest_dir, elems[2].getText()+ '.html')
os.rename(dst_file, new_dst_file_name)
os.chdir(dest_dir)
print(elems[2].getText())
I have one scenario where i have to rename the files in the folder. Please find the scenario,
Example :
Elements(Main Folder)<br/>
2(subfolder-1) <br/>
sample_2_description.txt(filename1)<br/>
sample_2_video.avi(filename2)<br/>
3(subfolder2)
sample_3_tag.jpg(filename1)<br/>
sample_3_analysis.GIF(filename2)<br/>
sample_3_word.docx(filename3)<br/>
I want to modify the names of the files as,
Elements(Main Folder)<br/>
2(subfolder1)<br/>
description.txt(filename1)<br/>
video.avi(filename2)<br/>
3(subfolder2)
tag.jpg(filename1)<br/>
analysis.GIF(filename2)<br/>
word.docx(filename3)<br/>
Could anyone guide on how to write the code?
Recursive directory traversal to rename a file can be based on this answer. All we are required to do is to replace the file name instead of the extension in the accepted answer.
Here is one way - split the file name by _ and use the last index of the split list as the new name
import os
import sys
directory = os.path.dirname(os.path.realpath("/path/to/parent/folder")) #get the directory of your script
for subdir, dirs, files in os.walk(directory):
for filename in files:
subdirectoryPath = os.path.relpath(subdir, directory) #get the path to your subdirectory
filePath = os.path.join(subdirectoryPath, filename) #get the path to your file
newFilePath = filePath.split("_")[-1] #create the new name by splitting the old name by _ and grabbing last index
os.rename(filePath, newFilePath) #rename your file
Hope this helps.
check below code example for the first filename1, replace path with the actual path of the file:
import os
os.rename(r'path\\sample_2_description.txt',r'path\\description.txt')
print("File Renamed!")
In Python, is there a way to import csv or text files dynamically.We process multiple files a week that have different names and I don't want to update the with open statement manually each time the script runs. I have a function to read the file name which I pass to a variable for later use in my code.
I can see and read the files in the directory but I am not sure if I can add the contents of the folder into a variable that can then be used in the with open statement.
import os
os.chdir('T:\Credit Suite')
DIR = os.listdir()
print(DIR)
import csv,sys
with open('July 19.csv',mode='r') as csv_file:
ROWCOUNT = 0
FILENAME = (csv_file.name)
output = csv.writer(open('test2.txt', 'w', newline=''))
reader =csv.DictReader(csv_file)
for records in reader:
ROWCOUNT += 1
EIN = records['EIN']
DATE = records['Date Established']
DUNS = records['DUNS #']
COMPANYNAME = records['Company Name']
lineout =('<S>'+ EIN+'$EIN '+EIN+'*'+DATE+')'+ COMPANYNAME +'#D-U-N-S '+DUNS).upper()
output.writerow([lineout])
print("writing completed")
I will be running my script when a file hits a folder using a monitor and scheduler in an automated process. I want the code to run no matter what the inbound file name is labeled as in the folder and I wont have to update the code manually for the file name or change the file name to a standard name each time.
os.chdir('T:\Credit Suite')
for root, dirs, files in os.walk("."):
for filename in files:
if filename.endswith('.csv'):
f=filename
import csv,sys
with open(f,mode='r') as csv_file:
os.listdir() returns a list of all the files in the dir, you can just loop all the files:
import os
os.chdir('T:\Credit Suite')
DIR = os.listdir()
print(DIR)
import csv,sys
for file in DIR:
if file.endswith('.csv'):
with open(file,mode='r') as csv_file:
ROWCOUNT = 0
FILENAME = (csv_file.name)
output = csv.writer(open(FILENAME + '_output.txt', 'w', newline=''))
reader =csv.DictReader(csv_file)
all_lines = []
for records in reader:
ROWCOUNT += 1
EIN = records['EIN']
DATE = records['Date Established']
DUNS = records['DUNS #']
COMPANYNAME = records['Company Name']
lineout =('<S>'+ EIN+'$EIN '+EIN+'*'+DATE+')'+ COMPANYNAME +'#D-U-N-S '+DUNS).upper()
all_lines.append(lineout)
output.writerow(all_lines)
print("writing completed")
# remove file to avoid reprocessing the file again in the next run
# of the script, or just move it elsewhere with os.rename
os.remove(file)
Relatively new to python ( not using it everyday ). However I am trying to simplify some things. I basically have Keys which have long names however a subset of the key ( or file name ) has the same sequence of the associated folder.{excuse the indentation, it is properly indented.} I.E
file1 would be: 101010-CDFGH-8271.dat and folder is CDFGH-82
file2 would be: 101010-QWERT-7425.dat and folder is QWERT-74
import os
import glob
import shutil
files = os.listdir("files/location")
dest_1 = os.listdir("dest/location")
for f in files:
file = f[10:21]
for d in dest_1:
dire = d
if file == dire:
shutil.move(file, dest_1)
The code runs with no errors, however nothing moves. Look forward to your reply and chance to learn.
Sorry updated the format.
Try a variation of:
basedir = "dest/location"
for fname in os.listdir("files/location"):
dirname = os.path.join(basedir, fname[10:21])
if os.path.isdir(dirname):
path = os.path.join("files/location", fname)
shutil.move(path, dirname)