Wildcard in string as input - string

I have a variable that is an input for a process. Its essentially the full path name of a file, but injects a value based on a list to get the correct name:
fipsList = ['06001','06037','06059']
for fip in fipsList:
file = r"T:\CCSI\TECH\FEMA\Datasets\NFHL\NFHL_06122018\NFHL_{}_20180518.gdb".format(fip)"
What I want to do now is make everything between "...NFHL_{}_ and ....gdb" to be a wildcard "*". Simply using file = r"T:\CCSI\TECH\FEMA\Datasets\NFHL\NFHL_06122018\NFHL_{}_*.gdb".format(fip)"
doesn't seem to work. Essentially, this is what that produces:
>>>'T:\\CCSI\\TECH\\FEMA\\Datasets\\NFHL\\NFHL_06122018\\NFHL_06_*.gdb'. Suggestions on how to get it to work?

Maybe some old good concat?
Like:
fipsList = ['06001','06037','06059']
for fip in fipsList:
file = "T:\CCSI\TECH\FEMA\Datasets\NFHL\NFHL_06122018\NFHL_{}_" + fip + ".gdb"

Simply adding '*' into a string this way will not work. The set up of the question is poor (my own fault), but for clarification's sake, here's how I resolved the issue:
fipsList = ['06001','06037','06059']
for fip in fipsList:
path = r"T:\CCSI\TECH\FEMA\Datasets\NFHL\NFHL_06122018"
for root, dirs, filename in os.walk(path):
for dir in dirs:
if('NFHL_' + fip[:2] in dir and '.gdb' in dir):
file = os.path.join(root, dir)
Essentially, I had to walk through the folder and use an if conditional to make sure that the conditions of having both the fip value and the .gdb extension were met.

Related

How to copy merge files of two different directories with different extensions into one directory and remove the duplicated ones

I would need a Python function which performs below action:
I have two directories which in one of them I have files with .xml format and in the other one I have files with .pdf format. To simplify things consider this example:
Directory 1: a.xml, b.xml, c.xml
Directory 2: a.pdf, c.pdf, d.pdf
Output:
Directory 3: a.xml, b.xml, c.xml, d.pdf
As you can see the priority is with the xml files in the case that both extensions have similar names.
I would be thankful for your help.
You need to use the shutil module and the os module to achieve this. This function will work on the following assumption:
A given directory has all files with the same extension
The priority_directory will be the directory with file extensions to be prioritized
The secondary_directory will be the directory with file extensions to be dropped in case of a name collision
Try:
import os,shutil
def copy_files(priority_directory,secondary_directory,destination = "new_directory"):
file_names = [os.path.splitext(filename)[0] for filename in os.listdir(priority_directory)] # get the file names to check for collisions
os.mkdir(destination) # make a new directory
for file in os.listdir(priority_directory): # this loop copies the first direcotory as it is
file_path = os.path.join(priority_directory,file)
dst_path = os.path.join(destination,file)
shutil.copy(file_path,dst_path)
for file in os.listdir(secondary_directory): # this loop checks for collisions and drops files whose name collide
if(os.path.splitext(file)[0] not in file_names):
file_path = os.path.join(secondary_directory,file)
dst_path = os.path.join(destination,file)
shutil.copy(file_path,dst_path)
print(os.listdir(destination))
Let's run it with your direcotry names as arguments:
copy_files('directory_1','directory_2','directory_3')
You can now check a new directory with the name directory_3 will be created with the desired files in it.
This will work for all such similar cases no matter what the extension is.
Note: There should not be a need to do this i guess cause a directory can have two files with the same name as long as the extensions differ.
Rough working solution:
import os
from shutil import copy2
d1 = './d1/'
d2 = './d2/'
d3 = './d3/'
ext_1 = '.xml'
ext_2 = '.pdf'
def get_files(d: str, files: list):
directory = os.fsencode(d)
for file in os.listdir(d):
dup = False
filename = os.fsdecode(file)
if filename[-4:] == ext_2:
for (x, y) in files:
if y == filename[:-4] + ext_1:
dup = True
break
if dup:
continue
files.append((d, filename))
files = []
get_files(d1, files)
get_files(d2, files)
for d, file in files:
copy2(d+file, d3)
I'll see if I can get it to look/perform better.

For Loop to Move and Rename .html Files - Python 3

I'm asking for help in trying to create a loop to make this script go through all files in a local directory. Currently I have this script working with a single HTML file, but would like it so it picks the first file in the directory and just loops until it gets to the last file in the directory.
Another way to help would be adding a line to the string would add a (1), (2), (3), etc. at the end if the names are duplicate.
Can anyone help with renaming thousands of files with a string that is parsed with BeautifulSoup4. Each file contains a name and reference number at the same position/line. Could be same name and reference number, or could be different reference number with same name.
import bs4, shutil, os
src_dir = os.getcwd()
print(src_dir)
dest_dir = os.mkdir('subfolder')
os.listdir()
dest_dir = src_dir+"/subfolder"
src_file = os.path.join(src_dir, 'example_filename_here.html')
shutil.copy(src_file, dest_dir)
exampleFile = open('example_filename_here.html')
exampleSoup = bs4.BeautifulSoup(exampleFile.read(), 'html.parser')
elems = exampleSoup.select('.bodycopy')
type(elems)
elems[2].getText()
dst_file = os.path.join(dest_dir, 'example_filename_here.html')
new_dst_file_name = os.path.join(dest_dir, elems[2].getText()+ '.html')
os.rename(dst_file, new_dst_file_name)
os.chdir(dest_dir)
print(elems[2].getText())

How do I dynamically create a variable name in a loop to assign to a file name in python 3

I'm still relatively new to programming and Python. But I am sure this must be possible but my searches are not turning up what I'm looking for.
In my current directory, I have 6 PDF files that I wish to read in via the loop below.
What I would like to do is open each of the PDF's with a new variable name, as you can see it is imaginatively called pdf[1-6]File.pdf.
I can list the files in the console and pull them via the code when I stick breaks in to stop it executing but I can't for the life of me work out how to create the variable name. I thought something like "pdf" + str(i) + "File" would have worked but I'm missing something.
Code is below - not complete but enough so you get what I'm looking at:
#Open the PDF files in the current directory for
#reading in binary mode
def opensource():
listOfFiles = os.listdir('.')
pattern = "*.pdf"
for entry in listOfFiles:
if fnmatch.fnmatch(entry, pattern):
# Works to here perfectly
for i in range(len(entry)):
# print(len(entry))
# Trying to create the variable name with
# an incremental numeral in the file name
"pdf" + i + "File" = open(entry, 'rb')
This bit below is how I'm currently doing it and its a pain in the backside. I'm sure it can be done programmatically
#This is the old way. Monolithic and horrid
#Open the files that have to be merged one by one
pdf1File = open('file1.pdf', 'rb')
pdf2File = open('file2.pdf', 'rb')
pdf3File = open('file3.pdf', 'rb')
pdf4File = open('file4.pdf', 'rb')
pdf5File = open('file5.pdf', 'rb')
pdf6File = open('file6.pdf', 'rb')
All help gratefully received.
Thanks
If you are going to use the file pointer outside this for loop, you can very well use a dictionary to do that..
def opensource():
listOfFiles = os.listdir('.')
pattern = "*.pdf"
file_ptrs = {}
for entry in listOfFiles:
if fnmatch.fnmatch(entry, pattern):
# Works to here perfectly
for i in range(len(entry)):
# print(len(entry))
# Trying to create the variable name with
# an incremental numeral in the file name
file_ptrs["pdf" + str(i) + "File"] = open(entry, 'rb')
Caution: Its always advisable to use the open method alongside of a "with" clause in python.. it takes care of closing the file once the file operation goes out of context.

How to find files in multilevel subdirectories

Suppose I have a directory that contains multiple subdirectories:
one_meter = r"C:\Projects\NED_1m"
Within the directory one_meter I want to find all of the files that end with '.xml' and contain the string "_meta". My problem is that some of the subdirectories have that file one level donw, while others have it 2 levels down
EX:
one_meter > USGS_NED_one_meter_x19y329_LA_Jean_Lafitte_2013_IMG_2015 > USGS_NED_one_meter_x19y329_LA_Jean_Lafitte_2013_IMG_2015_meta.xml
one_meter > NY_Long_Island> USGS_NED_one_meter_x23y454_NY_LongIsland_Z18_2014_IMG_2015 > USGS_NED_one_meter_x23y454_NY_LongIsland_Z18_2014_IMG_2015_meta.xml
I want to look in my main directory (one_meter') and find all of the_meta.xmlfiles (regardless of the subdirectory) and append them to a list (one_m_lister = []`).
I tried the following but it doesn't produce any results. What am I doing incorrectly?
one_m_list = []
for filename in os.listdir(one_meter):
if filename.endswith(".xml") and "_meta" in filename:
print(filename)
one_m_list.append(filename)
The answer of #JonathanDavidArndt is good but quite outdated. Since Python 3.5, you can use pathlib.Path.glob to search a pattern in any subdirectory.
For instance:
import pathlib
destination_root = r"C:\Projects\NED_1m"
pattern = "**/*_meta*.xml"
master_list = list(pathlib.Path(destination_root).glob(pattern))
The function you are looking for is os.walk
A simple and minimal working example is below. You should be able to modify this to suit your needs:
destination_root = "C:\Projects\NED_1m"
extension_to_find = ".xml"
master_list = []
extension_to_find_len = len(extension_to_find)
for path,dir,files in os.walk(destination_root):
for filename in files:
# and of course, you can add extra filter criteria
# such as "contains _meta" right in here
if filename[-extension_to_find_len:] == extension_to_find:
print(os.path.join(path, filename))
master_list.append(os.path.join(path, filename))

The code seems correct but my files aren't getting deleted

I heard that python can make life easier, I wanted to remove duplicates in folderA by comparing folderB with folderA, so I decided to download python and try coding with python. My code seems correct, however, my files are failing to delete, what's wrong with it?
I tried unlink but doesn't work.
import os
with open(r"C:\pathto\output.txt", "w") as a:
for path, subdirs, files in os.walk(r'C:\pathto\directoryb'):
for filename in files:
#f = os.path.join(path, filename)
#a.write(str(f) + os.linesep)
a.write(str(filename) + '\n')
textFile = open(r'C:\output.txt', 'r')
line = textFile.readline()
while line:
target = str(line)
todelete = 'C:\directorya' + target
if (os.path.exists(todelete)):
os.remove(todelete)
else:
print("failed")
line = textFile.readline()
textFile.close()
I want my files deleted, basically folderA contains some files in folderB, and I'm trying to delete it.
The problem is that the place where you're deleting the file isn't actually deleting a file - it's deleting a variable that contains the file's information.
todelete = 'C:\directorya' + target
if (os.path.exists(todelete)):
os.remove(todelete) # this is deleting todelete, but doesn't get rid of the file!
I had a similar problem in a program I've started on but with a list, and in the end I had to use this kind of format:
lst.remove(lst[val1][val2][val3]) # as opposed to something cleaner-looking, like 'lst.remove(var_to_del)'
It's a pain, but I hope that clarifies the issue! You'll have to go to the file without giving it a variable name.

Resources