I have an issue when I try to run a basic function in my code pertaining to the default locations of VMs in a Windows box. These VMs are stored as a single file.
For some reason, the loop with the directory that glob is interacting with is not finding any files.
I have to use glob at the beginning of the path and the end of the path, so that this script can be used around my department.
I have researched with os.walk() and os.listdir(); both fail because the ways that I have written it, I get the error TypeError: expected str, bytes or os.PathLike object, not list.
I need a list of the VMs so that I can write a script that clones all of the VMs within that list through the vix API.
def getVMs():
vmloc = glob.glob('**\\Documents\\Virtual Machines\\*.vmdk', recursive=True)
for f in vmloc:
print(f)
The problem is that it prints an null output and I cannot figure out why. Any help would be appreciated.
EDIT:
I also tried to finalize the path with creating the path through os.path and created the full path of the VM folder:
def getVMs():
path = os.path.join('..','C:','\\','Users',os.getlogin(),'Documents','Virtual Machines\\',)
for vmloc in glob.glob(path +'**.vmdk', recursive=True):
print(vmloc)
It still produces a null output
The issue lied within the fact that I was having it look for .vmdk files within the directory above the actual dir that I needed.
path = os.path.join('..','C:','\\','Users',os.getlogin(),'Documents','Virtual Machines\\',)
path1 = glob.glob(path + '**\\' + '**.vmdk' , recursive=True )
for vms in path1:
print(vms)
I was able to find all of the VMDK files after that
Related
I am trying to loop through multiple folders and subfolders in Azure Blob container and read multiple xml files.
Eg: I have files in YYYY/MM/DD/HH/123.xml format
Similarly I have multiple sub folders under month, date, hours and multiple XML files at last.
My intention is to loop through all these folder and read XML files. I have tried using few Pythonic approaches which did not give me the intended result. Can you please help me with any ideas in implementing this?
import glob, os
for filename in glob.iglob('2022/08/18/08/225.xml'):
if os.path.isfile(filename): #code does not enter the for loop
print(filename)
import os
dir = '2022/08/19/08/'
r = []
for root, dirs, files in os.walk(dir): #Code not moving past this for loop, no exception
for name in files:
filepath = root + os.sep + name
if filepath.endswith(".xml"):
r.append(os.path.join(root, name))
return r
The glob is a python function and it won't recognize the blob folders path directly as code is in pyspark. we have to give the path from root for this. Also, make sure to specify recursive=True in that.
For Example, I have checked above pyspark code in databricks.
and the OS code as well.
You can see I got the no result as above. Because for the above, we need to give the absolute root. it means the root folder.
glob code:
import glob, os
for file in glob.iglob('/path_from_root_to_folder/**/*.xml',recursive=True):
print(file)
For me in databricks the root to access is /dbfs and I have used csv files.
Using os:
You can see my blob files are listed from folders and subfolders.
I have used databricks for my repro after mounting. Wherever you are trying this code in pyspark, make sure you are giving the root of the folder in the path. when using glob, set the recursive = True as well.
There is an easier way to solve this problem with PySpark!
The tough part is all the files have to have the same format. In the Azure databrick's sample directory, there is a /cs100 folder that has a bunch of files that can be read in as text (line by line).
The trick is the option called "recursiveFileLookup". It will assume that the directories are created by spark. You can not mix and match files.
I added to the data frame the name of the input file for the dataframe. Last but not least, I converted the dataframe to a temporary view.
Looking at a simple aggregate query, we have 10 unique files. The biggest have a little more than 1 M records.
If you need to cherry pick files for a mixed directory, this method will not work.
However, I think that is an organizational cleanup task, versus easy reading one.
Last but not least, use the correct formatter to read XML.
spark.read.format("com.databricks.spark.xml")
I am trying to loop through a list of filepaths for files I have throughout the entire network at my company. The filepaths have locations of various drives throughout the network.
The user submitted the file once upon a time and the filepath was passed through at the point of submission. However, the file drive is not the same for every user and is not the same for what that drive is named on my machine.
For example: a path like X:\Users\Submissions\Bob's File.xlsx may coincide with the same drive and file but named differently on my machine:
K:\Users\Submissions\Bob's File.xlsx
Each user has the potential of using a different letter for that particular drive for a various number of reasons.
Is there a way I can make my pattern string that I pass in smart enough to be able to find the proper directory and locate that file? Any ideas would be great.
Thank you
import pandas as pd
import shutil as sh
copydir = r"C:\Users\me\Desktop\PythonSpyderDesktop\Extractor\Models"
file_path_list = r"C:\Users\me\Desktop\PythonSpyderDesktop\Extractor\FilePathList.csv"
data = pd.read_csv(file_path_list)
i = 1 #Start at 2nd row
for i in range(1, len(data)):
try:
sh.copyfile(data.FilePath[i], copydir)
print("Copied over file: " + data.FilePath[i])
except:
print ("File not found.")
Your question is unclear. It revolves around the source & dest arguments being passed to copyfile:
sh.copyfile(data.FilePath[i], copydir)
It's hard to tell what pathnames you're extracting from the .CSV, but apparently source files may have the "wrong" drive letter, and/or the destination directory copydir may have the "wrong" drive letter. The script apparently runs on multiple machines, and those machines have diverse drive letters mounted.
Write a helper function that finds the "right" drive letter. It should accept a pathname like copydir, then probe a search list, then return a corrected pathname.
Given a list of drive letters, you can iterate through them and test whether a pathname exists using os.path.exists(). Return the first one found.
Use splitdrive() to parse out components of the input pathname.
Suppose that both source and dest may need their drive letters fixed up. Then the call might look like this:
sh.copyfile(fix_path(data.FilePath[i]), fix_path(copydir))
The following is a piece of the code:
files = glob.iglob(studentDir + '/**/*.py',recursive=True)
for file in files:
shutil.copy(file, newDir)
The thing is: I plan to get all the files with extension .py and also all the files whose names contain "write". Is there anything I can do to change my code? Many thanks for your time and attention.
If you want that recursive option, you could use :
patterns = ['/**/*write*','/**/*.py']
for p in patterns:
files = glob.iglob(studentDir + p, recursive=True)
for file in files:
shutil.copy(file, newDir)
If the wanted files are in the same directory you could simply use :
certainfiles = [glob.glob(e) for e in ['*.py', '*write*']]
for file in certainfiles:
shutil.copy(file, newDir)
I would suggest the use of pathlib which has been available from version 3.4. It makes many things considerably easier.
In this case '**' stands for 'descend the entire folder'.
'*.py' has its usual meaning.
path is an object but you can recover its string representation using the str function to get just the file name.
When you want the entire path name, use path.absolute and get the str of that.
Don't worry, you'll get used to it. :) If you look at the other goodies in pathlib you'll see it's worth it.
from pathlib import Path
studentDir = <something>
newDir = <something else>
for path in Path(studentDir).glob('**/*.py'):
if 'write' in str(path):
shutil.copy(str(path.absolute()), newDir)
I'm attempting to create a script that looks into a specific directory and then lists all the files of my chosen types in addition to all folders within the original location.
I have managed the first part of listing all the files of the chosen types, however am encountering issues listing the folders.
The code I have is:
import datetime, os
now = datetime.datetime.now()
myFolder = 'F:\\'
textFile = 'myTextFile.txt'
outToFile = open(textFile, mode='w', encoding='utf-8')
filmDir = os.listdir(path=myFolder)
for file in filmDir:
if file.endswith(('avi','mp4','mkv','pdf')):
outToFile.write(os.path.splitext(file)[0] + '\n')
if os.path.isdir(file):
outToFile.write(os.path.splitext(file)[0] + '\n')
outToFile.close()
It is successfully listing all avi/mp4/mkv/pdf files, however isn't ever going into the if os.path.isdir(file): even though there are multiple folders in my F: directory.
Any help would be greatly appreciated. Even if it is suggesting a more effective/efficient method entirely that does the job.
Solution found thanks to Son of a Beach
if os.path.isdir(file):
changed to
if os.path.isdir(os.path.join(myFolder, file)):
os.listdir returns the names of the files, not the fully-qualified paths to the files.
You should use a fully qualified path name in os.path.isdir() (unless you've already told Python where to look).
Eg, instead of using if os.path.isdir(file): you could use:
if os.path.isdir(os.path.join(myFolder, file)):
I am a Python newbie and need to create a script that will do parse some files and put them into a SQL db. So I am trying to create smaller scripts that do what I want, then combine them into a larger script.
To that end, I am trying run this code:
import os
fileList = []
testDir = "/home/me/somedir/dir1/test"
for i in os.listdir(testDir):
if os.path.isfile(i):
fileList.append(i)
for fileName in fileList:
print(fileName)
When I look at the output, I do not see any files listed. I tried the path without quotes and got stack errors. So searching showed I need the double quotes.
Where did I go wrong?
I found this code that works fine:
import os
in_path = "/home/me/dir/"
for dir_path, subdir_list, file_list in os.walk(in_path):
for fname in file_list:
full_path = os.path.join(dir_path, fname)
print(full_path)
I can use full_path to do my next step.
If anyone has any performance tips, feel free to share them. Or point me in the right direction.
that is because you're most likely ejecuting your script from a folder outside your testdir, os.path.isfile need the full path name of the file so it can check is that is a lile or not (os.listdir return the names), if the full path is not provide then it will check is there is a file with the given name in the same folder from which the script is executed, to fix this you need to give the full path name of that file, you can do it with os.path.join like this
for name in os.listdir(testDir):
if os.path.isfile( os.path.join(testDir,name) ):
fileList.append(name)
or if you also want the full path
for name in os.listdir(testDir):
path = os.path.join(testDir,name)
if os.path.isfile(path):
fileList.append(path)