Run code on specific files in a directory separately (by the name of file)

Run code on specific files in a directory separately (by the name of file) - python-3.x

I have N files in the same folder with different index numbers like
Fe_1Sec_1_.txt
Fe_1Sec_2_.txt
Fe_1Sec_3_.txt
Fe_2Sec_1_.txt
Fe_2Sec_2_.txt
Fe_2Sec_3_.txt
.
.
.
and so on
Ex: If I need to run my code with only the files with time = 1 Sec, I can make it manually as follow:
path = "input/*_1Sec_*.txt"
files = glob.glob(path)
print(files)
which gave me:
Out[103]: ['input\\Fe_1Sec_1_.txt', 'input\\Fe_1Sec_2_.txt', 'input\\Fe_1Sec_3_.txt']
In case of I need to run my code for all files separately (depending on the measurement time in seconds, i.e. the name of file)
I tried this code to get the path for each time of measurement:
time = 0
while time < 4:
time += 1
t = str(time)
path = ('"input/*_'+t+'Sec_*.txt"')
which gives me:
"input/*_1Sec_*.txt"
"input/*_2Sec_*.txt"
"input/*_3Sec_*.txt"
"input/*_4Sec_*.txt"
After that I tried to use this path as follow:
files = glob.glob(path)
print(files)
But it doesn't import the wanted files and give me :
"input/*_1Sec_*.txt"
[]
"input/*_2Sec_*.txt"
[]
"input/*_3Sec_*.txt"
[]
"input/*_4Sec_*.txt"
[]
Any suggestions, please??

I think the best way would be to simply do
for time in range(1, 5): # 1,2,3,4
glob_path = 'input/*_{}Sec_*.txt'.format(time)
for file_path in glob.glob(glob_path):
do_something(file_path, measurement) # or whatever

Related

Python Selenium: Check if a new file in the download folder is added

I have this when I press on a link it downloads to the download folder.
my Url looks something like so
url='https://vle......ac.uk/pluginfile.php/2814969/mod_page/content/16/Statistics_for_Business_and_Economics_----_%28Unit_I_Introduction%29.pdf'
driver.execute_script("window.open('%s', '_blank')" % URL)
Where the URL is a pdf file that I am trying to download.
I want to write a code that waits until number of files in the download folder increases to move on to the next itteration in the loop.
I wrote this code:
def wait_till_number_of_files_is_byound_the_current_file():
path_download=r'\\Mac\Home\Downloads\*'
list_of_files = glob.glob(path_download)
a=len(list_of_files)
while len(list_of_files)==a:
time.sleep(1)
list_of_files = glob.glob(path_download)
In my for loop I also tried this code
item = WebDriverWait(driver, 10).until(lambda driver: driver.execute_script("window.open('%s', '_blank')" % URL))
but this made the file being pressed infinitely not only once.

The best way, to get around this (I hope there would be a better way) is to use the following function
def download_wait(directory, timeout, nfiles=None):
"""
Wait for downloads to finish with a specified timeout.
Args
----
directory : str
The path to the folder where the files will be downloaded.
timeout : int
How many seconds to wait until timing out.
nfiles : int, defaults to None
If provided, also wait for the expected number of files.
"""
seconds = 0
dl_wait = True
while dl_wait and seconds < timeout:
time.sleep(1)
dl_wait = False
files = os.listdir(directory)
if nfiles and len(files) != nfiles:
dl_wait = True
for fname in files:
if fname.endswith('.crdownload'):
dl_wait = True
seconds += 1
return seconds
In my for loop, I wrote the following
for url in hyper_link_of_files:
# Click on this link
driver.execute_script("window.open('%s', '_blank')" % url)
# time.sleep(2)
download_wait(r'\\Mac\Home\Downloads', 10, nfiles=None)
time.sleep(2)
# move the last download file into the destination folder
Move_File(dest_folder)
I will share my Move_File function for reference to those who are interested in moving the downloaded file into a new destination
def Move_File(path_needed):
# Get the working directory of the downloads folder
path_download=r'\\Mac\Home\Downloads\*'
list_of_files = glob.glob(path_download)
latest_file = max(list_of_files, key=os.path.getctime)
# Copy to the new file into the destination
path_destination=os.path.join(path_needed,os.path.basename(latest_file))
shutil.move(latest_file,path_destination)

calulcate size of data in each directory using pyspark

I am using the following code snippet to calculate the size of all the folders in each directory. I can pass the file path as a parameter in the form of widgets. I can achieve the requirement by giving the directory names one after the other however, the requirement is to achieve the size of the folders in a recursive manner:
For example, the following input paths are as below:
/mnt/stoREC/datamart/export/
/mnt/stoREC/datamart/export//BRedem/
/mnt/stoREC/datamart/export/Sell/
/mnt/ADLS/Prepared/ModelExecution/gen/
/mnt/ADLS/Prepared/ModelExecution/hhp/
The expected output is:
/mnt/stoREC/datamart/export/ 457783298
/mnt/stoREC/datamart/export//BRedem/ 846262827
/mnt/stoREC/datamart/export/Sell/ 88736291
/mnt/ADLS/Prepared/ModelExecution/gen/ 346727682
/mnt/ADLS/Prepared/ModelExecution/hhp/ 52781528
Below is the code I am using
def dirsize(path):
total = 0
dir_files = dbutils.fs.ls(path)
for file in dir_files:
if file.isDir():
total += dirsize(file.path)
else:
total = file.size
# print(path)
return (total)
path =dbutils.fs.ls ("/mnt/stoREC/datamart/export")
path2=[]
for i in path:
path1=i[0]
# print(path1)
x=path1.replace('dbfs:','')
path2= path1[i]
path1[i]=path2[i]
path2[i]=path2[]
print(x)
I believe I am getting the logic incorrect somewhere since I am trying to use the swapping logic.

I modified the code and used it to find the size of folders that each path contains. The following is a demonstration of of the same.
The following are the list of input files for which I need to calculate size.
print(mypaths)
#output (my sample paths for demo)
['/mnt/repro', '/mnt/repro2', '/mnt/repro/a', 'mnt/repro2/b']
I have made slight changes to the dirsize() function code that you have provided.
def dirsize(path):
total=0
dir_files = dbutils.fs.ls(path)
for file in dir_files:
#print(file)
if file.isDir():
#print(total)
total += dirsize(file.path)
else:
total += file.size
#print(file.path+"/ "+str(total))
return (total)
Now when you can iterate through your input paths (mypaths) to get the total size of the files/folders that it holds.
for path in mypaths:
print(path+" : "+ str(dirsize(path)))
#output
/mnt/repro : 3445
/mnt/repro2 : 254
/mnt/repro/a : 2292
mnt/repro2/b : 254
Output Image:

Python program to list all folders with date modified

I need a Python program to list all folders with date modified. When I run it, all of the modification dates are the same. What am I doing wrong?
Here is the code that I'm using:
import os, time, stat
path = 'h:\\lance\\'
folders = []
r=root, d=directories, f = files
for r, d, f in os.walk(path):
for folder in d:
modTimesinceEpoc = os.path.getctime(path)
modificationTime = time.strftime('%Y-%m-%d', time.localtime(modTimesinceEpoc))
folders.append(os.path.join(r, folder))
[0:5] just grabs the first 5 folders, helpful if the total amount of folders is large
for f in folders [0:5]:
print(f, "Last Modified Time : ", modificationTime)
Output:
h:\lance\Return series project Last Modified Time : 2019-09-23
h:\lance\Forecast Pro Last Modified Time : 2019-09-23
h:\lance\Custom Price Files Last Modified Time : 2019-09-23
h:\lance\MBO and responsibilities Last Modified Time : 2019-09-23
h:\lance.vscode Last Modified Time : 2019-09-23

I think this is what you are looking for
import os, time, stat
path = 'h:\\lance\\'
folders = []
# r=root, d=directories, f = files
for r, d, f in os.walk(path):
for folder in d:
location = os.path.join(r, folder)
modTimesinceEpoc = os.path.getctime(location)
modificationTime = time.strftime('%Y-%m-%d', time.localtime(modTimesinceEpoc))
folders.append((location, modificationTime))
# [0:5] just grabs the first 5 folders, helpful if the total amount of folders is large
for f in folders[:5]:
print(f)

os.path.getsize returns 0 although folder has files in it (Python 3.5)

I am trying to create a program which auto-backups some folders under certain circumstances.
I try to compare the size of two folders (source and dest), source has files in it, a flac file and a subfolder with a text file whereas dest is empty.
This is the code I've written so far:
import os.path
sls = os.path.getsize('D:/autobu/source/')
dls = os.path.getsize('D:/autobu/dest/')
print(sls)
print(dls)
if sls > dls:
print('success')
else:
print('fail')
And the output is this:
0
0
fail
What have I done wrong? Have I misunderstood how getsize functions?

os.path.getsize('D:/autobu/source/') is used for getting size of a file
you can folder size you can use os.stat
src_stat = os.stat('D:/autobu/source/')
sls = src_stat.st_size

Error in using os.path.walk() correctly

So I created this Folder C:\TempFiles to test run the following code snippet
Inside this folder i had two files -> nd1.txt, nd2.txt and a folder C:\TempFiles\Temp2, inside which i had only one file nd3.txt
Now when i execute this code:-
import os,file,storage
database = file.dictionary()
tools = storage.misc()
lui = -1 # last used file index
fileIndex = 1
def sendWord(wrd, findex): # where findex is the file index
global lui
if findex!=lui:
tools.refreshRecentList()
lui = findex
if tools.mustIgnore(wrd)==0 and tools.toRecentList(wrd)==1:
database.addWord(wrd,findex) # else there's no point adding the word to the database, because its either trivial, or has recently been added
def showPostingsList():
print("\nPOSTING's LIST")
database.display()
def parseFile(nfile, findex):
for line in nfile:
pl = line.split()
for word in pl:
sendWord(word.lower(),findex)
def parseDirectory(dirname):
global fileIndex
for root,dirs,files in os.walk(dirname):
for name in dirs:
parseDirectory(os.path.join(root,name))
for filename in files:
nf = open(os.path.join(root,filename),'r')
parseFile(nf,fileIndex)
print(" --> "+ nf.name)
fileIndex+=1
nf.close()
def main():
dirname = input("Enter the base directory :-\n")
print("\nParsing Files...")
parseDirectory(dirname)
print("\nPostings List has Been successfully created.\n",database.entries()," word(s) sent to database")
choice = ""
while choice!='y' and choice!='n':
choice = str(input("View List?\n(Y)es\n(N)o\n -> ")).lower()
if choice!='y' and choice!='n':
print("Invalid Entry. Re-enter\n")
if choice=='y':
showPostingsList()
main()
Now I should Traverse the three files only once each, and i put a print(filename) to test that, but apparently I am traversing the inside folder twice:-
Enter the base directory :-
C:\TempFiles
Parsing Files...
--> C:\TempFiles\Temp2\nd3.txt
--> C:\TempFiles\nd1.txt
--> C:\TempFiles\nd2.txt
--> C:\TempFiles\Temp2\nd3.txt
Postings List has Been successfully created.
34 word(s) sent to database
View List?
(Y)es
(N)o
-> n
Can Anyone tell me how to modify the os.path.walk() as such to avoid the error
Its not that my output is incorrect, but its traversing over one entire folder twice, and that's not very efficient.

Your issue isn't specific to Python 3, it's how os.walk() works - iterating already does the recursion to subfolders, so you can take out your recursive call:
def parseDirectory(dirname):
global fileIndex
for root,dirs,files in os.walk(dirname):
for filename in files:
nf = open(os.path.join(root,filename),'r')
parseFile(nf,fileIndex)
print(" --> "+ nf.name)
fileIndex+=1
nf.close()
By calling parseDirectory() for the dirs, you were starting another, independant walk of your only subfolder.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Run code on specific files in a directory separately (by the name of file) - python-3.x

I think the best way would be to simply do for time in range(1, 5): # 1,2,3,4 glob_path = 'input/_{}Sec_.txt'.format(time) for file_path in glob.glob(glob_path): do_something(file_path, measurement) # or whatever

Related

Python Selenium: Check if a new file in the download folder is added

calulcate size of data in each directory using pyspark

Python program to list all folders with date modified

os.path.getsize returns 0 although folder has files in it (Python 3.5)

Error in using os.path.walk() correctly

Categories

Resources

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Run code on specific files in a directory separately (by the name of file) - python-3.x

I think the best way would be to simply do for time in range(1, 5): # 1,2,3,4 glob_path = 'input/*_{}Sec_*.txt'.format(time) for file_path in glob.glob(glob_path): do_something(file_path, measurement) # or whatever

Related

Python Selenium: Check if a new file in the download folder is added

calulcate size of data in each directory using pyspark

Python program to list all folders with date modified

os.path.getsize returns 0 although folder has files in it (Python 3.5)

Error in using os.path.walk() correctly

Categories

Resources

I think the best way would be to simply do for time in range(1, 5): # 1,2,3,4 glob_path = 'input/_{}Sec_.txt'.format(time) for file_path in glob.glob(glob_path): do_something(file_path, measurement) # or whatever