Python program to list all folders with date modified - python-3.x

I need a Python program to list all folders with date modified. When I run it, all of the modification dates are the same. What am I doing wrong?
Here is the code that I'm using:
import os, time, stat
path = 'h:\\lance\\'
folders = []
r=root, d=directories, f = files
for r, d, f in os.walk(path):
for folder in d:
modTimesinceEpoc = os.path.getctime(path)
modificationTime = time.strftime('%Y-%m-%d', time.localtime(modTimesinceEpoc))
folders.append(os.path.join(r, folder))
[0:5] just grabs the first 5 folders, helpful if the total amount of folders is large
for f in folders [0:5]:
print(f, "Last Modified Time : ", modificationTime)
Output:
h:\lance\Return series project Last Modified Time : 2019-09-23
h:\lance\Forecast Pro Last Modified Time : 2019-09-23
h:\lance\Custom Price Files Last Modified Time : 2019-09-23
h:\lance\MBO and responsibilities Last Modified Time : 2019-09-23
h:\lance.vscode Last Modified Time : 2019-09-23

I think this is what you are looking for
import os, time, stat
path = 'h:\\lance\\'
folders = []
# r=root, d=directories, f = files
for r, d, f in os.walk(path):
for folder in d:
location = os.path.join(r, folder)
modTimesinceEpoc = os.path.getctime(location)
modificationTime = time.strftime('%Y-%m-%d', time.localtime(modTimesinceEpoc))
folders.append((location, modificationTime))
# [0:5] just grabs the first 5 folders, helpful if the total amount of folders is large
for f in folders[:5]:
print(f)

Related

How to get a list of all folders that list in a specific s3 location using spark in databricks?

Currently, I am using this code but it gives me all folders plus sub-folders/files for a specified s3 location. I want only the names of the folder live in s3://production/product/:
def get_dir_content(ls_path):
dir_paths = dbutils.fs.ls(ls_path)
subdir_paths = [get_dir_content(p.path) for p in dir_paths if p.isDir() and p.path != ls_path]
flat_subdir_paths = [p for subdir in subdir_paths for p in subdir]
return list(map(lambda p: p.path, dir_paths)) + flat_subdir_paths
paths = get_dir_content('s3://production/product/')
[print(p) for p in paths]
Current output returns all folders plus sub-directories where files live which is too much. I only need the folders that live on that hierachical level of the specifiec s3 location (no deeper levels). How do I teak this code?
just use dbutils.fs.ls(ls_path)

Setting variable to compare file modification between multiple dates

I'm trying to compare two text files that are date specific, but I'm stumped. I created a test folder that has three text files in it with modified dates between one and 35 days old.
I.E: red.txt is 35 days old, blue.txt is one day old, and green.txt is 15 days old.
For my two compared files, the first file must be between a range of 13-15 days and the second one day old or less. So for this example, 'green.txt' will become 'file1' and 'blue.txt' will become 'file2' and then be compared with difflib, but I'm having trouble with the syntax, or maybe even the logic. I am using datetime with timedelta to try to get this working, but my results will always store the oldest modified file that is past 15 days for 'file1'. Here's my code:
import os, glob, sys, difflib, datetime as d
p_path = 'C:/test/Text_file_compare_test/'
f_list = glob.glob(os.path.join(p_path, '*.txt'))
file1 = ''
file2 = ''
min_days_ago = d.datetime.now() - d.timedelta(days=1)
max_days_ago = d.datetime.now() - d.timedelta(days=13 <= 15)
for file in f_list:
filetime = d.datetime.fromtimestamp(os.path.getmtime(file))
if filetime < max_days_ago:
file1 = file
if filetime > min_days_ago:
file2 = file
with open(file1) as f1, open(file2) as f2:
d = difflib.Differ()
result = list(d.compare(f1.readlines(), f2.readlines()))
sys.stdout.writelines(result)
I'm certain there is something wrong with code:
max_days_ago = d.datetime.now() - d.timedelta(days=13 <= 15)
Maybe I'm just not seeing something in the datetime module that's obvious. Can someone shed some light for me? Also, this is on Windows 10 Python 3.7.2. Thanks in advance!
As per my comment, your d.timedelta(days=13 <= 15) isn't quite right as you are assigning days to a boolean value of true, which will be equivalent to d.timedelta(days=1). You need to store 3 separate time points and do your 13-15 day comparison against two different dates. The code below demonstrates what you are looking for I believe:
import datetime as d
files = {
'red': d.datetime.now() - d.timedelta(days=35),
'blue': d.datetime.now() - d.timedelta(days=0, hours=12),
'green': d.datetime.now() - d.timedelta(days=14),
}
days_ago_1 = d.datetime.now() - d.timedelta(days=1)
days_ago_13 = d.datetime.now() - d.timedelta(days=13)
days_ago_15 = d.datetime.now() - d.timedelta(days=15)
file1 = None
file2 = None
for file, filetime in files.items():
if days_ago_13 >= filetime >= days_ago_15:
file1 = file
elif filetime > days_ago_1:
file2 = file
# need to break out of the loop when we are finished
if file1 and file2:
break
print(file1, file2)
prints green blue

Run code on specific files in a directory separately (by the name of file)

I have N files in the same folder with different index numbers like
Fe_1Sec_1_.txt
Fe_1Sec_2_.txt
Fe_1Sec_3_.txt
Fe_2Sec_1_.txt
Fe_2Sec_2_.txt
Fe_2Sec_3_.txt
.
.
.
and so on
Ex: If I need to run my code with only the files with time = 1 Sec, I can make it manually as follow:
path = "input/*_1Sec_*.txt"
files = glob.glob(path)
print(files)
which gave me:
Out[103]: ['input\\Fe_1Sec_1_.txt', 'input\\Fe_1Sec_2_.txt', 'input\\Fe_1Sec_3_.txt']
In case of I need to run my code for all files separately (depending on the measurement time in seconds, i.e. the name of file)
I tried this code to get the path for each time of measurement:
time = 0
while time < 4:
time += 1
t = str(time)
path = ('"input/*_'+t+'Sec_*.txt"')
which gives me:
"input/*_1Sec_*.txt"
"input/*_2Sec_*.txt"
"input/*_3Sec_*.txt"
"input/*_4Sec_*.txt"
After that I tried to use this path as follow:
files = glob.glob(path)
print(files)
But it doesn't import the wanted files and give me :
"input/*_1Sec_*.txt"
[]
"input/*_2Sec_*.txt"
[]
"input/*_3Sec_*.txt"
[]
"input/*_4Sec_*.txt"
[]
Any suggestions, please??
I think the best way would be to simply do
for time in range(1, 5): # 1,2,3,4
glob_path = 'input/*_{}Sec_*.txt'.format(time)
for file_path in glob.glob(glob_path):
do_something(file_path, measurement) # or whatever

Compare files in 2 directories and keep copying the newly added files from source to desitnation folder

d1_contents = set(os.listdir(r'C:\Users\AUTHORITAH\Desktop\comp_2'))
d2_contents = set(os.listdir(r'E:\comp_1'))
common = list(d1_contents & d2_contents)
common_files = [ f
for f in common
if os.path.isfile(os.path.join(r'C:\Users\AUTHORITAH\Desktop\comp_2', f))
]
print ('Common files:', common_files)
match, mismatch, errors = filecmp.cmpfiles(r'C:\Users\AUTHORITAH\Desktop\comp_2',
# r'E:\comp_1',
#common_files)
print ('Match:', match)
print ('Mismatch:', mismatch)
print ('Errors:', errors)
I am trying to compare the contents of two directories and if I add new data into dir 1 it should get added into dir 2. For ex- If dir 1 has 8 files in it and I add 2 more in it then dir 2 should get updated with those 2 new files as well.

Delete each file older than X days in Y folder

I wrote this but it doesn't work.
In giorni I put the maximum days of stay in the SD andfile_dir is the default location where the files are analyzed.
import os
from datetime import datetime, timedelta
file_dir = "/home/pi/" #location
giorni = 2 #n max of days
giorni_pass = datetime.now() - timedelta(giorni)
for root, dirs, files in os.walk(file_dir):
for file in files:
filetime = datetime.fromtimestamp(os.path.getctime(file))
if filetime > giorni_pass:
os.remove(file)
Solved with:
for file in files:
path = os.path.join(file_dir, file)
filetime = datetime.fromtimestamp(os.path.getctime(path))
if filetime > giorni_pass:
os.remove(path)
Because "Filenames" contains a list of files whose path name is relative to "file_dir" and to make operations on those files should first get the absolute path, using path = os.path.join(file_dir, file)

Resources