Python move the File based on the logical patttern - python-3.x

I need to move the files using python 3.6.*
I have listed the files in directory recursively based on the filter:
files = glob.glob(my_save_path + '/**/*.csv', recursive=True)
Directory Structure:
/tmp/input/
/tmp/output/
Result:
['/tmp/input/ODKDD_MN_K5_ID02_20230216_152227713019/part-00000-fd4123e2-4779-4850-b6fa-ee86dea3fba7-c100.csv', '/tmp/input/ODKDD_EY_GE_PE_K5_ID02_20230216_152227713019/part-00000-fd4173e2-4779-4850-b6fa-ee86dea3fba7-c000.csv', '/tmp/input/ODKDD_LY_OP_ST_K5_ID02_20230216_152227713019/part-00000-fd4173e2-4779-4850-b6fa-ee86dea3fba7-c000.csv']
Notes for common Pattern:
ODKDD_MN_K5_ID02
ODKDD_EY_GE_PE_K5_ID02
ODKDD_LY_OP_ST_K5_ID02
Prefix:ODKDD
Suffix: K5_ID02
BaseWord:
MN
EY_GE_PE
LY_OP_ST
Timestamp:20230216_152227713019
End Result:
I need to move the file found in file list and rename it. After move, it needs to delete the original destination:
expected Result:
/tmp/input/ODKDD_MN_K5_ID02_20230216_152227713019/part-00000-fd4123e2-4779-4850-b6fa-ee86dea3fba7-c100.csv -> /tmp/output/test_MN_20230216_152227713019.csv
/tmp/input/ODKDD_EY_GE_PE_K5_ID02_20230216_152227713019/part-00000-fd4173e2-4779-4850-b6fa-ee86dea3fba7-c000.csv -> /tmp/output/test_EY_GE_PE_20230216_152227713019.csv
/tmp/input/ODKDD_LY_OP_ST_K5_ID02_20230216_152227713019/part-00000-fd4173e2-4779-4850-b6fa-ee86dea3fba7-c000.csv -> /tmp/output/test_LY_OP_ST_20230216_152227713019.csv
Note for renaming:
Prefix:
test
Suffix:
timestamp from the folder level
BasedWord:
MN
EY_GE_PE
LY_OP_ST

Related

In case of using shutil.move getting subfolders of last folder that is moved not the folder itself

I am trying to to move folder (and its subfolder) present in generated_log_folder list. I am able to move all folders (with its subfolders) but in case of last folder am getting its subfolders in destination instead of folder name.
for f in generated_log_folder:
if f in generated_log_folder:
destination = 'log-files-'
d1 = datetime.now()
folder_time = d1.strftime("%Y-%m-%d_%I-%M-%S")
folder_to_save_files = folder_time
destination += folder_to_save_files
source = os.path.join(user_folder, f)
if os.path.isdir(source):
shutil.move(source, destination)

How to get a list of all folders that list in a specific s3 location using spark in databricks?

Currently, I am using this code but it gives me all folders plus sub-folders/files for a specified s3 location. I want only the names of the folder live in s3://production/product/:
def get_dir_content(ls_path):
dir_paths = dbutils.fs.ls(ls_path)
subdir_paths = [get_dir_content(p.path) for p in dir_paths if p.isDir() and p.path != ls_path]
flat_subdir_paths = [p for subdir in subdir_paths for p in subdir]
return list(map(lambda p: p.path, dir_paths)) + flat_subdir_paths
paths = get_dir_content('s3://production/product/')
[print(p) for p in paths]
Current output returns all folders plus sub-directories where files live which is too much. I only need the folders that live on that hierachical level of the specifiec s3 location (no deeper levels). How do I teak this code?
just use dbutils.fs.ls(ls_path)

Python Glob - Get Full Filenames, but no directory-only names

This code works, but it's returning directory names and filenames. I haven't found a parameter that tells it to return only files or only directories.
Can glob.glob do this, or do I have to call os.something to test if I have a directory or file. In my case, my files all end with .csv, but I would like to know for more general knowledge as well.
In the loop, I'm reading each file, so currently bombing when it tries to open a directory name as a filename.
files = sorted(glob.glob(input_watch_directory + "/**", recursive=True))
for loop_full_filename in files:
print(loop_full_filename)
Results:
c:\Demo\WatchDir\
c:\Demo\WatchDir\2202
c:\Demo\WatchDir\2202\07
c:\Demo\WatchDir\2202\07\01
c:\Demo\WatchDir\2202\07\01\polygonData_2022_07_01__15_51.csv
c:\Demo\WatchDir\2202\07\01\polygonData_2022_07_01__15_52.csv
c:\Demo\WatchDir\2202\07\01\polygonData_2022_07_01__15_53.csv
c:\Demo\WatchDir\2202\07\01\polygonData_2022_07_01__15_54.csv
c:\Demo\WatchDir\2202\07\01\polygonData_2022_07_01__15_55.csv
c:\Demo\WatchDir\2202\07\05
c:\Demo\WatchDir\2202\07\05\polygonData_2022_07_05__12_00.csv
c:\Demo\WatchDir\2202\07\05\polygonData_2022_07_05__12_01.csv
Results needed:
c:\Demo\WatchDir\2202\07\01\polygonData_2022_07_01__15_51.csv
c:\Demo\WatchDir\2202\07\01\polygonData_2022_07_01__15_52.csv
c:\Demo\WatchDir\2202\07\01\polygonData_2022_07_01__15_53.csv
c:\Demo\WatchDir\2202\07\01\polygonData_2022_07_01__15_54.csv
c:\Demo\WatchDir\2202\07\01\polygonData_2022_07_01__15_55.csv
c:\Demo\WatchDir\2202\07\05\polygonData_2022_07_05__12_00.csv
c:\Demo\WatchDir\2202\07\05\polygonData_2022_07_05__12_01.csv
For this specific program, I can just check if the file name contains.csv, but I would like to know in general for future reference.
Line:
files = sorted(glob.glob(input_watch_directory + "/**", recursive=True))
replace with the line:
files = sorted(glob.glob(input_watch_directory + "/**/*.*", recursive=True))

zipping files using ant builder and excludes not working as expected in groovy

I am trying to zip files by tokenizing the file names in a directory. The files with that token should be zipped into the respective folder.
This code is doing that but its not filtering exactly. In abc folder, only files with abc should be present but files with def are also included which is not expected.Same way for other folders.But if there is a file with a then the filtering is happening correctly and zipping is properly done as per excludestring for all tokens except abc. Please find the code below.
Any suggestions please.
tokenList.each{token ->
for(i in tokenList)
{
excludeString = tokenList - token
println "excludeString for " +token + "is:" +excludeString
println "Creating zip folder for " +token
ant.zip( basedir: outputDir, destfile: token.substring(1,token.length()-1) +".zip", excludes: excludeString, update:true)
break
}
}
output
TokenList: [*abc*, *def*, *ghi*, *jkl*]
excludeString for *abc*is:[*def*, *ghi*, *jkl*]
Creating zip folder for *abc*
excludeString for *def*is:[*abc*, *ghi*, *jkl*]
Creating zip folder for *def*
excludeString for *ghi*is:[*abc*, *def*, *jkl*]
Creating zip folder for *ghi*
excludeString for *jkl*is:[*abc*, *def*, *ghi*]
Creating zip folder for *jkl*

How to find missing files?

I have several files (with the same dim) in a folder called data for certain dates:
file2011001.bin named like this "fileyearday"
file2011009.bin
file2011020.bin
.
.
file2011322.bin
certin dates(files) are missing. What I need is just loop through these files
if file2011001.bin exist ok, if not copy any file in the directory and name it file2011001.bin
if file2011002.bin exist ok, if not copy any file in the directory and name it file2011002.bin and so on untill file2011365.bin
I can list them in R:
dir<- list.files("/data/", "*.bin", full.names = TRUE)
I wonder if it is possible thru R or any other language!
Pretty much what you'd expect:
AllFiles = paste0("file", 2010:2015, 0:364, ".bin")
for(file in AllFiles)
{
if(file.exists(file))
{
## do something
}
}

Resources