zipping files using ant builder and excludes not working as expected in groovy - groovy

I am trying to zip files by tokenizing the file names in a directory. The files with that token should be zipped into the respective folder.
This code is doing that but its not filtering exactly. In abc folder, only files with abc should be present but files with def are also included which is not expected.Same way for other folders.But if there is a file with a then the filtering is happening correctly and zipping is properly done as per excludestring for all tokens except abc. Please find the code below.
Any suggestions please.
tokenList.each{token ->
for(i in tokenList)
{
excludeString = tokenList - token
println "excludeString for " +token + "is:" +excludeString
println "Creating zip folder for " +token
ant.zip( basedir: outputDir, destfile: token.substring(1,token.length()-1) +".zip", excludes: excludeString, update:true)
break
}
}
output
TokenList: [*abc*, *def*, *ghi*, *jkl*]
excludeString for *abc*is:[*def*, *ghi*, *jkl*]
Creating zip folder for *abc*
excludeString for *def*is:[*abc*, *ghi*, *jkl*]
Creating zip folder for *def*
excludeString for *ghi*is:[*abc*, *def*, *jkl*]
Creating zip folder for *ghi*
excludeString for *jkl*is:[*abc*, *def*, *ghi*]
Creating zip folder for *jkl*

Related

Does pysmb support copying .zip files

I am trying to copy .zip files from a shared network folder to a unix environment using pysmb. The process will copy the .zip file names, but not the contents of the files
smbFolder = "networkdrive"
conn = SMBConnection('username', 'password', smbFolder,'')
conn.connect(smbFolder)
Share='shareFolder'
ShareFolder='TargetFolder'
ShareFilename = ShareFolder
Contents = conn.listPath(Share, ShareFolder)
for Content in Contents:
try:
conn.retrieveFile(Content, open(savePath + '/' + Content.filename, 'wb'))
except: None
conn.close()
Expecting this to copy zip files to the savePath folder along with the contents of the zip file, but zip files are copied as empty folders

How to get a list of all folders that list in a specific s3 location using spark in databricks?

Currently, I am using this code but it gives me all folders plus sub-folders/files for a specified s3 location. I want only the names of the folder live in s3://production/product/:
def get_dir_content(ls_path):
dir_paths = dbutils.fs.ls(ls_path)
subdir_paths = [get_dir_content(p.path) for p in dir_paths if p.isDir() and p.path != ls_path]
flat_subdir_paths = [p for subdir in subdir_paths for p in subdir]
return list(map(lambda p: p.path, dir_paths)) + flat_subdir_paths
paths = get_dir_content('s3://production/product/')
[print(p) for p in paths]
Current output returns all folders plus sub-directories where files live which is too much. I only need the folders that live on that hierachical level of the specifiec s3 location (no deeper levels). How do I teak this code?
just use dbutils.fs.ls(ls_path)

Python Glob - Get Full Filenames, but no directory-only names

This code works, but it's returning directory names and filenames. I haven't found a parameter that tells it to return only files or only directories.
Can glob.glob do this, or do I have to call os.something to test if I have a directory or file. In my case, my files all end with .csv, but I would like to know for more general knowledge as well.
In the loop, I'm reading each file, so currently bombing when it tries to open a directory name as a filename.
files = sorted(glob.glob(input_watch_directory + "/**", recursive=True))
for loop_full_filename in files:
print(loop_full_filename)
Results:
c:\Demo\WatchDir\
c:\Demo\WatchDir\2202
c:\Demo\WatchDir\2202\07
c:\Demo\WatchDir\2202\07\01
c:\Demo\WatchDir\2202\07\01\polygonData_2022_07_01__15_51.csv
c:\Demo\WatchDir\2202\07\01\polygonData_2022_07_01__15_52.csv
c:\Demo\WatchDir\2202\07\01\polygonData_2022_07_01__15_53.csv
c:\Demo\WatchDir\2202\07\01\polygonData_2022_07_01__15_54.csv
c:\Demo\WatchDir\2202\07\01\polygonData_2022_07_01__15_55.csv
c:\Demo\WatchDir\2202\07\05
c:\Demo\WatchDir\2202\07\05\polygonData_2022_07_05__12_00.csv
c:\Demo\WatchDir\2202\07\05\polygonData_2022_07_05__12_01.csv
Results needed:
c:\Demo\WatchDir\2202\07\01\polygonData_2022_07_01__15_51.csv
c:\Demo\WatchDir\2202\07\01\polygonData_2022_07_01__15_52.csv
c:\Demo\WatchDir\2202\07\01\polygonData_2022_07_01__15_53.csv
c:\Demo\WatchDir\2202\07\01\polygonData_2022_07_01__15_54.csv
c:\Demo\WatchDir\2202\07\01\polygonData_2022_07_01__15_55.csv
c:\Demo\WatchDir\2202\07\05\polygonData_2022_07_05__12_00.csv
c:\Demo\WatchDir\2202\07\05\polygonData_2022_07_05__12_01.csv
For this specific program, I can just check if the file name contains.csv, but I would like to know in general for future reference.
Line:
files = sorted(glob.glob(input_watch_directory + "/**", recursive=True))
replace with the line:
files = sorted(glob.glob(input_watch_directory + "/**/*.*", recursive=True))

os.walk() not showing parent folder of a sub folder

So I was using os.walk() and I created a test folder with this structure
testFolder
|---folder1
| |---folder2
|---folder3
I tried to list the folders and subfolders using this code
import os
parentPath = "C:\\Users\\name\\Desktop\\Projects\\testFolder"
for parent, directories, files in os.walk(parentPath, topdown = True):
for name in directories:
print("{}".format(os.path.join(parentPath,name)))
here is the output of that code
C:\Users\name\Desktop\Projects\testFolder\folder1
C:\Users\name\Desktop\Projects\testFolder\folder3
C:\Users\name\Desktop\Projects\testFolder\folder2

How to find missing files?

I have several files (with the same dim) in a folder called data for certain dates:
file2011001.bin named like this "fileyearday"
file2011009.bin
file2011020.bin
.
.
file2011322.bin
certin dates(files) are missing. What I need is just loop through these files
if file2011001.bin exist ok, if not copy any file in the directory and name it file2011001.bin
if file2011002.bin exist ok, if not copy any file in the directory and name it file2011002.bin and so on untill file2011365.bin
I can list them in R:
dir<- list.files("/data/", "*.bin", full.names = TRUE)
I wonder if it is possible thru R or any other language!
Pretty much what you'd expect:
AllFiles = paste0("file", 2010:2015, 0:364, ".bin")
for(file in AllFiles)
{
if(file.exists(file))
{
## do something
}
}

Resources