How to find missing files? - linux

I have several files (with the same dim) in a folder called data for certain dates:
file2011001.bin named like this "fileyearday"
file2011009.bin
file2011020.bin
.
.
file2011322.bin
certin dates(files) are missing. What I need is just loop through these files
if file2011001.bin exist ok, if not copy any file in the directory and name it file2011001.bin
if file2011002.bin exist ok, if not copy any file in the directory and name it file2011002.bin and so on untill file2011365.bin
I can list them in R:
dir<- list.files("/data/", "*.bin", full.names = TRUE)
I wonder if it is possible thru R or any other language!

Pretty much what you'd expect:
AllFiles = paste0("file", 2010:2015, 0:364, ".bin")
for(file in AllFiles)
{
if(file.exists(file))
{
## do something
}
}

Related

Python Glob - Get Full Filenames, but no directory-only names

This code works, but it's returning directory names and filenames. I haven't found a parameter that tells it to return only files or only directories.
Can glob.glob do this, or do I have to call os.something to test if I have a directory or file. In my case, my files all end with .csv, but I would like to know for more general knowledge as well.
In the loop, I'm reading each file, so currently bombing when it tries to open a directory name as a filename.
files = sorted(glob.glob(input_watch_directory + "/**", recursive=True))
for loop_full_filename in files:
print(loop_full_filename)
Results:
c:\Demo\WatchDir\
c:\Demo\WatchDir\2202
c:\Demo\WatchDir\2202\07
c:\Demo\WatchDir\2202\07\01
c:\Demo\WatchDir\2202\07\01\polygonData_2022_07_01__15_51.csv
c:\Demo\WatchDir\2202\07\01\polygonData_2022_07_01__15_52.csv
c:\Demo\WatchDir\2202\07\01\polygonData_2022_07_01__15_53.csv
c:\Demo\WatchDir\2202\07\01\polygonData_2022_07_01__15_54.csv
c:\Demo\WatchDir\2202\07\01\polygonData_2022_07_01__15_55.csv
c:\Demo\WatchDir\2202\07\05
c:\Demo\WatchDir\2202\07\05\polygonData_2022_07_05__12_00.csv
c:\Demo\WatchDir\2202\07\05\polygonData_2022_07_05__12_01.csv
Results needed:
c:\Demo\WatchDir\2202\07\01\polygonData_2022_07_01__15_51.csv
c:\Demo\WatchDir\2202\07\01\polygonData_2022_07_01__15_52.csv
c:\Demo\WatchDir\2202\07\01\polygonData_2022_07_01__15_53.csv
c:\Demo\WatchDir\2202\07\01\polygonData_2022_07_01__15_54.csv
c:\Demo\WatchDir\2202\07\01\polygonData_2022_07_01__15_55.csv
c:\Demo\WatchDir\2202\07\05\polygonData_2022_07_05__12_00.csv
c:\Demo\WatchDir\2202\07\05\polygonData_2022_07_05__12_01.csv
For this specific program, I can just check if the file name contains.csv, but I would like to know in general for future reference.
Line:
files = sorted(glob.glob(input_watch_directory + "/**", recursive=True))
replace with the line:
files = sorted(glob.glob(input_watch_directory + "/**/*.*", recursive=True))

How to get the name of the directory from the name of the directory + the file

In an application, I can get the path to a file which resides in a directory as a string:
"/path/to/the/file.txt"
In order to write another another file into that same directory, I want to change the string "/path/to/the/file.txt" and remove the part "file.txt" to finally only get
"/path/to/the/"
as a string
I could use
string = "/path/to/the/file.txt"
string.split('/')
and then glue all the term (except the last one) together with a loop
Is there an easy way to do it?
You can use os.path.basename for getting last part of path and delete it with using replace.
import os
path = "/path/to/the/file.txt"
delete = os.path.basename(os.path.normpath(path))
print(delete) # will return file.txt
#Remove file.txt in path
path = path.replace(delete,'')
print(path)
OUTPUT :
file.txt
/path/to/the/
Let say you have an array include txt files . you can get all path like
new_path = ['file2.txt','file3.txt','file4.txt']
for get_new_path in new_path:
print(path + get_new_path)
OUTPUT :
/path/to/the/file2.txt
/path/to/the/file3.txt
/path/to/the/file4.txt
Here is what I finally used
iter = len(string.split('/'))-1
directory_path_str = ""
for i in range(0,iter):
directory_path_str = directory_path_str + srtr.split('/')[i] + "/"

zipping files using ant builder and excludes not working as expected in groovy

I am trying to zip files by tokenizing the file names in a directory. The files with that token should be zipped into the respective folder.
This code is doing that but its not filtering exactly. In abc folder, only files with abc should be present but files with def are also included which is not expected.Same way for other folders.But if there is a file with a then the filtering is happening correctly and zipping is properly done as per excludestring for all tokens except abc. Please find the code below.
Any suggestions please.
tokenList.each{token ->
for(i in tokenList)
{
excludeString = tokenList - token
println "excludeString for " +token + "is:" +excludeString
println "Creating zip folder for " +token
ant.zip( basedir: outputDir, destfile: token.substring(1,token.length()-1) +".zip", excludes: excludeString, update:true)
break
}
}
output
TokenList: [*abc*, *def*, *ghi*, *jkl*]
excludeString for *abc*is:[*def*, *ghi*, *jkl*]
Creating zip folder for *abc*
excludeString for *def*is:[*abc*, *ghi*, *jkl*]
Creating zip folder for *def*
excludeString for *ghi*is:[*abc*, *def*, *jkl*]
Creating zip folder for *ghi*
excludeString for *jkl*is:[*abc*, *def*, *ghi*]
Creating zip folder for *jkl*

Spark: Traverse HDFS subfolders and find all files with name "X"

I have a HDFS path and I want to traverse through all the subfolders and find all the files within that have the name "X".
I have tried to do this:
FileSystem.get( sc.hadoopConfiguration )
.listStatus( new Path("hdfs://..."))
.foreach( x => println(x.getPath))
But this only searches for files within 1 level and I want all levels.
You need to get all the files recursively. Loop through the path and get all the files, if it is a directory call the same function once again.
Below is a simple code you can modify as your configuration and test.
var fileSystem : FileSystem = _
var configuration: Configuration = _
def init() {
configuration = new Configuration
fileSystem = FileSystem.get(configuration)
val fileStatus: Array[FileStatus] = fileSystem.listStatus(new Path(""))
getAllFiles(fileStatus)
}
def getAllFiles(fileStatus: Array[FileStatus]) {
fileStatus.map(fs => {
if (fs.isDirectory)
getAllFiles(fileSystem.listStatus(fs.getPath))
else fs
})
}
Also filter the files that contains 'X' after getting the file list.

Have R look for files in a library directory

I am using R, on linux.
I have a set a functions that I use often, and that I have saved in different .r script files. Those files are in ~/r_lib/.
I would like to include those files without having to use the fully qualified name, but just "file.r". Basically I am looking the same command as -I in the c++ compiler.
I there a way to set the include file from R, in the .Rprofile or .Renviron file?
Thanks
You can use the sourceDir function in the Examples section of ?source:
sourceDir <- function(path, trace = TRUE, ...) {
for (nm in list.files(path, pattern = "\\.[RrSsQq]$")) {
if(trace) cat(nm,":")
source(file.path(path, nm), ...)
if(trace) cat("\n")
}
}
And you may want to use sys.source to avoid cluttering your global environment.
If you set the chdir parameter of source to TRUE, then the source calls within the included file will be relative to its path. Hence, you can call:
source("~/r_lib/file.R",chdir=T)
It would probably be better not to have source calls within your "library" and make your code into a package, but sometimes this is convenient.
Get all the files of your directory, in your case
d <- list.files("~/r_lib/")
then you can load them with a function of the plyr package
library(plyr)
l_ply(d, function(x) source(paste("~/r_lib/", x, sep = "")))
If you like you can do it in a loop as well or use a different function onstead of l_ply. Conventional loop:
for (i in 1:length(d)) source(paste("~/r_lib/", d[[i]], sep = ""))
Write your own source() wrapper?
mySource <- function(script, path = "~/r_lib/", ...) {
## paste path+filename
fname <- paste(path, script, sep = "")
## source the file
source(fname, ...)
}
You could stick that in your .Rprofile do is will be loaded each time you start R.
If you want to load all the R files, you can extend the above easily to source all files at once
mySource <- function(path = "~/r_lib/", ...) {
## list of files
fnames <- list.files(path, pattern = "\\.[RrSsQq]$")
## add path
fnames <- paste(path, fnames, sep = "")
## source the files
lapply(fnames, source, ...)
invisible()
}
Actually, though, you'd be better off starting your own private package and loading that.

Resources