Nested array comprehension in Julia - list-comprehension

I'm trying to get a list of files with the new walkdir function in Julia. The following works, but I would like the result to be a flat list of files. Can this be achieved with array comprehension, without flattening the array after it has been created?
files = [[joinpath(root, file) for file in files] for (root, dirs, files) in collect(walkdir(AUDIO_PATH))]

As far as I know this can not solved with an array comprehension, without flattening the array after it has been created. But you could define a function which iterates over walkdir as:
function files_func(path)
function it()
for (root, dirs, files) in walkdir(path)
for file in files
produce(joinpath(root,file))
end
end
end
Task(it)
end
When this function is defined a list of the files can be obtained by collect(files_func(AUDIO_PATH)). Alternatively
a list of files can be obtained by looping over walkdir as:
allfiles=ASCIIString[]
for (root, dirs, files) in walkdir(path)
for file in files
push!(allfiles,joinpath(root,file))
end
end
allfiles

As #Daniel Høegh points out, it seems you can't. But you can flatten it easily with the vcat function:
all_files(path::AbstractString) = vcat([[joinpath(root, file) for file in files] for (root, dirs, files) in collect(walkdir(path))]...)
This other more readable version is like Daniel's iterator/generator, but using the cartesian product for loop syntax, alternative #task macro (just to show example of it) and the compact assignment function definition syntax:
function each_file(path::AbstractString)
iter() = for (root, dirs, files) in walkdir(path), file in files
produce(joinpath(root, file))
end
#task iter()
end
# No need to flatten anything:
all_files(path::AbstractString) = collect(each_file(path))
for file in each_file(AUDIO_PATH)
#show file
end
audio_files = all_files(AUDIO_PATH)

An option without comprehension but readable (to my human neuralware):
filelist = AbstractString[]
for (root, dirs, files) in walkdir(AUDIO_PATH)
append!(filelist,map(_->joinpath(root,_),files))
end
Anonymous functions and map may incur performance cost but this should be of smaller importance in file mapping code compared to readability.

Related

Return a list of the paths of all the parts.txt files

Write a function list_files_walk that returns a list of the paths of all the parts.txt files, using the os module's walk generator. The function takes no input parameters.
def list_filess_walk():
for dirpath, dirnames, filenames in os.walk("CarItems"):
if 'parts.txt' in dirpath:
list_files.append(filenames)
print(list_files)
return list_files
Currently, list_files is still empty. The output is supposed to look similar to this:
CarItems/Chevrolet/Chevelle/2011/parts.txt
CarItems/Chevrolet/Chevelle/1982/parts.txt
How can I produce this output?
You pretty much have it here--the only adjustments I'd make are:
Make sure list_files is scoped locally to the function to avoid side effects.
Use parameters so that the function can work on any arbitrary path.
Return a generator with the yield keyword which allows for the next file to be fetched lazily.
'parts.txt' in dirpath could be error-prone if the filename happens to be a substring elsewhere in a path. I'd use endswith or iterate over the second item in the tuple that os.walk which is a list of all the items in the current directory, e.g. 'parts.txt' in dirnames.
Along the same line of thought as above, you might want to make sure that your target is a file with os.path.isfile.
Here's an example:
import os
def find_files_rec(path, fname):
for dirpath, dirnames, files in os.walk(path):
if fname in files:
yield f"{dirpath}/{fname}"
if __name__ == "__main__":
print(list(find_files_rec(".", "parts.txt")))

Exclude directories in a List Comprehension

I want to get a list of all picture files in a directory, excluding certain subdirectories.
I have a List Comprehension I normally use to extract files, which works, but includes subdirectories I do not want.
This is on macOS and 'Photos Library.photoslibrary' is "package".The contents are normally hidden by the OS and the library appears to the user as a file, but to Unix this is just a normal directory which contains a massive number of files.
I have attempted to exclude the directory, as os.walk() describes, but my attempts all produce syntax errors.
the caller can modify the dirnames list in-place
(e.g., via del or slice assignment), and walk will only recurse into the
subdirectories whose names remain in dirnames
Is it possible to exclude within a List Comprehension
#!/usr/bin/python3
import os
pdir = "/Users/ian/Pictures"
def get_files(top, extension=".jpg"):
"""
For each directory in the directory tree rooted at top,
return all files which match extension.
"""
files = [os.path.join(dirname, filename)
for dirname, dirnames, filenames in os.walk(top)
# if 'Photos Library.photoslibrary' in dirnames:
# dirnames.remove('Photos Library.photoslibrary')
for filename in filenames
if filename.endswith(extension)
if 'Photos Library.photoslibrary' in dirnames:
dirnames.remove('Photos Library.photoslibrary')
]
return files
for file in get_files(pdir, (".JPG", ".JPEG", ".jpg", ".jpeg")):
print(file)
I couldn't get a List Comprehension to work, so I modified the code to a Generator Function, and make a List from the result.
The code below works.
def get_files(top, exclude=None, extension=".jpg"):
"""
For each directory in the directory tree rooted at top,
return all files which match extension.
exclude is an optional string or tuple/list of strings
to exclude named subdirectories.
"""
for dirname, dirnames, filenames in os.walk(top):
if(exclude is not None):
if(type(exclude) == str): # prevent Python treating str as sequence
if exclude in dirnames:
dirnames.remove(exclude)
else:
for excl in exclude:
if excl in dirnames:
dirnames.remove(excl)
for filename in filenames:
if filename.endswith(extension):
yield(os.path.join(dirname, filename))
for file in get_files(pdir, ('Photos Library.photoslibrary', 'iPhoto Library.photolibrary'), (".JPG", ".JPEG", ".jpg", ".jpeg")):
print(file)
The type test for exclude is inelegant, but Python polymorphism otherwise misinterprets strings,

Python loop files from a specific filename

For example:
Under the folder, the file list is like:
20110101
20110102
20110103
...
20140101
20140102
...
20171231
How can I start looping those files not from the natural beginning (20110101)
but from a middle one (20140101)?
Well you can get an unsorted list of all the files in the current directory with os.listdir(). So you need to first sort this alphabetically (the default when using the sorted() function), and find the index of that "beginning file" and iterate from there.
So, in code, the above would look something like:
import os
b = '20110101'
fs = sorted(os.listdir())
for f in fs[fs.index(b):]:
...

Rename files according to list

I'm trying to rename files in a directory using a list. My code so far will only rename the first file before giving me a FileNotFoundError. How can I read the list and rename my files in the same order as it?
import os
import glob
fileLib = ('/filepath1/')
ref = ('/filepath2/ref.csv')
for file in glob.glob(os.path.join(fileLib, '*.csv')):
with open(ref) as list1:
line = list1.read().split(',\n')
for name in line:
os.rename(file, os.path.join(fileLib, '{}.csv'.format(name)))
You're applying the rename to the same file, since the loops are nested.
So the first time it works, and the next time it tries to rename a file that has been already renamed.
Reorganize your code. First, read the new names file:
fileLib = '/filepath1/'
ref = '/filepath2/ref.csv'
with open(ref) as list1:
newnames = list1.read().split(',\n')
then zip directory contents and the new names list together with a single loop:
for file,newname in zip(glob.glob(os.path.join(fileLib, '*.csv')),newnames):
os.rename(file, os.path.join(fileLib, '{}.csv'.format(newname)))
Since zip stops when one of the iterable parameters is exhausted, if the glob result is longer than the new names list, renaming will be done only partially, so it would be better to check that both lists have the same size prior to renamining.

Can I force os.walk to visit directories in alphabetical order?

I would like to know if it's possible to force os.walk in python3 to visit directories in alphabetical order. For example, here is a directory and some code that will walk this directory:
ryan:~/bktest$ ls -1 sample
CD01
CD02
CD03
CD04
CD05
--------
def main_work_subdirs(gl):
for root, dirs, files in os.walk(gl['pwd']):
if root == gl['pwd']:
for d2i in dirs:
print(d2i)
When the python code hits the directory above, here is the output:
ryan:~/bktest$ ~/test.py sample
CD03
CD01
CD05
CD02
CD04
I would like to force walk to visit these dirs in alphabetical order, 01, 02 ... 05. In the python3 doc for os.walk, it says:
When topdown is True, the caller can modify the dirnames list in-place
(perhaps using del or slice assignment), and walk() will only recurse
into the subdirectories whose names remain in dirnames; this can be
used to prune the search, impose a specific order of visiting
Does that mean that I can impose an alphabetical visiting order on os.walk? If so, how?
Yes. You sort dirs in the loop.
def main_work_subdirs(gl):
for root, dirs, files in os.walk(gl['pwd']):
dirs.sort()
if root == gl['pwd']:
for d2i in dirs:
print(d2i)
I know this has already been answered but I wanted to add one little detail and adding more than a single line of code in the comments is wonky.
In addition to wanting the directories sorted I also wanted the files sorted so that my iteration through "gl" was consistent and predictable. To do this one more sort was required:
for root, dirs, files in os.walk(gl['pwd']):
dirs.sort()
for filename in sorted(files):
print(os.path.join(root, filename))
And, with benefit of learning more about Python, a different (better) way:
from pathlib import Path
# Directories, per original question.
[print(p) for p in sorted(Path(gl['pwd']).glob('**/*')) if p.is_dir()]
# Files, like I usually need.
[print(p) for p in sorted(Path(gl['pwd']).glob('**/*')) if p.is_file()]
This answer is not specific to this question and the problem is a little different but the solution can be used in either case.
Consider having these files ("one1.txt", "one2.txt", "one10.txt") and the content of all of them is a String "default":
I want to loop through a directory that contains these files and find a specific String in every file and replace it with the name of the file.
If you use any other methods which have already mentioned here and in other questions (like dirs.sort() and sorted(files) and sorted(dirs), the result will be something like this:
"one1.txt"--> "one10"
"one2.txt"--> "one1"
"one10.txt" --> "one2"
But we want it to be:
"one1.txt"--> "one1"
"one2.txt"--> "one2"
"one10.txt" --> "one10"
I found this method which changes file content alphabetically:
import re, os, fnmatch
def atoi(text):
return int(text) if text.isdigit() else text
def natural_keys(text):
'''
alist.sort(key=natural_keys) sorts in human order
http://nedbatchelder.com/blog/200712/human_sorting.html
(See Toothy's implementation in the comments)
'''
return [ atoi(c) for c in re.split('(\d+)', text) ]
def findReplace(directory, find, replace, filePattern):
count = 0
for path, dirs, files in sorted(os.walk(os.path.abspath(directory))):
dirs.sort()
for filename in sorted(fnmatch.filter(files, filePattern), key=natural_keys):
count = count +1
filepath = os.path.join(path, filename)
with open(filepath) as f:
s = f.read()
s = s.replace(find, replace+str(count)+".png")
with open(filepath, "w") as f:
f.write(s)
Then run this line:
findReplace(os.getcwd(), "default", "one", "*.xml")

Resources