I have a directory with many sub-directories, each of which follow the same naming convention; the day's date. Today a folder was made: 2021-04-22
I occasionally need to go through these directories and read a file from one, but once I've read it I don't need to again.
li = []
for root, dirs, files in os.walk(path):
for f in files:
li.append(f)
The list shows me the order the files are read, which is an alphabetic(numeric?) order. I know the newest files are going to be towards the bottom because of the naming convention.
How can I start my for loop from the 'end' rather than the 'beginning'?
If this is possible, I'd then exit the loop when my criteria are met, else, what would be the point of starting at the end?
EDIT: My original naming convention was mistyped. It is YYYY-MM-DD thank you #null
To reverse any iterable or iterator in python warp it in reversed().
In your code:
li = []
for root, dirs, files in os.walk(path):
for f in reversed(files):
li.append(f)
Suppose you have this tree of directories:
.
├── 1
│ ├── a
│ │ ├── 03-01-2021
│ │ └── 04-22-2021
│ ├── b
│ │ └── 04-21-2021
│ └── c
├── 2
│ ├── a
│ │ └── 05-01-2020
│ ├── b
│ └── c
│ └── 01-01-1966
└── 3
├── a
│ ├── 12-15-2001
│ └── 12-15-2001_blah
├── b
└── c
You can use pathlib with a recursive glob to get your directories. Then use a regex to reverse the date pattern to ISO 8601 format of YYYY-MM-DD and sort in reverse fashion:
import re
from pathlib import Path
p=Path('/tmp/test/')
my_glob='**/[0-9][0-9]-[0-9][0-9]-[0-9][0-9][0-9][0-9]*'
my_regex=r'.*/(\d{2})-(\d{2})-(\d{4}).*'
for pa in sorted(
[pa for pa in p.glob(my_glob) if pa.is_dir()],
key=lambda pa: re.sub(my_regex,r'\3-\2-\1', str(pa)), reverse=True):
print(pa)
Prints:
/tmp/test/1/a/04-22-2021
/tmp/test/1/b/04-21-2021
/tmp/test/1/a/03-01-2021
/tmp/test/2/a/05-01-2020
/tmp/test/3/a/12-15-2001_blah
/tmp/test/3/a/12-15-2001
/tmp/test/2/c/01-01-1966
The glob of '**/*' makes the search recursive and adding:
**/[0-9][0-9]-[0-9][0-9]-[0-9][0-9][0-9][0-9]*
will only return files and directories that match that naming pattern. By adding the test if pa.is_dir() we are only looking at directories -- not files.
The regex:
my_regex=r'.*/(\d{2})-(\d{2})-(\d{4})/'
re.sub(my_regex,r'\3-\2-\1', str(pa))
Removes everything other than the date and reverses it to ISO 8601 for the key passed to sorted.
You asked the default order files are returned. Usually files are breadth first oldest to newest. That said, it is OS and implementation dependent.
You updated the question that your files DO have the YYYY-MM-DD naming convention. If so, just change or remove the regex. Same basic method handles both.
Since files is a list, you can use extended list slicing to reverse the list:
li = []
for root, dirs, files in os.walk(path):
for f in files[::-1]:
li.append(f)
Related
I am using the following structure to separate my host_vars into plaintext and encrypted
ansible
├── ansible.cfg
├── host_vars
│ ├── host1
│ │ ├── vars
│ │ └── vault
│ └── host2
│ ├── vars
│ └── vault
├── inventory
├── site.yaml
└── vars
└── ansible_vars.yaml
Is there a way, using ansible-vault to encrypt both files named vault or do I have to do them one by one?
Just asking since there are more to come, e.g. in future directories of group_vars etc.
I know this works
ansible-vault encrypt host_vars/host1/vault host_vars/host2/vault
just asking whether there is a more elegant / quick solution
There are a lot of possibilities gives by shell expansions.
Here are two that would be interesting in your case:
The asterisk * expansion, that is used as a wildcard.
Which means that host_vars/*/vault would match both host_vars/host1/vault and host_vars/host2/vault but any other in the future, too.
Mind that, if, in the future, you have a more complex folder hierarchy host_vars/*/vault will only match one folder level (e.g. it won't match host_vars/level1/host1/vault), but multiple folder levels can be achieved with a double asterisk (actually named globstar): host_vars/**/vault, will match
host_vars/host1/vault as well as host_vars/level1/host1/vault
The brace expansion, on the other hands offer a more granular set of possibilities, for examples, if I have hosts names after the distributions like RedHat[1..5], Ubuntu[1..5] and Debian[1..5], I could target only the Debian and RedHat ones via host_vars/{Ubuntu*,RedHat*}/vault.
Or only target the three first of them both with host_vars/{Ubuntu{1..3},RedHat{1..3}}/vault, or the three first of them all via host_vars/*{1..3}/vault
As a more practical example, if you where to handle SE via Ansible and would like to encrypt the the files for *.stackexchange.com and stackoverflow.com but not superuser.com or any other Q&A having a specific domain name, given that the hosts are named as their DNS name, you could do
ansible-vault host_vars/{stackoverflow.com,*.stackexchange.com}/vault
I will just throw in my quick super simple shell script which worked for my simple use case.
For sure it can be improved but I think it's a good starting point.
You could also utilize secret file via --vault-password-file parameter.
#!/bin/sh
echo "Choose the option:"
echo "(d) Decrypt all Ansible vault files"
echo "(e) Encrypt all Ansible vault files"
read option
function decrypt {
ansible-vault decrypt --ask-vault-pass \
ansible/*/environments/development/group_vars/*/vault.yaml \
ansible/*/environments/development/host_vars/*/vault.yaml
}
function encrypt {
ansible-vault encrypt --ask-vault-pass \
ansible/*/environments/development/group_vars/*/vault.yaml \
ansible/*/environments/development/host_vars/*/vault.yaml
}
case $option in
d)
decrypt
;;
e)
encrypt
break
;;
*)
echo "Wrong option"
;;
esac
I have a directory that regroups a .sql file and multiple data files.
/home/barmar/test.dir
├── database.sql
├── table1.unl
├── table2.unl
├── table3.unl
├── table4.unl
├── table5.unl
└── table6.unl
The .sql file contains an unload instructions for every .unl file the issue I have is that the names of the .unl files are not the same as the instructions on .sql.
Usually the name should be TABLE_TABID.unl Im looking for a way to retreive the names from the .sql file and rename the .unl files correctly.
The .sql file contains multiple instructions here's an example of the lines that contain the correct names.
{ unload file name = table1_89747.unl number of rows = 8376}
As you can see the only thing in common is the table name (table1) in the example
The expected result should be something like that:
/home/barmar/test.dir
├── database.sql
├── table1_89747.unl
├── table2_89765.unl
├── table3_89745.unl
├── table4_00047.unl
├── table5_00787.unl
└── table6_42538.unl
This sed line will generate commands to rename files like table1.unl to names like table1_89747.unl:
sed -n 's/.*name = \([^_]*\)\(_[^.]*\.unl\).*/mv '\''\1.unl'\'' '\''\1\2'\''/p' <database.sql
Assumptions: spaces exist around the = sign, and the filename is of the form FOO_BAR.unl, i.e. the underscore character and the extension are always present.
Sample output:
$ echo '{ unload file name = table1_89747.unl number of rows = 8376}' | sed -n 's/.*name = \([^_]*\)\(_[^.]*\.unl\).*/mv '\''\1.unl'\'' '\''\1\2'\''/p'
mv 'table1.unl' 'table1_89747.unl'
To generate and execute the commands:
eval $(sed -n 's/.*name = \([^_]*\)\(_[^.]*\.unl\).*/mv '\''\1.unl'\'' '\''\1\2'\'';/p' <database.sql | tr -d '\n')
Goes without saying, before running this make sure your database.sql doesn't have malicious strings that could lead to renaming files outside the current directory.
I would like to use rsync to sync only folder between folder a and b
> rsync -zaSH --delete -vv --delete -f"+ f/" -f"- f/*" a/ b
sending incremental file list
[sender] showing directory f because of pattern f/
[sender] hiding file f/1.data because of pattern f/*
[sender] hiding file f/4.data because of pattern f/*
[sender] hiding file f/8.data because of pattern f/*
[sender] hiding file f/10.data because of pattern f/*
[sender] hiding file f/6.data because of pattern f/*
[sender] hiding file f/3.data because of pattern f/*
[sender] hiding file f/5.data because of pattern f/*
[sender] hiding file f/9.data because of pattern f/*
[sender] hiding file f/7.data because of pattern f/*
[sender] hiding file f/2.data because of pattern f/*
delta-transmission disabled for local transfer or --whole-file
f/
total: matches=0 hash_hits=0 false_alarms=0 data=0
sent 88 bytes received 90 bytes 356.00 bytes/sec
total size is 0 speedup is 0.00
> tree
.
├── a
│ └── f
│ ├── 10.data
│ ├── 1.data
│ ├── 2.data
│ ├── 3.data
│ ├── 4.data
│ ├── 5.data
│ ├── 6.data
│ ├── 7.data
│ ├── 8.data
│ └── 9.data
└── b
└── f
As can be seen from the verbose output, the files were still scanned. If there are millions of files in the folder that could take very long.
Is there a way to achieve the same goal without scanning files? Thanks.
You can't stop it scanning the files directly inside a/f/, but it shouldn't scan any subdirectories of those unless they are also called f.
This command will exclude those too:
rsync -aSH --delete -vv --delete -f"+ /f/" -f"- /f/*" a/ b/
Prefixing patterns with / means match the root of the source directory.
(I also removed -z because compression makes no sense for local copies.)
I would like to recursively search through "project" directories for "Feedback Report" folder and if that folder has no more sub directories I would like to process the files in a particular manner.
After we have reached the target directory, I want to find the latest feedback report.xlsx in that directory(which will contain many previous versions of it)
the data is really huge and inconsistent in its directory structure. I believe the following algorithm should bring me close to my desired behavior but still not sure. I have tried multiple scrappy code scripts to convert into json path hierarchy and then parse from it but the inconsistency makes the code really huge and not readable
The path of the file is important.
My algorithm that I would like to implement is:
dictionary_of_files_paths = {}
def recursive_traverse(path):
//not sure if this is a right base case
if(path.isdir):
if re.match(dir_name, *eedback*port*) and dir has no sub directory:
process(path,files)
return
for contents in os.listdir(path):
recursive_traverse(os.path.join(path, contents))
return
def process(path,files):
files.filter(filter files only with xlsx)
files.filter(filter files only that have *eedback*port* in it)
files.filter(os.path.getmtime > 2016)
files.sort(key=lambda x:os.path.getmtime(x))
reversed(files)
dictionary_of_files_paths[path] = files[0]
recursive_traverse("T:\\Something\\Something\\Projects")
I need guidance before I actually implement and need to validate if this is correct.
There is another snippet that I got for path hierarchy from stackoverflow which is
try:
for contents in os.listdir(path):
recursive_traverse(os.path.join(path, contents))
except OSError as e:
if e.errno != errno.ENOTDIR:
raise
//file
Use pathlib and glob.
Test directory structure:
.
├── Untitled.ipynb
├── bar
│ └── foo
│ └── file2.txt
└── foo
├── bar
│ └── file3.txt
├── foo
│ └── file1.txt
└── test4.txt
Code:
from pathlib import Path
here = Path('.')
for subpath in here.glob('**/foo/'):
if any(child.is_dir() for child in subpath.iterdir()):
continue # Skip the current path if it has child directories
for file in subpath.iterdir():
print(file.name)
# process your files here according to whatever logic you need
Output:
file1.txt
file2.txt
In my package.json, I have a scripts block that uses **/*Test.js to match files. When run via npm, they do not match sub-directories more than one level. When executed on the command line directly, they work as expected.
Can anyone explain what is happening, and provide a workaround or solution?
package.json
{
"name": "immutable-ts",
"scripts": {
"test": "echo mocha dist/**/*Test.js",
}
}
Execution
% npm run test
> immutable-ts#0.0.0 test:unit .../immutable-ts
> echo mocha dist/**/*Test.js
mocha dist/queue/QueueTest.js dist/stack/StackTest.js
% echo mocha dist/**/*Test.js
mocha dist/queue/QueueTest.js dist/stack/StackTest.js dist/tree/binary/BinaryTreeTest.js
% ls dist/**/*
dist/collections.js dist/queue/QueueTest.js dist/tree/binary/BinaryTree.js dist/immutable.js.map dist/stack/Stack.js.map dist/tree/binary/BinaryTreeTest.js.map
dist/immutable.js dist/stack/Stack.js dist/tree/binary/BinaryTreeTest.js dist/queue/Queue.js.map dist/stack/StackTest.js.map
dist/queue/Queue.js dist/stack/StackTest.js dist/collections.js.map dist/queue/QueueTest.js.map dist/tree/binary/BinaryTree.js.map
Solution
Change your scripts so that what you pass to Mocha is protected from expansion by the shell:
"scripts": {
"test": "mocha 'dist/**/*Test.js'",
}
Note the single quotes around the parameter given to mocha.
Explanation
This issue is fixable without resorting to external tools. The root cause of your problem is that by npm uses sh as the shell that will run your script commands.
It is overwhelmingly the case that when a *nix process starts a shell it will start sh unless there is something telling it to do otherwise. The shell preference you set for logins does not constitute a way to "tell it otherwise". So if you have, say, zsh as your login shell, it does not entail that npm will use zsh.
Those implementations of sh that do not include any extensions beyond what sh should provide do not understand the ** glob in the way you want it to. As far as I can tell, it is interpreted as *. However, Mocha interprets the paths passed to it using its a JavaScript implementation of globs. So you can work around the issue by protecting your globs from being interpreted by sh. Consider the following package.json:
{
"name": "immutable-ts",
"scripts": {
"bad": "mocha test/**/*a.js",
"good": "mocha 'test/**/*a.js'",
"shell": "echo $0"
}
}
The shell script is just so that we can check what shell is running the script. If you run it, you should see sh.
Now, given the following tree:
test/
├── a.js
├── b.js
├── x
│ ├── a
│ │ ├── a.js
│ │ └── b.js
│ ├── a.js
│ └── b
│ └── a.js
└── y
├── a.js
└── q
With all a.js and b.js files containing it(__filename);. You get the following results:
$ npm run bad
> immutable-ts# bad /tmp/t2
> mocha test/**/*a.js
- /tmp/t2/test/x/a.js
- /tmp/t2/test/y/a.js
0 passing (6ms)
2 pending
$ npm run good
> immutable-ts# good /tmp/t2
> mocha 'test/**/*a.js'
- /tmp/t2/test/a.js
- /tmp/t2/test/x/a.js
- /tmp/t2/test/x/a/a.js
- /tmp/t2/test/x/b/a.js
- /tmp/t2/test/y/a.js
0 passing (5ms)
5 pending
You can inline the find command with the -name option in your scripts to replace the extended globbing syntax provided by zsh.
In your case, the command would be:
mocha `find dist -type f -name '*Test.js'`
You can realistically omit the -type f part if you're confident that you won't ever put "Test.js" in a directory name. (A safe assumption, most likely, but I included it for completeness sake)
The glob expansion is actually done by your shell and that's why it works from the command line.
You can do mocha --recursive and point at your test directory.