rsync only folder without files inside and without scanning files - linux

I would like to use rsync to sync only folder between folder a and b
> rsync -zaSH --delete -vv --delete -f"+ f/" -f"- f/*" a/ b
sending incremental file list
[sender] showing directory f because of pattern f/
[sender] hiding file f/1.data because of pattern f/*
[sender] hiding file f/4.data because of pattern f/*
[sender] hiding file f/8.data because of pattern f/*
[sender] hiding file f/10.data because of pattern f/*
[sender] hiding file f/6.data because of pattern f/*
[sender] hiding file f/3.data because of pattern f/*
[sender] hiding file f/5.data because of pattern f/*
[sender] hiding file f/9.data because of pattern f/*
[sender] hiding file f/7.data because of pattern f/*
[sender] hiding file f/2.data because of pattern f/*
delta-transmission disabled for local transfer or --whole-file
f/
total: matches=0 hash_hits=0 false_alarms=0 data=0
sent 88 bytes received 90 bytes 356.00 bytes/sec
total size is 0 speedup is 0.00
> tree
.
├── a
│   └── f
│   ├── 10.data
│   ├── 1.data
│   ├── 2.data
│   ├── 3.data
│   ├── 4.data
│   ├── 5.data
│   ├── 6.data
│   ├── 7.data
│   ├── 8.data
│   └── 9.data
└── b
└── f
As can be seen from the verbose output, the files were still scanned. If there are millions of files in the folder that could take very long.
Is there a way to achieve the same goal without scanning files? Thanks.

You can't stop it scanning the files directly inside a/f/, but it shouldn't scan any subdirectories of those unless they are also called f.
This command will exclude those too:
rsync -aSH --delete -vv --delete -f"+ /f/" -f"- /f/*" a/ b/
Prefixing patterns with / means match the root of the source directory.
(I also removed -z because compression makes no sense for local copies.)

Related

Is it possible to move files within a zip file? [closed]

Closed. This question is not about programming or software development. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 5 months ago.
Improve this question
example.zip/
└── example/
├── nice.md
├── tree.md
└── diagram.md
Expected:
example.zip/
├── nice.md
├── tree.md
└── diagram.md
example.zip contains a folder with the same name. In it are files that I want to move to the root of the zip file and remove the empty directory.
I looked at the zip man page. Could not find any flags related to the issue or I could be missing something.
I tried the --copy-entries flag. This create a new zip with selected files from the existing zip but also copy over the folder hierarchy.
zip example.zip "*.md" --copy-entries --out example1.zip
I am trying to write a shell script to do this.
Is it possible to do without extracting the zip?
If you have (or can install) 7z (aka p7zip) you can make use of the d(delete) and rn(rename) options, eg:
$ mkdir example
$ touch example/{nice.md,tree.md,diagram.md}
$ zip -r example.zip example
adding: example/ (stored 0%)
adding: example/diagram.md (stored 0%)
adding: example/nice.md (stored 0%)
adding: example/tree.md (stored 0%)
$ unzip -l example.zip
Archive: example.zip
Length Date Time Name
--------- ---------- ----- ----
0 09-15-2022 09:29 example/
0 09-15-2022 09:29 example/diagram.md
0 09-15-2022 09:29 example/nice.md
0 09-15-2022 09:29 example/tree.md
--------- -------
0 4 files
# rename the *.md files first and then delete the directory; if you delete
# the directory first you'll lose all files under the directory; the 7z d/rn
# commands will generate a lot of output (not shown here)
$ 7z rn example.zip example/nice.md nice.md
$ 7z rn example.zip example/tree.md tree.md
$ 7z rn example.zip example/diagram.md diagram.md
$ 7z d example.zip example
$ unzip -l example.zip
Archive: example.zip
Length Date Time Name
--------- ---------- ----- ----
0 09-15-2022 09:29 diagram.md
0 09-15-2022 09:29 nice.md
0 09-15-2022 09:29 tree.md
--------- -------
0 3 files
$ unzip example.zip
Archive: example.zip
extracting: diagram.md
extracting: nice.md
extracting: tree.md
I'm guessing in OP's real life example the names of the directories and/or files may not be known in advance; the 7z commands do work with bash variables (eg, 7z d "${zipfile}" "${dir_to_delete}"); if OP has issues dynamically processing the contents of a given *zip then I'd recommend asking a new question ...
For a large number of renames (or deletes) it looks like you can also:
specify multiple source/destination pairs on the single command line
use a list file
Good answer. Just to be clear, 7z does not do an in-place edit on the zip file when it does the rename/delete. Under the hood it copies the old zip into a temporary file (example.zip.tmp in this instance), renaming & deleting as it does that copy. Then it deletes the original zip file and renames the temporary file, example.zip.tmp back to original filename, example.zip. For the most part this is a perfectly acceptable (and safe) approach.
Here are the relevant lines from an strace run that shows the deletion of th eoriginal example.zip file, followed by renaming the example.zip.tmp file to example.zip.
$ strace 7z rn example.zip example/tree.md tree.md
...
unlink("example.zip") = 0
rename("example.zip.tmp", "example.zip") = 0
...
Main edge condition of this approach is with very large zip files where you are strapped for disk space -- you need to have space available to store the zip file twice when it creates the temporary copy.

use ansible-vault to encrypt multiple files at once

I am using the following structure to separate my host_vars into plaintext and encrypted
ansible
├── ansible.cfg
├── host_vars
│ ├── host1
│ │ ├── vars
│ │ └── vault
│ └── host2
│ ├── vars
│ └── vault
├── inventory
├── site.yaml
└── vars
└── ansible_vars.yaml
Is there a way, using ansible-vault to encrypt both files named vault or do I have to do them one by one?
Just asking since there are more to come, e.g. in future directories of group_vars etc.
I know this works
ansible-vault encrypt host_vars/host1/vault host_vars/host2/vault
just asking whether there is a more elegant / quick solution
There are a lot of possibilities gives by shell expansions.
Here are two that would be interesting in your case:
The asterisk * expansion, that is used as a wildcard.
Which means that host_vars/*/vault would match both host_vars/host1/vault and host_vars/host2/vault but any other in the future, too.
Mind that, if, in the future, you have a more complex folder hierarchy host_vars/*/vault will only match one folder level (e.g. it won't match host_vars/level1/host1/vault), but multiple folder levels can be achieved with a double asterisk (actually named globstar): host_vars/**/vault, will match
host_vars/host1/vault as well as host_vars/level1/host1/vault
The brace expansion, on the other hands offer a more granular set of possibilities, for examples, if I have hosts names after the distributions like RedHat[1..5], Ubuntu[1..5] and Debian[1..5], I could target only the Debian and RedHat ones via host_vars/{Ubuntu*,RedHat*}/vault.
Or only target the three first of them both with host_vars/{Ubuntu{1..3},RedHat{1..3}}/vault, or the three first of them all via host_vars/*{1..3}/vault
As a more practical example, if you where to handle SE via Ansible and would like to encrypt the the files for *.stackexchange.com and stackoverflow.com but not superuser.com or any other Q&A having a specific domain name, given that the hosts are named as their DNS name, you could do
ansible-vault host_vars/{stackoverflow.com,*.stackexchange.com}/vault
I will just throw in my quick super simple shell script which worked for my simple use case.
For sure it can be improved but I think it's a good starting point.
You could also utilize secret file via --vault-password-file parameter.
#!/bin/sh
echo "Choose the option:"
echo "(d) Decrypt all Ansible vault files"
echo "(e) Encrypt all Ansible vault files"
read option
function decrypt {
ansible-vault decrypt --ask-vault-pass \
ansible/*/environments/development/group_vars/*/vault.yaml \
ansible/*/environments/development/host_vars/*/vault.yaml
}
function encrypt {
ansible-vault encrypt --ask-vault-pass \
ansible/*/environments/development/group_vars/*/vault.yaml \
ansible/*/environments/development/host_vars/*/vault.yaml
}
case $option in
d)
decrypt
;;
e)
encrypt
break
;;
*)
echo "Wrong option"
;;
esac

Renaming files in a directory based on an instructions file

I have a directory that regroups a .sql file and multiple data files.
/home/barmar/test.dir
├── database.sql
├── table1.unl
├── table2.unl
├── table3.unl
├── table4.unl
├── table5.unl
└── table6.unl
The .sql file contains an unload instructions for every .unl file the issue I have is that the names of the .unl files are not the same as the instructions on .sql.
Usually the name should be TABLE_TABID.unl Im looking for a way to retreive the names from the .sql file and rename the .unl files correctly.
The .sql file contains multiple instructions here's an example of the lines that contain the correct names.
{ unload file name = table1_89747.unl number of rows = 8376}
As you can see the only thing in common is the table name (table1) in the example
The expected result should be something like that:
/home/barmar/test.dir
├── database.sql
├── table1_89747.unl
├── table2_89765.unl
├── table3_89745.unl
├── table4_00047.unl
├── table5_00787.unl
└── table6_42538.unl
This sed line will generate commands to rename files like table1.unl to names like table1_89747.unl:
sed -n 's/.*name = \([^_]*\)\(_[^.]*\.unl\).*/mv '\''\1.unl'\'' '\''\1\2'\''/p' <database.sql
Assumptions: spaces exist around the = sign, and the filename is of the form FOO_BAR.unl, i.e. the underscore character and the extension are always present.
Sample output:
$ echo '{ unload file name = table1_89747.unl number of rows = 8376}' | sed -n 's/.*name = \([^_]*\)\(_[^.]*\.unl\).*/mv '\''\1.unl'\'' '\''\1\2'\''/p'
mv 'table1.unl' 'table1_89747.unl'
To generate and execute the commands:
eval $(sed -n 's/.*name = \([^_]*\)\(_[^.]*\.unl\).*/mv '\''\1.unl'\'' '\''\1\2'\'';/p' <database.sql | tr -d '\n')
Goes without saying, before running this make sure your database.sql doesn't have malicious strings that could lead to renaming files outside the current directory.

Loop over a directory, but starting from the end?

I have a directory with many sub-directories, each of which follow the same naming convention; the day's date. Today a folder was made: 2021-04-22
I occasionally need to go through these directories and read a file from one, but once I've read it I don't need to again.
li = []
for root, dirs, files in os.walk(path):
for f in files:
li.append(f)
The list shows me the order the files are read, which is an alphabetic(numeric?) order. I know the newest files are going to be towards the bottom because of the naming convention.
How can I start my for loop from the 'end' rather than the 'beginning'?
If this is possible, I'd then exit the loop when my criteria are met, else, what would be the point of starting at the end?
EDIT: My original naming convention was mistyped. It is YYYY-MM-DD thank you #null
To reverse any iterable or iterator in python warp it in reversed().
In your code:
li = []
for root, dirs, files in os.walk(path):
for f in reversed(files):
li.append(f)
Suppose you have this tree of directories:
.
├── 1
│   ├── a
│   │   ├── 03-01-2021
│   │   └── 04-22-2021
│   ├── b
│   │   └── 04-21-2021
│   └── c
├── 2
│   ├── a
│   │   └── 05-01-2020
│   ├── b
│   └── c
│   └── 01-01-1966
└── 3
├── a
│   ├── 12-15-2001
│   └── 12-15-2001_blah
├── b
└── c
You can use pathlib with a recursive glob to get your directories. Then use a regex to reverse the date pattern to ISO 8601 format of YYYY-MM-DD and sort in reverse fashion:
import re
from pathlib import Path
p=Path('/tmp/test/')
my_glob='**/[0-9][0-9]-[0-9][0-9]-[0-9][0-9][0-9][0-9]*'
my_regex=r'.*/(\d{2})-(\d{2})-(\d{4}).*'
for pa in sorted(
[pa for pa in p.glob(my_glob) if pa.is_dir()],
key=lambda pa: re.sub(my_regex,r'\3-\2-\1', str(pa)), reverse=True):
print(pa)
Prints:
/tmp/test/1/a/04-22-2021
/tmp/test/1/b/04-21-2021
/tmp/test/1/a/03-01-2021
/tmp/test/2/a/05-01-2020
/tmp/test/3/a/12-15-2001_blah
/tmp/test/3/a/12-15-2001
/tmp/test/2/c/01-01-1966
The glob of '**/*' makes the search recursive and adding:
**/[0-9][0-9]-[0-9][0-9]-[0-9][0-9][0-9][0-9]*
will only return files and directories that match that naming pattern. By adding the test if pa.is_dir() we are only looking at directories -- not files.
The regex:
my_regex=r'.*/(\d{2})-(\d{2})-(\d{4})/'
re.sub(my_regex,r'\3-\2-\1', str(pa))
Removes everything other than the date and reverses it to ISO 8601 for the key passed to sorted.
You asked the default order files are returned. Usually files are breadth first oldest to newest. That said, it is OS and implementation dependent.
You updated the question that your files DO have the YYYY-MM-DD naming convention. If so, just change or remove the regex. Same basic method handles both.
Since files is a list, you can use extended list slicing to reverse the list:
li = []
for root, dirs, files in os.walk(path):
for f in files[::-1]:
li.append(f)

Recursive code to traverse through directories in python and filter files

I would like to recursively search through "project" directories for "Feedback Report" folder and if that folder has no more sub directories I would like to process the files in a particular manner.
After we have reached the target directory, I want to find the latest feedback report.xlsx in that directory(which will contain many previous versions of it)
the data is really huge and inconsistent in its directory structure. I believe the following algorithm should bring me close to my desired behavior but still not sure. I have tried multiple scrappy code scripts to convert into json path hierarchy and then parse from it but the inconsistency makes the code really huge and not readable
The path of the file is important.
My algorithm that I would like to implement is:
dictionary_of_files_paths = {}
def recursive_traverse(path):
//not sure if this is a right base case
if(path.isdir):
if re.match(dir_name, *eedback*port*) and dir has no sub directory:
process(path,files)
return
for contents in os.listdir(path):
recursive_traverse(os.path.join(path, contents))
return
def process(path,files):
files.filter(filter files only with xlsx)
files.filter(filter files only that have *eedback*port* in it)
files.filter(os.path.getmtime > 2016)
files.sort(key=lambda x:os.path.getmtime(x))
reversed(files)
dictionary_of_files_paths[path] = files[0]
recursive_traverse("T:\\Something\\Something\\Projects")
I need guidance before I actually implement and need to validate if this is correct.
There is another snippet that I got for path hierarchy from stackoverflow which is
try:
for contents in os.listdir(path):
recursive_traverse(os.path.join(path, contents))
except OSError as e:
if e.errno != errno.ENOTDIR:
raise
//file
Use pathlib and glob.
Test directory structure:
.
├── Untitled.ipynb
├── bar
│   └── foo
│   └── file2.txt
└── foo
├── bar
│   └── file3.txt
├── foo
│   └── file1.txt
└── test4.txt
Code:
from pathlib import Path
here = Path('.')
for subpath in here.glob('**/foo/'):
if any(child.is_dir() for child in subpath.iterdir()):
continue # Skip the current path if it has child directories
for file in subpath.iterdir():
print(file.name)
# process your files here according to whatever logic you need
Output:
file1.txt
file2.txt

Resources