rsync copy over only certain types of files using include option - linux

I use the following bash script to copy only files of certain extension(in this case *.sh), however it still copies over all the files. what's wrong?
from=$1
to=$2
rsync -zarv --include="*.sh" $from $to

I think --include is used to include a subset of files that are otherwise excluded by --exclude, rather than including only those files.
In other words: you have to think about include meaning don't exclude.
Try instead:
rsync -zarv --include "*/" --exclude="*" --include="*.sh" "$from" "$to"
For rsync version 3.0.6 or higher, the order needs to be modified as follows (see comments):
rsync -zarv --include="*/" --include="*.sh" --exclude="*" "$from" "$to"
Adding the -m flag will avoid creating empty directory structures in the destination. Tested in version 3.1.2.
So if we only want *.sh files we have to exclude all files --exclude="*", include all directories --include="*/" and include all *.sh files --include="*.sh".
You can find some good examples in the section Include/Exclude Pattern Rules of the man page

The answer by #chepner will copy all the sub-directories whether it contains files or not. If you need to exclude the sub-directories that don't contain the file and still retain the directory structure, use
rsync -zarv --prune-empty-dirs --include "*/" --include="*.sh" --exclude="*" "$from" "$to"

Here's the important part from the man page:
As the list of files/directories to transfer is built, rsync checks each name to be transferred against the list of include/exclude patterns in turn, and the first matching pattern is acted on: if it is an exclude pattern, then that file is skipped; if it is an include pattern then that filename is not skipped; if no matching pattern is found, then the filename is not skipped.
To summarize:
Not matching any pattern means a file will be copied!
The algorithm quits once any pattern matches
Also, something ending with a slash is matching directories (like find -type d would).
Let's pull apart this answer from above.
rsync -zarv --prune-empty-dirs --include "*/" --include="*.sh" --exclude="*" "$from" "$to"
Don't skip any directories
Don't skip any .sh files
Skip everything
(Implicitly, don't skip anything, but the rule above prevents the default rule from ever happening.)
Finally, the --prune-empty-directories keeps the first rule from making empty directories all over the place.

One more addition: if you need to sync files by its extensions in one dir only (without of recursion) you should use a construction like this:
rsync -auzv --include './' --include '*.ext' --exclude '*' /source/dir/ /destination/dir/
Pay your attention to the dot in the first --include. --no-r does not work in this construction.
EDIT:
Thanks to gbyte.co for the valuable comment!
EDIT:
The -uzv flags are not related to this question directly, but I included them because I use them usually.

Wrote this handy function and put in my bash scripts or ~/.bash_aliases. Tested sync'ing locally on Linux with bash and awk installed. It works
selrsync(){
# selective rsync to sync only certain filetypes;
# based on: https://stackoverflow.com/a/11111793/588867
# Example: selrsync 'tsv,csv' ./source ./target --dry-run
types="$1"; shift; #accepts comma separated list of types. Must be the first argument.
includes=$(echo $types| awk -F',' \
'BEGIN{OFS=" ";}
{
for (i = 1; i <= NF; i++ ) { if (length($i) > 0) $i="--include=*."$i; } print
}')
restargs="$#"
echo Command: rsync -avz --prune-empty-dirs --include="*/" $includes --exclude="*" "$restargs"
eval rsync -avz --prune-empty-dirs --include="*/" "$includes" --exclude="*" $restargs
}
Advantages:
short handy and extensible when one wants to add more arguments (i.e. --dry-run).
Example:
selrsync 'tsv,csv' ./source ./target --dry-run

If someone looks for this…
I wanted to rsync only specific files and folders and managed to do it with this command: rsync --include-from=rsync-files
With rsync-files:
my-dir/
my-file.txt
- /*

Related

rsync link_stat No such file or directory when using wildcard

I'm trying to copy all XML files whose name start with foo
rsync /source/dir/foo*.xml /dest/dir
If there aren't any files matching this pattern rsync throws error:
rsync: link_stat "/source/dir/foo*.xml" failed: No such file or directory (2)
Should I care about this error? Is there a way to suppress it? If there's at least one file matching the pattern then the command runs without errors.
There is a bash setting to avoid this:
shopt -s failglob
From the man page:
failglob
If set, patterns which fail to match filenames during pathname expansion result in an expansion error.
Otherwise, you can use an if to just not run the rsync when there's nothing to do.
If you really do want to sync nothing, so that it deletes files that don't exist, then the command might be like this:
rsync --include 'foo*.xml' \
--exclude '**' \
--delete \
/source/dir/ /dest/dir
The trailing slash is significant.

find returning inverted results

In a few words a wrote this little script to clean up some directories where I had consolidated directories/files from multiple sources where I used the cp command with the --backup=numbered feature so that files with identical names would have a suffix like .~1~ appended to avoid overwriting. I then ran fdupes to remove duplicate files, in some cases fdupes removed the file which did not have the suffix appended from the cp command (the original file) so I wanted to scan the directories looking for files with the suffix appended by the cp command and if the file does not exist with the suffix removed I would move mv the file otherwise I would leave it to avoid deleting anything as fdupes did not think it was a duplicate.
The issues is the test condition if [ -f ... ] part of the code below returns inverted results than what it should and I cannot understand why. For example, when the file exists it would return false and when the file did not exist it would return true. I fixed it by reversing the actions that I wanted to do based on the inverted return code and verified it was working as intended and it was so I ran it as such but would like to know if anyone knows why it would behave the way it did. I am not a bash script expert by any means so its possible that I missed something simple.
#!/bin/bash
logfile=$$.log
exec > $logfile 2>&1
IFS='
'
#set -f
for FILE in $(find . -type f -regextype posix-extended -regex '^.*(\.~[0-9]+~)+$')
do
FILE2=${FILE%%.~[0-9]*} # remove the suffix
if [ -f "${FILE2}" ]
then
echo ERROR: "${FILE2}" already exists!
else
echo "${FILE}" renamed "${FILE2}"
mv "${FILE}" "${FILE2}"
fi
done
You might be able to see the problem by modifying your script to show both FILE and FILE2 in the error message. There are a few minor problems with the script which could cause some confusion (but not the "inverted" logic):
find output is not sorted. If you had more than one backup file, a randomly chosen one would replace the original file;
you could sort the output using an expression like |sort -t~ -n -k2 on the end of the find-command.
the regular expression allows multiple matches of the ~[0-9]~ pattern. Conceivably you could have some odd file which ends with ~1~~2~.
the part where the suffix is removed assumes a single ~[0-9]~ is on the end of the filename. An embedded ~0, e.g., foo~0bar~1~ would reduce FILE to foo. The workaround for that would be more cumbersome (since the suffix-stripping uses globbing), but could be done with a case statement which matched an explicit number of digits (likely three digits would be enough).

How to find,copy and rename files in linux?

I am trying to find all files in a directory and sub-directories and then copy them to a different directory. However some of them have the same name, so I need to copy the files over and then if there are two files have the same name, rename one of those files.
So far I have managed to copy all found files with a unique name over using:
#!/bin/bash
if [ ! -e $2 ] ; then
mkdir $2
echo "Directory created"
fi
if [ ! -e $1 ] ; then
echo "image source does not exists"
fi
find $1 -name IMG_****.JPG -exec cp {} $2 \;
However, I now need some sort of if statement to figure out if a file has the same name as another file that has been copied.
Since you are on linux, you are probably using cp from coreutils. If that is the case, let it do the backup for you by using cp --backup=t
Try this approach: put the list of files in a variable and copy each file looking if the copy operation succeeds. If not, try a different name.
In code:
FILES=`find $1 -name IMG_****.JPG | xargs -r`
for FILE in $FILES; do
cp -n $FILE destination
# Check return error of latest command (i.e. cp)
# through the $? variable and, in case
# choose a different name for the destination
done
Inside the for statement, you can also put some incremental integer to try different names incrementally (e.g., name_1, name_2 and so on, until the cp command succeeds).
You can do:
for file in $1/**/IMG_*.jpg ; do
target=$2/$(basename "$file")
SUFF=0
while [[ -f "$target$SUFF" ]] ; do
(( SUFF++ ))
done
cp "$file" "$target$SUFF"
done
in your script in place of the find command to append integer suffixes to identically-named files
You can use rsync with the following switches for more control
rsync --backup --backup-dir=DIR --suffix=SUFFIX -az <source dire> <destination dir>
Here (from man page)
-b, --backup
With this option, preexisting destination files are renamed as each file is transferred or deleted. You can control where the backup file goes and what (if any) suffix gets appended using the --backup-dir and --suffix options.
--backup-dir=DIR
In combination with the --backup option, this tells rsync to store all backups in the specified directory on the receiving side. This can be used for incremental backups. You can additionally specify a backup suffix using the --suffix option (otherwise the files backed up in the specified directory will keep their original filenames).
--suffix=SUFFIX
This option allows you to override the default backup suffix used with the --backup (-b) option. The default suffix is a ~ if no --backup-dir was specified, otherwise it is an empty string.
You can use rsycn to either sync two folders on local file system or on a remote file system. You can even do syncing over ssh connection.
rsync is amazingly powerful. See the man page for all the options.

rsync using shopt globstar and **/. - how to exclude directories?

I'm attempting to sync all files from within a large directory structure into a single root directory (ie not creating the sub directories but still including all recursive files).
Environment:
Ubuntu 12.04 x86
RSYNC version 3.0.9
GNU bash version 4.2.25(1)
So far I have this command called from a bash script which works fine and provides the basic core functionality required:
shopt -s globstar
rsync -adv /path/to/source/**/. /path/to/dest/. --exclude-from=/myexcludefile
The contents of myexcludefile are:
filename
*/
# the */ prevents all of the directories appearing in /path/to/dest/
# other failed attempts have included:
directory1
directory1/
directory1/*
I now need to exclude files that are located inside certain directories in the source tree. However due to the globstar approach of looking in all directories rsync is unable to match directories to exclude. In other words, with the exception of my /* and filename rules, everything else is completely ignored.
So I'm looking for some assistance on either the excludes syntax or if there's another way of achieving the rsync of many directories into a single destination directory that doesn't use my globstar approach.
Any help or advice would be very gratefully received.
If you want to exclude directories from a globstar match, you can save those to an array, then filter the contents of that array based on a file.
Example:
#!/bin/bash
shopt -s globstar
declare -A X
readarray -t XLIST < exclude_file.txt
for A in "${XLIST[#]}"; do
X[$A]=.
done
DIRS=(/path/to/source/**/.)
for I in "${!DIRS[#]}"; do
D=${DIRS[I]}
[[ -n ${X[$D]} ]] && unset 'DIRS[I]'
done
rsync -adv "${DIRS[#]}" /path/to/dest/.
Run with:
bash script.sh
Note that values in exclude_file.txt should really match expanded values in /path/to/source/**/..

rsync selected sub folders

I want to transfer selective sub folders from a range of parent folders:
/home/user/sample_rsync/
FolderA/sub1
FolderA/sub2
FolderA/sub3
FolderB/sub1
FolderB/sub2
FolderB/sub3
FolderC/sub1
FolderC/sub2
FolderC/sub3
Say from the above example I want to copy just sub1 from each directory. i.e. in my destination I want the following folders to be created (along with the files they contain)
/destination/
sample_rsync/FolderA/sub1
sample_rsync/FolderB/sub1
sample_rsync/FolderC/sub1
How do I go about doing this?
I tried out
rsync -avh -f"- *" -f"+ *sub1/*" /home/user/sample_rsync /destination/
In an attempt to exclude everything and then just include sub1's - didnt work.
Any way I can get this working?
Assuming your source folders are in a file called "sources" as typed in your first code sement (without trailing / characters)
for s in $(cat sources)
do
rsync -av ${s} /destination/sample_rsync/$(echo ${s}| awk -F "/" '{print $1}')
done
of course this is only valid if you have a certain level deep directories in your sources file. If the depth level of the directories to be copied changes, this script will need to be heavily modified. But at least it is a starting point I hope.
upon your question below, you might want to use something like this: (ignore the code segment above. I just left it there for history purposes)
cd /home/user/sample_rsync
for dir in $(find ./ -type d -name sub1)
do
dest=$(echo ${dir} | sed -e "1,1s+/sub1++")
mkdir /destination/sample_rsync/${dest}
rsync -av ${dir} /destination/sample_rsync/${dest}
done
please do not take it as the word of gospel. I have not tested the code whatsoever. So. it might yield some unexpected results. Please test it on a system that you wouldn't mind having problems if it gets haywire.

Resources