rsync using shopt globstar and **/. - how to exclude directories? - linux

I'm attempting to sync all files from within a large directory structure into a single root directory (ie not creating the sub directories but still including all recursive files).
Environment:
Ubuntu 12.04 x86
RSYNC version 3.0.9
GNU bash version 4.2.25(1)
So far I have this command called from a bash script which works fine and provides the basic core functionality required:
shopt -s globstar
rsync -adv /path/to/source/**/. /path/to/dest/. --exclude-from=/myexcludefile
The contents of myexcludefile are:
filename
*/
# the */ prevents all of the directories appearing in /path/to/dest/
# other failed attempts have included:
directory1
directory1/
directory1/*
I now need to exclude files that are located inside certain directories in the source tree. However due to the globstar approach of looking in all directories rsync is unable to match directories to exclude. In other words, with the exception of my /* and filename rules, everything else is completely ignored.
So I'm looking for some assistance on either the excludes syntax or if there's another way of achieving the rsync of many directories into a single destination directory that doesn't use my globstar approach.
Any help or advice would be very gratefully received.

If you want to exclude directories from a globstar match, you can save those to an array, then filter the contents of that array based on a file.
Example:
#!/bin/bash
shopt -s globstar
declare -A X
readarray -t XLIST < exclude_file.txt
for A in "${XLIST[#]}"; do
X[$A]=.
done
DIRS=(/path/to/source/**/.)
for I in "${!DIRS[#]}"; do
D=${DIRS[I]}
[[ -n ${X[$D]} ]] && unset 'DIRS[I]'
done
rsync -adv "${DIRS[#]}" /path/to/dest/.
Run with:
bash script.sh
Note that values in exclude_file.txt should really match expanded values in /path/to/source/**/..

Related

How do I navigate fast through directories with the command line?

I spent some time finding a solution for my problem but google couldn't provide me a sufficient answer... I'm working a lot with the command line in linux and I simply need a way to navigate fast through my file system. I don't want to type cd [relative or absoulte path] all the time. I know there is pushd and popd but that still seems too complicated for a simple problem like this.
When I'm in ~/Desktop/sampleFile I simply want to use sampleCommand fileToGo to get to ~/Desktop/anotherFile/anotherFile/fileToGo, no matter, where the file is located. Is there an easy command for this?
Thanks in advance!
This can be done with native Bash features without involving a sub-shell fork:
You can insert this into your "$HOME/.bashrc":
cdf(){
# Query globstar state
shopt -q globstar
# and save it in the gs variable (gs=0 if set, 1 if not)
local gs=$?
# Need globstar to glob find files in sub-directories
shopt -s globstar
# Find the file in directories
# and store the result into the matches array
matches=(**/"$1")
# globstar no longer needed, so restore its previous state
[ $gs -gt 0 ] && shopt -u globstar
# Change to the directory containing the first matched file
cd "${matches[0]%/*}" # cd EXIT status is preserved
}
Hmm, you could do something like this:
cd $(dirname $(find . -name name-of-your-file | head -n 1))
That will search the current directory (use / instead of . to search all directories) for a file called name-of-your-file and cd into the parent directory of the first file with that name that it finds.
If you're in a large directory, typing the path and using cd will probably be faster than this, but it works alright for small directories.

`mv somedir/* someotherdir` when somedir is empty

I am writing an automated bash script that moves some files from one directory to another directory, but the first directory may be empty:
$ mv somedir/* someotherdir/
mv: cannot stat 'somedir/*': No such file or directory
How can I write this command without generating an error if the directory is empty? Should I just use rm and cp instead? I could write a conditional check to see if the directory is empty first, but that feels overweight.
I'm surprised the command fails if the directory is empty, so I'm trying to find out if I'm missing some simple solution.
Environment:
bash
RHEL
If you really want full control over the process, it might look like:
#!/usr/bin/env bash
# ^^^^- bash, not sh
restore_nullglob=$(shopt -p nullglob) # store the initial state of the nullglob setting
shopt -s nullglob # unconditionally enable nullglob
source_files=( somedir/* ) # store matching files in an array
if (( ${#source_files[#]} )); then # if that array isn't empty...
mv -- "${source_files[#]}" someotherdir/ # ...move the files it contains...
else # otherwise...
echo "No files to move; doing nothing" >&2 # ...write an error message.
fi
eval "$restore_nullglob" # restore nullglob to its original setting
Explaining the moving parts:
When nullglob is set, the shell expands *.txt to an empty list if no .txt files exist; otherwise (by default), it expands *.txt to the string *.txt when there are no matching files.
source_files is an array above -- bash's native mechanism to store a list. ${#source_files[#]} expands to the length of that array, whereas ${source_files[#]} on its own expands to its contents.
(( )) creates an arithmetic context, in which expressions are treated as math. In such a context, 0 is falsey, and positive numbers are truthy. Thus, if (( ${#source_files[#]} )) is true only if there is more than one file listed in the array source_files.
BTW, note that saving and restoring nullglob isn't really essential in an independent script; the purpose of showing how to do it is so you can safely use this code in larger scripts that might make assumptions about whether or not nullglob is set, without disrupting other code.
find somedir -type f -exec mv -t someotherdir/. '{}' +
Saves you the check, may not be what you want, though.
Are you aware of the output stream and the error stream? Output stream has number 1, while error stream has number 2. In case you don't want to see a result, you can redirect that result to the garbage bin.
Excuse me?
Well, let's have a look at this case: when the directory is empty, an error is generated and that error is shown in the error stream (2). You can redirect this, using 2>/dev/null (/dev/null being the UNIX/Linux garbage bin), so your command becomes:
$ mv somedir/* someotherdir/ 2>/dev/null
Following up on Dominique, to report all errors except the empty directory one use:
mv somedir/* someotherdir 2>&1 | grep -v No.such

How to recursively get all files filtered by multiple extensions within a folder including working folder without using find in Bash script

I have this question after quite a day of searching the net, perhaps I'm doing something wrong , here is my script:
#!/bin/bash
shopt -s extglob
FILE_EXTENSIONS=properties\|xml\|sh\|sql\|ksh
SOURCE_FOLDER=$1
if [ -z "$SOURCE_FOLDER" ]; then
SOURCE_FOLDER=$(pwd)
fi # Set directory to current working folder if no input parameter.
for file in $SOURCE_FOLDER/**/*.*($FILE_EXTENSIONS)
do
echo Working with file: $file
done
Basically, I want to recursively get all the files filtered by a list of extensions within folders from a directory that is passed as an argument including the directory itself.
I would like to know if there is a way of doing this and how without the use of the find command.
Imagine I have this file tree:
bin/props.properties
bin/xmls.xml
bin/source/sources.sh
bin/config/props.properties
bin/config/folders/moreProps.xml
My script, as it is right now and running from /bin, would echo:
bin/source/sources.sh
bin/config/props.properties
bin/config/folders/moreProps.xml
Leaving the ones in the working path aside.
P.S. I know this can be done with find but I really want to know if there's another way for the sake of learning.
Thanks!
You can use find with grep, just like this:
#!/bin/bash
SOURCE_FOLDER=$1
EXTENSIONS="properties|xml|sh|sql|ksh"
find $SOURCE_FOLDER | grep -E ".(${EXTENSIONS})"
#or even better
find $SOURCE_FOLDER -regextype posix-egrep -regex ".*(${EXTENSIONS})"

rsync copy over only certain types of files using include option

I use the following bash script to copy only files of certain extension(in this case *.sh), however it still copies over all the files. what's wrong?
from=$1
to=$2
rsync -zarv --include="*.sh" $from $to
I think --include is used to include a subset of files that are otherwise excluded by --exclude, rather than including only those files.
In other words: you have to think about include meaning don't exclude.
Try instead:
rsync -zarv --include "*/" --exclude="*" --include="*.sh" "$from" "$to"
For rsync version 3.0.6 or higher, the order needs to be modified as follows (see comments):
rsync -zarv --include="*/" --include="*.sh" --exclude="*" "$from" "$to"
Adding the -m flag will avoid creating empty directory structures in the destination. Tested in version 3.1.2.
So if we only want *.sh files we have to exclude all files --exclude="*", include all directories --include="*/" and include all *.sh files --include="*.sh".
You can find some good examples in the section Include/Exclude Pattern Rules of the man page
The answer by #chepner will copy all the sub-directories whether it contains files or not. If you need to exclude the sub-directories that don't contain the file and still retain the directory structure, use
rsync -zarv --prune-empty-dirs --include "*/" --include="*.sh" --exclude="*" "$from" "$to"
Here's the important part from the man page:
As the list of files/directories to transfer is built, rsync checks each name to be transferred against the list of include/exclude patterns in turn, and the first matching pattern is acted on: if it is an exclude pattern, then that file is skipped; if it is an include pattern then that filename is not skipped; if no matching pattern is found, then the filename is not skipped.
To summarize:
Not matching any pattern means a file will be copied!
The algorithm quits once any pattern matches
Also, something ending with a slash is matching directories (like find -type d would).
Let's pull apart this answer from above.
rsync -zarv --prune-empty-dirs --include "*/" --include="*.sh" --exclude="*" "$from" "$to"
Don't skip any directories
Don't skip any .sh files
Skip everything
(Implicitly, don't skip anything, but the rule above prevents the default rule from ever happening.)
Finally, the --prune-empty-directories keeps the first rule from making empty directories all over the place.
One more addition: if you need to sync files by its extensions in one dir only (without of recursion) you should use a construction like this:
rsync -auzv --include './' --include '*.ext' --exclude '*' /source/dir/ /destination/dir/
Pay your attention to the dot in the first --include. --no-r does not work in this construction.
EDIT:
Thanks to gbyte.co for the valuable comment!
EDIT:
The -uzv flags are not related to this question directly, but I included them because I use them usually.
Wrote this handy function and put in my bash scripts or ~/.bash_aliases. Tested sync'ing locally on Linux with bash and awk installed. It works
selrsync(){
# selective rsync to sync only certain filetypes;
# based on: https://stackoverflow.com/a/11111793/588867
# Example: selrsync 'tsv,csv' ./source ./target --dry-run
types="$1"; shift; #accepts comma separated list of types. Must be the first argument.
includes=$(echo $types| awk -F',' \
'BEGIN{OFS=" ";}
{
for (i = 1; i <= NF; i++ ) { if (length($i) > 0) $i="--include=*."$i; } print
}')
restargs="$#"
echo Command: rsync -avz --prune-empty-dirs --include="*/" $includes --exclude="*" "$restargs"
eval rsync -avz --prune-empty-dirs --include="*/" "$includes" --exclude="*" $restargs
}
Advantages:
short handy and extensible when one wants to add more arguments (i.e. --dry-run).
Example:
selrsync 'tsv,csv' ./source ./target --dry-run
If someone looks for this…
I wanted to rsync only specific files and folders and managed to do it with this command: rsync --include-from=rsync-files
With rsync-files:
my-dir/
my-file.txt
- /*

How to recursively search for files with certain extensions?

I need to find all the .psd files on my Linux system (dedicated web hosting). I tried something like this: ls -R *.psd, but that's not working. Suggestions?
You can use the following find command to do that:
find /path/to/search -iname '*.psd'
iname does a case insensitive search.
you also can
ls ./**/*.psd
but:
you must have bash version 4+
you must have shopt -s globstar #in your .bashrc or .profile, etc....
will search case sensitive (or you must set shopt -s nocaseglob too)

Resources