How can I recursively copy a directory into another and replace only the files that have not changed? - linux

I am looking to do a specific copy in Fedora.
I have two folders:
'webroot': holding ALL web files/images etc
'export': folder containing thousands of PHP, CSS, JS documents that are exported from my SVN repo.
The export directory contains many of the same files/folders that the root does, however the root contains additional ones not found in export.
I'd like to merge all of the contents of export with my webroot with the following options:
Overwriting the file in webroot if export's version contains different code than what
is inside of webroot's version (live)
Preserve the permissions/users/groups of the file if it is overwritten (the export
version replacing the live version) *NOTE I would like the webroots permissions/ownership maintained, but with export's contents
No prompting/stopping of the copy
of any kind (ie not verbose)
Recursive copy - obviously I
would like to copy all* files
folders and subfolders found in
export
I've done a bit of research into cp - would this do the job?:
cp -pruf ./export /path/to/webroot

It might, but any time the corresponding files in export and webroot have the same content but different modification times, you'd wind up performing an unnecessary copy operation. You'd probably get slightly smarter behavior from rsync:
rsync -pr ./export /path/to/webroot
Besides, rsync can copy files from one host to another over an SSH connection, if you ever have a need to do that. Plus, it has a zillion options you can specify to tweak its behavior - look in the man page for details.
EDIT: with respect to your clarification about what you mean by preserving permissions: you'd probably want to leave off the -p option.

-u overwrites existing files folder if the destination is older than source
-p perserves the permission and dates
-f turns off verbosity
-r makes the copy recursive
So looks like you got all the correct args to cp

Sounds like a job for cpio (and hence, probably, GNU tar can do it too):
cd export
find . -print | cpio -pvdm /path/to/webroot
If you need owners preserved, you have to do it as root, of course. The -p option is 'pass mode', meaning copy between locations; -v is verbose (but not interactive; there's a difference); -d means create directories as necessary; -m means preserve modification time. By default, without the -u option, cpio won't overwrite files in the target area that are newer than the one from the source area.

Related

rsync only certain types of files

I know there has been a huge discussion about this but I have not found something this specific.
Im trying to copy all .key files in /home// directory
This does not work
/usr/bin/rsync -auPA --include="*/*.key" --exclude="*" /home/* /tmp/test
This works but it copies over unwanted empty directories like /home/uname/Documents
/usr/bin/rsync -auPA --include="*/" --include="*.key" --exclude="*" /home /tmp/test
Basically what i need for rsync to do is to copy only files with .key extension and only create necessarily folders that contain .key files
I think you are looking for the -m option. From the man page:
-m, --prune-empty-dirs
This option tells the receiving rsync to get rid of empty directories from the file-list, including nested directories that
have no non-directory children. This is useful for avoiding the creation of a bunch of useless directories when the sending
rsync is recursively scanning a hierarchy of files using include/exclude/filter rules.
Note that the use of transfer rules, such as the --min-size option, does not affect what goes into the file list, and thus
does not leave directories empty, even if none of the files in a directory match the transfer rule.
Because the file-list is actually being pruned, this option also affects what directories get deleted when a delete is active.
However, keep in mind that excluded files and directories can prevent existing items from being deleted due to an exclude both
hiding source files and protecting destination files. See the perishable filter-rule option for how to avoid this.
You can prevent the pruning of certain empty directories from the file-list by using a global "protect" filter. For instance,
this option would ensure that the directory "emptydir" was kept in the file-list:
--filter ’protect emptydir/’
Here’s an example that copies all .pdf files in a hierarchy, only creating the necessary destination directories to hold the
.pdf files, and ensures that any superfluous files and directories in the destination are removed (note the hide filter of
non-directories being used instead of an exclude):
rsync -avm --del --include=’*.pdf’ -f ’hide,! */’ src/ dest
If you didn’t want to remove superfluous destination files, the more time-honored options of "--include='*/' --exclude='*'"
would work fine in place of the hide-filter (if that is more natural to you).

How can I preserve aliases when copying folders on the command line in OSX?

I'm trying to write a personal backup command-line utility on OSX. Let's say I have two folders:
foo/bar/
foo/baz/
foo/bar contains, among other things, OSX aliases to files in foo/baz:
foo/bar/file_alias# -> foo/baz/file
I want to copy both foo/bar and foo/baz to an external hard drive, but for various reasons I do not just want to copy the entire folder foo. I can't figure out a way to copy these folders separately and make the aliases come out right in the end:
cp -r foo/bar /external_hd/foo/bar follows the aliases, replacing them with the original files.
cp -R foo/bar /external_hd/foo/bar preserves the aliases, but they (not surprisingly) continue to point to the original files (e.g. foo/baz/file, not external_hd/foo/baz/file).
rsync -avE foo/bar /external_hd/foo/bar (see this question) seems to do the same thing as cp -R.
Is there any way to accomplish this without copying the entire parent folder foo?
I know of no way where you can automatically copy folders and relink symbolic links to a new destination without some manual intervention. If you know the new paths its quite simple to script, though.
For your specific example; the following should do the trick to relink:
cd /external_hd/foo
find . -type l | while read x; do y=$(readlink "$x" | sed s'|/foo|/external_hd/foo|'); ln -sf "$y" "$x";done
rsync will get you close, the command:
rsync -avHER --safe-links foo/{bar,baz} /external_hd/
will copy the two folders, preserve "safe" relative symlinks between, and ignore "unsafe" symlinks - those that may reference files outside of the copied tree. Change it to:
rsync -avHER --copy-unsafe-links foo/{bar,baz} /external_hd/
and "safe" relative symlinks are preserve and "unsafe" symlinks are replaced by their destination.
If you only have "safe" relative symlinks the first option will do, the second option may do if some extra copying is OK.
However, the definition of "safe" is over-restrictive. Any absolute symlink is "unsafe" even if its target is within the copied tree. Furthermore even a relative link which goes too far towards the root, or maybe is just too complicated, is also "unsafe".
If you need to fix this it should be possible, as the above options show rsync is pretty close to what you need and the source code is available from Apple's Open Source site. Examine the code around the options --links, --copy-links, --copy-unsafe-links & unsafe-links and you may find fixing the definition of "safe" is fairly easy (and you can re-write the symlinks to use the shortest possible relative path at the same time).
HTH

Compare two folders containing source files & hardlinks, remove orphaned files

I am looking for a way to compare two folders containing source files and hard links (lets use /media/store/download and /media/store/complete as an example) and then remove orphaned files that don't exist in both folders. These files may have been renamed and may be stored in subdirectories.
I'd like to set this up on a cron script to run regularly. I just can't logically figure out myself how work the logic of the script - could anyone be so kind as to help?
Many thanks
rsync can do what you want, using the --existing, --ignore-existing, and --delete options. You'll have to run it twice, once in each "direction" to clean orphans from both source and target directories.
rsync -avn --existing --ignore-existing --delete /media/store/download/ /media/store/complete
rsync -avn --existing --ignore-existing --delete /media/store/complete/ /media/store/download
--existing says don't copy orphan files
--ignore-existing says don't update existing files
--delete says delete orphans on target dir
The trailing slash on the source dir, and no trailing slash on the target dir, are mandatory for your task.
The 'n' in -avn means not to really do anything, and I always do a "dry run" with the -n option to make sure the command is going to do what I want, ESPECIALLY when using --delete. Once you're confident your command is correct, run it with just -av to actually do the work.
Perhaps rsync is of use ?
Rsync is a fast and extraordinarily versatile file copying tool. It
can copy locally, to/from another host over any remote shell, or
to/from a remote rsync daemon. It offers a large number of options
that control every aspect of its behavior and permit very flexible
specification of the set of files to be copied. It is famous for its
delta-transfer algorithm, which reduces the amount of data sent over
the network by sending only the differences between the source files
and the existing files in the destination. Rsync is widely used for
backups and mirroring and as an improved copy command for everyday
use.
Note it has a --delete option
--delete delete extraneous files from dest dirs
which could help with your specific use case above.
You can also use "diff" command to list down all the different files in two folders.

rsync not synchronizing .htaccess file

I am trying to rsync directory A of server1 with directory B of server2.
Sitting in the directory A of server1, I ran the following commands.
rsync -av * server2::sharename/B
but the interesting thing is, it synchronizes all files and directories except .htaccess or any hidden file in the directory A. Any hidden files within subdirectories get synchronized.
I also tried the following command:
rsync -av --include=".htaccess" * server2::sharename/B
but the results are the same.
Any ideas why hidden files of A directory are not getting synchronized and how to fix it. I am running as root user.
thanks
This is due to the fact that * is by default expanded to all files in the current working directory except the files whose name starts with a dot. Thus, rsync never receives these files as arguments.
You can pass . denoting current working directory to rsync:
rsync -av . server2::sharename/B
This way rsync will look for files to transfer in the current working directory as opposed to looking for them in what * expands to.
Alternatively, you can use the following command to make * expand to all files including those which start with a dot:
shopt -s dotglob
See also shopt manpage.
For anyone who's just trying to sync directories between servers (including all hidden files) -- e.g., syncing somedirA on source-server to somedirB on a destination server -- try this:
rsync -avz -e ssh --progress user#source-server:/somedirA/ somedirB/
Note the slashes at the end of both paths. Any other syntax may lead to unexpected results!
Also, for me its easiest to perform rsync commands from the destination server, because it's easier to make sure I've got proper write access (i.e., I might need to add sudo to the command above).
Probably goes without saying, but obviously your remote user also needs read access to somedirA on your source server. :)
I had the same issue.
For me when I did the following command the hidden files did not get rsync'ed
rsync -av /home/user1 server02:/home/user1
But when I added the slashes at the end of the paths, the hidden files were rsync'ed.
rsync -av /home/user1/ server02:/home/user1/
Note the slashes at the end of the paths, as Brian Lacy said the slashes are the key. I don't have the reputation to comment on his post or I would have done that.
I think the problem is due to shell wildcard expansion. Use . instead of star.
Consider the following example directory content
$ ls -a .
. .. .htaccess a.html z.js
The shell's wildcard expansion translates the argument list that the rsync program gets from
-av * server2::sharename/B
into
-av a.html z.js server2::sharename/B
before the command starts getting executed.
The * tell to rsynch to not synch hidden files. You should not omit it.
On a related note, in case any are coming in from google etc trying to find while rsync is not copying hidden subfolders, I found one additional reason why this can happen and figured I'd pay it forward for the next guy running into the same thing: if you are using the -C option (obviously the --exclude would do it too but I figure that one's a bit easier to spot).
In my case, I had a script that was copying several folders across computers, including a directory with several git projects and I noticed that the I couldn't run any of the normal git commands in the copied repos (yes, normally one should use git clone but this was part of a larger backup that included other things). After looking at the script, I found that it was calling rsync with 7 or 8 options.
After googling didn't turn up any obvious answers, I started going through the switches one by one. After dropping the -C option, it worked correctly. In the case of the script, the -C flag appears to have been added as a mistake, likely because sftp was originally used and -C is a compression-related option under that tool.
per man rsync, the option is described as
--cvs-exclude, -C auto-ignore files in the same way CVS does
Since CVS is an older version control system, and given the man page description, it makes perfect sense that it would behave this way.

How do I mirror a directory with wget without creating parent directories?

I want to mirror a folder via FTP, like this:
wget --mirror --user=x --password=x ftp://ftp.site.com/folder/subfolder/evendeeper
But I do not want to create a directory structure like this:
ftp.site.com -> folder -> subfolder -> evendeeper
I just want:
evendeeper
And anything below it to be the resulting structure. It would also be acceptable for the contents of evendeeper to wind up in the current directory as long as subdirectories are created for subdirectories of evendeeper on the server.
I am aware of the -np option, according to the documentation that just keeps it from following links to parent pages (a non-issue for the binary files I'm mirroring via FTP). I am also aware of the -nd option, but this prevents creating any directory structure at all, even for subdirectories of evendeeper.
I would consider alternatives as long as they are command-line-based, readily available as Ubuntu packages and easily automated like wget.
For a path like: ftp.site.com/a/b/c/d
-nH would download all files to the directory a/b/c/d in the current directory, and -nH --cut-dirs=3 would download all files to the directory d in the current directory.
I had a similar requirement and the following combination seems to be the perfect choice:
In the below example, all the files in http://url/dir1/dir2 (alone) are downloaded to local directory /dest/dir
wget -nd -np -P /dest/dir --recursive http://url/dir1/dir2
Thanks #ffledgling for the hint on "-nd"
For the above example:
wget -nd -np --mirror --user=x --password=x ftp://ftp.site.com/folder/subfolder/evendeeper
Snippets from manual:
-nd
--no-directories
Do not create a hierarchy of directories when retrieving recursively. With this option turned on, all files will get saved to the current directory, without clobbering (if a name shows up more than once, the
filenames will get extensions .n).
-np
--no-parent
Do not ever ascend to the parent directory when retrieving recursively. This is a useful option, since it guarantees that only the files below a certain hierarchy will be downloaded.
-np (no parent) option will probably do what you want, tied in with -L 1 (I think, don't have a wget install before me), which limits the recursion to one level.
EDIT. ok. gah... maybe I should wait until I've had coffee.. There is a --cut or similar option, which allows you to "cut" a specified number of directories from the output path, so for /a/b/c/d, a cut of 2 would force wget to create c/d on your local machine
Instead of using:
-nH --cut-dirs=1
use:
-nH --cut-dirs=100
This will cut more directories and no folders will be created.
Note: 100 = the number of folders to skip creating.
You can change 100 to any number.

Resources