Merge two directories with large content on slow storage efficiently - linux

I have two directory structures on a USB drive that have various files that are the same, and each have files that the other one doesn't.
What I want to do is to move over directory structure B to A. It is ok that B's content is gone after the merge. Directories in A must not be erased, because otherwise I lose A's content. The mv command won't work I think, because it will complain that it can't move a directory because a destination directory in the same place is not empty. mv B/* A/ won't work either because some sub directory will also not be empty.
cp -a B/* A/ is bad (even with -u), because it will take way too long, because the files are on a USB drive, and there possibly too many of them, making the drive run out of capacity.
rsync has the same problem, because it doesn't appear to have a move/rename feature, and it can only move files by copying them.
So either, I'm going to have to write a script that will recursively run through B, and create missing directories and move missing files to A.
But I'm hoping that there is a command or option or utility that I don't know about.

I believe cpio has the capabilities you are wanting. This command:
cd B
find . -type f -print0 | cpio -0dumpl A/.
Will find all files in B, pass them to cpio with null termination to properly handle odd file names, create necessary directories (cpio -d), preserve ownership, permissions and timestamps (-m), and use linking to create the destination files where possible (-l) unconditionally (-u).

Related

rsync only certain types of files

I know there has been a huge discussion about this but I have not found something this specific.
Im trying to copy all .key files in /home// directory
This does not work
/usr/bin/rsync -auPA --include="*/*.key" --exclude="*" /home/* /tmp/test
This works but it copies over unwanted empty directories like /home/uname/Documents
/usr/bin/rsync -auPA --include="*/" --include="*.key" --exclude="*" /home /tmp/test
Basically what i need for rsync to do is to copy only files with .key extension and only create necessarily folders that contain .key files
I think you are looking for the -m option. From the man page:
-m, --prune-empty-dirs
This option tells the receiving rsync to get rid of empty directories from the file-list, including nested directories that
have no non-directory children. This is useful for avoiding the creation of a bunch of useless directories when the sending
rsync is recursively scanning a hierarchy of files using include/exclude/filter rules.
Note that the use of transfer rules, such as the --min-size option, does not affect what goes into the file list, and thus
does not leave directories empty, even if none of the files in a directory match the transfer rule.
Because the file-list is actually being pruned, this option also affects what directories get deleted when a delete is active.
However, keep in mind that excluded files and directories can prevent existing items from being deleted due to an exclude both
hiding source files and protecting destination files. See the perishable filter-rule option for how to avoid this.
You can prevent the pruning of certain empty directories from the file-list by using a global "protect" filter. For instance,
this option would ensure that the directory "emptydir" was kept in the file-list:
--filter ’protect emptydir/’
Here’s an example that copies all .pdf files in a hierarchy, only creating the necessary destination directories to hold the
.pdf files, and ensures that any superfluous files and directories in the destination are removed (note the hide filter of
non-directories being used instead of an exclude):
rsync -avm --del --include=’*.pdf’ -f ’hide,! */’ src/ dest
If you didn’t want to remove superfluous destination files, the more time-honored options of "--include='*/' --exclude='*'"
would work fine in place of the hide-filter (if that is more natural to you).

How do I copy differing content files from one directory to another?

There exists two directories: a/ and b/.
I'd like to copy all the files(recursively) from a/ into b/.
However, I only want to copy over an a file if its content is different than the already existing b file. If the corresponding b file does not exist, then you would still copy over the a file.
*by "corresponding file", I mean a files with the same name and relative path from their parent directories.
note:
The reason I don't want to overwrite a b file with the same exact contents, is because the b directory is being monitored by another program, and I don't want the file date to change causing the program to do more work than required.
I'm essentially looking for a way to perform a cp -rf a/ b/ while performing a diff check on each file. If the file's are different, perform the copy; otherwise skip the copy.
I see that cp has an update flag:
-u, --update
copy only when the SOURCE file is newer than the destination file or when the
destination file is missing
but this will not work because I'm not concerned about newer files; I'm concerned about different file contents.
Any shell language will do.
I've been attempting to get this to work by injecting my diff check into a find command:
find a/ ??? -exec cp {} b \;
This doesn't seem like an uncommon thing to do between two directories, so I'm hoping there is an elegant command line solution as aposed to me having to write a python script.
You can achieve this using rsync. Files or directories will be updated only if there is any new update in source folder.
$rsync -av --progress sourcefolder destinationfolder

How can I preserve aliases when copying folders on the command line in OSX?

I'm trying to write a personal backup command-line utility on OSX. Let's say I have two folders:
foo/bar/
foo/baz/
foo/bar contains, among other things, OSX aliases to files in foo/baz:
foo/bar/file_alias# -> foo/baz/file
I want to copy both foo/bar and foo/baz to an external hard drive, but for various reasons I do not just want to copy the entire folder foo. I can't figure out a way to copy these folders separately and make the aliases come out right in the end:
cp -r foo/bar /external_hd/foo/bar follows the aliases, replacing them with the original files.
cp -R foo/bar /external_hd/foo/bar preserves the aliases, but they (not surprisingly) continue to point to the original files (e.g. foo/baz/file, not external_hd/foo/baz/file).
rsync -avE foo/bar /external_hd/foo/bar (see this question) seems to do the same thing as cp -R.
Is there any way to accomplish this without copying the entire parent folder foo?
I know of no way where you can automatically copy folders and relink symbolic links to a new destination without some manual intervention. If you know the new paths its quite simple to script, though.
For your specific example; the following should do the trick to relink:
cd /external_hd/foo
find . -type l | while read x; do y=$(readlink "$x" | sed s'|/foo|/external_hd/foo|'); ln -sf "$y" "$x";done
rsync will get you close, the command:
rsync -avHER --safe-links foo/{bar,baz} /external_hd/
will copy the two folders, preserve "safe" relative symlinks between, and ignore "unsafe" symlinks - those that may reference files outside of the copied tree. Change it to:
rsync -avHER --copy-unsafe-links foo/{bar,baz} /external_hd/
and "safe" relative symlinks are preserve and "unsafe" symlinks are replaced by their destination.
If you only have "safe" relative symlinks the first option will do, the second option may do if some extra copying is OK.
However, the definition of "safe" is over-restrictive. Any absolute symlink is "unsafe" even if its target is within the copied tree. Furthermore even a relative link which goes too far towards the root, or maybe is just too complicated, is also "unsafe".
If you need to fix this it should be possible, as the above options show rsync is pretty close to what you need and the source code is available from Apple's Open Source site. Examine the code around the options --links, --copy-links, --copy-unsafe-links & unsafe-links and you may find fixing the definition of "safe" is fairly easy (and you can re-write the symlinks to use the shortest possible relative path at the same time).
HTH

move (or copy) files from a list in Linux

So, I have a list of files in a text file. I believe it's about 100,000 files.
The files in said list are spread across many directories, have different sizes, filenames, extensions, ages, etc.
I am trying to find a way to move those files, and just those, to another drive.
Complicating factor: some of the files have the same name, but are not the same file. They can't just be moved into one folder with an overwriting or ignoring policy towards multiples.
Preferably, I would like them to retain their directory structure, but only have the files that I want inside the destination directory. (the destination drive isn't big enough to simply copy everything).
Below is an example of some lines in the file:
media/dave/xdd/cruzer/F#(NTFS 1)/Raw Files/Portable Network Graphic file/3601-3900/FILE3776.PNG/Windows/winsxs/amd64_microsoft-windows-o..disc-style-memories_31bf3856ad364e35_6.1.7600.16385_none_51190840a935f980/Title_mainImage-mask.png
media/dave/xdd/d1/other/hd1/Program Files/DVD Maker/Shared/DvdStyles/Memories/Title_content-background.png
I have tried to use
rsync -a --files-from=/sourcefile.txt / /media/destinationhdd
However, this just tries to copy my root directory to the destination. Please help, how to I just copy the accursed files that I want to?
cat list | xargs tar cf - | (cd dest; tar xvfp -)
Where list is the file which contains all the file paths.
dest is the target directory

How can I recursively copy a directory into another and replace only the files that have not changed?

I am looking to do a specific copy in Fedora.
I have two folders:
'webroot': holding ALL web files/images etc
'export': folder containing thousands of PHP, CSS, JS documents that are exported from my SVN repo.
The export directory contains many of the same files/folders that the root does, however the root contains additional ones not found in export.
I'd like to merge all of the contents of export with my webroot with the following options:
Overwriting the file in webroot if export's version contains different code than what
is inside of webroot's version (live)
Preserve the permissions/users/groups of the file if it is overwritten (the export
version replacing the live version) *NOTE I would like the webroots permissions/ownership maintained, but with export's contents
No prompting/stopping of the copy
of any kind (ie not verbose)
Recursive copy - obviously I
would like to copy all* files
folders and subfolders found in
export
I've done a bit of research into cp - would this do the job?:
cp -pruf ./export /path/to/webroot
It might, but any time the corresponding files in export and webroot have the same content but different modification times, you'd wind up performing an unnecessary copy operation. You'd probably get slightly smarter behavior from rsync:
rsync -pr ./export /path/to/webroot
Besides, rsync can copy files from one host to another over an SSH connection, if you ever have a need to do that. Plus, it has a zillion options you can specify to tweak its behavior - look in the man page for details.
EDIT: with respect to your clarification about what you mean by preserving permissions: you'd probably want to leave off the -p option.
-u overwrites existing files folder if the destination is older than source
-p perserves the permission and dates
-f turns off verbosity
-r makes the copy recursive
So looks like you got all the correct args to cp
Sounds like a job for cpio (and hence, probably, GNU tar can do it too):
cd export
find . -print | cpio -pvdm /path/to/webroot
If you need owners preserved, you have to do it as root, of course. The -p option is 'pass mode', meaning copy between locations; -v is verbose (but not interactive; there's a difference); -d means create directories as necessary; -m means preserve modification time. By default, without the -u option, cpio won't overwrite files in the target area that are newer than the one from the source area.

Resources