rsync : copy files if local file doesn't exist. Don't check filesize, time, checksum etc

rsync : copy files if local file doesn't exist. Don't check filesize, time, checksum etc - cygwin

I am using rsync to backup a million images from my linux server to my computer (windows 7 using Cygwin).
The command I am using now is :
rsync -rt --quiet --rsh='ssh -p2200' root#X.X.X.X:/home/XXX/public_html/XXX /cygdrive/images
Whenever the process is interrupted, and I start it again, it takes long time to start the copying process.
I think it is checking each file if there is any update.
The images on my server won't change once they are created.
So, is there any faster way to run the command so that it may copy files if local file doesn't exist without checking filesize, time, checksum etc...
Please suggest.
Thank you

did you try this flag -- it might help, but it might still take some time to resume the transfer:
--ignore-existing
This tells rsync to skip updating files that already exist on the destination (this does not ignore
existing directories, or nothing would get done). See also --existing.
This option is a transfer rule, not an exclude, so it doesn't affect the data that goes into the
file-lists, and thus it doesn't affect deletions. It just limits the files that the receiver requests
to be transferred.
This option can be useful for those doing backups using the --link-dest option when they need to con-
tinue a backup run that got interrupted. Since a --link-dest run is copied into a new directory hier-
archy (when it is used properly), using --ignore existing will ensure that the already-handled files
don't get tweaked (which avoids a change in permissions on the hard-linked files). This does mean that
this option is only looking at the existing files in the destination hierarchy itself.

Related

Empty log files daily using cron task

I want to empty (not delete) log files daily at a particular time. something like
echo "" > /home/user/dir/log/*.log
but it returns
-bash: /home/user/dir/log/*.log: ambiguous redirect
is there any way to achieve this?

You can't redirect to more than one file, but you can tee to multiple files.
tee /home/user/dir/log/*.log </dev/null
The redirect from /dev/null also avoids writing an empty line to the beginning of each file, which was another bug in your attempt. (Perhaps specify nullglob to avoid creating a file with the name *.log if the wildcard doesn't match any existing files, though.)
However, a much better solution is probably to use the utility logrotate which is installed out of the box on every Debian (and thus also Ubuntu, Mint, etc) installation. It runs nightly by default, and can be configured by dropping a file in its configuration directory. It lets you compress the previous version of a log file instead of just overwrite, and takes care to preserve ownership and permissions etc.

How to find and remove partially transferred files after numerous failed rsync attempts

I have launched few rsyncs over sshfs(sftp) that leaves temporary files.
Is there any way how to cleanup those files?
I don't want to run rsync with --partial option, because there are many big files and it can take ages.
I tried to find them this way:
find -name ".*.??????"
and it finds some temporary files. But I'm not 100% sure if there are any files that are not discovered using this pattern.
Is this solution sufficient?

You could run rsync again with both the --delete and --dry-run options, and perhaps with --itemize-changes. This would show you a list of all the changes that would be made. Just take note of any deletions, ignoring changed files. Unless your files have odd names, it should be obvious what are rsync temp files left behind and what are not.

Compare two folders containing source files & hardlinks, remove orphaned files

I am looking for a way to compare two folders containing source files and hard links (lets use /media/store/download and /media/store/complete as an example) and then remove orphaned files that don't exist in both folders. These files may have been renamed and may be stored in subdirectories.
I'd like to set this up on a cron script to run regularly. I just can't logically figure out myself how work the logic of the script - could anyone be so kind as to help?
Many thanks

rsync can do what you want, using the --existing, --ignore-existing, and --delete options. You'll have to run it twice, once in each "direction" to clean orphans from both source and target directories.
rsync -avn --existing --ignore-existing --delete /media/store/download/ /media/store/complete
rsync -avn --existing --ignore-existing --delete /media/store/complete/ /media/store/download
--existing says don't copy orphan files
--ignore-existing says don't update existing files
--delete says delete orphans on target dir
The trailing slash on the source dir, and no trailing slash on the target dir, are mandatory for your task.
The 'n' in -avn means not to really do anything, and I always do a "dry run" with the -n option to make sure the command is going to do what I want, ESPECIALLY when using --delete. Once you're confident your command is correct, run it with just -av to actually do the work.

Perhaps rsync is of use ?
Rsync is a fast and extraordinarily versatile file copying tool. It
can copy locally, to/from another host over any remote shell, or
to/from a remote rsync daemon. It offers a large number of options
that control every aspect of its behavior and permit very flexible
specification of the set of files to be copied. It is famous for its
delta-transfer algorithm, which reduces the amount of data sent over
the network by sending only the differences between the source files
and the existing files in the destination. Rsync is widely used for
backups and mirroring and as an improved copy command for everyday
use.
Note it has a --delete option
--delete delete extraneous files from dest dirs
which could help with your specific use case above.

You can also use "diff" command to list down all the different files in two folders.

lftp mirroring directories that don't meet my criteria

I've been writing an lftp script that should mirror a remote directory to a local directory efficiently, possibly transferring multiple gigabyte files at a time.
One of the requirements is that a local user can delete the local file when it is no longer needed, and since I will have multiple "local" computers running this script, I don't want to delete the remote file until I know everyone who needs it, has it. So the script uses the --newer-than flag to only mirror files that are new/modified on the remote server since the last time the lftp script ran locally.
Here's the important bits of the script:
lftp -u $login,$pass $host << EOF
set ftp:ssl-allow yes
set ftp:ssl-protect-data yes
set ftp:ssl-protect-list yes
set ftp:ssl-force yes
set mirror:use-pget-n 5
mirror -X * -I share*/* --newer-than=/local/file/last.run --continue --parallel=5 $remote_dir $local_dir
quit
EOF
Note that the EOF isn't the actual end of the bash script.
So I EXCLUDE everything in $remote_dir except anything in the share/ directory, including the share/ directory itself that are NEWER than the last.run file's timestamp.
This works as expected except in one case where say I have another specifically named directory in share/ called shareWHATEVER/
So share/shareWHATEVER/stuff.txt exists.
The first time it runs, shareWHATEVER/stuff.txt are copied remotely to locally, and all is well.
If I delete the shareWHATEVER directory locally in its entirety, including stuff.txt, then the next time the script runs, stuff.txt it NOT mirrored, but shareWHATEVER is, even though the timestamps have not changed on the remote server.
So locally it looks like share/shareWHATEVER/ where the directory is empty.
Any idea why shareWHATEVER is being copied over even though neither its own timestamp or any of its files' timestamps are --newer-than my local check?
Thanks.

Apparently, creating directories even when no files are copied is just the way lftp works (and the mirror option --no-empty-dirs doesn't change this behaviour).
You could discuss this in the lftp mailing list.

How can I recursively copy a directory into another and replace only the files that have not changed?

I am looking to do a specific copy in Fedora.
I have two folders:
'webroot': holding ALL web files/images etc
'export': folder containing thousands of PHP, CSS, JS documents that are exported from my SVN repo.
The export directory contains many of the same files/folders that the root does, however the root contains additional ones not found in export.
I'd like to merge all of the contents of export with my webroot with the following options:
Overwriting the file in webroot if export's version contains different code than what
is inside of webroot's version (live)
Preserve the permissions/users/groups of the file if it is overwritten (the export
version replacing the live version) *NOTE I would like the webroots permissions/ownership maintained, but with export's contents
No prompting/stopping of the copy
of any kind (ie not verbose)
Recursive copy - obviously I
would like to copy all* files
folders and subfolders found in
export
I've done a bit of research into cp - would this do the job?:
cp -pruf ./export /path/to/webroot

It might, but any time the corresponding files in export and webroot have the same content but different modification times, you'd wind up performing an unnecessary copy operation. You'd probably get slightly smarter behavior from rsync:
rsync -pr ./export /path/to/webroot
Besides, rsync can copy files from one host to another over an SSH connection, if you ever have a need to do that. Plus, it has a zillion options you can specify to tweak its behavior - look in the man page for details.
EDIT: with respect to your clarification about what you mean by preserving permissions: you'd probably want to leave off the -p option.

-u overwrites existing files folder if the destination is older than source
-p perserves the permission and dates
-f turns off verbosity
-r makes the copy recursive
So looks like you got all the correct args to cp

Sounds like a job for cpio (and hence, probably, GNU tar can do it too):
cd export
find . -print | cpio -pvdm /path/to/webroot
If you need owners preserved, you have to do it as root, of course. The -p option is 'pass mode', meaning copy between locations; -v is verbose (but not interactive; there's a difference); -d means create directories as necessary; -m means preserve modification time. By default, without the -u option, cpio won't overwrite files in the target area that are newer than the one from the source area.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string