How to find and remove partially transferred files after numerous failed rsync attempts - linux

I have launched few rsyncs over sshfs(sftp) that leaves temporary files.
Is there any way how to cleanup those files?
I don't want to run rsync with --partial option, because there are many big files and it can take ages.
I tried to find them this way:
find -name ".*.??????"
and it finds some temporary files. But I'm not 100% sure if there are any files that are not discovered using this pattern.
Is this solution sufficient?

You could run rsync again with both the --delete and --dry-run options, and perhaps with --itemize-changes. This would show you a list of all the changes that would be made. Just take note of any deletions, ignoring changed files. Unless your files have odd names, it should be obvious what are rsync temp files left behind and what are not.

Related

Compare two folders containing source files & hardlinks, remove orphaned files

I am looking for a way to compare two folders containing source files and hard links (lets use /media/store/download and /media/store/complete as an example) and then remove orphaned files that don't exist in both folders. These files may have been renamed and may be stored in subdirectories.
I'd like to set this up on a cron script to run regularly. I just can't logically figure out myself how work the logic of the script - could anyone be so kind as to help?
Many thanks
rsync can do what you want, using the --existing, --ignore-existing, and --delete options. You'll have to run it twice, once in each "direction" to clean orphans from both source and target directories.
rsync -avn --existing --ignore-existing --delete /media/store/download/ /media/store/complete
rsync -avn --existing --ignore-existing --delete /media/store/complete/ /media/store/download
--existing says don't copy orphan files
--ignore-existing says don't update existing files
--delete says delete orphans on target dir
The trailing slash on the source dir, and no trailing slash on the target dir, are mandatory for your task.
The 'n' in -avn means not to really do anything, and I always do a "dry run" with the -n option to make sure the command is going to do what I want, ESPECIALLY when using --delete. Once you're confident your command is correct, run it with just -av to actually do the work.
Perhaps rsync is of use ?
Rsync is a fast and extraordinarily versatile file copying tool. It
can copy locally, to/from another host over any remote shell, or
to/from a remote rsync daemon. It offers a large number of options
that control every aspect of its behavior and permit very flexible
specification of the set of files to be copied. It is famous for its
delta-transfer algorithm, which reduces the amount of data sent over
the network by sending only the differences between the source files
and the existing files in the destination. Rsync is widely used for
backups and mirroring and as an improved copy command for everyday
use.
Note it has a --delete option
--delete delete extraneous files from dest dirs
which could help with your specific use case above.
You can also use "diff" command to list down all the different files in two folders.

rsync : copy files if local file doesn't exist. Don't check filesize, time, checksum etc

I am using rsync to backup a million images from my linux server to my computer (windows 7 using Cygwin).
The command I am using now is :
rsync -rt --quiet --rsh='ssh -p2200' root#X.X.X.X:/home/XXX/public_html/XXX /cygdrive/images
Whenever the process is interrupted, and I start it again, it takes long time to start the copying process.
I think it is checking each file if there is any update.
The images on my server won't change once they are created.
So, is there any faster way to run the command so that it may copy files if local file doesn't exist without checking filesize, time, checksum etc...
Please suggest.
Thank you
did you try this flag -- it might help, but it might still take some time to resume the transfer:
--ignore-existing
This tells rsync to skip updating files that already exist on the destination (this does not ignore
existing directories, or nothing would get done). See also --existing.
This option is a transfer rule, not an exclude, so it doesn't affect the data that goes into the
file-lists, and thus it doesn't affect deletions. It just limits the files that the receiver requests
to be transferred.
This option can be useful for those doing backups using the --link-dest option when they need to con-
tinue a backup run that got interrupted. Since a --link-dest run is copied into a new directory hier-
archy (when it is used properly), using --ignore existing will ensure that the already-handled files
don't get tweaked (which avoids a change in permissions on the hard-linked files). This does mean that
this option is only looking at the existing files in the destination hierarchy itself.

cronjob to remove files older than 99 days

I have to make a cronjob to remove files older than 99 days in a particular directory but I'm not sure the file names were made by trustworthy Linux users. I must expect special characters, spaces, slash characters, and others.
Here is what I think could work:
find /path/to/files -mtime +99 -exec rm {}\;
But I suspect this will fail if there are special characters or if it finds a file that's read-only, (cron may not be run with superuser privileges). I need it to go on if it meets such files.
When you use -exec rm {} \;, you shouldn't have any problems with spaces, tabs, returns, or special characters because find calls the rm command directly and passes it the name of each file one at a time.
Directories won't' be removed with that command because you aren't passing it the -r parameter, and you probably don't want too. That could end up being a bit dangerous. You might also want to include the -f parameter to do a force in case you don't have write permission. Run the cron script as root, and you should be fine.
The only thing I'd worry about is that you might end up hitting a file that you don't want to remove, but has not been modified in the past 100 days. For example, the password to stop the autodestruct sequence at your work. Chances are that file hasn't been modified in the past 100 days, but once that autodestruct sequence starts, you wouldn't want the one to be blamed because the password was lost.
Okay, more reasonable might be applications that are used but rarely modified. Maybe someone's resume that hasn't been updated because they are holding a current job, etc.
So, be careful with your assumptions. Just because a file hasn't been modified in 100 days doesn't mean it isn't used. A better criteria (although still questionable) is whether the file has been accessed in the last 100 days. Maybe this as a final command:
find /path/to/files -atime +99 -type f -exec rm -f {}\;
One more thing...
Some find commands have a -delete parameter which can be used instead of the -exec rm parameter:
find /path/to/files -atime +99 -delete
That will delete both found directories and files.
One more small recommendation: For the first week, save the files found in a log file instead of removing them, and then examine the log file. This way, you make sure that you're not deleting something important. Once you're happy thet there's nothing in the log file you don't want to touch, you can remove those files. After a week, and you're satisfied that you're not going to delete anything important, you can revert the find command to do the delete for you.
If you run rm with the -f option, your file is going to be deleted regardless of whether you have write permission on the file or not (all that matters is the containing folder). So, either you can erase all the files in the folder, or none. Add also -r if you want to erase subfolders.
And I have to say it: be very careful! You're playing with fire ;) I suggest you debug with something less harmful likfe the file command.
You can test this out by creating a bunch of files like, e.g.:
touch {a,b,c,d,e,f}
And setting permissions as desired on each of them
You should use -execdir instead of -exec. Even better, read the full Security considerations for find chapter in the findutils manual.
Please, always use rm [opts] -- [files], this will save you from errors with files like -rf wiich would otherwise be parsed as options. When you provide file names, then end all options.

rsync not synchronizing .htaccess file

I am trying to rsync directory A of server1 with directory B of server2.
Sitting in the directory A of server1, I ran the following commands.
rsync -av * server2::sharename/B
but the interesting thing is, it synchronizes all files and directories except .htaccess or any hidden file in the directory A. Any hidden files within subdirectories get synchronized.
I also tried the following command:
rsync -av --include=".htaccess" * server2::sharename/B
but the results are the same.
Any ideas why hidden files of A directory are not getting synchronized and how to fix it. I am running as root user.
thanks
This is due to the fact that * is by default expanded to all files in the current working directory except the files whose name starts with a dot. Thus, rsync never receives these files as arguments.
You can pass . denoting current working directory to rsync:
rsync -av . server2::sharename/B
This way rsync will look for files to transfer in the current working directory as opposed to looking for them in what * expands to.
Alternatively, you can use the following command to make * expand to all files including those which start with a dot:
shopt -s dotglob
See also shopt manpage.
For anyone who's just trying to sync directories between servers (including all hidden files) -- e.g., syncing somedirA on source-server to somedirB on a destination server -- try this:
rsync -avz -e ssh --progress user#source-server:/somedirA/ somedirB/
Note the slashes at the end of both paths. Any other syntax may lead to unexpected results!
Also, for me its easiest to perform rsync commands from the destination server, because it's easier to make sure I've got proper write access (i.e., I might need to add sudo to the command above).
Probably goes without saying, but obviously your remote user also needs read access to somedirA on your source server. :)
I had the same issue.
For me when I did the following command the hidden files did not get rsync'ed
rsync -av /home/user1 server02:/home/user1
But when I added the slashes at the end of the paths, the hidden files were rsync'ed.
rsync -av /home/user1/ server02:/home/user1/
Note the slashes at the end of the paths, as Brian Lacy said the slashes are the key. I don't have the reputation to comment on his post or I would have done that.
I think the problem is due to shell wildcard expansion. Use . instead of star.
Consider the following example directory content
$ ls -a .
. .. .htaccess a.html z.js
The shell's wildcard expansion translates the argument list that the rsync program gets from
-av * server2::sharename/B
into
-av a.html z.js server2::sharename/B
before the command starts getting executed.
The * tell to rsynch to not synch hidden files. You should not omit it.
On a related note, in case any are coming in from google etc trying to find while rsync is not copying hidden subfolders, I found one additional reason why this can happen and figured I'd pay it forward for the next guy running into the same thing: if you are using the -C option (obviously the --exclude would do it too but I figure that one's a bit easier to spot).
In my case, I had a script that was copying several folders across computers, including a directory with several git projects and I noticed that the I couldn't run any of the normal git commands in the copied repos (yes, normally one should use git clone but this was part of a larger backup that included other things). After looking at the script, I found that it was calling rsync with 7 or 8 options.
After googling didn't turn up any obvious answers, I started going through the switches one by one. After dropping the -C option, it worked correctly. In the case of the script, the -C flag appears to have been added as a mistake, likely because sftp was originally used and -C is a compression-related option under that tool.
per man rsync, the option is described as
--cvs-exclude, -C auto-ignore files in the same way CVS does
Since CVS is an older version control system, and given the man page description, it makes perfect sense that it would behave this way.

How can I recursively copy a directory into another and replace only the files that have not changed?

I am looking to do a specific copy in Fedora.
I have two folders:
'webroot': holding ALL web files/images etc
'export': folder containing thousands of PHP, CSS, JS documents that are exported from my SVN repo.
The export directory contains many of the same files/folders that the root does, however the root contains additional ones not found in export.
I'd like to merge all of the contents of export with my webroot with the following options:
Overwriting the file in webroot if export's version contains different code than what
is inside of webroot's version (live)
Preserve the permissions/users/groups of the file if it is overwritten (the export
version replacing the live version) *NOTE I would like the webroots permissions/ownership maintained, but with export's contents
No prompting/stopping of the copy
of any kind (ie not verbose)
Recursive copy - obviously I
would like to copy all* files
folders and subfolders found in
export
I've done a bit of research into cp - would this do the job?:
cp -pruf ./export /path/to/webroot
It might, but any time the corresponding files in export and webroot have the same content but different modification times, you'd wind up performing an unnecessary copy operation. You'd probably get slightly smarter behavior from rsync:
rsync -pr ./export /path/to/webroot
Besides, rsync can copy files from one host to another over an SSH connection, if you ever have a need to do that. Plus, it has a zillion options you can specify to tweak its behavior - look in the man page for details.
EDIT: with respect to your clarification about what you mean by preserving permissions: you'd probably want to leave off the -p option.
-u overwrites existing files folder if the destination is older than source
-p perserves the permission and dates
-f turns off verbosity
-r makes the copy recursive
So looks like you got all the correct args to cp
Sounds like a job for cpio (and hence, probably, GNU tar can do it too):
cd export
find . -print | cpio -pvdm /path/to/webroot
If you need owners preserved, you have to do it as root, of course. The -p option is 'pass mode', meaning copy between locations; -v is verbose (but not interactive; there's a difference); -d means create directories as necessary; -m means preserve modification time. By default, without the -u option, cpio won't overwrite files in the target area that are newer than the one from the source area.

Resources