lftp mirroring directories that don't meet my criteria - linux

I've been writing an lftp script that should mirror a remote directory to a local directory efficiently, possibly transferring multiple gigabyte files at a time.
One of the requirements is that a local user can delete the local file when it is no longer needed, and since I will have multiple "local" computers running this script, I don't want to delete the remote file until I know everyone who needs it, has it. So the script uses the --newer-than flag to only mirror files that are new/modified on the remote server since the last time the lftp script ran locally.
Here's the important bits of the script:
lftp -u $login,$pass $host << EOF
set ftp:ssl-allow yes
set ftp:ssl-protect-data yes
set ftp:ssl-protect-list yes
set ftp:ssl-force yes
set mirror:use-pget-n 5
mirror -X * -I share*/* --newer-than=/local/file/last.run --continue --parallel=5 $remote_dir $local_dir
quit
EOF
Note that the EOF isn't the actual end of the bash script.
So I EXCLUDE everything in $remote_dir except anything in the share/ directory, including the share/ directory itself that are NEWER than the last.run file's timestamp.
This works as expected except in one case where say I have another specifically named directory in share/ called shareWHATEVER/
So share/shareWHATEVER/stuff.txt exists.
The first time it runs, shareWHATEVER/stuff.txt are copied remotely to locally, and all is well.
If I delete the shareWHATEVER directory locally in its entirety, including stuff.txt, then the next time the script runs, stuff.txt it NOT mirrored, but shareWHATEVER is, even though the timestamps have not changed on the remote server.
So locally it looks like share/shareWHATEVER/ where the directory is empty.
Any idea why shareWHATEVER is being copied over even though neither its own timestamp or any of its files' timestamps are --newer-than my local check?
Thanks.

Apparently, creating directories even when no files are copied is just the way lftp works (and the mirror option --no-empty-dirs doesn't change this behaviour).
You could discuss this in the lftp mailing list.

Related

copy directory from another computer on Linux

On a computer with IP address like 10.11.12.123, I have a folder document. I want to copy that folder to my local folder /home/my-pc/doc/ using the shell.
I tried like this:
scp -r smb:10.11.12.123/other-pc/document /home/my-pc/doc/
but it's not working.
So you can use below command to copy your files.
scp -r <source> <destination>
(-r: Recursively copy entire directories)
eg:
scp -r user#10.11.12.123:/other-pc/document /home/my-pc/doc
To identify the location you can use the pwd command, eg:
kasun#kasunr:~/Downloads$ pwd
/home/kasun/Downloads
If you want to copy from B to A if you are logged into B: then
scp /source username#a:/destination
If you want to copy from B to A if you are logged into A: then
scp username#b:/source /destination
In addition to the comment, when you look at your host-to-host copy options on Linux today, rsync is by far, the most robust solution around. It is brought to you by the SAMBA team[1] and continues to enjoy active development. Most distributions include the rsync package by default. (if not, you should find an easily installable package for your distro or you can download it from rsync.samba.org ).
The basic use of rsync for host-to-host directory copy is:
$ rsync -uav srchost:/path/to/dir /path/to/dest
-uav simply recursively copies -ua only new or changed files preserving file & directory times and permissions while providing -v verbose output. You will be prompted for the username/password on 10.11.12.123 unless your have setup ssh-keys to allow public/private key authentication (see: ssh-keygen for key generation)
If you notice, the syntax is basically the same as that for scp with a slight difference in the options: (e.g. scp -rv srchost:/path/to/dir /path/to/dest). rsync will use ssh for secure transport by default, so you will want to insure sshd is running on your srchost (10.11.12.123 in your case). If you have name resolution working (or a simple entry in /etc/hosts for 10.11.12.123) you can use the hostname for the remote host instead of the remote IP. Regardless, you can always transfer the files you are interested in with:
$ rsync -uav 10.11.12.123:/other-pc/document /home/my-pc/doc/
Note: do NOT include a trailing / after document if you want to copy the directory itself. If you do include a trailing / after document (i.e. 10.11.12.123:/other-pc/document/) you are telling rsync to copy the contents, (i.e. the files and directories under) document to 10.11.12.123:/other-pc/ without also copying the document directory.
The reason rsync is far superior to other copy apps is it provides options to truly synchronize filesystems and directory trees both locally and between your local machine and remote host. Meaning, in your case, if you have used rsync to transfer files to /home/my-pc/doc/ and then make changes to the files or delete files on 10.11.12.123, you can simply call rsync again and have the changes/deletions reflected in /home/my-pc/doc/. (look at the several flavors of the --delete option for details in rsync --help or in man 1 rsync)
For these, and many more reasons, it is well worth the time to familiarize yourself with rsync. It is an invaluable tool in any Linux user's hip pocket. Hopefully this will solve your problem and get you started.
Footnotes
[1] the same folks that "Opened Windows to a Wider World" allowing seemless connection between windows/Linux hosts via the native windows server message block (smb) protocol. samba.org
If the two directories (document and /home/my-pc/doc/) you mentioned are on the same 10.11.12.123 machine.
then:
cp -ai document /home/my-pc/doc/
else:
scp -r document/ root#10.11.12.123:/home/my-pc/doc/

rsync : copy files if local file doesn't exist. Don't check filesize, time, checksum etc

I am using rsync to backup a million images from my linux server to my computer (windows 7 using Cygwin).
The command I am using now is :
rsync -rt --quiet --rsh='ssh -p2200' root#X.X.X.X:/home/XXX/public_html/XXX /cygdrive/images
Whenever the process is interrupted, and I start it again, it takes long time to start the copying process.
I think it is checking each file if there is any update.
The images on my server won't change once they are created.
So, is there any faster way to run the command so that it may copy files if local file doesn't exist without checking filesize, time, checksum etc...
Please suggest.
Thank you
did you try this flag -- it might help, but it might still take some time to resume the transfer:
--ignore-existing
This tells rsync to skip updating files that already exist on the destination (this does not ignore
existing directories, or nothing would get done). See also --existing.
This option is a transfer rule, not an exclude, so it doesn't affect the data that goes into the
file-lists, and thus it doesn't affect deletions. It just limits the files that the receiver requests
to be transferred.
This option can be useful for those doing backups using the --link-dest option when they need to con-
tinue a backup run that got interrupted. Since a --link-dest run is copied into a new directory hier-
archy (when it is used properly), using --ignore existing will ensure that the already-handled files
don't get tweaked (which avoids a change in permissions on the hard-linked files). This does mean that
this option is only looking at the existing files in the destination hierarchy itself.

Keep files updated from remote server

I have a server at hostname.com/files. Whenever a file has been uploaded I want to download it.
I was thinking of creating a script that constantly checked the files directory. It would check the timestamp of the files on the server and download them based on that.
Is it possible to check the files timestamp using a bash script? Are there better ways of doing this?
I could just download all the files in the server every 1 hour. Would it therefore be better to use a cron job?
If you have a regular interval at which you'd like to update your files, yes, a cron job is probably your best bet. Just write a script that does the checking and run that at an hourly interval.
As #Barmar commented above, rsync could be another option. Put something like this in the crontab and you should be set:
# min hour day month day-of-week user command
17 * * * * user rsync -av http://hostname.com/ >> rsync.log
would grab files from the server in that location and append the details to rsync.log on the 17th minute of every hour. Right now, though, I can't seem to get rsync to get files from a webserver.
Another option using wget is:
wget -Nrb -np -o wget.log http://hostname.com/
where -N re-downloads only files newer than the timestamp on the local version, -b sends
the process to the background, -r recurses into directories and -o specifies a log file. This works from an arbitrary web server. -np makes sure it doesn't go up into a parent directory, effectively spidering the entire server's content.
More details, as usual, will be in the man pages of rsync or wget.

rsync not synchronizing .htaccess file

I am trying to rsync directory A of server1 with directory B of server2.
Sitting in the directory A of server1, I ran the following commands.
rsync -av * server2::sharename/B
but the interesting thing is, it synchronizes all files and directories except .htaccess or any hidden file in the directory A. Any hidden files within subdirectories get synchronized.
I also tried the following command:
rsync -av --include=".htaccess" * server2::sharename/B
but the results are the same.
Any ideas why hidden files of A directory are not getting synchronized and how to fix it. I am running as root user.
thanks
This is due to the fact that * is by default expanded to all files in the current working directory except the files whose name starts with a dot. Thus, rsync never receives these files as arguments.
You can pass . denoting current working directory to rsync:
rsync -av . server2::sharename/B
This way rsync will look for files to transfer in the current working directory as opposed to looking for them in what * expands to.
Alternatively, you can use the following command to make * expand to all files including those which start with a dot:
shopt -s dotglob
See also shopt manpage.
For anyone who's just trying to sync directories between servers (including all hidden files) -- e.g., syncing somedirA on source-server to somedirB on a destination server -- try this:
rsync -avz -e ssh --progress user#source-server:/somedirA/ somedirB/
Note the slashes at the end of both paths. Any other syntax may lead to unexpected results!
Also, for me its easiest to perform rsync commands from the destination server, because it's easier to make sure I've got proper write access (i.e., I might need to add sudo to the command above).
Probably goes without saying, but obviously your remote user also needs read access to somedirA on your source server. :)
I had the same issue.
For me when I did the following command the hidden files did not get rsync'ed
rsync -av /home/user1 server02:/home/user1
But when I added the slashes at the end of the paths, the hidden files were rsync'ed.
rsync -av /home/user1/ server02:/home/user1/
Note the slashes at the end of the paths, as Brian Lacy said the slashes are the key. I don't have the reputation to comment on his post or I would have done that.
I think the problem is due to shell wildcard expansion. Use . instead of star.
Consider the following example directory content
$ ls -a .
. .. .htaccess a.html z.js
The shell's wildcard expansion translates the argument list that the rsync program gets from
-av * server2::sharename/B
into
-av a.html z.js server2::sharename/B
before the command starts getting executed.
The * tell to rsynch to not synch hidden files. You should not omit it.
On a related note, in case any are coming in from google etc trying to find while rsync is not copying hidden subfolders, I found one additional reason why this can happen and figured I'd pay it forward for the next guy running into the same thing: if you are using the -C option (obviously the --exclude would do it too but I figure that one's a bit easier to spot).
In my case, I had a script that was copying several folders across computers, including a directory with several git projects and I noticed that the I couldn't run any of the normal git commands in the copied repos (yes, normally one should use git clone but this was part of a larger backup that included other things). After looking at the script, I found that it was calling rsync with 7 or 8 options.
After googling didn't turn up any obvious answers, I started going through the switches one by one. After dropping the -C option, it worked correctly. In the case of the script, the -C flag appears to have been added as a mistake, likely because sftp was originally used and -C is a compression-related option under that tool.
per man rsync, the option is described as
--cvs-exclude, -C auto-ignore files in the same way CVS does
Since CVS is an older version control system, and given the man page description, it makes perfect sense that it would behave this way.

Keep Remote Directory Up-to-date

I absolutely love the Keep Remote Directory Up-to-date feature in Winscp. Unfortunately, I can't find anything as simple to use in OS X or Linux. I know the same thing can theoretically be accomplished using changedfiles or rsync, but I've always found the tutorials for both tools to be lacking and/or contradictory.
I basically just need a tool that works in OSX or Linux and keeps a remote directory in sync (mirrored) with a local directory while I make changes to the local directory.
Update
Looking through the solutions, I see a couple which solve the general problem of keeping a remote directory in sync with a local directory manually. I know that I can set a cron task to run rsync every minute, and this should be fairly close to real time.
This is not the exact solution I was looking for as winscp does this and more: it detects file changes in a directory (while I work on them) and then automatically pushes the changes to the remote server. I know this is not the best solution (no code repository), but it allows me to very quickly test code on a server while I develop it. Does anyone know how to combine rsync with any other commands to get this functionality?
lsyncd seems to be the perfect solution. it combines inotify (kernel builtin function which watches for file changes in a directory trees) and rsync (cross platform file-syncing-tool).
lsyncd -rsyncssh /home remotehost.org backup-home/
Quote from github:
Lsyncd watches a local directory trees event monitor interface (inotify or fsevents). It aggregates and combines events for a few seconds and then spawns one (or more) process(es) to synchronize the changes. By default this is rsync. Lsyncd is thus a light-weight live mirror solution that is comparatively easy to install not requiring new filesystems or blockdevices and does not hamper local filesystem performance.
How "real-time" do you want the syncing? I would still lean toward rsync since you know it is going to be fully supported on both platforms (Windows, too, with cygwin) and you can run it via a cron job. I have a super-simple bash file that I run on my system (this does not remove old files):
#!/bin/sh
rsync -avrz --progress --exclude-from .rsync_exclude_remote . remote_login#remote_computer:remote_dir
# options
# -a archive
# -v verbose
# -r recursive
# -z compress
Your best bet is to set it up and try it out. The -n (--dry-run) option is your friend!
Keep in mind that rsync (at least in cygwin) does not support unicode file names (as of 16 Aug 2008).
What you want to do for linux remote access is use 'sshfs' - the SSH File System.
# sshfs username#host:path/to/directory local_dir
Then treat it like an network mount, which it is...
A bit more detail, like how to set it up so you can do this as a regular user, on my blog
If you want the asynchronous behavior of winSCP, you'll want to use rsync combined with something that executes it periodically. The cron solution above works, but may be overkill for the winscp use case.
The following command will execute rsync every 5 seconds to push content to the remote host. You can adjust the sleep time as needed to reduce server load.
# while true; do rsync -avrz localdir user#host:path; sleep 5; done
If you have a very large directory structure and need to reduce the overhead of the polling, you can use 'find':
# touch -d 01/01/1970 last; while true; do if [ "`find localdir -newer last -print -quit`" ]; then touch last; rsync -avrz localdir user#host:path; else echo -ne .; fi; sleep 5; done
And I said cron may be overkill? But at least this is all just done from the command line, and can be stopped via a ctrl-C.
kb
To detect changed files, you could try fam (file alteration monitor) or inotify. The latter is linux-specific, fam has a bsd port which might work on OS X. Both have userspace tools that could be used in a script together with rsync.
I have the same issue. I loved winscp "keep remote directory up to date" command. However, in my quest to rid myself of Windows, I lost winscp. I did write a script that uses fileschanged and rsync to do something similar much closer to real time.
How to use:
Make sure you have fileschanged installed
Save this script in /usr/local/bin/livesync or somewhere reachable in your $PATH and make it executable
Use Nautilus to connect to the remote host (sftp or ftp)
Run this script by doing livesync SOURCE DEST
The DEST directory will be in /home/[username]/.gvfs/[path to ftp scp or whatever]
A Couple downsides:
It is slower than winscp (my guess is because it goes through Nautilus and has to detect changes through rsync as well)
You have to manually create the destination directory if it doesn't already exist. So if you're adding a directory, it won't detect and create the directory on the DEST side.
Probably more that I haven't noticed yet
Also, do not attempt to synchronize a SRC directory named "rsyncThis". That will probably not be good :)
#!/bin/sh
upload_files()
{
if [ "$HOMEDIR" = "." ]
then
HOMEDIR=`pwd`
fi
while read input
do
SYNCFILE=${input#$HOMEDIR}
echo -n "Sync File: $SYNCFILE..."
rsync -Cvz --temp-dir="$REMOTEDIR" "$HOMEDIR/$SYNCFILE" "$REMOTEDIR/$SYNCFILE" > /dev/null
echo "Done."
done
}
help()
{
echo "Live rsync copy from one directory to another. This will overwrite the existing files on DEST."
echo "Usage: $0 SOURCE DEST"
}
case "$1" in
rsyncThis)
HOMEDIR=$2
REMOTEDIR=$3
echo "HOMEDIR=$HOMEDIR"
echo "REMOTEDIR=$REMOTEDIR"
upload_files
;;
help)
help
;;
*)
if [ -n "$1" ] && [ -n "$2" ]
then
fileschanged -r "$1" | "$0" rsyncThis "$1" "$2"
else
help
fi
;;
esac
You could always use version control, like SVN, so all you have to do is have the server run svn up on a folder every night. This runs into security issues if you are sharing your files publicly, but it works.
If you are using Linux though, learn to use rsync. It's really not that difficult as you can test every command with -n. Go through the man page, the basic format you will want is
rsync [OPTION...] SRC... [USER#]HOST:DEST
the command I run from my school server to my home backup machine is this
rsync -avi --delete ~ me#homeserv:~/School/ >> BackupLog.txt
This takes all of the files in my home directory (~) and uses rsync's archive mode (-a), verbosly (-v), lists all of the changes made (-i), while deleting any files that don't exist anymore (--delete) and puts the in the Folder /home/me/School/ on my remote server. All of the information it prints out (what was copied, what was deleted, etc.) is also appended to the file BackupLog.txt
I know that's a whirlwind tour of rsync, but I hope it helps.
The rsync solutions are really good, especially if you're only pushing changes one way. Another great tool is unison -- it attempts to syncronize changes in both directions. Read more at the Unison homepage.
Great question I have searched answer for hours !
I have tested lsyncd and the problem is that the default delay is far too long and no example command line give the -delay option.
Other problem is that by default rsync ask password each time !
Solution with lsyncd :
lsyncd --nodaemon -rsyncssh local_dir remote_user#remote_host remote_dir -delay .2
other way is to use inotify-wait in a script :
while inotifywait -r -e modify,create,delete local_dir ; do
# if you need you can add wait here
rsync -avz local_dir remote_user#remote_host:remote_dir
done
For this second solution you will have to install inotify-tools package
To suppress the need to enter password at each change, simply use ssh-keygen :
https://superuser.com/a/555800/510714
It seems like perhaps you're solving the wrong problem. If you're trying to edit files on a remote computer then you might try using something like the ftp plugin for jedit. http://plugins.jedit.org/plugins/?FTP This ensures that you have only one version of the file so it can't ever be out of sync.
Building off of icco's suggestion of SVN, I'd actually suggest that if you are using subversion or similar for source control (and if you aren't, you should probably start) you can keep the production environment up to date by putting the command to update the repository into the post-commit hook.
There are a lot of variables in how you'd want to do that, but what I've seen work is have the development or live site be a working copy and then have the post-commit use an ssh key with a forced command to log into the remote site and trigger an svn up on the working copy. Alternatively in the post-commit hook you could trigger an svn export on the remote machine, or a local (to the svn repository) svn export and then an rsync to the remote machine.
I would be worried about things that detect changes and push them, and I'd even be worried about things that ran every minute, just because of race conditions. How do you know it's not going to transfer the file at the very same instant it's being written to? Stumble across that once or twice and you'll lose all of the time-saving advantage you had by constantly rsyncing or similar.
Will DropBox (http://www.getdropbox.com/) do what you want?
User watcher.py and rsync to automate this. Read the following step by step instructions here:
http://kushellig.de/linux-file-auto-sync-directories/
I used to have the same setup under Windows as you, that is a local filetree (versioned) and a test environment on a remote server, which I kept mirrored in realtime with WinSCP. When I switched to Mac I had to do quite some digging before I was happy, but finally ended up using:
SmartSVN as my subversion client
Sublime Text 2 as my editor (already used it on Windows)
SFTP-plugin to ST2 which handles the uploading on save (sorry, can't post more than 2 links)
I can really recommend this setup, hope it helps!
I have been using WinSCP on Wine for a few years now and it works fine for the syncing operations you mention.
Here are some instructions I posted to Github on how to setup via wine: WinSCP_On_Wine
Just be aware that WinSCP is not being actively tested on wine so there may be some quirky issues. however, I use it daily on Ubuntu 20.04 for all my devops and have never lost a file and rarely experience any of such quirks.
You can also use Fetch as an SFTP client, and then edit files directly on the server from within that. There are also SSHFS (mount an ssh folder as a Volume) options. This is in line with what stimms said - are you sure you want stuff kept in sync, or just want to edit files on the server?
OS X has it's own file notifications system - this is what Spotlight is based upon. I haven't heard of any program that uses this to then keep things in sync, but it's certainly conceivable.
I personally use RCS for this type of thing:- whilst it's got a manual aspect, it's unlikely I want to push something to even the test server from my dev machine without testing it first. And if I am working on a development server, then I use one of the options given above.
Well, I had the same kind of problem and it is possible using these together: rsync, SSH Passwordless Login, Watchdog (a Python sync utility) and Terminal Notifier (an OS X notification utility made with Ruby. Not needed, but helps to know when the sync has finished).
I created the key to Passwordless Login using this tutorial from Dreamhost wiki: http://cl.ly/MIw5
1.1. When you finish, test if everything is ok… if you can't Passwordless Login, maybe you have to try afp mount. Dreamhost (where my site is) does not allow afp mount, but allows Passwordless Login. In terminal, type:
ssh username#host.com
You should login without passwords being asked :P
I installed the Terminal Notifier from the Github page: http://cl.ly/MJ5x
2.1. I used the Gem installer command. In Terminal, type:
gem install terminal-notifier
2.3. Test if the notification works.In Terminal, type:
terminal-notifier -message "Starting sync"
Create a sh script to test the rsync + notification. Save it anywhere you like, with the name you like. In this example, I'll call it ~/Scripts/sync.sh I used the ".sh extension, but I don't know if its needed.
#!/bin/bash
terminal-notifier -message "Starting sync"
rsync -azP ~/Sites/folder/ user#host.com:site_folder/
terminal-notifier -message "Sync has finished"
3.1. Remember to give execution permission to this sh script. In Terminal, type:
sudo chmod 777 ~/Scripts/sync.sh
3.2. Run the script and verify if the messages are displayed correctly and the rsync actually sync your local folder with the remote folder.
Finally, I downloaded and installed Watchdog from the Github page: http://cl.ly/MJfb
4.1. First, I installed the libyaml dependency using Brew (there are lot's of help how to install Brew - like an "aptitude" for OS X). In Terminal, type:
brew install libyaml
4.2. Then, I used the "easy_install command". Go the folder of Watchdog, and type in Terminal:
easy_install watchdog
Now, everything is installed! Go the folder you want to be synced, change this code to your needs, and type in Terminal:
watchmedo shell-command
--patterns="*.php;*.txt;*.js;*.css" \
--recursive \
--command='~/Scripts/Sync.sh' \
.
It has to be EXACTLY this way, with the slashes and line breaks, so you'll have to copy these lines to a text editor, change the script, paste in terminal and press return.
I tried without the line breaks, and it doesn't work!
In my Mac, I always get an error, but it doesn't seem to affect anything:
/Library/Python/2.7/site-packages/argh-0.22.0-py2.7.egg/argh/completion.py:84: UserWarning: Bash completion not available. Install argcomplete.
Now, made some changes in a file inside the folder, and watch the magic!
I'm using this little Ruby-Script:
#!/usr/bin/env ruby
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Rsyncs 2Folders
#
# watchAndSync by Mike Mitterer, 2014 <http://www.MikeMitterer.at>
# with credit to Brett Terpstra <http://brettterpstra.com>
# and Carlo Zottmann <https://github.com/carlo/haml-sass-file-watcher>
# Found link on: http://brettterpstra.com/2011/03/07/watch-for-file-changes-and-refresh-your-browser-automatically/
#
trap("SIGINT") { exit }
if ARGV.length < 2
puts "Usage: #{$0} watch_folder sync_folder"
puts "Example: #{$0} web keepInSync"
exit
end
dev_extension = 'dev'
filetypes = ['css','html','htm','less','js', 'dart']
watch_folder = ARGV[0]
sync_folder = ARGV[1]
puts "Watching #{watch_folder} and subfolders for changes in project files..."
puts "Syncing with #{sync_folder}..."
while true do
files = []
filetypes.each {|type|
files += Dir.glob( File.join( watch_folder, "**", "*.#{type}" ) )
}
new_hash = files.collect {|f| [ f, File.stat(f).mtime.to_i ] }
hash ||= new_hash
diff_hash = new_hash - hash
unless diff_hash.empty?
hash = new_hash
diff_hash.each do |df|
puts "Detected change in #{df[0]}, syncing..."
system("rsync -avzh #{watch_folder} #{sync_folder}")
end
end
sleep 1
end
Adapt it for your needs!
If you are developing python on remote server, Pycharm may be a good choice to you. You can synchronize your remote files with your local files utilizing pycharm remote development feature. The guide link as:
https://www.jetbrains.com/help/pycharm/creating-a-remote-server-configuration.html

Resources