Rsync files across a dodgy network link - hangs instead of timeout - linux

I am trying to keep 3 large directories (9G, 400G, 800G) in sync between our home site and another in a land far, far away across a network link that is a bit dodgy (slow and drops occasionally). Data was copied onto disks prior to installation so the rsync only needs to send updates.
The problem I'm having is the rsync hangs for hours on the client side.
The smaller 9G job completed, the 400G job has been in limbo for 15 hours - no output to the log file in that time, but has not timed out.
What I've done to setup for this (after reading many forum articles about rsync/rsync server/partial since I am not really a system admin)
I setup rsync server (/etc/rsyncd.conf) on our home system, entred it into xinetd and wrote a script to run rsync on the distant server, it loops if rsync fails in an attempt to deal with the dodgy network. The rsync command in the script looks like this:
rsync -avzAXP --append root#homesys01::tools /disk1/tools
Note the "-P" option is equivalent to "--progress --partial"
I can see in the log file that rsync did fail at one point and the loop restarted rsync, data was transferred after that based on entries in the log file, but the last update to the log file was 15 hours ago, and the rsync process on the client is still running.
CNT=0
while [ 1 ]
do
rsync -avzAXP --append root#homesys01::tools /disk1/tools
STATUS=$?
if [ $STATUS -eq 0 ] ; then
echo "Successful completion of tools rsync."
exit 0
else
CNT=`expr ${CNT} + 1`
echo " Rsync of tools failure. Status returned: ${STATUS}"
echo " Backing off and retrying(${CNT})..."
sleep 180
fi
done
So I expected these jobs to take a long time, I expected to see the occasional failure message in the log files (which I have) and to see rsync restart (which it has). Was not expecting rsync to just hang for 15 hours or more with no progress and no timeout error.
Is there a way to tell if rsync on the client is hung versus dealing with the dodgy network?
I set no timeout in the /etc/rsyncd.conf file. Should I and how do I determin a reasonable timeout setting?
I set rsync up to be available through xinetd, but don't always see the "rsync --daemon" process running. It restarts if I run rsync from the remote system. But shouldn't it be always running?
Any guidance or suggestions would be appreciated.

to tell the rsync client working status , with verbose option and keep a log file
change this line
rsync -avzAXP --append root#homesys01::tools /disk1/tools
to
rsync -avzAXP --append root#homesys01::tools /disk1/tools >>/tmp/rsync.log.`date +%F`
this would produce one log file per day under /tmp directory
then you can use tail -f command to trace the most recent log file ,
if it is rolling , it is working
see also
rsync - what means the f+++++++++ on rsync logs?
to understand more about the log

I thought I would post my final solution, in case it can help anyone else. I added --timeout 300 and --append-verify. The timeout eliminates the case of rsync getting hung indefinitely, the loop will restart it after the timeout. The append-verify is necessary to have it check any partial file it updated.
Note the following code is in a shell script and the output is redirected to a log file.
CNT=0
while [ 1 ]
do
rsync -avzAXP --append-verify --timeout 300 root#homesys01::tools /disk1/tools
STATUS=$?
if [ $STATUS -eq 0 ] ; then
echo "Successful completion of tools rsync."
exit 0
else
CNT=`expr ${CNT} + 1`
echo " Rsync of tools failure. Status returned: ${STATUS}"
echo " Backing off and retrying(${CNT})..."
sleep 180
fi
done

Related

rsync between a master and slave server to a remote repository storage

I am putting together an backup/archive mechanism using RSYNC.
my situation is that I have 2 servers, 1 is primary production, 2 is passive failover. I have rsync archiving running on a single system, the primary, it is working as expected. I also, have the same scripts/cron/, system on the failover server. however, at this time, I have the cron on failover commented out so it doesn't execute. I would like to not have to deal with switching over as a manual operation.
both servers rsync the exact same locations just on different servers.
I am worried that if I let the failover rsync run as scheduled on the failover box, that it will somehow destroy or corrupt the archives. I'm not sure my concern is justified. because, thinking, if the failover cron kicks off, runs the rsync archive scripts, and there are no files to be archived, nothing will happen, however, if failover servers has files to sync, these files will simply be written to the archive by rsync and builds on the incremental backup. so essentually it is safe. I'm just not sure. Should I create an env variable, to control the rsync archives or am I over thinking this?
Rsync is working as expected on each server separately. just don't know how it is to be implemented in this use case.
hourly and daily are taken, example hourly below:
set -x
SNAPSHOT=/SystemArchiveEndpoint/PassThru
EXCLUDES=/data1/backup/scripts/data/passthru_backup_exclude.txt
if [ -d $SNAPSHOT/processed-hourly.3 ]
then
rm -rf $SNAPSHOT/processed-hourly.3
fi
if [ -d $SNAPSHOT/processed-hourly.2 ]
then
mv $SNAPSHOT/processed-hourly.2 $SNAPSHOT_RW/processed-hourly.3
fi
if [ -d $SNAPSHOT/processed-hourly.1 ]
then
mv $SNAPSHOT/processed-hourly.1 $SNAPSHOT_RW/processed-hourly.2
fi
if [ -d $SNAPSHOT/processed-hourly.0 ]
then
cp -al $SNAPSHOT/processed-hourly.0 $SNAPSHOT_RW/processed-hourly.1
fi
rsync -va /app/PassThruMultiTenantMT1/latest/digital/ /SystemArchiveEndpoint/PassThru/processed-hourly.0
touch $SNAPSHOT/processed-hourly.0
exit 0

How would you make a shell script to monitor mounts and log issues?

I am looking for a good way monitor and log mounts on a CentOS 6.5 box. Since I am new to Linux shell scripting I am somewhat at a loss as to if there is something that is already around and proven which I could just plug in or is there a good method I should direct my research toward to build my own.
In the end what I am hoping to have running is a check of each of the 9 mounts on the server to confirm they are up and working. If there is an issue I would like to log the information to a file, possibly email out the info, and check the next mount. 5-10 minutes later I would like to run it again. I know that probably this isn't needed but we are trying to gather evidence if there is an issue or show to a vendor that what they are saying is the issue is not a problem.
This shell script will test each mountpoint and send mail to root if any of them is not mounted:
#!/bin/bash
while sleep 10m;
do
status=$(for mnt in /mnt/disk1 /mnt/disk2 /mnt/disk3; do mountpoint -q "$mnt" || echo "$mnt missing"; done)
[ "$status" ] && echo "$status" | mail root -s "Missing mount"
done
My intention here is not to give a complete turn-key solution but, instead, to give you a starting point for your research.
To make this fit your precise needs, you will need to learn about bash and shell scripts, cron jobs, and other of Unix's very useful tools.
How it works
#!/bin/bash
This announces that this is a bash script.
while sleep 10m; do
This repeats the commands in the loop once every 10 minutes.
status=$(for mnt in /mnt/disk1 /mnt/disk2 /mnt/disk3; do mountpoint -q "$mnt" || echo "$mnt missing"; done)
This cycles through mount points /mnt/disk1, /mnt/disk2, and /mnt/disk3 and tests that each one is mounted. If it isn't, a message is created and stored in the shell variable status.
You will want to replace /mnt/disk1 /mnt/disk2 /mnt/disk3 with your list of mount points, whatever they are.
This uses the command mountpoint which is standard on modern linux versions. It is part of the util-linux package. It might be missing on old installations.
[ "$status" ] && echo "$status" | mail root -s "Missing mount"
If status contains any messages, they will be mailed to root with the subject line Missing mount.
There are a few different versions of the mail command. You may need to adjust the argument list to work with the version on your system.
done
This marks the end of the while loop.
Notes
The above script uses a while loop that runs the tests every ten minutes. If you are familiar with the cron system, you may want to use that to run the commands every 10 minutes instead of the while loop.

How to check if file size is not incrementing ,if not then kill the $$ of script

I am trying to figure out a way to monitor the files I am dumping from my script. If there is no increment seen in child files then kill my script.
I am doing this to free up the resources when not needed. Here is what I think of , but I think my apporch is going to add burden to CPU. Can anyone please suggest more efficent way of doing this?
Below script is suppose to poll in every 15 sec and collect two file size of same file, if the size of the two samples are same then exit.
checkUsage() {
while true; do
sleep 15
fileSize=$(stat -c%s $1)
sleep 10;
fileSizeNew=$(stat -c%s $1)
if [ "$fileSize" == "$fileSizeNew" ]; then
echo -e "[Error]: No activity noted on this window from 5 sec. Exiting..."
kill -9 $$
fi
done
}
And I am planning to call it as follow (in background):
checkUsage /var/log/messages &
I can also get solution if, someone suggest how to monitor tail command and if nothing printing on tail then exit. NOT SURE WHY PEOPLE ARE CONFUSED. End goal of this question is to ,check if the some file is edited in last 15 seconds. If not exit or throw some error.
I have achived this by above script,but I don't know if this is the smartest way of achiveing this. I have asked this question to know views from other if there is any alternative way or better way of doing it.
I would based the check on file modification time instead of size, so something like this (untested code):
checkUsage() {
while true; do
# Test if file mtime is 'second arg' seconds older than date, default to 10 seconds
if [ $(( $(date +"%s") - $(stat -c%Y /var/log/message) )) -gt ${2-10} ]; then
echo -e "[Error]: No activity noted on this window from ${2-10} sec. Exiting..."
return 1
fi
#Sleep 'first arg' second, 15 seconds by default
sleep ${1-15}
done
}
The idea is to compare the file mtime with current time, if it's greater than second argument seconds, print the message and return.
And then I would call it like this later (or with no args to use defaults):
[ checkusage 20 10 ] || exit 1
Which would exit the script with code 1 as when the function return from it's infinite loop (as long as the file is modified)
Edit: reading me again, the target file could be a parameter too, to allow a better reuse of the function, left as an exercise to the reader.
If on Linux, in a local file system (Ext4, BTRFS, ...) -not a network file system- then you could consider inotify(7) facilities: something could be triggered when some file or directory changes or is accessed.
In particular, you might have some incron job thru incrontab(5) file; maybe it could communicate with some other job ...
PS. I am not sure to understand what you really want to do...
I suppose an external programme is modifying /var/log/messages.
If this is the case, below is my script (with minor changes to yours)
#Bash script to monitor changes to file
#!/bin/bash
checkUsage() # Note that U is in caps
{
while true
do
sleep 15
fileSize=$(stat -c%s $1)
sleep 10;
fileSizeNew=$(stat -c%s $1)
if [ "$fileSize" == "$fileSizeNew" ]
then
echo -e "[Notice : ] no changes noted in $1 : gracefully exiting"
exit # previously this was kill -9 $$
# changing this to exit would end the program gracefully.
# use kill -9 to kill a process which is not under your control.
# kill -9 sends the SIGKILL signal.
fi
done
}
checkUsage $1 # I have added this to your script
#End of the script
Save the script as checkusage and run it like :
./checkusage /var/log/messages &
Edit :
Since you're looking for better solutions I would suggest inotifywait, thanks for the suggestion from the other answerer.
Below would be my code :
while inotifywait -t 10 -q -e modify $1 >/dev/null
do
sleep 15 # as you said the polling would happen in 15 seconds.
done
echo "Script exited gracefully : $1 has not been changed"
Below are the details from the inotifywait manpage
-t <seconds>, --timeout <seconds> Exit if an appropriate event has not occurred within <seconds> seconds. If <seconds> is zero (the default),
wait indefinitely for an event.
-e <event>, --event <event> Listen for specific event(s) only. The events which can be listened for are listed in the EVENTS section.
This option can be specified more than once. If omitted, all events
are listened for.
-q, --quiet If specified once, the program will be less verbose. Specifically, it will not state when it has completed establishing all
inotify watches.
modify(Event) A watched file or a file within a watched directory was
written to.
Notes
You might have to install the inotify-tools first to make use of the inotifywait command. Check the inotify-tools page at Github.

Rsync cronjob that will only run if rsync isn't already running

I have checked for a solution here but cannot seem to find one. I am dealing with a very slow wan connection about 300kb/sec. For my downloads I am using a remote box, and then I am downloading them to my house. I am trying to run a cronjob that will rsync two directories on my remote and local server every hour. I got everything working but if there is a lot of data to transfer the rsyncs overlap and end up creating two instances of the same file thus duplicate data sent.
I want to instead call a script that would run my rsync command but only if rsync isn't running?
The problem with creating a "lock" file as suggested in a previous solution, is that the lock file might already exist if the script responsible to removing it terminates abnormally.
This could for example happen if the user terminates the rsync process, or due to a power outage. Instead one should use flock, which does not suffer from this problem.
As it happens flock is also easy to use, so the solution would simply look like this:
flock -n lock_file -c "rsync ..."
The command after the -c option is only executed if there is no other process locking on the lock_file. If the locking process for any reason terminates, the lock will be released on the lock_file. The -n options says that flock should be non-blocking, so if there is another processes locking the file nothing will happen.
Via the script you can create a "lock" file. If the file exists, the cronjob should skip the run ; else it should proceed. Once the script completes, it should delete the lock file.
if [ -e /home/myhomedir/rsyncjob.lock ]
then
echo "Rsync job already running...exiting"
exit
fi
touch /home/myhomedir/rsyncjob.lock
#your code in here
#delete lock file at end of your job
rm /home/myhomedir/rsyncjob.lock
To use the lock file example given by #User above, a trap should be used to verify that the lock file is removed when the script is exited for any reason.
if [ -e /home/myhomedir/rsyncjob.lock ]
then
echo "Rsync job already running...exiting"
exit
fi
touch /home/myhomedir/rsyncjob.lock
#delete lock file at end of your job
trap 'rm /home/myhomedir/rsyncjob.lock' EXIT
#your code in here
This way the lock file will be removed even if the script exits before the end of the script.
A simple solution without using a lock file is to just do this:
pgrep rsync > /dev/null || rsync -avz ...
This will work as long as it is the only rsync job you run on the server, and you can then run this directly in cron, but you will need to redirect the output to a log file.
If you do run multiple rsync jobs, you can get pgrep to match against the full command line with a pattern like this:
pgrep -f rsync.*/data > /dev/null || rsync -avz --delete /data/ otherhost:/data/
pgrep -f rsync.*/www > /dev/null || rsync -avz --delete /var/www/ otherhost:/var/www/
As a definite solution kill rsync processes before new one starts in crontab.

how to re-run the "curl" command automatically when the error occurs

Sometimes when I execute a bash script with the curl command to upload some files to my ftp server, it will return some error like:
56 response reading failed
and I have to find the wrong line and re-run them manually and it will be OK.
I'm wondering if that could be re-run automatically when the error occurs.
My scripts is like this:
#there are some files(A,B,C,D,E) in my to_upload directory,
# which I'm trying to upload to my ftp server with curl command
for files in `ls` ;
do curl -T $files ftp.myserver.com --user ID:pw ;
done
But sometimes A,B,C, would be uploaded successfully, only D were left with an "error 56", so I have to rerun curl command manually. Besides, as Will Bickford said, I prefer that no confirmation will be required, because I'm always asleep at the time the script is running. :)
Here's a bash snippet I use to perform exponential back-off:
# Retries a command a configurable number of times with backoff.
#
# The retry count is given by ATTEMPTS (default 5), the initial backoff
# timeout is given by TIMEOUT in seconds (default 1.)
#
# Successive backoffs double the timeout.
function with_backoff {
local max_attempts=${ATTEMPTS-5}
local timeout=${TIMEOUT-1}
local attempt=1
local exitCode=0
while (( $attempt < $max_attempts ))
do
if "$#"
then
return 0
else
exitCode=$?
fi
echo "Failure! Retrying in $timeout.." 1>&2
sleep $timeout
attempt=$(( attempt + 1 ))
timeout=$(( timeout * 2 ))
done
if [[ $exitCode != 0 ]]
then
echo "You've failed me for the last time! ($#)" 1>&2
fi
return $exitCode
}
Then use it in conjunction with any command that properly sets a failing exit code:
with_backoff curl 'http://monkeyfeathers.example.com/'
Perhaps this will help. It will try the command, and if it fails, it will tell you and pause, giving you a chance to fix run-my-script.
COMMAND=./run-my-script.sh
until $COMMAND; do
read -p "command failed, fix and hit enter to try again."
done
I have faced a similar problem where I need to make contact with servers using curl that are in the process of starting up and haven't started up yet, or services that are temporarily unavailable for whatever reason. The scripting was getting out of hand, so I made a dedicated retry tool that will retry a command until it succeeds:
#there are some files(A,B,C,D,E) in my to_upload directory,
# which I'm trying to upload to my ftp server with curl command
for files in `ls` ;
do retry curl -f -T $files ftp.myserver.com --user ID:pw ;
done
The curl command has the -f option, which returns code 22 if the curl fails for whatever reason.
The retry tool will by default run the curl command over and over forever until the command returns status zero, backing off for 10 seconds between retries. In addition retry will read from stdin once and once only, and writes to stdout once and once only, and writes all stdout to stderr if the command fails.
Retry is available from here: https://github.com/minfrin/retry

Resources