rsync between a master and slave server to a remote repository storage - linux

I am putting together an backup/archive mechanism using RSYNC.
my situation is that I have 2 servers, 1 is primary production, 2 is passive failover. I have rsync archiving running on a single system, the primary, it is working as expected. I also, have the same scripts/cron/, system on the failover server. however, at this time, I have the cron on failover commented out so it doesn't execute. I would like to not have to deal with switching over as a manual operation.
both servers rsync the exact same locations just on different servers.
I am worried that if I let the failover rsync run as scheduled on the failover box, that it will somehow destroy or corrupt the archives. I'm not sure my concern is justified. because, thinking, if the failover cron kicks off, runs the rsync archive scripts, and there are no files to be archived, nothing will happen, however, if failover servers has files to sync, these files will simply be written to the archive by rsync and builds on the incremental backup. so essentually it is safe. I'm just not sure. Should I create an env variable, to control the rsync archives or am I over thinking this?
Rsync is working as expected on each server separately. just don't know how it is to be implemented in this use case.
hourly and daily are taken, example hourly below:
set -x
SNAPSHOT=/SystemArchiveEndpoint/PassThru
EXCLUDES=/data1/backup/scripts/data/passthru_backup_exclude.txt
if [ -d $SNAPSHOT/processed-hourly.3 ]
then
rm -rf $SNAPSHOT/processed-hourly.3
fi
if [ -d $SNAPSHOT/processed-hourly.2 ]
then
mv $SNAPSHOT/processed-hourly.2 $SNAPSHOT_RW/processed-hourly.3
fi
if [ -d $SNAPSHOT/processed-hourly.1 ]
then
mv $SNAPSHOT/processed-hourly.1 $SNAPSHOT_RW/processed-hourly.2
fi
if [ -d $SNAPSHOT/processed-hourly.0 ]
then
cp -al $SNAPSHOT/processed-hourly.0 $SNAPSHOT_RW/processed-hourly.1
fi
rsync -va /app/PassThruMultiTenantMT1/latest/digital/ /SystemArchiveEndpoint/PassThru/processed-hourly.0
touch $SNAPSHOT/processed-hourly.0
exit 0

Related

Rsync files across a dodgy network link - hangs instead of timeout

I am trying to keep 3 large directories (9G, 400G, 800G) in sync between our home site and another in a land far, far away across a network link that is a bit dodgy (slow and drops occasionally). Data was copied onto disks prior to installation so the rsync only needs to send updates.
The problem I'm having is the rsync hangs for hours on the client side.
The smaller 9G job completed, the 400G job has been in limbo for 15 hours - no output to the log file in that time, but has not timed out.
What I've done to setup for this (after reading many forum articles about rsync/rsync server/partial since I am not really a system admin)
I setup rsync server (/etc/rsyncd.conf) on our home system, entred it into xinetd and wrote a script to run rsync on the distant server, it loops if rsync fails in an attempt to deal with the dodgy network. The rsync command in the script looks like this:
rsync -avzAXP --append root#homesys01::tools /disk1/tools
Note the "-P" option is equivalent to "--progress --partial"
I can see in the log file that rsync did fail at one point and the loop restarted rsync, data was transferred after that based on entries in the log file, but the last update to the log file was 15 hours ago, and the rsync process on the client is still running.
CNT=0
while [ 1 ]
do
rsync -avzAXP --append root#homesys01::tools /disk1/tools
STATUS=$?
if [ $STATUS -eq 0 ] ; then
echo "Successful completion of tools rsync."
exit 0
else
CNT=`expr ${CNT} + 1`
echo " Rsync of tools failure. Status returned: ${STATUS}"
echo " Backing off and retrying(${CNT})..."
sleep 180
fi
done
So I expected these jobs to take a long time, I expected to see the occasional failure message in the log files (which I have) and to see rsync restart (which it has). Was not expecting rsync to just hang for 15 hours or more with no progress and no timeout error.
Is there a way to tell if rsync on the client is hung versus dealing with the dodgy network?
I set no timeout in the /etc/rsyncd.conf file. Should I and how do I determin a reasonable timeout setting?
I set rsync up to be available through xinetd, but don't always see the "rsync --daemon" process running. It restarts if I run rsync from the remote system. But shouldn't it be always running?
Any guidance or suggestions would be appreciated.
to tell the rsync client working status , with verbose option and keep a log file
change this line
rsync -avzAXP --append root#homesys01::tools /disk1/tools
to
rsync -avzAXP --append root#homesys01::tools /disk1/tools >>/tmp/rsync.log.`date +%F`
this would produce one log file per day under /tmp directory
then you can use tail -f command to trace the most recent log file ,
if it is rolling , it is working
see also
rsync - what means the f+++++++++ on rsync logs?
to understand more about the log
I thought I would post my final solution, in case it can help anyone else. I added --timeout 300 and --append-verify. The timeout eliminates the case of rsync getting hung indefinitely, the loop will restart it after the timeout. The append-verify is necessary to have it check any partial file it updated.
Note the following code is in a shell script and the output is redirected to a log file.
CNT=0
while [ 1 ]
do
rsync -avzAXP --append-verify --timeout 300 root#homesys01::tools /disk1/tools
STATUS=$?
if [ $STATUS -eq 0 ] ; then
echo "Successful completion of tools rsync."
exit 0
else
CNT=`expr ${CNT} + 1`
echo " Rsync of tools failure. Status returned: ${STATUS}"
echo " Backing off and retrying(${CNT})..."
sleep 180
fi
done

How would you make a shell script to monitor mounts and log issues?

I am looking for a good way monitor and log mounts on a CentOS 6.5 box. Since I am new to Linux shell scripting I am somewhat at a loss as to if there is something that is already around and proven which I could just plug in or is there a good method I should direct my research toward to build my own.
In the end what I am hoping to have running is a check of each of the 9 mounts on the server to confirm they are up and working. If there is an issue I would like to log the information to a file, possibly email out the info, and check the next mount. 5-10 minutes later I would like to run it again. I know that probably this isn't needed but we are trying to gather evidence if there is an issue or show to a vendor that what they are saying is the issue is not a problem.
This shell script will test each mountpoint and send mail to root if any of them is not mounted:
#!/bin/bash
while sleep 10m;
do
status=$(for mnt in /mnt/disk1 /mnt/disk2 /mnt/disk3; do mountpoint -q "$mnt" || echo "$mnt missing"; done)
[ "$status" ] && echo "$status" | mail root -s "Missing mount"
done
My intention here is not to give a complete turn-key solution but, instead, to give you a starting point for your research.
To make this fit your precise needs, you will need to learn about bash and shell scripts, cron jobs, and other of Unix's very useful tools.
How it works
#!/bin/bash
This announces that this is a bash script.
while sleep 10m; do
This repeats the commands in the loop once every 10 minutes.
status=$(for mnt in /mnt/disk1 /mnt/disk2 /mnt/disk3; do mountpoint -q "$mnt" || echo "$mnt missing"; done)
This cycles through mount points /mnt/disk1, /mnt/disk2, and /mnt/disk3 and tests that each one is mounted. If it isn't, a message is created and stored in the shell variable status.
You will want to replace /mnt/disk1 /mnt/disk2 /mnt/disk3 with your list of mount points, whatever they are.
This uses the command mountpoint which is standard on modern linux versions. It is part of the util-linux package. It might be missing on old installations.
[ "$status" ] && echo "$status" | mail root -s "Missing mount"
If status contains any messages, they will be mailed to root with the subject line Missing mount.
There are a few different versions of the mail command. You may need to adjust the argument list to work with the version on your system.
done
This marks the end of the while loop.
Notes
The above script uses a while loop that runs the tests every ten minutes. If you are familiar with the cron system, you may want to use that to run the commands every 10 minutes instead of the while loop.

split scp of backup files to different smb shares based on date

I backup files to Tar files once a day and grab from our Ubuntu servers using a backup shell script and put them in a share. We only have 5TB shares but can have several.
At the moment we need more as we backup 30 days worth of Tar files.
I need a method where the first 10 days go to share one, next ten to share tow, next 11 to share three
Currently each Server VM runs the following script to backup and tar folders and place then in another folder ready to be grabbed by the backup server
!/bin/bash
appname=myapp.com
dbname=mydb
dbuser=myDBuser
dbpass=MyDBpass
datestamp=`date +%d%m%y`
rm -f /var/mybackupTars/* > /dev/null 2>&1
mysqldump -u$dbuser -p$dbpass $dbname > /var/mybackups/$dbname-$datestamp.sql && gzip /var/mybackupss/$dbname-$datestamp.sql
tar -zcf /var/mybackups/myapp-$datestamp.tar.gz /var/www/myapp > /dev/null 2>&1
tar -zcf /var/mydirectory/myapp-$datestamp.tar.gz /var/www/html/myapp > /dev/null 2>&1
I then grab the backups using a script on the backup server and put them in a share
#!/bin/bash
#
# Generate a list of myapps to grab
df|grep myappbackups|awk -F/ '{ print $NF }'>/tmp/myapplistlistsmb
# Get each app in turn
for APPNAME in `cat /tmp/applistsmb`
do
cd /srv/myappbackups/$APPNAME
scp $APPNAME:* .
done
I know this is a tough one but I really need 3 shares with ten days worth in each share
I do not anticipate the backup script changing on each server VM that backs up to itself
Only maybe the grabber script that puts the dated backups in the share on the backup server
Or am I wrong??
Any help would be great

How to write a script for backup using bacula?

I am very new to this shell scripting and bacula. I want to create a script that schedules the backup using bacula?
How to do that?
Any lead is appreciated?
Thanks.
If you are going to administer your own Linux system, learn bash. The man page is really quite detailed and useful. Do man bash.
If you are really new to Linux and command-lines, administering bacula is not for newbies. It is a fairly comprehensive backup system, for multiple systems, with a central database, which means that is is also complex.
There are much simpler tools available on Linux to perform simple system backups, which are just as reliable. If you just want to backup you home directory, tar or zip are excellent tools. In particular, tar can do both full backups and incremental backups.
Assuming that you really want to use bacula and have enough information to write a couple of simple scripts, then even so, the original request is ambiguous.
Do you mean schedule a periodic cron job to accomplish backups unattended? Or, do you mean to schedule an single invocation of bacula at a determined time and date?
In either case, it's a good idea to create two simple scripts: one to perform a full backup, and one to perform an incremental backup. The full backup should be run, say, once a week or once a month, and the incremental backup should be run every day, or once a week -- depending on how often your system data changes.
Most modest sites undergoing daily usage would have a daily incremental backup with a full backup on the weekends (say, Sunday). This way, if the system crashed on, say, Friday, you would need to recover with the most recent full backup (on the previous Sunday), and then recover with each day's incremental backup (Mon, Tue, Wed, Thu). You would probably lose data changes that had occurred on the day of the crash.
If the rate of data change was hourly, and recovery at an hourly rate was important, then the incrementals should be scheduled for each hour, with full backups each night.
An important consideration is knowing what, exactly, is to be backed up. Most home users want their home directory to be recoverable. The OS root and application partitions are often easily recoverable without backups. Alternatively, these are backed up on a very infrequent schedule (say once a month or so), since they change must less frequently than the user's home director.
Another important consideration is where to put the backups. Bacula supports external storage devices, such as tapes, which are not mounted filesystems. tar also supports tape archives. Most home users have some kind of USB or network-attached storage that is used to store backups.
Let's assume that the backups are to be stored on /mnt/backups/, and let's assume that the user's home directory (and subdirectories) are all to be backed up and made recoverable.
% cat <<EOF >/usr/local/bin/full-backup
#!/bin/bash
# full-backup SRCDIRS [--options]
# incr-backup SRCDIRS [--options]
#
# set destdir to the path at which the backups will be stored
# each backup will be stored in a directory of the date of the
# archive, grouped by month. The directories will be:
#
# /mnt/backups/2014/01
# /mnt/backups/2014/02
# ...
# the full and incremental files will be named this way:
#
# /mnt/backups/2014/01/DIR-full-2014-01-24.192832.tgz
# /mnt/backups/2014/01/DIR-incr-2014-01-25.192531.tgz
# ...
# where DIR is the name of the source directory.
#
# There is also a file named ``lastrun`` which is used for
# its last mod-time which is used to select files changed
# since the last backup.
$PROG=${0##*/} # prog name: full-backup or incr-backup
destdir=/mnt/backup
now=`date +"%F-%H%M%S"`
monthdir=`date +%Y-%m`
dest=$destdir/$monthdir/
set -- "$#"
while (( $# > 0 )) ; do
dir="$1" ; shift ;
options='' # collect options
while [[ $# -gt 0 && "x$1" =~ x--* ]]; do # any options?
options="$options $1"
shift
done
basedir=`basename $dir`
fullfile=$dest/$basedir-full-$now.tgz
incrfile=$dest/$basedir-incr-$now.tgz
lastrun=$destdir/lastrun
case "$PROG" in
full*) archive="$fullfile" newer= kind=Full ;;
incr*) archive="$incrfile" newer="--newer $lastrun" kind=Incremental ;;
esac
cmd="tar cfz $archive $newer $options $dir"
echo "$kind backup starting at `date`"
echo ">> $cmd"
eval "$cmd"
echo "$kind backup done at `date`"
touch $lastrun # mark the end of the backup date/time
exit
EOF
(cd /usr/local/bin ; ln -s full-backup incr-backup )
chmod +x /usr/local/bin/full-backup
Once this script is configured and available, it can be scheduled with cron. See man cron. Use cron -e to create and edit a crontab entry to invoke full-backup once a week (say), and another crontab entry to invoke incr-backup once a day. The following are three sample crontab entries (see man 5 crontab for details on syntax) for performing incremental and full backups, as well as removing old archives.
# run incremental backups on all user home dirs at 3:15 every day
15 3 * * * /usr/local/bin/incr-backup /Users
# run full backups every sunday, at 3:15
15 3 * * 7 /usr/local/bin/full-backup /Users
# run full backups on the entire system (but not the home dirs) every month
30 4 * 1 7 /usr/local/bin/full-backup / --exclude=/Users --exclude=/tmp --exclude=/var
# delete old backup files (more than 60 days old) once a month
15 3 * 1 7 find /mnt/backups -type f -mtime +60 -delete
Recovering from these backups is an exercise left for later.
Good luck.
I don't think it gives meaning to have a cron scheduled script to activate Bacula.
The standard way to schedule backup using bacula is :
1) Install the Bacula file daemon on the machine you want to backup and then
2) Configure your Bacula Directory to schedule the backup
ad 1)
If your machine to backup is Debian or Ubuntu, you can install the Bacula file daemon from the shell like this:
shell> apt-get install bacula-fd (bacula-fd stands for Bacula File Daemon)
If your machine to backup is Windows, then you need to download the Bacula file daemon and install it. You can download here : http://sourceforge.net/projects/bacula/files/Win32_64/ (select the version that match your Bacula server version)
ad 2)
You need to find the bacula-dir.conf file on your Bacula server (if you installed Bacula Director on a Ubuntu machine, then the path is : /etc/bacula/bacula-dir.conf)
The bacula-dir.conf schedule section is very flexible and therefore also somewhat complicated, here is an example :
Schedule {
Name = "MonthlyCycle"
Run = Level=Full on 1 at 2:05 # full backup the 1. of every month at 2:05.
Run = Level=Incremental on 2-31 at 2:05 # incremental backup all other days.
}
Note that there are a lot more configuration necessary to run Bacula, here is a full tutorial how to install, configure, backup and restore Bacula : http://webmodelling.com/webbits/miscellaneous/bacula.aspx (disclaimer : I wrote the Bacula tutorial myself)

Rsync cronjob that will only run if rsync isn't already running

I have checked for a solution here but cannot seem to find one. I am dealing with a very slow wan connection about 300kb/sec. For my downloads I am using a remote box, and then I am downloading them to my house. I am trying to run a cronjob that will rsync two directories on my remote and local server every hour. I got everything working but if there is a lot of data to transfer the rsyncs overlap and end up creating two instances of the same file thus duplicate data sent.
I want to instead call a script that would run my rsync command but only if rsync isn't running?
The problem with creating a "lock" file as suggested in a previous solution, is that the lock file might already exist if the script responsible to removing it terminates abnormally.
This could for example happen if the user terminates the rsync process, or due to a power outage. Instead one should use flock, which does not suffer from this problem.
As it happens flock is also easy to use, so the solution would simply look like this:
flock -n lock_file -c "rsync ..."
The command after the -c option is only executed if there is no other process locking on the lock_file. If the locking process for any reason terminates, the lock will be released on the lock_file. The -n options says that flock should be non-blocking, so if there is another processes locking the file nothing will happen.
Via the script you can create a "lock" file. If the file exists, the cronjob should skip the run ; else it should proceed. Once the script completes, it should delete the lock file.
if [ -e /home/myhomedir/rsyncjob.lock ]
then
echo "Rsync job already running...exiting"
exit
fi
touch /home/myhomedir/rsyncjob.lock
#your code in here
#delete lock file at end of your job
rm /home/myhomedir/rsyncjob.lock
To use the lock file example given by #User above, a trap should be used to verify that the lock file is removed when the script is exited for any reason.
if [ -e /home/myhomedir/rsyncjob.lock ]
then
echo "Rsync job already running...exiting"
exit
fi
touch /home/myhomedir/rsyncjob.lock
#delete lock file at end of your job
trap 'rm /home/myhomedir/rsyncjob.lock' EXIT
#your code in here
This way the lock file will be removed even if the script exits before the end of the script.
A simple solution without using a lock file is to just do this:
pgrep rsync > /dev/null || rsync -avz ...
This will work as long as it is the only rsync job you run on the server, and you can then run this directly in cron, but you will need to redirect the output to a log file.
If you do run multiple rsync jobs, you can get pgrep to match against the full command line with a pattern like this:
pgrep -f rsync.*/data > /dev/null || rsync -avz --delete /data/ otherhost:/data/
pgrep -f rsync.*/www > /dev/null || rsync -avz --delete /var/www/ otherhost:/var/www/
As a definite solution kill rsync processes before new one starts in crontab.

Resources