Rsync cronjob that will only run if rsync isn't already running - cron

I have checked for a solution here but cannot seem to find one. I am dealing with a very slow wan connection about 300kb/sec. For my downloads I am using a remote box, and then I am downloading them to my house. I am trying to run a cronjob that will rsync two directories on my remote and local server every hour. I got everything working but if there is a lot of data to transfer the rsyncs overlap and end up creating two instances of the same file thus duplicate data sent.
I want to instead call a script that would run my rsync command but only if rsync isn't running?

The problem with creating a "lock" file as suggested in a previous solution, is that the lock file might already exist if the script responsible to removing it terminates abnormally.
This could for example happen if the user terminates the rsync process, or due to a power outage. Instead one should use flock, which does not suffer from this problem.
As it happens flock is also easy to use, so the solution would simply look like this:
flock -n lock_file -c "rsync ..."
The command after the -c option is only executed if there is no other process locking on the lock_file. If the locking process for any reason terminates, the lock will be released on the lock_file. The -n options says that flock should be non-blocking, so if there is another processes locking the file nothing will happen.

Via the script you can create a "lock" file. If the file exists, the cronjob should skip the run ; else it should proceed. Once the script completes, it should delete the lock file.
if [ -e /home/myhomedir/rsyncjob.lock ]
then
echo "Rsync job already running...exiting"
exit
fi
touch /home/myhomedir/rsyncjob.lock
#your code in here
#delete lock file at end of your job
rm /home/myhomedir/rsyncjob.lock

To use the lock file example given by #User above, a trap should be used to verify that the lock file is removed when the script is exited for any reason.
if [ -e /home/myhomedir/rsyncjob.lock ]
then
echo "Rsync job already running...exiting"
exit
fi
touch /home/myhomedir/rsyncjob.lock
#delete lock file at end of your job
trap 'rm /home/myhomedir/rsyncjob.lock' EXIT
#your code in here
This way the lock file will be removed even if the script exits before the end of the script.

A simple solution without using a lock file is to just do this:
pgrep rsync > /dev/null || rsync -avz ...
This will work as long as it is the only rsync job you run on the server, and you can then run this directly in cron, but you will need to redirect the output to a log file.
If you do run multiple rsync jobs, you can get pgrep to match against the full command line with a pattern like this:
pgrep -f rsync.*/data > /dev/null || rsync -avz --delete /data/ otherhost:/data/
pgrep -f rsync.*/www > /dev/null || rsync -avz --delete /var/www/ otherhost:/var/www/

As a definite solution kill rsync processes before new one starts in crontab.

Related

BASH: simultaneous execution of a multiloop function without waiting

Usecase:
need to transfer binary files (1Gb) to array of IPs and start executing them upon arrival to their destinations without waiting all binaries to be transferred/executed. Sort of parallel mode.
Situation:
I have 2 functions - transfer and execution (depending on approach it can be shortened to 1 with 2 loops).
for N in "${NODES[#]}"; do
rsync -Pcz -e "ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null" --timeout=10 $FILE user#$N
done
and
for N in "${NODES[#]}"; do
ssh user#$N "cd ~/; ./exec.sh"
done
The point is that in this case i have to wait till all transfers finish first (and there sometimes can be tens of addresses)and just afterwards start the execution.
If i combine the loops into a single one, i have to wait again - this time for transfer+execution per node.
Expectation:
I'd like to transfer a file to the first node, start its execution, and switch to the second node with the same process, and so on. So timing would count for the transfers only, whereas each node executes the file on its own in parallel.
Obstacles:
1- need to be able to have an execution output from each node
2- additional packages, like screen are not an option.
What did i try:
i was thinking about injecting some script to the remote nodes via the loop to control the execution from there. But i'm sure there must be some less barbaric option.
What can be done here?
You should be able to use a single loop, and run the ssh command with a & suffix, which runs it in the background (i.e. without waiting for it to finish), and then after the loop use wait to wait for all of them to finish. Collecting output will be more interesting... I think you'll need to collect each run's output into a file, and then print the files at the end. Something like this (note that I have not tested this properly):
tmpdir="$(mktemp -qd -t "$(basename "$0")")" || {
echo "Error creating temporary directory" >&2
exit 1
}
for nodenum in "${!NODES[#]}"; do
# The ${!array[#]} idiom gets a list of array *indexes*, not elements; get the element by index:
N=${NODES[nodenum]}
# Copy file, and wait for copy to finish:
rsync -Pcz -e "ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null" --timeout=10 $FILE user#$N
# Start the script, and *don't* wait for it to finish:
ssh user#$N "cd ~/ sh exec.sh" >"$tmpdir/$nodenum.out" 2>&1 &
done
# Wait for all of the scripts to finish
wait
# Print all of the outputs (in order)
for nodenum in "${!NODES[#]}"; do
echo
echo "Output from ${NODES[nodenum]}:"
cat "$tmpdir/$nodenum.out"
done
# Clean up the temp directory
rm -R "$tmpdir"
BTW, the remote command "cd ~/ sh exec.sh" doesn't make sense. Is there supposed to be a semicolon in there? Also, I recommend using lower or mixed-case variable names to avoid conflicts with the many all-caps variables that have some sort of special meaning, and putting double-quotes around variable references (i.e. rsync ... "$FILE" "user#$N" instead of rsync ... $FILE user#$N).
EDIT: this assumes you want to start the script on each host as soon as that particular copy is done; if you want to wait until all copies are done, then fire all scripts at once, use two loops: one to do the copies, then a second that does the ssh commands in the background (collecting output as above), then wait for those to all finish, then print all of the outputs.
You could do the transfer and script as a single background task, so that the script on a particular host starts as soon as its transfer is complete
for N in "${NODES[#]}"; do
(rsync -Pcz -e "ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null" --timeout=10 $FILE user#$N
ssh user#$N "cd ~/; ./exec.sh") > ${N}.log 2>&1 &
done
You then collect all of the hostname.log files

inotifywait shell script run as daemon

I have a script that watches a directory (recursively) and performs a command when a file changes. This is working correctly when the monitoring flag is used as below:
#!/bin/sh
inotifywait -m -r /path/to/directory |
while read path action file; do
if [ <perform a check> ]
then
my_command
fi
done
However, I want to run this on startup and in the background, so naïvely thought I could change the -m flag to -d (run inotifywait as daemon, and include an --outfile location) and then add this to rc.local to have this run at startup. Where am I going wrong?
Well .... with -d it backgrounds itself and outputs ONLY to outfile, so your whole pipe & loop construct is moot, and it never sees any data.
Incron is a cron-like daemon for inotify events.
Just need to use incrontab and an entry for your task:
/path/to/directory IN_ALL_EVENTS /usr/local/bin/my-script $# $# $%
And /local/bin/my-script would be:
#! /bin/bash
local path=$1
local action=$2
local file=$3
if [ <perform a check> ]
then
my_command
fi
You need to add a single & to the end of command in your /etc/rc.local
Putting a single & at the end of a command means Run this program in the background so the user can still have input.

inotifywait misses events while script is running

I am running an inotify wait script that triggers a bash script to call a function to synchronize my database of files whenever files are modified, created, or deleted.
#!/bin/sh
while inotifywait -r -e modify -e create -e delete /var/my/path/Documents; do
cd /var/scripts/
./sync.sh
done
This actually works quite well except that during the 10 seconds it takes my sync script to run the watch doesn't pickup any additional changes. There are instances where the sync has already looked at a directory and an additional change occurs that isn't detected by inotifywait because it hasn't re-established watches.
Is there any way for the inofitywait to trigger the script and still maintain the watch?
Use the -m option so that it runs continuously, instead of exiting after each event.
inotifywait -q -m -r -e modify -e create -e delete /var/my/path/Documents | \
while read event; do
cd /var/scripts
./sync.sh
done
This would actually have the opposite problem: if multiple changes occur while the sync script is running, it will run it again that many times. You might want to put something in the sync.sh script that prevents it from running again if it has run too recently.

Simple bash script runs asynchronously when run as a cron job

I have a backup script written that will do the following in this order:
Zip up files via SSH on a remote backup server
Dump my local database
Transfer my local database via SSH rsync to the backup server
Now when I run this script from the command line in RHEL it works a-ok perfectly fine.
BUT when I set this script to run via a cronjob, the script does run, but from what I can tell, it's somehow running those above 3 commands simultaneously. Because of that, things are getting done out of order (my local database is completed dumping and transferred before the #1 zip job is actually complete).
Has anyone run across such a strange scenario? As the most simple fix, is there a way to force a script to run synchronously? Maybe add some kind of command to wait for the prior line to complete before moving on?
EDIT I added a example version of my backup script. It seems that the second line of my script runs at the same time as the first line of my script, so while the SSH command has been issued, it has not yet completed before my second line triggers and an SQL dump begins.
#!/bin/bash
THEDIR="sample"
THEDBNAME="mydatabase"
ssh -i /rsync/mirror-rsync-key sample#sample.com "tar zcvpf /$THEDIR/old-1.tar /$THEDIR/public_html/*"
mysqldump --opt -Q $THEDBNAME > mySampleDb
/usr/bin/rsync -avz --delete --exclude=**/stats --exclude=**/error -e "ssh -i /rsync/mirror-rsync-key" /$THEDIR/public_html/ sample#sample.com:/$THEDIR/public_html/
/usr/bin/rsync -avz --delete --exclude=**/stats --exclude=**/error -e "ssh -i /rsync/mirror-rsync-key" /$THEDIR/ sample#sample.com:/$THEDIR/
Unless you're explicitly using backgrounding (&) everything should run one-by-one, waiting until the prior finishes.
Perhaps you are actually seeing overlapping prior executions by cron? If so, you can prevent multi-execution by calling your script with flock
e.g. midnight cron entry from
0 0 * * * backup.sh
to
0 0 * * * flock -n /tmp/backup.lock -c backup.sh
If you want to run commands in a sequential order you can use ; operator.
; – semicolon operator
This operator Run multiple commands in one go, but in a sequential order. If we take three commands separated by semicolon, second command will run after first command completion, third command will run only after second command execution completes. One point we should know is that to run second command, it do not depend on first command exit status.
Execute ls, pwd, whoami commands in one line sequentially one after the other.
ls;pwd;whoami
Please correct me if i am not understanding your question correctly.

How can I check a file exists and execute a command if not?

I have a daemon I have written using Python. When it is running, it has a PID file located at /tmp/filename.pid. If the daemon isn't running then PID file doesn't exist.
On Linux, how can I check to ensure that the PID file exists and if not, execute a command to restart it?
The command would be
python daemon.py restart
which has to be executed from a specific directory.
[ -f /tmp/filename.pid ] || python daemon.py restart
-f checks if the given path exists and is a regular file (just -e checks if the path exists)
the [] perform the test and returns 0 on success, 1 otherwise
the || is a C-like or, so if the command on the left fails, execute the command on the right.
So the final statement says, if /tmp/filename.pid does NOT exist then start the daemon.
test -f filename && daemon.py restart || echo "File doesn't exists"
If it is bash scripting you are wondering about, something like this would work:
if [ ! -f "$FILENAME" ]; then
python daemon.py restart
fi
A better option may be to look into lockfile
The other answers are fine for detecting the existence of the file. However for a complete solution you probably should check that the PID in the pidfile is still running, and that it's your program.
Another approach to solving the problem is a script that ensures that your daemon "stays" alive...
Something like this (note: signal handling should be added for proper startup/shutdown):
$PIDFILE = "/path/to/pidfile"
if [ -f "$PIDFILE" ]; then
echo "Pid file exists!"
exit 1
fi
while true; do
# Write it's own pid file
python your-server.py ;
# force removal of pid in case of unexpected death.
rm -f $PIDFILE;
# sleep for 2 seconds
sleep 2;
done
In this way, the server will stay alive even if it dies unexpectedly.
You can also use a ready solution like Monit.
ls /tmp/filename.pid
It returns true if file exists. Returns false if file does not exist.

Resources