Delaying not preventing Bash function from simultaneous execution - linux

I need to block the simultaneous calling of highCpuFunction function. I have tried to create a blocking mechanism, but it is not working. How can I do this?
nameOftheScript="$(basename $0)"
pidOftheScript="$$"
highCpuFunction()
{
# Function with code causing high CPU usage. Like tar, zip, etc.
while [ -f /tmp/"$nameOftheScript"* ];
do
sleep 5;
done
touch /tmp/"$nameOftheScript"_"$pidOftheScript"
echo "$(date +%s) I am a Bad function you do not want to call me simultaniously..."
# Real high CPU usage code for reaching the database and
# parsing logs. It takes the heck out of the CPU.
rm -rf /tmp/"$nameOftheScript"_"$pidOftheScript" 2>/dev/null
}
while true
do
sleep 2
highCpuFunction
done
# The rest of the code...
In short, I want to run highCpuFunction at least with a gap of 5 seconds. Regardless of the instance/user/terminal. I need to allow other users to run this function but in proper sequence and with a gap of at least 5 seconds.

Use the flock tool. Consider this code (let's call it 'onlyoneofme.sh'):
#!/bin/sh
exec 9>/var/lock/myexclusivelock
flock 9
echo start
sleep 10
echo stop
It will open file /var/lock/myexclusivelock as descriptor 9 and then try to lock it exclusively. Only one instance of the script will be allowed to pass behind the flock 9 command. The rest of them will wait for the other script to finish (so the descriptor will be closed and the lock freed). After this, the next script will acquire the lock and execute, and so on.

In the following solution the # rest of the script part can be executed only by one process. The test and set is atomic, and there isn't any race condition, whereas test -f file .. touch file, two processes can touch the file.
try_acquire_lock() {
local lock_file=$1
# Noclobber option to fail if the file already exists
# in a sub-shell to avoid modifying current shell options
( set -o noclobber; : >"$lock_file")
}
# Trap to remove the file when the process exits
trap 'rm "$lock_file"' EXIT
lock_file=/tmp/"$nameOftheScript"_"$pidOftheScript"
while ! try_acquire_lock "$lock_file";
do
echo "failed to acquire lock, sleeping 5sec.."
sleep 5;
done
# The rest of the script
It's not optimal, because the wait is done in a loop with sleep. To improve, one can use inter process communication (FIFO), or operating system notifications or signals.
# Block current shell process
kill -STOP $BASHPID
# Unblock blocked shell process (where <pid> is the id of the blocked process)
kill -CONT <pid>

Related

Linux command execution rate limiting

I have a Linux command that can be called by another application multiple times (in quick succession) with different parameters. The problem is, if the command gets executed in too quick of succession, the function that it performs will not work properly.
What I’m looking for is some simple way to ensure that each call to the command will be properly delayed/spaced (by a couple milliseconds) from each other.
Order of execution does not matter in this case and I have no control over how the application makes the calls.
Edit: The command being called is used to transmit an RF signal on a Raspberry Pi. As such, the command execution must be exclusive (no concurrency) with an additional delay between executions to prevent the receivers from misreading the signals.
For anyone with the same problem, this worked for me: https://unix.stackexchange.com/questions/408934/how-to-serialize-command-execution-on-linux
CMD="<some command> && sleep <some delay in seconds>"
flock /tmp/some_lockfile $CMD
For a simple concurrency control, which will limit concurrent execution to instances, consider the following while loop (modify as needed).
Note that the script must be invoked as /path/to/script.sh so that it will find other instances. Starting with 'bash /path/to/script.sh' will require changes!
#! /bin/bash
# Process identifier.
echo "START $$"
ME=${0##*/}
# Max number of instances
N=5
# Sleep while there are more than N instances.
while [[ "$(pgrep -c -x $ME)" -gt "$N" ]] ; do echo Waiting ... ; sleep 1 ; done
# Execute the job
sleep "$#"
echo "Done $$"

nohup node service using cron job on CentOS 7 [duplicate]

I have a python script that'll be checking a queue and performing an action on each item:
# checkqueue.py
while True:
check_queue()
do_something()
How do I write a bash script that will check if it's running, and if not, start it. Roughly the following pseudo code (or maybe it should do something like ps | grep?):
# keepalivescript.sh
if processidfile exists:
if processid is running:
exit, all ok
run checkqueue.py
write processid to processidfile
I'll call that from a crontab:
# crontab
*/5 * * * * /path/to/keepalivescript.sh
Avoid PID-files, crons, or anything else that tries to evaluate processes that aren't their children.
There is a very good reason why in UNIX, you can ONLY wait on your children. Any method (ps parsing, pgrep, storing a PID, ...) that tries to work around that is flawed and has gaping holes in it. Just say no.
Instead you need the process that monitors your process to be the process' parent. What does this mean? It means only the process that starts your process can reliably wait for it to end. In bash, this is absolutely trivial.
until myserver; do
echo "Server 'myserver' crashed with exit code $?. Respawning.." >&2
sleep 1
done
The above piece of bash code runs myserver in an until loop. The first line starts myserver and waits for it to end. When it ends, until checks its exit status. If the exit status is 0, it means it ended gracefully (which means you asked it to shut down somehow, and it did so successfully). In that case we don't want to restart it (we just asked it to shut down!). If the exit status is not 0, until will run the loop body, which emits an error message on STDERR and restarts the loop (back to line 1) after 1 second.
Why do we wait a second? Because if something's wrong with the startup sequence of myserver and it crashes immediately, you'll have a very intensive loop of constant restarting and crashing on your hands. The sleep 1 takes away the strain from that.
Now all you need to do is start this bash script (asynchronously, probably), and it will monitor myserver and restart it as necessary. If you want to start the monitor on boot (making the server "survive" reboots), you can schedule it in your user's cron(1) with an #reboot rule. Open your cron rules with crontab:
crontab -e
Then add a rule to start your monitor script:
#reboot /usr/local/bin/myservermonitor
Alternatively; look at inittab(5) and /etc/inittab. You can add a line in there to have myserver start at a certain init level and be respawned automatically.
Edit.
Let me add some information on why not to use PID files. While they are very popular; they are also very flawed and there's no reason why you wouldn't just do it the correct way.
Consider this:
PID recycling (killing the wrong process):
/etc/init.d/foo start: start foo, write foo's PID to /var/run/foo.pid
A while later: foo dies somehow.
A while later: any random process that starts (call it bar) takes a random PID, imagine it taking foo's old PID.
You notice foo's gone: /etc/init.d/foo/restart reads /var/run/foo.pid, checks to see if it's still alive, finds bar, thinks it's foo, kills it, starts a new foo.
PID files go stale. You need over-complicated (or should I say, non-trivial) logic to check whether the PID file is stale, and any such logic is again vulnerable to 1..
What if you don't even have write access or are in a read-only environment?
It's pointless overcomplication; see how simple my example above is. No need to complicate that, at all.
See also: Are PID-files still flawed when doing it 'right'?
By the way; even worse than PID files is parsing ps! Don't ever do this.
ps is very unportable. While you find it on almost every UNIX system; its arguments vary greatly if you want non-standard output. And standard output is ONLY for human consumption, not for scripted parsing!
Parsing ps leads to a LOT of false positives. Take the ps aux | grep PID example, and now imagine someone starting a process with a number somewhere as argument that happens to be the same as the PID you stared your daemon with! Imagine two people starting an X session and you grepping for X to kill yours. It's just all kinds of bad.
If you don't want to manage the process yourself; there are some perfectly good systems out there that will act as monitor for your processes. Look into runit, for example.
Have a look at monit (http://mmonit.com/monit/). It handles start, stop and restart of your script and can do health checks plus restarts if necessary.
Or do a simple script:
while true
do
/your/script
sleep 1
done
In-line:
while true; do <your-bash-snippet> && break; done
This will restart continuously <your-bash-snippet> if it fails: && break will stop the loop if <your-bash-snippet> stop gracefully (return code 0).
To restart <your-bash-snippet> in all cases:
while true; do <your-bash-snippet>; done
e.g. #1
while true; do openconnect x.x.x.x:xxxx && break; done
e.g. #2
while true; do docker logs -f container-name; sleep 2; done
The easiest way to do it is using flock on file. In Python script you'd do
lf = open('/tmp/script.lock','w')
if(fcntl.flock(lf, fcntl.LOCK_EX|fcntl.LOCK_NB) != 0):
sys.exit('other instance already running')
lf.write('%d\n'%os.getpid())
lf.flush()
In shell you can actually test if it's running:
if [ `flock -xn /tmp/script.lock -c 'echo 1'` ]; then
echo 'it's not running'
restart.
else
echo -n 'it's already running with PID '
cat /tmp/script.lock
fi
But of course you don't have to test, because if it's already running and you restart it, it'll exit with 'other instance already running'
When process dies, all it's file descriptors are closed and all locks are automatically removed.
You should use monit, a standard unix tool that can monitor different things on the system and react accordingly.
From the docs: http://mmonit.com/monit/documentation/monit.html#pid_testing
check process checkqueue.py with pidfile /var/run/checkqueue.pid
if changed pid then exec "checkqueue_restart.sh"
You can also configure monit to email you when it does do a restart.
if ! test -f $PIDFILE || ! psgrep `cat $PIDFILE`; then
restart_process
# Write PIDFILE
echo $! >$PIDFILE
fi
watch "yourcommand"
It will restart the process if/when it stops (after a 2s delay).
watch -n 0.1 "yourcommand"
To restart it after 0.1s instead of the default 2 seconds
watch -e "yourcommand"
To stop restarts if the program exits with an error.
Advantages:
built-in command
one line
easy to use and remember.
Drawbacks:
Only display the result of the command on the screen once it's finished
I'm not sure how portable it is across operating systems, but you might check if your system contains the 'run-one' command, i.e. "man run-one".
Specifically, this set of commands includes 'run-one-constantly', which seems to be exactly what is needed.
From man page:
run-one-constantly COMMAND [ARGS]
Note: obviously this could be called from within your script, but also it removes the need for having a script at all.
I've used the following script with great success on numerous servers:
pid=`jps -v | grep $INSTALLATION | awk '{print $1}'`
echo $INSTALLATION found at PID $pid
while [ -e /proc/$pid ]; do sleep 0.1; done
notes:
It's looking for a java process, so I
can use jps, this is much more
consistent across distributions than
ps
$INSTALLATION contains enough of the process path that's it's totally unambiguous
Use sleep while waiting for the process to die, avoid hogging resources :)
This script is actually used to shut down a running instance of tomcat, which I want to shut down (and wait for) at the command line, so launching it as a child process simply isn't an option for me.
I use this for my npm Process
#!/bin/bash
for (( ; ; ))
do
date +"%T"
echo Start Process
cd /toFolder
sudo process
date +"%T"
echo Crash
sleep 1
done

BASH - why the infinite loop is not infinite and failing to restart the crashed process? [duplicate]

I have a python script that'll be checking a queue and performing an action on each item:
# checkqueue.py
while True:
check_queue()
do_something()
How do I write a bash script that will check if it's running, and if not, start it. Roughly the following pseudo code (or maybe it should do something like ps | grep?):
# keepalivescript.sh
if processidfile exists:
if processid is running:
exit, all ok
run checkqueue.py
write processid to processidfile
I'll call that from a crontab:
# crontab
*/5 * * * * /path/to/keepalivescript.sh
Avoid PID-files, crons, or anything else that tries to evaluate processes that aren't their children.
There is a very good reason why in UNIX, you can ONLY wait on your children. Any method (ps parsing, pgrep, storing a PID, ...) that tries to work around that is flawed and has gaping holes in it. Just say no.
Instead you need the process that monitors your process to be the process' parent. What does this mean? It means only the process that starts your process can reliably wait for it to end. In bash, this is absolutely trivial.
until myserver; do
echo "Server 'myserver' crashed with exit code $?. Respawning.." >&2
sleep 1
done
The above piece of bash code runs myserver in an until loop. The first line starts myserver and waits for it to end. When it ends, until checks its exit status. If the exit status is 0, it means it ended gracefully (which means you asked it to shut down somehow, and it did so successfully). In that case we don't want to restart it (we just asked it to shut down!). If the exit status is not 0, until will run the loop body, which emits an error message on STDERR and restarts the loop (back to line 1) after 1 second.
Why do we wait a second? Because if something's wrong with the startup sequence of myserver and it crashes immediately, you'll have a very intensive loop of constant restarting and crashing on your hands. The sleep 1 takes away the strain from that.
Now all you need to do is start this bash script (asynchronously, probably), and it will monitor myserver and restart it as necessary. If you want to start the monitor on boot (making the server "survive" reboots), you can schedule it in your user's cron(1) with an #reboot rule. Open your cron rules with crontab:
crontab -e
Then add a rule to start your monitor script:
#reboot /usr/local/bin/myservermonitor
Alternatively; look at inittab(5) and /etc/inittab. You can add a line in there to have myserver start at a certain init level and be respawned automatically.
Edit.
Let me add some information on why not to use PID files. While they are very popular; they are also very flawed and there's no reason why you wouldn't just do it the correct way.
Consider this:
PID recycling (killing the wrong process):
/etc/init.d/foo start: start foo, write foo's PID to /var/run/foo.pid
A while later: foo dies somehow.
A while later: any random process that starts (call it bar) takes a random PID, imagine it taking foo's old PID.
You notice foo's gone: /etc/init.d/foo/restart reads /var/run/foo.pid, checks to see if it's still alive, finds bar, thinks it's foo, kills it, starts a new foo.
PID files go stale. You need over-complicated (or should I say, non-trivial) logic to check whether the PID file is stale, and any such logic is again vulnerable to 1..
What if you don't even have write access or are in a read-only environment?
It's pointless overcomplication; see how simple my example above is. No need to complicate that, at all.
See also: Are PID-files still flawed when doing it 'right'?
By the way; even worse than PID files is parsing ps! Don't ever do this.
ps is very unportable. While you find it on almost every UNIX system; its arguments vary greatly if you want non-standard output. And standard output is ONLY for human consumption, not for scripted parsing!
Parsing ps leads to a LOT of false positives. Take the ps aux | grep PID example, and now imagine someone starting a process with a number somewhere as argument that happens to be the same as the PID you stared your daemon with! Imagine two people starting an X session and you grepping for X to kill yours. It's just all kinds of bad.
If you don't want to manage the process yourself; there are some perfectly good systems out there that will act as monitor for your processes. Look into runit, for example.
Have a look at monit (http://mmonit.com/monit/). It handles start, stop and restart of your script and can do health checks plus restarts if necessary.
Or do a simple script:
while true
do
/your/script
sleep 1
done
In-line:
while true; do <your-bash-snippet> && break; done
This will restart continuously <your-bash-snippet> if it fails: && break will stop the loop if <your-bash-snippet> stop gracefully (return code 0).
To restart <your-bash-snippet> in all cases:
while true; do <your-bash-snippet>; done
e.g. #1
while true; do openconnect x.x.x.x:xxxx && break; done
e.g. #2
while true; do docker logs -f container-name; sleep 2; done
The easiest way to do it is using flock on file. In Python script you'd do
lf = open('/tmp/script.lock','w')
if(fcntl.flock(lf, fcntl.LOCK_EX|fcntl.LOCK_NB) != 0):
sys.exit('other instance already running')
lf.write('%d\n'%os.getpid())
lf.flush()
In shell you can actually test if it's running:
if [ `flock -xn /tmp/script.lock -c 'echo 1'` ]; then
echo 'it's not running'
restart.
else
echo -n 'it's already running with PID '
cat /tmp/script.lock
fi
But of course you don't have to test, because if it's already running and you restart it, it'll exit with 'other instance already running'
When process dies, all it's file descriptors are closed and all locks are automatically removed.
You should use monit, a standard unix tool that can monitor different things on the system and react accordingly.
From the docs: http://mmonit.com/monit/documentation/monit.html#pid_testing
check process checkqueue.py with pidfile /var/run/checkqueue.pid
if changed pid then exec "checkqueue_restart.sh"
You can also configure monit to email you when it does do a restart.
if ! test -f $PIDFILE || ! psgrep `cat $PIDFILE`; then
restart_process
# Write PIDFILE
echo $! >$PIDFILE
fi
watch "yourcommand"
It will restart the process if/when it stops (after a 2s delay).
watch -n 0.1 "yourcommand"
To restart it after 0.1s instead of the default 2 seconds
watch -e "yourcommand"
To stop restarts if the program exits with an error.
Advantages:
built-in command
one line
easy to use and remember.
Drawbacks:
Only display the result of the command on the screen once it's finished
I'm not sure how portable it is across operating systems, but you might check if your system contains the 'run-one' command, i.e. "man run-one".
Specifically, this set of commands includes 'run-one-constantly', which seems to be exactly what is needed.
From man page:
run-one-constantly COMMAND [ARGS]
Note: obviously this could be called from within your script, but also it removes the need for having a script at all.
I've used the following script with great success on numerous servers:
pid=`jps -v | grep $INSTALLATION | awk '{print $1}'`
echo $INSTALLATION found at PID $pid
while [ -e /proc/$pid ]; do sleep 0.1; done
notes:
It's looking for a java process, so I
can use jps, this is much more
consistent across distributions than
ps
$INSTALLATION contains enough of the process path that's it's totally unambiguous
Use sleep while waiting for the process to die, avoid hogging resources :)
This script is actually used to shut down a running instance of tomcat, which I want to shut down (and wait for) at the command line, so launching it as a child process simply isn't an option for me.
I use this for my npm Process
#!/bin/bash
for (( ; ; ))
do
date +"%T"
echo Start Process
cd /toFolder
sudo process
date +"%T"
echo Crash
sleep 1
done

Launch the same program with different arguments in parallel via bash

I have a program that has very big computation times. I need to call it with different arguments. I want to run them on a server with a lot of processors, so I'd like to launch them in parallel in order to save time. (One program instance only uses one processor)
I have tried my best to write a bash script which looks like this:
#!/bin/bash
# set maximal number of parallel jobs
MAXPAR=5
# fill the PID array with nonsense pid numbers
for (( PAR=1; PAR<=MAXPAR; PAR++ ))
do
PID[$PAR]=-18
done
# loop over the arguments
for ARG in 50 60 70 90
do
# endless loop that checks, if one of the parallel jobs has finished
while true
do
# check if PID[PAR] is still running, suppress error output of kill
if ! kill -0 ${PID[PAR]} 2> /dev/null
then
# if PID[PAR] is not running, the next job
# can run as parellel job number PAR
break
fi
# if it is still running, check the next parallel job
if [ $PAR -eq $MAXPAR ]
then
PAR=1
else
PAR=$[$PAR+1]
fi
# but sleep 10 seconds before going on
sleep 10
done
# call to the actual program (here sleep for example)
#./complicated_program $ARG &
sleep $ARG &
# get the pid of the process we just started and save it as PID[PAR]
PID[$PAR]=$!
# give some output, so we know where we are
echo ARG=$ARG, par=$PAR, pid=${PID[PAR]}
done
Now, this script works, but I don't quite like it.
Is there any better way to deal with the beginning? (Setting PID[*]=-18 looks wrong to me)
How do I wait for the first job to finish without the ugly infinite loop and sleeping some seconds? I know there is wait, but I'm not sure how to use it here.
I'd be grateful for any comments on how to improve style and conciseness.
I have a much more complicated code that, more or less, does the same thing.
The things you need to consider:
Does the user need to approve the spawning of a new thread
Does the user need to approve the killing of an old thread
Does the thread terminate on it's own or it needs to be killed
Does the user want the script to run endlessly, as long as it has MAXPAR threads
If so, does the user need an escape sequence to stop further spawning
Here is some code for you:
spawn() #function that spawns a thread
{ #usage: spawn 1 ls -l
i=$1 #save the thread index
shift 1 #shift arguments to the left
[ ${thread[$i]} -ne 0 ] && #if the thread is not already running
[ ${#thread[#]} -lt $threads] && #and if we didn't reach maximum number of threads,
$# & #run the thread in the background, with all the arguments
thread[$1]=$! #associate thread id with thread index
}
terminate() #function that terminates threads
{ #usage: terminate 1
[ your condition ] && #if your condition is met,
kill {thread[$1]} && #kill the thread and if so,
thread[$1]=0 #mark the thread as terminated
}
Now, the rest of the code depends on your needs (things to consider), so you will either loop through input arguments and call spawn, and then after some time loop through threads indexes and call terminate. Or, if the threads end on their own, loop through input arguments and call both spawn and terminate,but the condition for the terminate is then:
[ ps -aux 2>/dev/null | grep " ${thread[$i]} " &>/dev/null ]
#look for thread id in process list (note spaces around id)
Or, something along the lines of that, you get the point.
Using the tips #theotherguy gave in the comments, I rewrote the script in a better way using the sem command that comes with GNU Parallel:
#!/bin/bash
# set maximal number of parallel jobs
MAXPAR=5
# loop over the arguments
for ARG in 50 60 70 90
do
# call to the actual program (here sleep for example)
# prefixed by sem -j $MAXPAR
#sem -j $MAXPAR ./complicated_program $ARG
sem -j $MAXPAR sleep $ARG
# give some output, so we know where we are
echo ARG=$ARG
done

How do I write a bash script to restart a process if it dies?

I have a python script that'll be checking a queue and performing an action on each item:
# checkqueue.py
while True:
check_queue()
do_something()
How do I write a bash script that will check if it's running, and if not, start it. Roughly the following pseudo code (or maybe it should do something like ps | grep?):
# keepalivescript.sh
if processidfile exists:
if processid is running:
exit, all ok
run checkqueue.py
write processid to processidfile
I'll call that from a crontab:
# crontab
*/5 * * * * /path/to/keepalivescript.sh
Avoid PID-files, crons, or anything else that tries to evaluate processes that aren't their children.
There is a very good reason why in UNIX, you can ONLY wait on your children. Any method (ps parsing, pgrep, storing a PID, ...) that tries to work around that is flawed and has gaping holes in it. Just say no.
Instead you need the process that monitors your process to be the process' parent. What does this mean? It means only the process that starts your process can reliably wait for it to end. In bash, this is absolutely trivial.
until myserver; do
echo "Server 'myserver' crashed with exit code $?. Respawning.." >&2
sleep 1
done
The above piece of bash code runs myserver in an until loop. The first line starts myserver and waits for it to end. When it ends, until checks its exit status. If the exit status is 0, it means it ended gracefully (which means you asked it to shut down somehow, and it did so successfully). In that case we don't want to restart it (we just asked it to shut down!). If the exit status is not 0, until will run the loop body, which emits an error message on STDERR and restarts the loop (back to line 1) after 1 second.
Why do we wait a second? Because if something's wrong with the startup sequence of myserver and it crashes immediately, you'll have a very intensive loop of constant restarting and crashing on your hands. The sleep 1 takes away the strain from that.
Now all you need to do is start this bash script (asynchronously, probably), and it will monitor myserver and restart it as necessary. If you want to start the monitor on boot (making the server "survive" reboots), you can schedule it in your user's cron(1) with an #reboot rule. Open your cron rules with crontab:
crontab -e
Then add a rule to start your monitor script:
#reboot /usr/local/bin/myservermonitor
Alternatively; look at inittab(5) and /etc/inittab. You can add a line in there to have myserver start at a certain init level and be respawned automatically.
Edit.
Let me add some information on why not to use PID files. While they are very popular; they are also very flawed and there's no reason why you wouldn't just do it the correct way.
Consider this:
PID recycling (killing the wrong process):
/etc/init.d/foo start: start foo, write foo's PID to /var/run/foo.pid
A while later: foo dies somehow.
A while later: any random process that starts (call it bar) takes a random PID, imagine it taking foo's old PID.
You notice foo's gone: /etc/init.d/foo/restart reads /var/run/foo.pid, checks to see if it's still alive, finds bar, thinks it's foo, kills it, starts a new foo.
PID files go stale. You need over-complicated (or should I say, non-trivial) logic to check whether the PID file is stale, and any such logic is again vulnerable to 1..
What if you don't even have write access or are in a read-only environment?
It's pointless overcomplication; see how simple my example above is. No need to complicate that, at all.
See also: Are PID-files still flawed when doing it 'right'?
By the way; even worse than PID files is parsing ps! Don't ever do this.
ps is very unportable. While you find it on almost every UNIX system; its arguments vary greatly if you want non-standard output. And standard output is ONLY for human consumption, not for scripted parsing!
Parsing ps leads to a LOT of false positives. Take the ps aux | grep PID example, and now imagine someone starting a process with a number somewhere as argument that happens to be the same as the PID you stared your daemon with! Imagine two people starting an X session and you grepping for X to kill yours. It's just all kinds of bad.
If you don't want to manage the process yourself; there are some perfectly good systems out there that will act as monitor for your processes. Look into runit, for example.
Have a look at monit (http://mmonit.com/monit/). It handles start, stop and restart of your script and can do health checks plus restarts if necessary.
Or do a simple script:
while true
do
/your/script
sleep 1
done
In-line:
while true; do <your-bash-snippet> && break; done
This will restart continuously <your-bash-snippet> if it fails: && break will stop the loop if <your-bash-snippet> stop gracefully (return code 0).
To restart <your-bash-snippet> in all cases:
while true; do <your-bash-snippet>; done
e.g. #1
while true; do openconnect x.x.x.x:xxxx && break; done
e.g. #2
while true; do docker logs -f container-name; sleep 2; done
The easiest way to do it is using flock on file. In Python script you'd do
lf = open('/tmp/script.lock','w')
if(fcntl.flock(lf, fcntl.LOCK_EX|fcntl.LOCK_NB) != 0):
sys.exit('other instance already running')
lf.write('%d\n'%os.getpid())
lf.flush()
In shell you can actually test if it's running:
if [ `flock -xn /tmp/script.lock -c 'echo 1'` ]; then
echo 'it's not running'
restart.
else
echo -n 'it's already running with PID '
cat /tmp/script.lock
fi
But of course you don't have to test, because if it's already running and you restart it, it'll exit with 'other instance already running'
When process dies, all it's file descriptors are closed and all locks are automatically removed.
You should use monit, a standard unix tool that can monitor different things on the system and react accordingly.
From the docs: http://mmonit.com/monit/documentation/monit.html#pid_testing
check process checkqueue.py with pidfile /var/run/checkqueue.pid
if changed pid then exec "checkqueue_restart.sh"
You can also configure monit to email you when it does do a restart.
if ! test -f $PIDFILE || ! psgrep `cat $PIDFILE`; then
restart_process
# Write PIDFILE
echo $! >$PIDFILE
fi
watch "yourcommand"
It will restart the process if/when it stops (after a 2s delay).
watch -n 0.1 "yourcommand"
To restart it after 0.1s instead of the default 2 seconds
watch -e "yourcommand"
To stop restarts if the program exits with an error.
Advantages:
built-in command
one line
easy to use and remember.
Drawbacks:
Only display the result of the command on the screen once it's finished
I'm not sure how portable it is across operating systems, but you might check if your system contains the 'run-one' command, i.e. "man run-one".
Specifically, this set of commands includes 'run-one-constantly', which seems to be exactly what is needed.
From man page:
run-one-constantly COMMAND [ARGS]
Note: obviously this could be called from within your script, but also it removes the need for having a script at all.
I've used the following script with great success on numerous servers:
pid=`jps -v | grep $INSTALLATION | awk '{print $1}'`
echo $INSTALLATION found at PID $pid
while [ -e /proc/$pid ]; do sleep 0.1; done
notes:
It's looking for a java process, so I
can use jps, this is much more
consistent across distributions than
ps
$INSTALLATION contains enough of the process path that's it's totally unambiguous
Use sleep while waiting for the process to die, avoid hogging resources :)
This script is actually used to shut down a running instance of tomcat, which I want to shut down (and wait for) at the command line, so launching it as a child process simply isn't an option for me.
I use this for my npm Process
#!/bin/bash
for (( ; ; ))
do
date +"%T"
echo Start Process
cd /toFolder
sudo process
date +"%T"
echo Crash
sleep 1
done

Resources