Bash - start and monitor an application using a wathcDog - linux

I'm writing a watchDog script which start my application and restart it while it been detected as down (according the process PID).
code:
while [ $pid ]; do
pid=$(getPID)
#if server App is down start it!
if [ -z "$pid" ]; then
echo -e "`date` [INFO]: watchDog activated -> starting service since was been deteced as down!\n" >> $watchDogLogger 2>$1
startApp > /dev/null 2>$1
pid=$(getPID)
fi
done
the issue is when a user start the application with the watchDog (watchdog start) because the while loop the console is "stuck" and i can't continue to using it.
i know that i can't to run the watchDog as background job (watchDog start &) and then get the console back but it make me troubles since when i want to stop it (watchDog stop) it indeed stop the application but the first watchDop job (watchDog start) is still live.
is there any other way to start the watchDog and get the consile back?
Thank you!

Related

How to check if supervisor process is running or stopped using bash script

I run multiple supervisor processes sometimes due to server overload some processes Stop indefinitely until you manually restart them. Is there a way I can write a bash script that can be regularly executed by a crontab to check the process that has stopped and restart it.
This how I can check status, stop or restart a process on a terminal
root#cloud:~# supervisorctl status birthday-sms:birthday-sms_00
birthday-sms:birthday-sms_00 RUNNING pid 2696895, uptime 0:02:08
root#cloud:~# supervisorctl stop birthday-sms:birthday-sms_00
birthday-sms:birthday-sms_00: stopped
root#cloud:~# supervisorctl status birthday-sms:birthday-sms_00
birthday-sms:birthday-sms_00 STOPPED May 07 11:07 AM
I don't want use crontab to restart all at a certain interval */5 * * * * /usr/bin/supervisorctl restart all but I want to check and restart stopped processes only
First make and test a script without crontab that performs the actions you want.
When the script is working, check that it still runs without the settings in your .bashrc, such as the path to supervisorctl.
Also check that your script doesn't write to stdout, perhaps introduce a logfile.
Next add the script to crontab.
#!/bin/bash
for proc in 'birthday-sms:birthday-sms_00' 'process2' 'process3'; do
# Only during development, you don't want output from cron
echo "Checking ${proc}"
status=$(supervisorctl status "${proc}" 2>&1)
echo "$status"
if [[ "$status" == *STOPPED* ]]; then
echo "Restarting ${proc}"
supervisorctl restart "${proc}"
fi
done
or using an array and shorter testing
#!/bin/bash
processes=('birthday-sms:birthday-sms_00' 'process2' 'process3')
for proc in ${processes[#]}; do
supervisorctl status "${proc}" 2>&1 |
grep -q STOPPED &&
supervisorctl restart "${proc}"
done

Different behaviour of bash script on supervisor start and restart

I have bash script which do something, (for example:)
[program:long_script]
command=/usr/local/bin/long.sh
autostart=true
autorestart=true
stderr_logfile=/var/log/long.err.log
stdout_logfile=/var/log/long.out.log
and it is bounded to supervisor.
I want to add if check in this script to determine is it executed by:
supervisor> start long_script
or
supervisor> restart long_script
I want something like that:
if [ executed by start command ]
then
echo "start"
else
echo "restart"
fi
but i don't know what should be in if clause.
Is it possible to determine this?
If not, how to achieve different behaviour of script for start and restart commands?
Please help.
Within the code there is no current difference between a restart and a stop/start. Restart within the supervisorctl calls:
self.do_stop(arg)
self.do_start(arg)
There is no status within the app for "restart" though there is some discussion of allowing different signals. The supervisor is already able to send different signals to the process. (allowing more control over reload/restart has been a long standing "gap")
This means you have at least two options but the key to making this work is that the process needs to record some state at shutdown
Option 1. The easiest option would be to use the supervisorctl signal <singal> <process> instead of calling supervisorctl restart <process> and record somewhere what signal was sent so that on startup you can read back the last signal.
Option 2. However a more interesting solution is to not expect any upstream changes ie continue to allow restart to be used and distinguish between stop, crash and restart
In this case, the only information that will be different between a start and a restart is that a restart should have a much shorter time between the shutdown of the old process and the start of the new process. So if, on shutdown, a timestamp is recorded, then on startup, the difference between now and the last shutdown will distinguish between a start and a restart
To do this, Ive got a definition like yours but with stopsignal defined:
[program:long_script]
command=/usr/local/bin/long.sh
autostart=true
autorestart=true
stderr_logfile=/var/log/long.err.log
stdout_logfile=/var/log/long.out.log
stopsignal=SIGUSR1
By making the stop from supervisord a specific signal, you can tell the difference between a crash and a normal stop event, and not interfere with normal kill or interrupt signals
Then as the very first line in the bash script, i set a trap for this singal:
trap "mkdir -p /var/run/long/; date +%s > /var/run/long/last.stop; exit 0" SIGUSR1
This means the date as epoch will be recorded in the file /var/run/long/last.stop everytime we are sent a stop from supervisord
Then as the immediate next lines in the script, calculate the difference between the last stop and now
stopdiff=0
if [ -e /var/run/long/last.stop ]; then
curtime=$(date +%s)
stoptime=$(cat /var/run/long/last.stop | grep "[0-9]*")
if [ -n "${stoptime}" ]; then
stopdiff=$[ ${curtime} - ${stoptime} ]
fi
else
stopdiff=9999
fi
stopdiff will now contain the difference in seconds between the stop and start or 9999 if the stop file didnt exist.
This can then be used to decide what to do:
if [ ${stopdiff} -gt 2 ]; then
echo "Start detected (${stopdiff} sec difference)"
elif [ ${stopdiff} -ge 0 ]; then
echo "Restart detected (${stopdiff} sec difference)"
else
echo "Error detected (${stopdiff} sec difference)"
fi
You'll have to make some choices about how long it actually takes to get from sending a stop to the script actually starting: here, ive allowed only 2 seconds and anything greater is considered a "start". If the shutdown of the script needs to happen in a specific way, you'll need a bit more complexity in the trap statement (rather than just exit 0
Since a crash shouldnt record any timestamp to the stop file, you should be able to tell that a startup is occurring because of a crash if you also regularly recorded somewhere a running timestamp.
I understand your problem. But I don't know about supervisor. Please check whether this idea works.
Instantiate a global string variable and put values to the variable before you enter the supervisor commands. Here I am making your each start and restart commands as two bash programs.
program : supervisor_start.sh
#!/bin/bash
echo "Starting.."
supervisor> start long_script
supervisor_started_command="start" # This is the one
echo "Started.."
program : supervisor_restart.sh
#!/bin/bash
echo "ReStarting.."
supervisor> restart long_script
supervisor_started_command="restart" # This is the one
echo "ReStarted.."
Now you can see what is in "supervisor_started_command" variable :)
#!/bin/bash
if [ $supervisor_started_command == "start" ]
then
echo "start"
elif [ $supervisor_started_command == "restart" ]
echo "restart"
fi
Well, I don't know this idea works for you or not..

Killing process started by bash script but not script itself

So basically I have one script that is keeping a server alive. It starts the server process and then starts it again after the process stops. Although sometimes the server becomes non responsive. For that I want to have another script which would ping the server and would kill the process if it wouldn't respond in 60 seconds.
The problem is that if I kill the server process the bash script also gets terminated.
The start script is just while do: sh Server.sh. It calls other shell script that has additional parameters for starting the server. The server is using java so it starts a java process. If the server hangs I use kill -9 pid because nothing else stops it. If the server doesn't hang and does the usual restart it gracefully stops and the bash script start second loop.
Doing The Right Thing
Use a real process supervision system -- your Linux distribution almost certainly includes one.
Directly monitoring the supervised process by PID
An awful, ugly, moderately buggy approach (for instance, able to kill the wrong process in the event of a PID collision) is the following:
while :; do
./Server.sh & server_pid=$!
echo "$server_pid" > server.pid
wait "$server_pid"
done
...and, to kill the process:
#!/bin/bash
# ^^^^ - DO NOT run this with "sh scriptname"; it must be "bash scriptname".
server_pid="$(<server.pid)"; [[ $server_pid ]] || exit
# allow 5 seconds for clean shutdown -- adjust to taste
for (( i=0; i<5; i++ )); do
if kill -0 "$server_pid"; then
sleep 1
else
exit 0 # server exited gracefully, nothing else to do
fi
done
# escalate to a SIGKILL
kill -9 "$server_pid"
Note that we're storing the PID of the server in our pidfile, and killing that directly -- thus, avoiding inadvertently targeting the supervision script.
Monitoring the supervised process and all children via lockfile
Note that this is using some Linux-specific tools -- but you do have linux on your question.
A more robust approach -- which will work across reboots even in the case of pidfile reuse -- is to use a lockfile:
while :; do
flock -x Server.lock sh Server.sh
done
...and, on the other end:
#!/bin/bash
# kill all programs having a handle on Server.lock
fuser -k Server.lock
for ((i=0; i<5; i++)); do
if fuser -s Server.lock; then
sleep 1
else
exit 0
fi
done
fuser -k -KILL Server.lock

Terminate bash script if process already running

I'm using a startup script to start our Minecraft server via webmin on CentOS. It backs up a few files before starting the server itself. Recently we messed up our data by accidentally executing the script twice in a row, which resulted two instances of the Minecraft server being run and everything went haywire with data files and such.
To prevent this from happening, I want the script to terminate if it detects that the process is running. I've searched around for similar problems, and things like lock files are suggested, but I don't have the opportunity to remove those since the startup script only sets up a screen for the Minecraft server process and stopping the server is usually done by terminating the screen or stopping the server through ingame commands.
The server process is started using this command:
screen -dmS minecraft java -Xincgc -Xmx2G -jar server.jar
How can I make the startup script detect if this process is already running, and then terminate itself?
Use this script:
#!/bin/bash
LOCKDIR="/path/to/lockdir"
if [ ! mkdir "$LOCKDIR" ]; then
echo >&2 "Server is already running"
exit 1
fi
# Here: when exiting, or receiving any of the mentioned signals, remove the lock file   
trap "rmdir \"$LOCKDIR\"" exit INT HUP TERM QUIT
 
# It would be tempting to exec instead, but DON'T DO IT: otherwise the trap is forgotten
minecraft java -Xincgc -Xmx2G -jar server.jar
exit $?
and launch it within your screen.
these links may give you some ideas: http://mywiki.wooledge.org/ProcessManagement - http://mywiki.wooledge.org/BashFAQ/042 - http://mywiki.wooledge.org/BashFAQ/033 - http://mywiki.wooledge.org/BashFAQ/045

Linux Daemon Stopping start-stop-daemon

I have a daemon I am creating in linux. I created the init.d file and have successfully started the daemon process using
/etc/init.d/mydaemon start
When I try to stop it(with /etc/init.d/mydaemon stop), however, it stops successfully, but start-stop-daemon never seems to complete as evidenced by no echos occuring immediately after the call to start-stop-daemon
Verbose mode shows that it stopped the process, and looking at system monitor, it does stop the process.
Stopped mydaemon (pid 13292 13310).
Here is my stop function of the init.d file.
do_stop()
{
# Return
# 0 if daemon has been stopped
# 1 if daemon was already stopped
# 2 if daemon could not be stopped
# other if a failure occurred
start-stop-daemon --stop --name $NAME -v
echo "stopped"#This is never printed and the script never formally gives shell back.
RETVAL="$?"
[ "$RETVAL" = 2 ] && return 2
# Wait for children to finish too if this is a daemon that forks
# and if the daemon is only ever run from this initscript.
# If the above conditions are not satisfied then add some other code
# that waits for the process to drop all resources that could be
# needed by services started subsequently. A last resort is to
# sleep for some time.
start-stop-daemon --stop --quiet --oknodo --retry=0/30/KILL/5 --exec $DAEMON
[ "$?" = 2 ] && return 2
# Many daemons don't delete their pidfiles when they exit.
return "$RETVAL"
}
I am running this on virtual machine, does this affect anything?
Running on a virtual machine shouldn't affect this.
And I have no idea why this is happening or how it is taking over control of the parent script.
However, I just encountered this issue and discovered that if I do:
start-stop-daemon ... && echo -n
it will work as expected and relinquish control of the shell.
I have no idea why this works, but it seems to work.

Resources