How to check if supervisor process is running or stopped using bash script - linux

I run multiple supervisor processes sometimes due to server overload some processes Stop indefinitely until you manually restart them. Is there a way I can write a bash script that can be regularly executed by a crontab to check the process that has stopped and restart it.
This how I can check status, stop or restart a process on a terminal
root#cloud:~# supervisorctl status birthday-sms:birthday-sms_00
birthday-sms:birthday-sms_00 RUNNING pid 2696895, uptime 0:02:08
root#cloud:~# supervisorctl stop birthday-sms:birthday-sms_00
birthday-sms:birthday-sms_00: stopped
root#cloud:~# supervisorctl status birthday-sms:birthday-sms_00
birthday-sms:birthday-sms_00 STOPPED May 07 11:07 AM
I don't want use crontab to restart all at a certain interval */5 * * * * /usr/bin/supervisorctl restart all but I want to check and restart stopped processes only

First make and test a script without crontab that performs the actions you want.
When the script is working, check that it still runs without the settings in your .bashrc, such as the path to supervisorctl.
Also check that your script doesn't write to stdout, perhaps introduce a logfile.
Next add the script to crontab.
#!/bin/bash
for proc in 'birthday-sms:birthday-sms_00' 'process2' 'process3'; do
# Only during development, you don't want output from cron
echo "Checking ${proc}"
status=$(supervisorctl status "${proc}" 2>&1)
echo "$status"
if [[ "$status" == *STOPPED* ]]; then
echo "Restarting ${proc}"
supervisorctl restart "${proc}"
fi
done
or using an array and shorter testing
#!/bin/bash
processes=('birthday-sms:birthday-sms_00' 'process2' 'process3')
for proc in ${processes[#]}; do
supervisorctl status "${proc}" 2>&1 |
grep -q STOPPED &&
supervisorctl restart "${proc}"
done

Related

Script file restart Tomcat runs manually success, but fails on Crontab

I'm newbie to shell scripting.
I have a Tomcat server build on : /APP/apache-tomcat-7.0.42
I want my tomcat automatic restart one time per day, so I write a file test.sh (/APP/apache-tomcat-7.0.42/test.sh) with content :
/APP/apache-tomcat-7.0.42/bin/shutdown.sh && echo "Tomcat was already shutdown"
kill -9 $(lsof -t -i:8080 -sTCP:LISTEN)
/APP/apache-tomcat-7.0.42/bin/startup.sh
And I install on crontab: 0 9 * * * /APP/apache-tomcat-7.0.42/test.sh
But not working, although I try run manually, and success.
I checked crontab : /etc/init.d/crond status, it is running.
I dont understand, help me!
Oh, I resloved!
If you can do it manually, from a log in session, but not automatically from
startup or from cron, I'm 99% sure it's because environment variables like
JAVA_HOME and CATALINA_HOME are not being set for the startup and cron environments.
You need get info in this session:
> echo $JAVA_HOME
> JAVA_HOME="/usr/java/jdk1.6.0_41"
> echo $CATALINA_HOME
> CATALINA_HOME="/APP/apache-tomcat-7.0.42"
Then, Result file crontab:
export PATH="/usr/lib64/qt-3.3/bin:/usr/kerberos/sbin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/tomcat/bin"
export JAVA_HOME="/usr/java/jdk1.6.0_41"
export CATALINA_HOME="/APP/apache-tomcat-7.0.42"
/APP/apache-tomcat-7.0.42/bin/shutdown.sh
kill -9 $(lsof -t -i:8080 -sTCP:LISTEN)
/APP/apache-tomcat-7.0.42/bin/startup.sh

How to debug an upstart script that intermittently fails?

I have a process that I want to start as soon my system is rebooted by whatever means so I was using upstart script for that but sometimes what I am noticing is my process doesn't get started up during hard reboot (plugging off and starting the machine) so I think my upstart script is not getting kicked in after hard reboot. I believe there is no runlevel for Hard Reboot.
I am confuse that why sometimes during reboot it works, but sometimes it doesn't work. And how can I debug this out?
Below is my upstart script:
# sudo start helper
# sudo stop helper
# sudo status helper
start on runlevel [2345]
stop on runlevel [!2345]
chdir /data
respawn
pre-start script
echo "[`date`] Agent Starting" >> /data/agent.log
sleep 30
end script
post-stop script
echo "[`date`] Agent Stopping" >> /data/agent.log
sleep 30
end script
limit core unlimited unlimited
limit nofile 100000 100000
setuid goldy
exec python helper.py
Is there any way to debug this out what's happening? I can easily reproduce this I believe. Any pointers on what I can do here?
Note:
During reboot sometimes I see the logging that I have in pre-start script but sometimes I don't see the logging at all after reboot and that means my upstart script was not triggered. Is there anything I need to change on runlevel to make it work?
I have a VM which is running in a Hypervisor and I am working with Ubuntu.
Your process running nicely, BUT during system startup many things go parallel.
IF mount (which makes available the /data folder) runs later than your pre-start script you will not see the "results" of pre-start script.
I suggest to move sleep 30 earlier (BTW 30 secs seems too looong):
pre-start script
sleep 30 # sleep 10 should be enough
echo "[`date`] Agent Starting" >> /data/agent.log
end script

How to run gdb on httpd processes within a shell script

I would like to get all my httpd processes, put in an array, then run gdb on each process, run as a cron, save output to file. For instance:
#!/bin/bash
# Make a list of current httpd pid's and then run "gdb" on each one
pids=( $(pgrep 'httpd') )
for each in "${pids[#]}"
do
echo "$each"
gdb httpd $each >> gdbscipt.out
echo "Done with: $each"
done
When I run it just runs on the first pid.
# ./gdbscript
2046
Then just stops after each pid is processed. Because it seems there is a breakpoint? within gdb after processing each pid.
I want to run it overnight a few times via cron.
Is there a better approach to running gdb on a list of active httpd processes via cron and outputting to a file(s)?
Thanks

Check if process runs if not execute script.sh

I am trying to find a way to monitor a process. If the process is not running it should be checked again to make sure it has really crashed. If it has really crashed run a script (start.sh)
I have tried monit with no succes, I have also tried adding this script in crontab: I made it executable with chmod +x monitor.sh
the actual program is called program1
case "$(pidof program | wc -w)" in
0) echo "Restarting program1: $(date)" >> /var/log/program1_log.txt
/home/user/files/start.sh &
;;
1) # all ok
;;
*) echo "Removed double program1: $(date)" >> /var/log/program1_log.txt
kill $(pidof program1 | awk '{print $1}')
;;
esac
The problem is this script does not work, I added it to crontab and set it to run every 2 minutes. If I close the program it won't restart.
Is there any other way to check a process, and run start.sh when it has crashed?
Not to be rude, but have you considered a more obvious solution?
When a shell (e.g. bash or tcsh) starts a subprocess, by default it waits for that subprocess to complete.
So why not have a shell that runs your process in a while(1) loop? Whenever the process terminates, for any reason, legitimate or not, it will automatically restart your process.
I ran into this same problem with mythtv. The backend keeps crashing on me. It's a Heisenbug. Happens like once a month (on average). Very hard to track down. So I just wrote a little script that I run in an xterm.
The, ahh, oninter business means that control-c will terminate the subprocess and not my (parent-process) script. Similarly, the sleep is in there so I can control-c several times to kill the subprocess and then kill the parent-process script while it's sleeping...
Coredumpsize is limited just because I don't want to fill up my disk with corefiles that I cannot use.
#!/bin/tcsh -f
limit coredumpsize 0
while( 1 )
echo "`date`: Running mythtv-backend"
# Now we cannot control-c this (tcsh) process...
onintr -
# This will let /bin/ls directory-sort my logfiles based on day & time.
# It also keeps the logfile names pretty unique.
mythbackend |& tee /....../mythbackend.log.`date "+%Y.%m.%d.%H.%M.%S"`
# Now we can control-c this (tcsh) process.
onintr
echo "`date`: mythtv-backend exited. Sleeping for 30 seconds, then restarting..."
sleep 30
end
p.s. That sleep will also save you in the event your subprocess dies immediately. Otherwise the constant respawning without delay will drive your IO and CPU through the roof, making it difficult to correct the problem.

Upstart: Error when using command substitution in post-start script stanza during startup sequence

I'm seeing an issue in upstart where using command substitution inside a post-start script stanza causes an error (syslog reports "terminated with status 1"), but only during the initial system startup.
I've tried using just about every startup event hook under the sun. local-filesystems and net-device-up worked without error about 1/100 tries, so it looks like a race condition. It works just fine on manual start/stop. The command substitutions I've seen trigger the error are a simple cat or date, and I've tried using both the $() way and the backtick way. I've also tried using sleep in pre-start to beat the race condition but that did nothing.
I'm running Ubuntu 11.10 on VMWare with a Win7 host. Spent too many hours troubleshooting this already... Anyone got any ideas?
Here is my .conf file for reference:
start on runlevel [2345]
stop on runlevel [016]
env NODE_ENV=production
env MYAPP_PIDFILE=/var/run/myapp.pid
respawn
exec start-stop-daemon --start --make-pidfile --pidfile $MYAPP_PIDFILE --chuid node-svc --exec /usr/local/n/versions/0.6.14/bin/node /opt/myapp/live/app.js >> /var/log/myapp/audit.node.log 2>&1
post-start script
MYAPP_PID=`cat $MYAPP_PIDFILE`
echo "[`date -u +%Y-%m-%dT%T.%3NZ`] + Started $UPSTART_JOB [$MYAPP_PID]: PROCESS=$PROCESS UPSTART_EVENTS=$UPSTART_EVENTS" >> /var/log/myapp/audit.upstart.log
end script
post-stop script
MYAPP_PID=`cat $MYAPP_PIDFILE`
echo "[`date -u +%Y-%m-%dT%T.%3NZ`] - Stopped $UPSTART_JOB [$MYAPP_PID]: PROCESS=$PROCESS UPSTART_STOP_EVENTS=$UPSTART_STOP_EVENTS EXIT_SIGNAL=$EXIT_SIGNAL EXIT_STATUS=$EXIT_STATUS" >> /var/log/myapp/audit.upstart.log
end script
The most likely scenario I can think of is that $MYAPP_PIDFILE has not been created yet.
Because you have not specified an 'expect' stanza, the post-start is run as soon as the main process has forked and execed. So, as you suspected, there is probably a race between start-stop-daemon running node and writing that pidfile and /bin/sh forking, execing, and forking again to exec cat $MYAPP_PIDFILE.
The right way to do this is to rewrite your post-start as such:
post-start script
for i in 1 2 3 4 5 ; do
if [ -f $MYAPP_PIDFILE ] ; then
echo ...
exit 0
fi
sleep 1
done
echo "timed out waiting for pidfile"
exit 1
end script
Its worth noting that in Upstart 1.4 (included first in Ubuntu 12.04), upstart added logging ability, so there's no need to redirect output into a special log file. All console output defaults to /var/log/upstart/$UPSTART_JOB.log (which is rotated by logrotate). So those echos could just be bare echos.

Resources