I have a daemon I am creating in linux. I created the init.d file and have successfully started the daemon process using
/etc/init.d/mydaemon start
When I try to stop it(with /etc/init.d/mydaemon stop), however, it stops successfully, but start-stop-daemon never seems to complete as evidenced by no echos occuring immediately after the call to start-stop-daemon
Verbose mode shows that it stopped the process, and looking at system monitor, it does stop the process.
Stopped mydaemon (pid 13292 13310).
Here is my stop function of the init.d file.
do_stop()
{
# Return
# 0 if daemon has been stopped
# 1 if daemon was already stopped
# 2 if daemon could not be stopped
# other if a failure occurred
start-stop-daemon --stop --name $NAME -v
echo "stopped"#This is never printed and the script never formally gives shell back.
RETVAL="$?"
[ "$RETVAL" = 2 ] && return 2
# Wait for children to finish too if this is a daemon that forks
# and if the daemon is only ever run from this initscript.
# If the above conditions are not satisfied then add some other code
# that waits for the process to drop all resources that could be
# needed by services started subsequently. A last resort is to
# sleep for some time.
start-stop-daemon --stop --quiet --oknodo --retry=0/30/KILL/5 --exec $DAEMON
[ "$?" = 2 ] && return 2
# Many daemons don't delete their pidfiles when they exit.
return "$RETVAL"
}
I am running this on virtual machine, does this affect anything?
Running on a virtual machine shouldn't affect this.
And I have no idea why this is happening or how it is taking over control of the parent script.
However, I just encountered this issue and discovered that if I do:
start-stop-daemon ... && echo -n
it will work as expected and relinquish control of the shell.
I have no idea why this works, but it seems to work.
Related
I have a process xyz, whose upstart script is as below
description "Run the xyz daemon"
author "xyz"
import SETTINGS
start on (
start-ap-services SETTINGS
)
stop on (
stop-ap-services SETTINGS
) or stopping system-services
respawn
oom score 0
script
. /usr/share/settings.sh
# directory for data persisted between boots
XYZ_DIR="/var/lib/xyz"
mkdir -p "${XYZ_DIR}"
chown -R xyz:xyz "${XYZ_DIR}"
if [ "${SETTING}" = 1 ]
ARGS="$ARGS --enable_stats=true"
fi
# CAP_NET_BIND_SERVICE, CAP_DAC_OVERRIDE.
exec /sbin/minijail0 -p -c 0x0402 -u xyz -g xyz \
-G /usr/bin/xyz ${ARGS}
else
exec sleep inf
fi
end script
# Prevent the job from respawning too quickly.
post-stop exec sleep 3
Now, due to OOM issue. xyz is killed based on it's OOM score and it gets respawned as expected. After a several restart of xyz, the post-stop sleep is killed after which xyz is never respawned.
How can this be prevented or Is there any solution to this?
Note: Name xyz is a dummy process name used only to mention my actual doubt.
I haven't worked on upstart scripts before. Any help would be of greater help.
Upstart can get confused when post-stop, pre-start, and post-start sections remain running across respawns.
I prefer to keep any command that takes longer than a few hundred milliseconds in a main job section, using auxiliary jobs if necessary.
For example, this will stall a job xyz that is being respawned or otherwise stopped:
start on stopping xyz RESULT='ok'
task
exec sleep 3
This has the same effect as your post-stop stanza, except that Upstart can better handle the state tracking for the simplified main job.
I'm writing a watchDog script which start my application and restart it while it been detected as down (according the process PID).
code:
while [ $pid ]; do
pid=$(getPID)
#if server App is down start it!
if [ -z "$pid" ]; then
echo -e "`date` [INFO]: watchDog activated -> starting service since was been deteced as down!\n" >> $watchDogLogger 2>$1
startApp > /dev/null 2>$1
pid=$(getPID)
fi
done
the issue is when a user start the application with the watchDog (watchdog start) because the while loop the console is "stuck" and i can't continue to using it.
i know that i can't to run the watchDog as background job (watchDog start &) and then get the console back but it make me troubles since when i want to stop it (watchDog stop) it indeed stop the application but the first watchDop job (watchDog start) is still live.
is there any other way to start the watchDog and get the consile back?
Thank you!
on my raspberry pi (raspbian running) I would like to have the current desktop switched to desktop n#0 after 5 minutes of idle system (no mouse or keyboard action), through wmctrl -s 0 and xprintidle for idle time checking.
Please keep in mind I'm no expert...
I tried 2 different ways, none of them working and I was wondering which one of them is the best way to do have the job done:
bash script and crontab
I wrote a simple script which checks if xprintidle is greater than a previously set $IDLE_TIME, than it switches desktops (saved in /usr/local/bin/switchDesktop0OnIdle):
#!/bin/bash
# 5 minutes in ms
IDLE_TIME=$((5*60*1000))
# Sequence to execute when timeout triggers.
trigger_cmd() {
wmctrl -s 0
}
sleep_time=$IDLE_TIME
triggered=false
while sleep $(((sleep_time+999)/1000)); do
idle=$(xprintidle)
if [ $idle -ge $IDLE_TIME ]; then
if ! $triggered; then
trigger_cmd
triggered=true
sleep_time=$IDLE_TIME
fi
else
triggered=false
# Give 100 ms buffer to avoid frantic loops shortly before triggers.
sleep_time=$((IDLE_TIME-idle+100))
fi
done
script itself works.
Then I added it to crontab (crontab -e) for have it run every 6 minutes
*/6 * * * * * sudo /usr/local/bin/switchDesktop0OnIdle
not sure sudo is necessary or not.
Anyway It doesn't work: googling around I understood that crontab runs in its own environment with its own variables. Even though I don't remember how to access this environment (oops) I do remember that I get these 2 errors running the script in it (which correctly works in "normal" shell)
could not open display (is it important ?)
bla bla -ge error, unary operator expected or similar: basically xprintidle doesn't work in this environment a gives back an empty value
What am I missing ?
infinite-while bash script running as daemon
second method I tried to set up a script with an internal infinite-while checking if xprintidle is greater then 5 minutes. In this case desktop is switched (less elegant?). Saved also in /usr/local/bin/switchDesktop0OnIdle
#!/bin/bash
triggered=false
while :
do
if [ `xprintidle` -ge 300000 ]; then
if [ triggered == false ]
wmctrl -s 0
triggered = true
fi
else
triggered = false
fi
fi
done
again the script itself works.
I tried to create a daemon in /etc/init.d/switchDesktop0OnIdle (really not an expert here, modified an existing one)
#! /bin/sh
# /etc/init.d/switchDesktop0OnIdle
### BEGIN INIT INFO
# Provides: switchDesktop0OnIdle
# Required-Start: $all
# Required-Stop: $all
# Should-Start:
# Should-Stop:
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description:
# Description:
### END INIT INFO
DAEMON=/usr/local/bin/switchDesktop0OnIdle
NAME=switchDesktop0OnIdle
test -x $DAEMON || exit 0
case "$1" in
start)
echo -n "Starting daemon: "
start-stop-daemon --start --exec $DAEMON
echo "switchDesktop0OnIdle."
;;
stop)
echo -n "Shutting down daemon:"
start-stop-daemon --stop --oknodo --retry 30 --exec $DAEMON
echo "switchDesktop0OnIdle."
;;
restart)
echo -n "Restarting daemon: "
start-stop-daemon --stop --oknodo --retry 30 --exec $DAEMON
start-stop-daemon --start --exec $DAEMON
echo "switchDesktop0OnIdle."
;;
*)
echo "Usage: $0 {start|stop|restart}"
exit 1
esac
exit 0
I set it up
sudo update-rc.d switchDesktop0OnIdle defaults
and
sudo service switchDesktop0OnIdle start
(necessary?)
...and nothing happens...
also I don't find the process with ps -ef | grep switchDesktop0OnIdle but it seems running with sudo service switchDesktop0OnIdle status
can anyone please help?
thank you
Giuseppe
As you suspected, the issue is that when you run your scripts from init or from cron, they are not running within the GUI environment you want them to control. In principle, a Linux system can have multiple X environments running. When you are using one, there are environment variables that direct the executables you are using to the environment you are in.
There are two parts to the solution: your scripts have to know which environment they are acting on, and they have to have authorization to interact with that environment.
You almost certainly are using a DISPLAY value of ":0", so export DISPLAY=:0 at the beginning of your script will handle the first part of the problem. (It might be ":0.0", which is effectively equivalent).
Authorization is a bit more complex. X can be set up to do authorization in different ways, but the most common is to have a file .Xauthority in your home directory which contains a token that is checked by the X server. If you install a script in your own crontab, it will run under your own user id (you probabl shouldn't use sudo), so it will read the right .Xauthority file. If you run from the root crontab, or from an init script, it will run as the root user, so it will have access to everything but will still need to know where to take the token from. I think that adding export XAUTHORITY=/home/joe/.Xauthority to the script will work. (Assuming your user id is joe.)
So basically I have one script that is keeping a server alive. It starts the server process and then starts it again after the process stops. Although sometimes the server becomes non responsive. For that I want to have another script which would ping the server and would kill the process if it wouldn't respond in 60 seconds.
The problem is that if I kill the server process the bash script also gets terminated.
The start script is just while do: sh Server.sh. It calls other shell script that has additional parameters for starting the server. The server is using java so it starts a java process. If the server hangs I use kill -9 pid because nothing else stops it. If the server doesn't hang and does the usual restart it gracefully stops and the bash script start second loop.
Doing The Right Thing
Use a real process supervision system -- your Linux distribution almost certainly includes one.
Directly monitoring the supervised process by PID
An awful, ugly, moderately buggy approach (for instance, able to kill the wrong process in the event of a PID collision) is the following:
while :; do
./Server.sh & server_pid=$!
echo "$server_pid" > server.pid
wait "$server_pid"
done
...and, to kill the process:
#!/bin/bash
# ^^^^ - DO NOT run this with "sh scriptname"; it must be "bash scriptname".
server_pid="$(<server.pid)"; [[ $server_pid ]] || exit
# allow 5 seconds for clean shutdown -- adjust to taste
for (( i=0; i<5; i++ )); do
if kill -0 "$server_pid"; then
sleep 1
else
exit 0 # server exited gracefully, nothing else to do
fi
done
# escalate to a SIGKILL
kill -9 "$server_pid"
Note that we're storing the PID of the server in our pidfile, and killing that directly -- thus, avoiding inadvertently targeting the supervision script.
Monitoring the supervised process and all children via lockfile
Note that this is using some Linux-specific tools -- but you do have linux on your question.
A more robust approach -- which will work across reboots even in the case of pidfile reuse -- is to use a lockfile:
while :; do
flock -x Server.lock sh Server.sh
done
...and, on the other end:
#!/bin/bash
# kill all programs having a handle on Server.lock
fuser -k Server.lock
for ((i=0; i<5; i++)); do
if fuser -s Server.lock; then
sleep 1
else
exit 0
fi
done
fuser -k -KILL Server.lock
I'm seeing an issue in upstart where using command substitution inside a post-start script stanza causes an error (syslog reports "terminated with status 1"), but only during the initial system startup.
I've tried using just about every startup event hook under the sun. local-filesystems and net-device-up worked without error about 1/100 tries, so it looks like a race condition. It works just fine on manual start/stop. The command substitutions I've seen trigger the error are a simple cat or date, and I've tried using both the $() way and the backtick way. I've also tried using sleep in pre-start to beat the race condition but that did nothing.
I'm running Ubuntu 11.10 on VMWare with a Win7 host. Spent too many hours troubleshooting this already... Anyone got any ideas?
Here is my .conf file for reference:
start on runlevel [2345]
stop on runlevel [016]
env NODE_ENV=production
env MYAPP_PIDFILE=/var/run/myapp.pid
respawn
exec start-stop-daemon --start --make-pidfile --pidfile $MYAPP_PIDFILE --chuid node-svc --exec /usr/local/n/versions/0.6.14/bin/node /opt/myapp/live/app.js >> /var/log/myapp/audit.node.log 2>&1
post-start script
MYAPP_PID=`cat $MYAPP_PIDFILE`
echo "[`date -u +%Y-%m-%dT%T.%3NZ`] + Started $UPSTART_JOB [$MYAPP_PID]: PROCESS=$PROCESS UPSTART_EVENTS=$UPSTART_EVENTS" >> /var/log/myapp/audit.upstart.log
end script
post-stop script
MYAPP_PID=`cat $MYAPP_PIDFILE`
echo "[`date -u +%Y-%m-%dT%T.%3NZ`] - Stopped $UPSTART_JOB [$MYAPP_PID]: PROCESS=$PROCESS UPSTART_STOP_EVENTS=$UPSTART_STOP_EVENTS EXIT_SIGNAL=$EXIT_SIGNAL EXIT_STATUS=$EXIT_STATUS" >> /var/log/myapp/audit.upstart.log
end script
The most likely scenario I can think of is that $MYAPP_PIDFILE has not been created yet.
Because you have not specified an 'expect' stanza, the post-start is run as soon as the main process has forked and execed. So, as you suspected, there is probably a race between start-stop-daemon running node and writing that pidfile and /bin/sh forking, execing, and forking again to exec cat $MYAPP_PIDFILE.
The right way to do this is to rewrite your post-start as such:
post-start script
for i in 1 2 3 4 5 ; do
if [ -f $MYAPP_PIDFILE ] ; then
echo ...
exit 0
fi
sleep 1
done
echo "timed out waiting for pidfile"
exit 1
end script
Its worth noting that in Upstart 1.4 (included first in Ubuntu 12.04), upstart added logging ability, so there's no need to redirect output into a special log file. All console output defaults to /var/log/upstart/$UPSTART_JOB.log (which is rotated by logrotate). So those echos could just be bare echos.