How can I make a working upstart job with yas3fs? - linux

I've got a very simple upstart config for maintaining a yas3fs mount.
start on filesystem
stop on runlevel [!2345]
respawn
kill timeout 15
oom never
expect fork
script
. /etc/s3.env
export AWS_ACCESS_KEY_ID
export AWS_SECRET_ACCESS_KEY
exec /opt/yas3fs/yas3fs.py /mnt/something --url=s3://something --cache-path=/mnt/s3fs-cache --mp-size=5120 --mp-num=8
end script'
What happens is that I get two copies of yas3fs.py running. One appears to mount the s3 bucket correctly, but the other is CONSTANTLY respawned by upstart (presumably because it errors due to the other one running).
If I throw in an "expect fork", the job never starts correctly. I just want to be able to have this simple mount safely able to be restarted, stopped, etc as an upstart job. Ideas?

I'm not an upstart expert, but this script should work:
start on (filesystem and net-device-up IFACE=eth0)
stop on runlevel [!2345]
env S3URL="s3://BUCKET[/PREFIX]"
env MOUNTPOINT="/SOME/PATH"
respawn
kill timeout 15
oom never
script
MOUNTED=$(mount|grep " $MOUNTPOINT "|wc -l)
if [ $MOUNTED = "1" ]; then
umount "$MOUNTPOINT"
fi
exec /opt/yas3fs/yas3fs.py "$MOUNTPOINT" --url="$S3URL" --mp-size=5120 --mp-num=8 -f
end script
pre-stop script
umount "$MOUNTPOINT"
end script
The trick is to leave yas3fs in foreground with the '-f' option, it seems there are too many forks to manage otherwise.
I added a check to clean (i.e. unmount) the mount point if yas3fs dies in some wrong way (e.g. "kill -9").

Related

How to debug an upstart script that intermittently fails?

I have a process that I want to start as soon my system is rebooted by whatever means so I was using upstart script for that but sometimes what I am noticing is my process doesn't get started up during hard reboot (plugging off and starting the machine) so I think my upstart script is not getting kicked in after hard reboot. I believe there is no runlevel for Hard Reboot.
I am confuse that why sometimes during reboot it works, but sometimes it doesn't work. And how can I debug this out?
Below is my upstart script:
# sudo start helper
# sudo stop helper
# sudo status helper
start on runlevel [2345]
stop on runlevel [!2345]
chdir /data
respawn
pre-start script
echo "[`date`] Agent Starting" >> /data/agent.log
sleep 30
end script
post-stop script
echo "[`date`] Agent Stopping" >> /data/agent.log
sleep 30
end script
limit core unlimited unlimited
limit nofile 100000 100000
setuid goldy
exec python helper.py
Is there any way to debug this out what's happening? I can easily reproduce this I believe. Any pointers on what I can do here?
Note:
During reboot sometimes I see the logging that I have in pre-start script but sometimes I don't see the logging at all after reboot and that means my upstart script was not triggered. Is there anything I need to change on runlevel to make it work?
I have a VM which is running in a Hypervisor and I am working with Ubuntu.
Your process running nicely, BUT during system startup many things go parallel.
IF mount (which makes available the /data folder) runs later than your pre-start script you will not see the "results" of pre-start script.
I suggest to move sleep 30 earlier (BTW 30 secs seems too looong):
pre-start script
sleep 30 # sleep 10 should be enough
echo "[`date`] Agent Starting" >> /data/agent.log
end script

Ubuntu upstart gets incorrect PID from Play 1.3

The Upstart script using the start-stop-daemon we've been using for Play 1.2.7 is now unable to stop/restart Play since Play 1.3 due to it having an incorrect PID.
Framework version: 1.3.0 on Ubuntu 12.04.5 LTS
Reproduction steps:
Setup an upstart script (playframework.conf) for a Play application
Play application starts successfully on server reboot Run 'sudo
status playframework' will return playframework start/running,
process 28912 - At this point process 28912 doesn't exist
vi {playapplicationfolder}/server.pid shows 28927
'stop playframework'
then fails due to unknown pid 28912 'status playframework' results in
playframework stop/killed, process 28912
Only way to restart play framework after this point is to either find the actual process and kill it then start play using the usual 'play start' command manually. Or restart the server.
This has broken our deployments scripts now as we used to install the new version of our app, then do play restart before reconnecting to the load balancer.
Upstart Script:
#Upstart script for a play application that binds to an unprivileged user.
# put this into a file like /etc/init/playframework
# you can then start/stop it using either initctl or start/stop/restart
# e.g.
# start playframework
description "PlayApp"
author "-----"
version "1.0"
env PLAY_BINARY=/opt/play/play
env JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
env HOME=/opt/myapp/latest
env USER=ubuntu
env GROUP=admin
env PROFILE=prod
start on (filesystem and net-device-up IFACE=lo) or runlevel [2345]
stop on runlevel [!2345]
limit nofile 65536 65536
respawn
respawn limit 10 5
umask 022
expect fork
pre-start script
test -x $PLAY_BINARY || { stop; exit 0; }
test -c /dev/null || { stop; exit 0; }
chdir ${HOME}
rm ${HOME}/server.pid || true
/opt/configurer.sh
end script
pre-stop script
exec $PLAY_BINARY stop $HOME
end script
post-stop script
rm ${HOME}/server.pid || true
end script
script
exec start-stop-daemon --start --exec $PLAY_BINARY --chuid $USER:$GROUP --chdir $HOME -- start $HOME -javaagent:/opt/newrelic/newrelic.jar --%$PROFILE -Dprecompiled=true --http.port=8080 --https.port=4443
end script
We've tried specifying the PID file in the start-stop-daemon as per: http://man.he.net/man8/start-stop-daemon however this also didnt seem to have any effect.
I have found some threads on similar issues https://askubuntu.com/questions/319199/upstart-tracking-wrong-pid-of-process-not-respawning but have been unable to find a way round this so far. I have tried changing fork to daemon but the same issue remains. I also can't see what has changed between Play 1.2.7 and 1.3 to cause this.
Another SO post has also asked a similar question but not had an answer as yet: https://stackoverflow.com/questions/23117345/upstart-gets-wrong-pid-after-launching-celery-with-start-stop-daemon
This is because getJavaVersion() spawns a subprocess, which bumps the PID count, which breaks Upstart, the latter which expects Play to fork exactly none, once or twice, depending on which expect stanza you use.
I've fixed this in a pull request.

upstart conf file wait for process

I have an upstart script that looks like this:
description "my script"
# Start and stop runlevels
start on runlevel [2345]
stop on runlevel [!2345]
# Automatically respawn
respawn
respawn limit 15 5
script
exec /home/myscript.sh
end script
This script launches a vpn, so in the process it takes some time to come up,when I see the logs on /var/log/upstart/myscript.sh I see that the process is being launched continually so it never finishes launching. What can I do to make upstart wait for the process to finish launching ?
Your upstart script has a respawn stansa but no expect stansa. Perhaps upstart is tracking the wrong process.
The documentation has the following warning "If you are creating a new Job Configuration File, do not specify the respawn stanza until you are fully satisfied you have specified the expect stanza correctly. If you do, you will find the behaviour potentially very confusing."
http://upstart.ubuntu.com/cookbook/#respawn
http://upstart.ubuntu.com/cookbook/#how-to-establish-fork-count

Can upstart expect/respawn be used on processes that fork more than twice?

I am using upstart to start/stop/automatically restart daemons. One of the daemons forks 4 times. The upstart cookbook states that it only supports forking twice. Is there a workaround?
How it fails
If I try to use expect daemon or expect fork, upstart uses the pid of the second fork. When I try to stop the job, nobody responds to upstarts SIGKILL signal and it hangs until you exhaust the pid space and loop back around. It gets worse if you add respawn. Upstart thinks the job died and immediately starts another one.
Bug acknowledged by upstream
A bug has been entered for upstart. The solutions presented are stick with the old sysvinit, rewrite your daemon, or wait for a re-write. RHEL is close to 2 years behind the latest upstart package, so by the time the rewrite is released and we get updated the wait will probably be 4 years. The daemon is written by a subcontractor of a subcontractor of a contractor so it will not be fixed any time soon either.
I came up with an ugly hack to make this work. It works for my application on my system. YMMV.
start the application in the pre-start section
in the script section run a script that runs as long as the application runs. The pid of this script is what upstart will track.
in the post-stop section kill the application
example
env DAEMON=/usr/bin/forky-application
pre-start script
su -s /bin/sh -c "$DAEMON" joeuseraccount
end script
script
sleepWhileAppIsUp(){
while pidof $1 >/dev/null; do
sleep 1
done
}
sleepWhileAppIsUp $DAEMON
end script
post-stop script
if pidof $DAEMON;
then
kill `pidof $DAEMON`
#pkill $DAEMON # post-stop process (19300) terminated with status 1
fi
end script
a similar approach could be taken with pid files.

Upstart: Error when using command substitution in post-start script stanza during startup sequence

I'm seeing an issue in upstart where using command substitution inside a post-start script stanza causes an error (syslog reports "terminated with status 1"), but only during the initial system startup.
I've tried using just about every startup event hook under the sun. local-filesystems and net-device-up worked without error about 1/100 tries, so it looks like a race condition. It works just fine on manual start/stop. The command substitutions I've seen trigger the error are a simple cat or date, and I've tried using both the $() way and the backtick way. I've also tried using sleep in pre-start to beat the race condition but that did nothing.
I'm running Ubuntu 11.10 on VMWare with a Win7 host. Spent too many hours troubleshooting this already... Anyone got any ideas?
Here is my .conf file for reference:
start on runlevel [2345]
stop on runlevel [016]
env NODE_ENV=production
env MYAPP_PIDFILE=/var/run/myapp.pid
respawn
exec start-stop-daemon --start --make-pidfile --pidfile $MYAPP_PIDFILE --chuid node-svc --exec /usr/local/n/versions/0.6.14/bin/node /opt/myapp/live/app.js >> /var/log/myapp/audit.node.log 2>&1
post-start script
MYAPP_PID=`cat $MYAPP_PIDFILE`
echo "[`date -u +%Y-%m-%dT%T.%3NZ`] + Started $UPSTART_JOB [$MYAPP_PID]: PROCESS=$PROCESS UPSTART_EVENTS=$UPSTART_EVENTS" >> /var/log/myapp/audit.upstart.log
end script
post-stop script
MYAPP_PID=`cat $MYAPP_PIDFILE`
echo "[`date -u +%Y-%m-%dT%T.%3NZ`] - Stopped $UPSTART_JOB [$MYAPP_PID]: PROCESS=$PROCESS UPSTART_STOP_EVENTS=$UPSTART_STOP_EVENTS EXIT_SIGNAL=$EXIT_SIGNAL EXIT_STATUS=$EXIT_STATUS" >> /var/log/myapp/audit.upstart.log
end script
The most likely scenario I can think of is that $MYAPP_PIDFILE has not been created yet.
Because you have not specified an 'expect' stanza, the post-start is run as soon as the main process has forked and execed. So, as you suspected, there is probably a race between start-stop-daemon running node and writing that pidfile and /bin/sh forking, execing, and forking again to exec cat $MYAPP_PIDFILE.
The right way to do this is to rewrite your post-start as such:
post-start script
for i in 1 2 3 4 5 ; do
if [ -f $MYAPP_PIDFILE ] ; then
echo ...
exit 0
fi
sleep 1
done
echo "timed out waiting for pidfile"
exit 1
end script
Its worth noting that in Upstart 1.4 (included first in Ubuntu 12.04), upstart added logging ability, so there's no need to redirect output into a special log file. All console output defaults to /var/log/upstart/$UPSTART_JOB.log (which is rotated by logrotate). So those echos could just be bare echos.

Resources