Upstart: each process on different core - node.js

I'm trying to use upstart to launch multiple instances of node.js - each on separate cpu core listening on different port.
Launch configuration:
start on startup
task
env NUM_WORKERS=2
script
for i in `seq 1 $NUM_WORKERS`
do
start worker N=$i
done
end script
Worker configuration:
instance $N
script
export HOME="/node"
echo $$ > /var/run/worker-$N.pid
exec sudo -u root /usr/local/bin/node /node/server.js >> /var/log/worker-$N.sys.log 2>&1
end script
How do I specify that each process should be launched on a separate core to scale node.js inside the box?

taskset allows you to set CPU affinities for any Linux process. But Linux kernel already favors keeping a process on the same CPU to optimize performance.

Related

ubuntu crontab celery beat

I have several celery tasks that execute via beat. In development, I used a single command to set this up, like:
celery worker -A my_tasks -n XXXXX#%h -Q for_mytasks -c 1 -E -l INFO -B -s ./config/celerybeat-schedule --pidfile ./config/celerybeat.pid
On moving to production, I inserted this into a script that activated my venv, set the PYTHONPATH, removed old beat files, cd to correct directory and then run celery. This works absolutely fine. However, in production I want to separate the worker from the beat scheduler, like:
celery worker -A my_tasks -n XXXXX#%h -Q for_mytasks -c 1 -E -l INFO -f ./logs/celeryworker.log
celery beat -A my_tasks -s ./config/celerybeat-schedule --pidfile ./config/celerybeat.pid -l INFO -f ./logs/celerybeat.log
Now this all works fine when put into the relevant bash scripts. However, I need these to be run on server start-up. I encountered several problems:
1) in crontab -e #reboot my_script will not work. I have to insert a delay to allow rabbitmq to fully start, i.e. #reboot sleep 60 && my_script. Now this seems a little 'messy' to me but I can live with it.
2) celery worker takes several seconds to finish before celery beat can be run properly. I tried all manner of cron directives to accomplish beat being run after worker has executed successfully but couldn't get the beat to run. My current solution in crontab is something like this:
#reboot sleep 60 && my_script_worker
#reboot sleep 120 && my_script_beat
So basically, ubuntu boots, waits 60 seconds and runs celery worker then waits another 60 seconds before running celery beat. This works fine but it seems even more 'messy' to me. In an ideal world I would like to flag when rabbitmq is ready to run worker, then flag when worker has executed successfully so that I can run beat.
My question is : has anybody encountered this problem and if so do they have a more elegant way of kicking off celery worker & beat on server reboot?
EDIT: 24/09/2019
Thanks to DejanLekic & Greenev
I have spent some hours converting from cron to systemd. Yes, I agree totally that this is a far more robust solution. My celery worker & beat are now started as services by systemd on reboot.
There is one tip I have for people trying this that is not mentioned in the celery documentation. The template beat command will create a 'celery beat database' file called celerybeat-schedule in your working directory. If you restart your beat service, this file will cause spurious celery tasks to be spawned that don't seem to fit with your actual celery schedule. The solution is to delete this file each time the beat service starts. I also delete the pid file, if it's there. I did this by adding 2 ExecStartPre and a -s option to the beat service :
ExecStartPre=/bin/sh -c 'rm -f ${CELERYBEAT_DB_FILE}'
ExecStartPre=/bin/sh -c 'rm -f ${CELERYBEAT_PID_FILE}'
ExecStart=/bin/sh -c '${CELERY_BIN} beat \
-A ${CELERY_APP} --pidfile=${CELERYBEAT_PID_FILE} \
-s ${CELERYBEAT_DB_FILE} \
--logfile=${CELERYBEAT_LOG_FILE} --loglevel=${CELERYD_LOG_LEVEL}'
Thanks guys.
To daemonize celery worker we are using systemd, so the worker and the beat could be getting to run as separate services and configured to start on the server reboot via just making these services enabled
What you really want is to run Celery beat process as a systemd or SysV service. It is described in depth in the Daemonization section of the Celery documentation. In fact, same goes for the worker process too.
Why? - Unlike your solution, which involves crontab with #reboot line, systemd for an example can check the health of the service and restart it if needed. All Linux services on your Linux boxes are started this way because it has been made for this particular purpose.

How to add a different scheduler to a shell script in Linux?

I am trying to use different schedulers to measure CPU usage among various programs. I am currently having trouble figuring out how to add a different scheduler to the script. I have tried using the chrt command, but I can not reliably get the pid for the script.
PIDs are fickle and racy (only the parent process of a PID can be sure it hasn't died and been recycled).
I'd use the first form (chrt [options] prio command [arg]...
) instead, relying on two scripts:
wrapper_script:
exec chrt --fifo 99 wrapee #wrapee must be in $PATH
wrapee:
echo "I'm a hi-priority hello world"

Ubuntu upstart gets incorrect PID from Play 1.3

The Upstart script using the start-stop-daemon we've been using for Play 1.2.7 is now unable to stop/restart Play since Play 1.3 due to it having an incorrect PID.
Framework version: 1.3.0 on Ubuntu 12.04.5 LTS
Reproduction steps:
Setup an upstart script (playframework.conf) for a Play application
Play application starts successfully on server reboot Run 'sudo
status playframework' will return playframework start/running,
process 28912 - At this point process 28912 doesn't exist
vi {playapplicationfolder}/server.pid shows 28927
'stop playframework'
then fails due to unknown pid 28912 'status playframework' results in
playframework stop/killed, process 28912
Only way to restart play framework after this point is to either find the actual process and kill it then start play using the usual 'play start' command manually. Or restart the server.
This has broken our deployments scripts now as we used to install the new version of our app, then do play restart before reconnecting to the load balancer.
Upstart Script:
#Upstart script for a play application that binds to an unprivileged user.
# put this into a file like /etc/init/playframework
# you can then start/stop it using either initctl or start/stop/restart
# e.g.
# start playframework
description "PlayApp"
author "-----"
version "1.0"
env PLAY_BINARY=/opt/play/play
env JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
env HOME=/opt/myapp/latest
env USER=ubuntu
env GROUP=admin
env PROFILE=prod
start on (filesystem and net-device-up IFACE=lo) or runlevel [2345]
stop on runlevel [!2345]
limit nofile 65536 65536
respawn
respawn limit 10 5
umask 022
expect fork
pre-start script
test -x $PLAY_BINARY || { stop; exit 0; }
test -c /dev/null || { stop; exit 0; }
chdir ${HOME}
rm ${HOME}/server.pid || true
/opt/configurer.sh
end script
pre-stop script
exec $PLAY_BINARY stop $HOME
end script
post-stop script
rm ${HOME}/server.pid || true
end script
script
exec start-stop-daemon --start --exec $PLAY_BINARY --chuid $USER:$GROUP --chdir $HOME -- start $HOME -javaagent:/opt/newrelic/newrelic.jar --%$PROFILE -Dprecompiled=true --http.port=8080 --https.port=4443
end script
We've tried specifying the PID file in the start-stop-daemon as per: http://man.he.net/man8/start-stop-daemon however this also didnt seem to have any effect.
I have found some threads on similar issues https://askubuntu.com/questions/319199/upstart-tracking-wrong-pid-of-process-not-respawning but have been unable to find a way round this so far. I have tried changing fork to daemon but the same issue remains. I also can't see what has changed between Play 1.2.7 and 1.3 to cause this.
Another SO post has also asked a similar question but not had an answer as yet: https://stackoverflow.com/questions/23117345/upstart-gets-wrong-pid-after-launching-celery-with-start-stop-daemon
This is because getJavaVersion() spawns a subprocess, which bumps the PID count, which breaks Upstart, the latter which expects Play to fork exactly none, once or twice, depending on which expect stanza you use.
I've fixed this in a pull request.

Script not starting on boot with start-stop-daemon

My script (located in /etc/init.d) is creating a pid file ($PIDFILE), but there is no process running. My daemon script includes:
start-stop-daemon --start --quiet --pidfile $PIDFILE -m -b --startas $DAEMON --test > /dev/null || return 1
The script works fine when executing it manually.
You need to create startup links.
sudo update-rc.d SCRIPT_NAME defaults
then reboot. SCRIPT_NAME is the name of the script in /etc/init.d (Without the path)
Was able to get it working, but tried so many things, don't know exactly what fixed it (probably an error in script or config). However, learned a lot and wanted to share since I can't find much of the same in the internet abyss.
It seems Ubuntu (and many other distros based on Ubuntu, including Mint) has migrated to Upstart for job and service management. Upstart includes SysVinit (using /etc/init.d daemons) compatibility that still can use update-rc.d to manage daemons (so if you are familiar with that usage, you can keep on using it). The Upstart method is to use a single .conf file in the /etc/init folder. My SCRIPT.conf file is very simple (I'm using a python script):
start on filesystem or runlevel [2345]
stop on runlevel [016]
exec python /usr/share/python-support/SCRIPT/SCRIPT.py
This simple file completely replaces the standard script in /etc/init.d with the case statement to provide [start|stop|restart|reload] functions and the pointer to /usr/bin/SCRIPT. You can see that it includes runlevel control that would normally be found in the /etc/rc*.d files (thus eliminating several files).
I tried update-rc.d to create the necessary /etc/rc*.d/ files for my daemon. My daemon bash script is located in /etc/init.d and includes the start-stop-daemon command as in my original question. (That command also works fine from terminal.)
I had /etc/rc*.d/ files, the bash script in /etc/init.d and /etc/init/SCRIPT.conf file during boot and it seems that Upstart likely first looks for the .conf file for its direction because the SysVinit command service SCRIPT [start|stop|restart|reload] returns Unknown Instance, however you can find the process is running with ps -elf | grep SCRIPT_FILE.
One interesting thing to note is the forking of your daemon when using .conf. The script as written above only spawns one fork of the daemon. However, total independence of the original script is possible by using expect fork or expect daemon and respawn (see the Upstart Cookbook for reference). Using these will ensure that your daemon will never be killed (at least by using the kill command).
I continued to test both my daemon and the boot process by utilizing the sudo initctl reload-configuration command. This reloads the conf files where you can test your daemon by the sudo [start|stop|restart] SCRIPT command. The result of the start command is:
$ sudo start SCRIPT
SCRIPT start/running, process xxxx
$ sudo restart SCRIPT
SCRIPT start/running, process xxxx
$ sudo stop SCRIPT
SCRIPT stop/waiting
Also, there is a nice log in /var/log/upstart/SCRIPT.log that gives you useful information for your daemon during boot. Mine still has a very annoying bug that prevents root from displaying osd messages with notify-send from my daemon. My log file includes a gtk warning (I will open another question to solicit help).
Hope this helps others in developing their daemons.

How to dump core of an init spawned process

I am trying to force core dump of a program. Core dumping is enabled via
ulimit -c unlimited
If my program is launched by the init process, and I kill it like this
kill -6 <pid_of_prog>
I can't find the core.
However, if it is launched from a terminal, and I kill it with the above command, then it dumps core. The program chdir to a directory when it is launched, and the core file is found in this directory.
ulimit does not set the limit of already launched process, so my init launched process is not affected by the ulimit command. I guess the correct answer is to use setrlimit

Resources