I'm having an issue where upstart is respawning a Node.js (v0.8.8) process that is completely healthy. I'm on Ubunut 11.10. When I run the program from the command line it is completely stable and does not crash. But, when I run it with upstart, it gets respawned pretty consistently every few seconds. I'm not sure what is going on and none of logs seem to help. In fact, there are no error messages produced any of the upstart logs for the job. Below is my upstart script:
#!upstart
description "server.js"
start on (local-filesystems and net-device-up IFACE=eth0)
stop on shutdown
# Automtically respawn
respawn # restart when job dies
respawn limit 99 5 # give up restart after 99 respawns in 5 seconds
script
export HOME="/home/www-data"
exec sudo -u www-data NODE_ENV="production" /usr/local/bin/node /var/www/server/current/server.js >> /var/log/node.log 2>> /var/log/node.error.log
end script
post-start script
echo "server-2 has started!"
end script
The strange thing is that server-1 works perfectly fine and is setup the same way.
syslog messages look like this:
Sep 24 15:40:28 domU-xx-xx-xx-xx-xx-xx kernel: [5272182.027977] init: server-2 main process (3638) terminated with status 1
Sep 24 15:40:35 domU-xx-xx-xx-xx-xx-xx kernel: [5272189.039308] init: server-2 main process (3647) terminated with status 1
Sep 24 15:40:42 domU-xx-xx-xx-xx-xx-xx kernel: [5272196.050805] init: server-2 main process (3656) terminated with status 1
Sep 24 15:40:49 domU-xx-xx-xx-xx-xx-xx kernel: [5272203.064022] init: server-2 main process (3665) terminated with status 1
Any help would be appreciated. Thanks.
Ok, seems that it was actually monit that was restarting it. Problem has been solved. Thanks.
Related
I'm on Debian and I have a systemd service that calls a bash script.
The script contains an infinite while loop, as I need it to check something every X seconds infinitely.
The systemd service crashes once it hits the "while true; do" line.
The script runs fine if I execute it manually.
Why doesn't systemd like it? What do I do?
Here are the service and the script. As I've indicated, an echo statement before the "while true; do" prints. The echo statement after the "while true; do" line does not print.
/etc/systemd/system/stream.service:
[Service]
WorkingDirectory=/home/pi/
ExecStart=/home/pi/joi_main.sh
Restart=no
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=stream_service
User=pi
Group=pi
Environment=NODE_ENV=production
[Install]
WantedBy=multi-user.target
/home/pi/joi_main.sh:
#!/bin/bash -e
today=`/bin/date '+%Y_%m_%d__%H_%M_%S'`
exec 2> "/home/pi/stream_logs/$today.$RANDOM.log"
exec 1>&2
#Wait 120s for system to finish booting
sleep 120
#Initial config
export AUDIODEV=mic_mono
export AUDIODRIVER=alsa
sudo sysctl fs.pipe-max-size=1048576
echo "This line prints"
# Check if video buffer is full every minute. if full, the stream needs to restart
while true; do
echo "This line doesn't"
if grep "100% full" /home/pi/video_buffer_usage.txt; then
echo "Buffer is full!"
# Kill existing processes
pkill -f “raspivid|rec|buffer|ffmpeg”
# Wait 10s
sleep 10
./joi_stream.sh &
fi
sleep 60
done
Journalctl seems completely unhelpful, but here it is. No errors. Why is "session closed"?
Mar 31 02:13:41 raspberrypi sudo[1369]: pi : TTY=unknown ; PWD=/home/pi ; USER=root ; COMMAND=/sbin/sysctl fs.pipe-max-size=1048576
Mar 31 02:13:41 raspberrypi sudo[1369]: pam_unix(sudo:session): session opened for user root by (uid=0)
Mar 31 02:13:41 raspberrypi sudo[1369]: pam_unix(sudo:session): session closed for user root
(Please don't tell me to start yet another systemd service for just this while loop. I want it to be part of this main script because it needs to run after everything else, and if I turn off the main service I don't want the while loop running either, so maintaining two systemd services would only add troube.)
The contents of ./joi_stream.sh were not shared, but here's a problem I see with your systemd solution. It doesn't directly explain your behavior, but may be related:
In your systemd configuration, you redirect both STDOUT and STDERR to syslog, but in your script, you redirect STDERR (file descriptor "2") to a file, and redirect STDOUT (file descriptor "1') to STDERR.
exec 2> "/home/pi/stream_logs/$today.$RANDOM.log"
exec 1>&2
If your ./joi_stream.sh expected your redirection of these file descriptors to another file to work, it may not. If the file is just for logging, I would get rid of these lines and let the systemd journal handle that-- it will tag the logs with your unit you can review your logs specifically:
journalctl -u your-unit-name.service
Also, in systemd, you wouldn't normally put in a sleep to wait until the systemd has booted. Instead, you would use a .timer unit.
The .timer file would instruct to run the main logic every minute, so the "while" loop would not be required. The timer unit would contain directives like:
# Run for the first time 2 minutes after boot
# and every minute after that
OnBootSec=120
OnUnitActiveSec=60
It would be timer unit which is enabled to start on boot. Timer files can be super-simple. Just create a .timer file in /etc/systemd/system and give it the same name as the service file you want it to activate:
[Unit]
Description=Runs my service every minute
[Timer]
# Run for the first time 2 minutes after boot
# and every minute after that
OnBootSec=120
OnUnitActiveSec=60
[Install]
WantedBy=timers.target
To start and test your timer immediately, run:
sudo systemctl start my-service.timer
You can review the status of timers with:
sudo systemctl list-timers
The systemd solution is more robust than the rc.local solution. If your rc.local solution dies for any reason, it will not restart. However, if your script dies will run under systemd, the timer will still run it again a minute later.
FYI, everything works if I call /home/pi/joi_main.sh from /etc/rc.local instead of using a systemd service. I'll use rc.local and kill the service.
I would like to start an interactive script from systemd after the getty.target has been reached. This works so far, however, systemd kills the script after a couple of seconds. The systemd unit looks like the following:
[Unit]
Description = Some interactive script
Requires = getty#tty1.service
After = getty#tty1.service
[Service]
Type = oneshot
ExecStart = /usr/local/bin/my-script
StandardInput = tty
StandardOutput = tty
TTYPath = /dev/tty1
TTYReset = yes
TTYVHangup = yes
[Install]
WantedBy = multi-user.target
Within the script there are calls to dialog, mount etc. Nothing very special but it is an interactive script. Systemd keeps killing the script and I don't understand why. The output of systemctl status interactive-script.service looks like:
● interactive-script.service - Some interactive script
Loaded: loaded (/etc/systemd/system/interactive-script.service; enabled)
Active: inactive (dead) since Tue 2016-06-28 10:18:07 UTC; 14min ago
Main PID: 364 (code=killed, signal=HUP)
And the log output gotten with journalctl -b -u interactive-script.service is empty:
-- Logs begin at Mon 2015-11-09 11:49:52 UTC, end at Tue 2016-06-28 10:30:28 UTC. --
I already tried to add KillMode=none, no luck. Then I tried TimeoutStartSec=infinity - systemd complains that it doesn't understand it therefore I tried to set it to 10000 but the script gets killed after just some seconds. I tried to run it as Type=simple and Type=forking, all to no avail.
The point is that starting the script seems to work fine (the dialogs appear) but systemd keeps killing the script. How can I achieve that systemd does not kill this interactive script?
I have an application that after it's finished and exited normally should not be restarted. After this app has done its business I'd like to shutdown the instance (ec2). I was thinking of doing this using systemd unit files with the options
Restart=on-failure
ExecStopPost=/path/to/script.sh
The script that should run on ExecStopPost:
#!/usr/bin/env bash
# sleep 1; adding sleep didn't help
# this always comes out deactivating
service_status=$(systemctl is-failed app-importer)
# could also do the other way round and check for failed
if [ $service_status = "inactive" ]
then
echo "Service exited normally: $service_status . Shutting down..."
#shutdown -t 5
else
echo "Service did not exit normally - $service_status"
fi
exit 0
The problem is that when post stop runs I can't seem to detect whether the service ended normally or not, the status then is deactivating, only after do I know if it enters a failed state or not.
Your problem is that systemd considers the service to be deactivating until the ExecPostStop process finishes. Putting sleeps in doesn't help since it's just going to wait longer. The idea for an ExecPostStop was to clean up anything the service might leave behind, like temp files, UNIX sockets, etc. The service is not done, and ready to start again, until the cleanup is finished. So what systemd is doing does make sense if you look at it that way.
What you should do is check $SERVICE_RESULT, $EXIT_CODE and/or $EXIT_STATUS in your script, which will tell you how the service stopped. Example:
#!/bin/sh
echo running exec post script | logger
systemctl is-failed foobar.service | logger
echo $SERVICE_RESULT, $EXIT_CODE and $EXIT_STATUS | logger
When service is allowed to to run to completion:
Sep 17 05:58:14 systemd[1]: Started foobar.
Sep 17 05:58:17 root[1663]: foobar service will now exit
Sep 17 05:58:17 root[1669]: running exec post script
Sep 17 05:58:17 root[1671]: deactivating
Sep 17 05:58:17 root[1673]: success, exited and 0
And when the service is stopped before it finishes:
Sep 17 05:57:22 systemd[1]: Started foobar.
Sep 17 05:57:24 systemd[1]: Stopping foobar...
Sep 17 05:57:24 root[1643]: running exec post script
Sep 17 05:57:24 root[1645]: deactivating
Sep 17 05:57:24 root[1647]: success, killed and TERM
Sep 17 05:57:24 systemd[1]: Stopped foobar.
I have a node.js server app which is being started twice for some reason. I have a cronjob that runs every minute, checking for a node main.js process and if not found, starting it. The cron looks like this:
* * * * * ~/startmain.sh >> startmain.log 2>&1
And the startmain.sh file looks like this:
if ps -ef | grep -v grep | grep "node main.js" > /dev/null
then
echo "`date` Server is running."
else
echo "`date` Server is not running! Starting..."
sudo node main.js > main.log
fi
The log file storing the output of startmain.js shows this:
Fri Aug 8 19:22:00 UTC 2014 Server is running.
Fri Aug 8 19:23:00 UTC 2014 Server is running.
Fri Aug 8 19:24:00 UTC 2014 Server is not running! Starting...
Fri Aug 8 19:25:00 UTC 2014 Server is running.
Fri Aug 8 19:26:00 UTC 2014 Server is running.
Fri Aug 8 19:27:00 UTC 2014 Server is running.
That is what I expect, but when I look at processes, it seems that two are running. One under sudo and one without. Check out the top two processes:
$ ps -ef | grep node
root 99240 99232 0 19:24:01 ? 0:01 node main.js
root 99232 5664 0 19:24:01 ? 0:00 sudo node main.js
admin 2777 87580 0 19:37:41 pts/1 0:00 grep node
Indeed, when I look at the application logs, I see startup entries happening in duplicate. To kill these processes, I have to use sudo, even for the process that does not start with sudo. When I kill one of these, the other one dies too.
Any idea why I am kicking off two processes?
First, you are starting your node main.js application with sudo in the script startmain.sh. According to sudo man page:
When sudo runs a command, it calls fork(2), sets up the execution environment as described above, and calls the execve system call in the child process. The main sudo process waits until the command has completed, then passes the command's exit status to the security policy's close method and exits.
So, in your case the process with name sudo node main.js is the sudo command itself and the process node main.js is the node.js app. You can easily verify this - run ps auxfw and you will see that the sudo node main.js process is the parent process for node main.js.
Another way to verify this is to run lsof -p [process id] and see that the txt part for the process sudo node main.js states /usr/bin/sudo while the txt part of the process node main.js will display the path to your node binary.
The bottom line is that you should not worry that your node.js app starts twice.
So the configuration file for monitoring gearman server is:
set logfile /var/log/monit.log
check process gearmand with pidfile /var/run/gearmand.pid
start program = "sudo gearmand --pid-file=/var/run/gearmand.pid"
stop program = "sudo kill all gearmand"
if failed port 4730 protocol http then restart
from monit.log
[EST Nov 26 19:42:39] info : 'gearmand' start: sudo
[EST Nov 26 19:42:39] error : Error: Could not execute sudo
[EST Nov 26 19:43:09] error : 'gearmand' failed to start
but Monit says that process failed to start. Does anyone know how to make it work? Thanks in advance.
check process gearman_daemon with pidfile /var/run/gearmand/gearmand.pid
start program = "/bin/bash -c '/usr/sbin/gearmand -d --job-retries 3 --log-file /var/log/gearmand/gearmand.log --pid-file /var/run/gearmand/gearmand.pid --queue-type libsqlite3 --libsqlite3-db /var/tmp/gearman-queue.sqlite3'"
stop program = "/bin/bash -c '/bin/killall gearmand'"