supervisord not killing all spawned node processes on stop command - node.js

I encountered something weird when deploying a new service with supervisord. These are the relevant parts:
# supervisord.conf
[program:express]
command=yarn re-express-start
# package.json
{
"scripts": {
"re-express-start": "node lib/js/client/Express.bs.js",
}
}
When I run supervisorctl start, the node server is started as expected. But after I run supervisorctl stop, the server keeps on running even though supervisor thinks it's been killed.
If I change the supervisord.conf file to execute node lib/js/client/Express.bs.js directly (without going through yarn), then this works as expected. But I want to go through the package.json-defined script.
I looked into how the process tree looks like and but I don't quite understand why. Below are the processes before and after stopping the supervisord-managed service.
$ ps aux | grep node
user 12785 1.4 3.5 846404 72912 ? Sl 16:30 0:00 node /usr/bin/yarn re-express-start
user 12796 0.0 0.0 4516 708 ? S 16:30 0:00 /bin/sh -c node lib/js/client/Express.bs.js
user 12797 5.2 2.7 697648 56384 ? Sl 16:30 0:00 /usr/bin/node lib/js/client/Express.bs.js
root 12830 0.0 0.0 14216 1004 pts/1 S+ 16:30 0:00 grep --color=auto node
$ pstree -c -l -p -s 12785
systemd(1)───supervisord(7153)───node(12785)─┬─sh(12796)───node(12797)─┬─{node}(12798)
│ └─{node}(12807)
├─{node}(12786)
└─{node}(12795)
$ supervisorctl stop express
$ ps aux | grep node
user 12797 0.7 2.7 697648 56384 ? Sl 16:30 0:00 /usr/bin/node lib/js/client/Express.bs.js
root 12975 0.0 0.0 14216 980 pts/1 S+ 16:32 0:00 grep --color=auto node
$ pstree -c -l -p -s 12797
systemd(1)───node(12797)─┬─{node}(12798)
└─{node}(12807)
$ kill 12797
$ ps aux | grep node
root 13426 0.0 0.0 14216 976 pts/1 S+ 16:37 0:00 grep --color=auto node
From the above, the "actual" workload process doing the server stuff has PID 12797. It is spawned by the supervisor process and nested under a few more.
Stopping the supervisor process stops the processes with PIDs 12785 and 12796, but not the 12797 which is actually reattached to the init process.
Any ideas on what is happening here? Is this due to something ignoring some SIGxxx signals? I assume it's the yarn invocation somehow eating those,
but I don't know how and how to reconfigure.

I ran into this issue as well when I was running a node Express app. The problem seemed to be that I was having supervisor call npm start which refers to the package.json start script. That script simply calls node app.js. The solution seemed to be to directly call that command from the supervisor config file like so:
[program:node]
...
command=node app.js
...
stopasgroup=true
stopsignal=QUIT
In addition, I added stopasgroup and changed the stopsignal to QUIT. The stopsignal seemed to be required in order to properly kill the process.
I can now freely call supervisorctl restart node:node_00 without having any ERROR (spawn error) errors.

Related

Why does trying to kill a process in Docker container take me out of it?

I have a v6.10.0 Node server on my macOS that is automatically started from the CMD in the Dockerfile. Normally in my local development un-containerized environment I will use CTRL+C to kill the server. Not being able to (or not knowing how to) do this in the container, I resort to ps aux | grep node to try to manually kill the processes. So, I get something like this:
myapp [master] :> kubectl exec -it web-3127363242-xb50k bash
root#web-3127363242-xb50k:/usr/src/app# ps aux | grep node
root 15 0.4 0.9 883000 35804 ? Sl 05:49 0:00 node /usr/src/app/node_modules/.bin/concurrent --kill-others npm run start-prod npm run start-prod-api
root 43 0.1 0.6 743636 25240 ? Sl 05:49 0:00 node /usr/src/app/node_modules/.bin/better-npm-run start-prod
root 44 0.1 0.6 743636 25140 ? Sl 05:49 0:00 node /usr/src/app/node_modules/.bin/better-npm-run start-prod-api
root 55 0.0 0.0 4356 740 ? S 05:49 0:00 sh -c node ./bin/server.js
root 56 0.0 0.0 4356 820 ? S 05:49 0:00 sh -c node ./bin/api.js
root 57 18.6 4.9 1018088 189416 ? Sl 05:49 0:08 node ./bin/server.js
root 58 13.9 5.2 1343296 197576 ? Sl 05:49 0:06 node ./bin/api.js
root 77 0.0 0.0 11128 1024 ? S+ 05:50 0:00 grep node
When I try to kill one of them by
kill -9 15
I am taken out of my container's shell and back to my computer's shell. When I enter the container again, I see that the process is still there with the same process id. This example uses a Kubernetes pod but I believe I have the same result with entering a Docker container using the docker exec command.
Every docker container has an ENTRYPOINT that will either be set in the dockerfile, using ENTRYPOINTor CMD declarations, or specified in the run command docker run myimage:tag "entrypoint_command". When the ENTRYPOINT process is killed, I think the container gets killed as well. The docker exec, as I understand it, is kind of like "attaching" command to a container. But if the ENTRYPOINT is down there is no container to attach to.
Kubernetes will restart a container after failure as far as I understand it. Which might be the reason you see the process is back up. I haven't really worked with Kubernetes but I'd try and play around with the way that the replications are scaled to terminate your process.
Containers isolate your desired app as pid 1 inside the namespace. The desired app being your entrypoint or cmd if you don't have an entrypoint defined. If killing a process results in pid 1 exiting, the container will immediately stop (similar to killing pid 1 on a linux host) along with killing all of the other pids. If this container has a restart policy, it will be restarted and the processes will get the same pids as last time it ran (all else being equal which it often is inside of a container).
To keep the container from stopping, you'll need to adjust your entrypoint to remain up even with the child process being killed. That side, having the container exit is typically a preferred behavior to handle unexpected errors by getting back to a clean state.

Why does this program hang on exit? (interaction between signals and sudo)

I am debugging a legacy program (on Linux). To synchronise it with another process I tried naively adding a raise(SIGSTOP). However when run under sudo I get a defunct (zombie) process and a hung terminal. Can someone explain what is happening here and how it can be avoided.
I've reduced the problem to the following simple C program (selfstop.c):
#include <signal.h>
#include <stdio.h>
int main(void)
{
printf("about to stop\n");
(void)raise(SIGSTOP);
printf("resumed\n");
return 0;
}
If run as normal it displays "about to stop" and halts itself with SIGSTOP.
kill -18 <pid> causes it to display "resumed" and exit as desired.
However, if I run it under sudo i.e.
sudo ./selfstop
in another terminal:
sudo kill -18 <pid>
It displays "resumed" and returns control to the terminal but I am left with a defunct process:
>ps aux | grep [s]elf
root 7619 0.0 0.0 215476 4136 pts/4 T 18:16 0:00 sudo ./selfstop
root 7623 0.0 0.0 0 0 pts/4 Z 18:16 0:00 [selfstop] <defunct>
Things get worse if the program is run in a script (runselfstop):
#!/bin/sh
sudo ./selfstop
Now when the process exits it hangs the terminal.
In both cases normal service is resumed by killing the sudo process (in this case "7619 = sudo ./selfstop":
sudo kill -9 7619
My question is why do we get the zombie and how do we avoid it.
Note: The reason for using sudo is irrelevant here. It relates to the legacy application.
sudo will suspend itself if the command it's running suspends itself. This allows you to, for example, run sudo -s to start a shell, then type suspend in that shell to get back to your top-level shell. If you have the source code for sudo, you can look at the suspend_parent function to see how this is done.
When sudo (or any process) has been suspended, the only way to resume it is to send it a SIGCONT signal. Sending SIGCONT to the selfstop process won't do that.
>ps aux | grep [s]elf
root 7619 0.0 0.0 215476 4136 pts/4 T 18:16 0:00 sudo ./selfstop
root 7623 0.0 0.0 0 0 pts/4 Z 18:16 0:00 [selfstop] <defunct>
That indicates that selfstop has exited but hasn't yet been waited for by its parent. It will remain a zombie until sudo is either resumed or killed.
How can you work around this? sudo and selfstop will be in the same process group (unless selfstop does something to change that). So you could send SIGCONT to sudo's process group, which will resume both processes, by doing kill -CONT -the-pid-of-sudo (note the minus sign before the pid to denote a pgrp).

ERROR : invalid PID number "" in "/run/nginx.pid"

My nginx is not starting on 80 port.
I have added the following details:
$ nginx -s reload
2016/03/23 16:11:27 [error] 24992#0: invalid PID number "" in "/run/nginx.pid"
$ ps -ef | grep nginx
root 25057 2840 0 16:16 pts/1 00:00:00 grep --color=auto nginx
$ kill -9 25057
bash: kill: (25057) - No such process
$ service nginx start
Nothing..
Please provide solution..
Trying to run nginx -s reload without first starting nginx will result in an error because nginx will look for the file containing it's master pid when you tell it to restart. In your case it seems that nginx wasn't running, so the file containing that id doesn't exist.
By running kill -9 25057 you tried to kill your own command ps -ef | grep nginx which no longer existed, so you got "No such process".
To make sure all is well I would stop nginx with nginx -s stop then start it with nginx followed by nginx -s reload to check that all is well. In any case the log file might tell you if something bad is going on /var/log/nginx/error.log.
If that works, you can try accessing http://localhost:80 or however you have configured nginx, and also follow the error log, and access log /var/log/nginx/error.log
As a sidenote: If this by any chance happens to be a case where nginx is reloaded by some other tool like confd, you should also check if nginx actually stores it's pid in /run/nginx.pid as opposed to /var/run/nginx/nginx.pid.
Let's talk about what we have here first:
$ nginx -s reload
2016/03/23 16:11:27 [error] 24992#0: invalid PID number "" in "/run/nginx.pid"
It's probably because the /run/nginx.pid file is empty, that causes issues with stop|start|restart commands, so you have to edit it by sudo and put there PID of your current running nginx service (master process). Now, let's have a look at the next lines, which are connected with.
$ ps -ef | grep nginx
root 25057 2840 0 16:16 pts/1 00:00:00 grep --color=auto nginx
$ kill -9 25057
bash: kill: (25057) - No such process
You're trying here to kill NOT a main process of the nginx. First try to run the following command to see the pids of an nginx master process and his worker:
$ ps -aux | grep "nginx"
root 17711 0.0 0.3 126416 6632 ? Ss 18:29 0:00 nginx: master process nginx -c /etc/nginx/nginx.conf
www-data 17857 0.0 0.2 126732 5588 ? S 18:32 0:00 nginx: worker process
ubuntu 18264 0.0 0.0 12916 984 pts/0 S+ 18:51 0:00 grep --color=auto nginx
Next, kill both:
$ sudo kill -9 17711
$ sudo kill -9 17857
and then try to run an nginx again.
$ service nginx start
Nothing..
Have nothing to say here ;)
Summary:
I think editing the /run/nginx.pid file with an nginx master process PID should solve the issue. So according to my example above, this file should looks like this:
17711
Hope that helps!
I have this problem. I restart the nginx.service and it fixed.
Run sudo systemctl restart nginx.service and then run sudo nginx -s reload in ubuntu.
ps -ef | grep nginx | grep root | grep -v grep | awk '{ print $2 }' > /run/nginx.pid
nginx -s reload

Why the command in /root/.bash_profile start twice?

Here is my /root/.bash_profile:
export DISPLAY=:42 && cd /home/df/SimulatedRpu-ex/bin && ./SimulatedRpu-V1 &
When I start my server,I run ps aux | grep SimulatedRpu and here is the output:
root 2758 0.2 1.0 62316 20416 ? Sl 14:35 0:00 ./SimulatedRpu-V1
root 3197 0.5 0.9 61428 19912 pts/0 Sl 14:35 0:00 ./SimulatedRpu-V1
root 3314 0.0 0.0 5112 716 pts/0 S+ 14:35 0:00 grep SimulatedRpu
So,the program print error message about the port is already used.
But why the command in /root/.bash_profile start twice?
Please help me,thank you!By the way,I use Redhat Enterprise 5.5
The profile is read every time you log in. So just by logging in to run the ps aux | grep SimulatedRpu, you run the profile once more and thus start a new process.
You should put the command into an init script instead.
[EDIT] You should also run Xvnc in the same script - that way, you can start and stop the display server together with your app.
Try it like
if ! ps aux | grep '[S]imulateRpu'; then
export DISPLAY=:42 && cd /home/df/SimulatedRpu-ex/bin && ./SimulatedRpu-V1 &
fi;
This way it will first check if if the application is not running yet. The [] around the S are to prevent grep from finding itself ;)

"forever list" says "No forever process running" but it is running an app

I have started an app with
forever start app.js
After that I typed,
forever list
and it shows that
The "sys" module is now called "util". It should have a similar interface.
info: No forever processes running
But I checked my processes with
ps aux | grep node
and it shows that
root 1184 0.1 1.5 642916 9672 ? Ss 05:37 0:00 node
/usr/local/bin/forever start app.js
root 1185 0.1 2.1 641408 13200 ? Sl 05:37 0:00 node
/var/www/app.js
ubuntu 1217 0.0 0.1 7928 1060 pts/0 S+ 05:41 0:00 grep --color=auto node
I cannot control over the process, since I cannot list the process in "forever list"
How can I let "Forever" knowing its running processes and let having control over its running processes.
forever list should be invoked with same user as that of processes.
Generally it is root user (in case of ubuntu upstart unless specified) so you can switch to root user using sudo su and then try forever list.
PS. Moved to pm2 recently which has lot more features than forever.
i had the same problem today.
in my case: i'm using NVM and forgot that it doesn't set/modify the global node path, so i had to set it manually
export NODE_PATH="/root/.nvm/v0.6.0/bin/node"
If you exec the forever start app.js within init.d you should later type sudo HOME=/home/pi/devel/web-app -u root forever list to have the correct list.
A fix would be great for this.
Encountered this one as well.
I believe an this issue was logged here.
What I could recommend for now is to find the process that's using your node port e.g. 3000. Like so:
sudo lsof -t -i:3000
That command will show the process id.
Then kill the process by performing:
kill PID
sudo su
forever list
This will output the correct list (processes started by root user).

Resources