zombies inside of docker - node.js

i have a docker container, which runs node.js application. This application runs a headless-chrome instance.
All work well, but if i kill chrome-instance, and check runned processes list, i will see 2(actually 3 + 2 cat process) zombie chrome process(defunct) still in system. I know this processes is a children
(of killed parent chrome process), which was not finished correct and attached to init process.
I tried to kill it directly - rejected. Also i tried to spawn chrome with detached:true flag and again kill all child processes directly, when main chrome receives "exit" signal, anyway ps -A | grep chrome shows two defunct to me. Any ideas?
UPD:
Thanks all for help. Adding --init totally solves my issue. Using another base image also works well, but i decided this approach as not neccesary. Also good description of root cause can be found here

larsks pretty much nails the reason, init (or systemd) on linux systems reaps zombie processes when their parent dies. The parent should cleanup its own zombie processes with the wait syscall. However, that automatic cleanup does not pass the namespace boundary of a container. So whatever process you run as your entrypoint, and that becomes pid 1, needs to handle these zombies for you.
With recent versions of docker, you can include an init process just by passing --init to your docker run command. If you are using a version 2.2 compose file, there's an option init: true you can define on your service for the same result.
In addition to dumb-init, there is also tini which is what docker uses under the covers as their own docker-init.

You need a process that will call wait() in order to reap any zombie processes. On a regular system this is handled by /sbin/init, but inside a container you will need to provide your own tooling. If you're developing your own application, consider just calling wait() in a loop periodically.
Alternatively, you can consider a container-specific init such as dumb-init and see if that resolves the problem.

Related

PID 1 for application automatically restarted in bash loop

Given is a microservice that after some time needs to quit itself. This is no error condition but (in that case) normal behavior.
After exiting it should be automatically restarted.
So currently I have a script run_app.sh:
#!/usr/bin/env bash
while true; do ./app ; done
And in the Dockerfile (inheriting FROM ubuntu:16.04) I run it like this:
CMD ["./run_app.sh"]
It "works" but since app does not have PID 1, it does not receive SIGTERM etc. coming from Kubernetes, which is needed for a graceful shutdown during rolling updates etc.
Using while true; do exec ./app ; done in run_app.sh does not solve the problem since the loop no longer exists when app finished and no restart is performed.
How can I restart the app automatically inside the container without restarting the container / pod every time it quits, but still have the advantages of PID 1?
Well, your loop does restart your app, so this is not your problem. Your problem is that the signal sent to the docker container is not propagated into the container. Docker just isn't (AFAIK) not meant to be used like this, so it doesn't propagate signals to its app.
You have two ways of handling this:
You can teach the signal sender (Kubernetes or whatever) to instead of sending a signal to the Docker container do something more elaborate to get the information to the app inside the container. I guess this is not easy (if possible at all).
You can migrate the looping shell script outside of the container and let Kubernetes (or whatever) run this script instead. Within the loop you then can start the Docker container with your app inside. In this case you will need to catch the SIGTERM in the outside looping shell script (help trap) and either send a SIGTERM to the Docker container directly or to the app within the Docker container (using Docker exec or similar).

How to find out why daemon tools supervise is exiting

I run a node process (websocket server) on an AWS instance. I used to start it like this:
node websocket/index.js
But have recently swiched to using daemontools supervise to run this process so that it respawns if it should quit or die for any reason.
So, I now run the process like this (from within the dir): supervise . &. The following is my ./run file:
#!/bin/sh
node websocket/index.js
This generally works well. When I manually kill -9 the Node process to test out it out, Supervise respawns it correctly.
However, every morning when I check in on things, the Node process and the Supervise process are both dead and nowhere to be found in ps. I confirmed that the system is not rebooting by looking at uptime.
How can I find out why the Supervise and Node processes are dying overnight? And how can I prevent this?
Update: I switched to using the node module forever to keep my process running, and it has been great so far.
https://www.npmjs.com/package/forever

docker stop spark container from exiting

I know docker only listens to pid 1 and in case that pid exits (or turns into a daemon) it thinks the program exited and the container is shut down.
When apache-spark is started the ./start-master.sh script how can I kept the container running?
I do not think: while true; do sleep 1000; done is an appropriate solution.
E.g. I used command: sbin/start-master.sh to start the master. But it keeps shutting down.
How to keep it running when started with docker-compose?
As mentioned in "Use of Supervisor in docker", you could use phusion/baseimage-docker as a base image in which you can register scripts as "services".
The my_init script included in that image will take care of the exit signals management.
And the processes launched by start-master.sh would still be running.
Again, that supposes you are building your apache-spark image starting from phusion/baseimage-docker.
As commented by thaJeztah, using an existing image works too: gettyimages/spark/~/dockerfile/. Its default CMD will keep the container running.
Both options are cleaner than relying on a tail -f trick, which won't handle the kill/exit signals gracefully.
Here is another solution.
Create a file spark-env.sh with the following contents and copy it into the spark conf directory.
SPARK_NO_DAEMONIZE=true
If your CMD in the Dockerfile looks like this:
CMD ["/spark/sbin/start-master.sh"]
the container will not exit.
tail -f -n 50 /path/to/spark/logfile
This will keep the container alive and also provide useful info if you run -it interactive mode. You can run -d detached and it will stay alive.

Docker not responding to CTRL+C in terminal

Having an issue with Docker at the moment; I'm using it to run an image that launches an ipython notebook on startup. I'm looking to make some edits to ipython notebook itself, so I need to close it after launch.
However, hitting CTRL+C in the terminal just inputs "^C" as a string. There seems to be no real way of using CTRL+C to actually close the ipython notebook instance.
Would anyone have any clues as to what can cause this, or know of any solutions for it?
Most likely the container image you use is not handling process signals properly.
If you are authoring the image then change it as Roland Webers' answer suggests.
Otherwise try to run it with --init.
docker run -it --init ....
This fixes Ctrl+C for me.
Source: https://docs.docker.com/v17.09/engine/reference/run/#specify-an-init-process
The problem is that Ctrl-C sends a signal to the top-level process inside the container, but that process doesn't necessarily react as you would expect. The top-level process has ID 1 inside the container, which means that it doesn't get the default signal handlers that processes usually have. If the top-level process is a shell, then it can receive the signal through its own handler, but doesn't forward it to the command that is executed within the shell. Details are explained here. In both cases, the docker container acts as if it simply ignores Ctrl-C.
If you're building your own images, the solution is to run a minimal init process, such as tini or dumb-init, as the top-level process inside the container.
This post proposes CTRL-Z as a workaround for sending the process to background and then killing the process by its process id:
Cannot kill Python script with Ctrl-C
Possible problems:
The program catches ctrl-c and does nothing, very unlikely.
There are background processes that are not managed correctly. Only the main process receives the signal and sub-processes hang. Very likely what's happening.
Proposed Solution:
Check the programs documentation on how it's properly started and stopped. ctrl-c seems not to be the proper way.
Wrap the program with a docker-entrypoint.sh bash script that blocks the container process and is able to catch ctrl-c. This bash example should help: https://rimuhosting.com/knowledgebase/linux/misc/trapping-ctrl-c-in-bash
After catching ctrl-c invoke the proper shutdown method for ipython notebook.
From this post on the Docker message boards:
Open a new shell and execute
$ docker ps # get the id of the running container
$ docker stop <container> # kill it (gracefully)
This worked well for me. CTRL-Z, CTRL-\, etc. only came up as strings, but this killed the Docker container and returned the tab to terminal input.
#maybeg's answer already explains very well why this might be happening.
Regarding stopping the unresponsive container, another solution is to simply issue a docker stop <container-id> in another terminal. As opposed to CTRL-C, docker stop does not send a SIGINT but a SIGTERM signal, to which the process might react differently.
Usage: docker stop [OPTIONS] CONTAINER [CONTAINER...]
Stop a running container by sending SIGTERM and then SIGKILL after a grace period
If that fails, use docker kill <container-id> which sends a SIGKILL immediately.

how to automatically restart a node server?

We are finishing development of a project, the client is already using it but occasionally some errors occur - crashing the server.
I know I could register a service as 'upstart' script on linux, in order to have my node service restart when it crashes.
But our server is running other stuff, so we can't restart it.
Well, actually, while writing, I realize I have two questions then:
Will 'upstart' work without having to reboot? Something is just whispering yes to me :)
If not, what other option would I have to 'respawn' my node server when it crashes?
Yes, upstart will restart your process without a reboot.
Also, you should look into forever.
PM2 is a Production process manager for Node.js app.
If your focus for automatic restart is an always running application, I suggest to use a process manager. Process manager, in general, handles the node process(es if cluster enabled), and is responsible for the process/es execution. PM leans on the operative system: your node app and the OS are not so strinctly chained because the pm is in the middle.Final trick: put the process manager on upstart. Here is a complete performance improvement path to follow.
Using a shared server and not having root privileges, I can't download or install any of the previously mentioned libraries. What I am able to do is use a simple infinite bash loop to solve my issue. First, I created the file ./startup.sh in the base directory ($ vim startup.sh):
#!/bin/bash
while:
do
node ./dist/sophisticatedPrimate/server/main.js
done
Then I run it with:
$ bash startup.sh
and it works fine. There is a downside to this, which is that is doesn't have a graceful way to end the loop (at least not once I exit the server). What I ended up doing is simply finding the process with:
$ ps aux | grep startup.sh
Then killing it with
$ kill <process id>
example
$ kill 555555

Resources