How do I make sure a process launched by Docker entrypoint can be killed? - linux

I'm building a Docker image that launches a long-running Java process. I want to make sure that it can be killed together with the container (e.g. by using Ctrl+C) yet still perform cleanup.
If I'm using exec java -jar in my entrypoint, it works as expected.
If I'm simply executing java -jar, the process cannot be killed.
However exec makes the container exit even on success, and that is a problem if this command is not the last one in the entrypoint. For example, if some file conversion or cleanup follows, it will not get executed:
exec java -jar "./lib/Saxon-HE-${SAXON_VER}.jar" -s:"$json_xml" -xsl:"$STYLESHEET" base-uri="$base"
rm "$json_xml"
I think the explanation is that using exec the process (java in this case) becomes PID=1 and receives the kill signals, while without exec it gets some other PID and does not receive the signals and therefore can't be killed.
So my question is two-fold:
is there a workaround that allows the process to be killed without exiting the container on success as exec does?
how do I make sure the cleanup after exec (rm in this case) gets executed even if the process is killed/exits?

You could create an entrypoint bash script that traps the CTRL-C signal, kills (or gracefully stops?) the java process and does your cleanup afterwards.
Example (not tested):
#!/bin/bash
# trap ctrl-c and call ctrl_c()
trap ctrl_c INT
function ctrl_c() {
echo "Trapped CTRL-C"
# Do something here. Kill Java?
}
java -jar "./lib/Saxon-HE-${SAXON_VER}.jar" -s:"$json_xml" -xsl:"$STYLESHEET" base-uri="$base"
Add it to your docker image and make it your entrypoint
FROM java:8
ADD docker-entrypoint.sh /docker-entrypoint.sh
ENTRYPOINT ["/docker-entrypoint.sh"]

I use tini. Here is the reference link

You can just build another program which will manage the java child process and the cleanup, for example using the same java, go or rust to write such. I'm sure these languages have a proper process control tools and you can catch the CTRL-C and other events to stop the internal child process and do the cleanup. Probably it would even take less time compared to searching for the tools which will have limited behavior anyways. Also it may be even worth to open source it for such problems.

Related

Docker 'exitpoint' - run a command before a container is `down`ed or `stop`ped?

Why? Because I'm propagating binds to allow a container to mount a union filesystem, and when it exits it leaves its mess behind. fusermount -uz /mount/point cleans it up, so I want that to happen on exit.
Is there any way of providing something like an exit-point or exit command for a Docker container?
I've tried appending ; echo EXITING ; myexitcmd to the entrypoint, the existing command being long-running, but it seems not to run.
This entirely makes sense, since what's running is sh -c "myentrycmd; echo EXITING; myexitcmd", and it's that shell that's getting killed, not myentrycmd within it.
So a solution need not be Docker-specific, I could alternatively phrase my question: How can I catch all 'exit' signals, and finish running my (inline) script first/instead?
I've also tried as an entrypoint:
#!/bin/sh
cleanup() {
echo EXITING
myexitcmd
}
trap 'cleanup' INT
myentrycmd
with STOPSIGNAL SIGINT in the Dockerfile. No cigars there either.
Use minit as an entrypoint. It runs /etc/minit/startup on container startup and /etc/minit/shutdown when a container is stopped.

What happens to other processes when a Docker container's PID1 exits?

Consider the following, which runs sleep 60 in the background and then exits:
$ cat run.sh
sleep 60&
ps
echo Goodbye!!!
$ docker run --rm -v $(pwd)/run.sh:/run.sh ubuntu:16.04 bash /run.sh
PID TTY TIME CMD
1 ? 00:00:00 bash
5 ? 00:00:00 sleep
6 ? 00:00:00 ps
Goodbye!!!
This will start a Docker container, with bash as PID1. It then fork/execs a sleep process, and then bash exits. When the Docker container dies, the sleep process somehow dies too.
My question is: what is the mechanism by which the sleep process is killed? I tried trapping SIGTERM in a child process, and that appears to not get tripped. My presumption is that something (either Docker or the Linux kernel) is sending SIGKILL when shutting down the cgroup the container is using, but I've found no documentation anywhere clarifying this.
EDIT The closest I've come to an explanation is the following quote from baseimage-docker:
If your init process is your app, then it'll probably only shut down itself, not all the other processes in the container. The kernel will then forcefully kill those other processes, not giving them a chance to gracefully shut down, potentially resulting in file corruption, stale temporary files, etc. You really want to shut down all your processes gracefully.
So at least according to this, the implication is that when the container exits, the kernel will sending a SIGKILL to all remaining processes. But I'd still like clarity on how it decides to do that (i.e., is it a feature of cgroups?), and ideally a more authoritative source would be nice.
OK, I seem to have come up with some more solid evidence that this is, in fact, the Linux kernel doing the terminating. In the clone(2) man page, there's this useful section:
CLONE_NEWPID (since Linux 2.6.24)
The first process created in a new namespace (i.e., the process
created using the CLONE_NEWPID flag) has the PID 1, and is the
"init" process for the namespace. Children that are orphaned
within the namespace will be reparented to this process rather than
init(8). Unlike the traditional init process, the "init" process of a
PID namespace can terminate, and if it does, all of the processes in
the namespace are terminated.
Unfortunately this is still vague on how exactly the processes in the namespace are terminated, but perhaps that's because, unlike a normal process exit, no entry is left in the process table. Whatever the case is, it seems clear that:
The kernel itself is killing the other processes
They are not killed in a way that allows them any chance to do cleanup, making it (almost?) identical to a SIGKILL

How to detach all processes from terminal and still get stdout in Docker?

So it's easy to detach applications if you're calling them directly using something like
myapplication &
But what if I want to call myapplication which then forks off 100 mychildapplication processes? Well, apparently the same command still works. I can run it, exit the terminal, and see that the child processes are still there.
It gets complicated when I introduce a Docker container.
If I were to run docker exec -it --user myuser mycontainer sh -c 'source ~/.bashrc; cd mydir; ./myapplication myarg' in a Docker container, the child processes get killed right away. I can hack it by appending a sleep 10000000 but of course then my terminal will hang indefinitely.
I can also use nohup, but then I don't get the stdout. disown does not work because it's not running in the background.
My workaround right now is using jpetazzo:nsenter, but I don't want to type my password when I run the command.
Note: the reason I have all this sudo stuff is because exec doesn't source bashrc. I can hack it by manually sourcing bashrc and it would work. It doesn't really impact what I'm trying to do.
TLDR: I want myapplication to print to my terminal, finish executing, and have the child processes stick around after I exit, all in a Docker container. The only way this can happen is if somehow I can "nohup" all processes associated with my terminal.

Bash script on background: how to kill child processes

Well, I'm basically trying to make a bash script runs a node script forever. I made the following bash script:
#!/bin/bash
while true ; do
cd /myscope/
unlink nohup.out
node myscript.js
sleep 6
done & echo $! > pid
I'm expecting that when it runs, it starts up node with the given script, checks if node exits, sleeps for 6 seconds if so and reopen node. Also, I'm expecting it to run in background and writes it's pid (the bash pid) on a file called "pid".
Everything explained above works as expected, apparently, but I'm also expecting that when the pid of the bash script is killed, the node script would stop running, I don't know why that made sense in my mind, but when it comes to practice, it doesn't work. The bash script is killed indeed, but the node script keeps running and that is freaking me out.
I've tested it in the terminal, by not sending the bash script to the background and entering ctrl+c, both scripts gets killed.
I'm obviously miss understanding something on the way the background process works. For god sake, can anybody help me?
There are lots of tools that let you do what you're trying, just two off the top of my head:
https://github.com/nodejitsu/forever - A simple CLI tool for ensuring that a given script runs continuously (i.e. forever)
https://github.com/remy/nodemon - Monitor for any changes in your node.js application and automatically restart the server - perfect for development
Maybe the second it's not what you're looking for, but still worth a look.
If you can't or don't want to use those then the problem is that if you kill the parent process the child one is still there, so, you should kill that too:
pkill -TERM -P $PID
where $PID is the parent PID.

linux: suspend process at startup

I would like to spawn a process suspended, possibly in the context of another user (e.g. via sudo -u ...), set up some iptables rules for the spawned process, continue running the process, and remove the iptable rules when the process exists.
Is there any standart means (bash, corutils, etc.) that allows me to achieve the above? In particular, how can I spawn a process in a suspended state and get its pid?
Write a wrapper script start-stopped.sh like this:
#!/bin/sh
kill -STOP $$ # suspend myself
# ... until I receive SIGCONT
exec $# # exec argument list
And then call it like:
sudo -u $SOME_USER start-stopped.sh mycommand & # start mycommand in stopped state
MYCOMMAND_PID=$!
setup_iptables $MYCOMMAND_PID # use its PID to setup iptables
sudo -u $SOME_USER kill -CONT $MYCOMMAND_PID # make mycommand continue
wait $MYCOMMAND_PID # wait for its termination
MYCOMMAND_EXIT_STATUS=$?
teardown_iptables # remove iptables rules
report $MYCOMMAND_EXIT_STATUS # report errors, if necessary
All this is overkill, however. You don't need to spawn your process in a suspended state to get the job done. Just make a wrapper script setup_iptables_and_start:
#!/bin/sh
setup_iptables $$ # use my own PID to setup iptables
exec sudo -u $SOME_USER $# # exec'ed command will have same PID
And then call it like
setup_iptables_and_start mycommand || report errors
teardown_iptables
You can write a C wrapper for your program that will do something like this :
fork and print child pid.
In the child, wait for user to press Enter. This puts the child in sleep and you can add the rules with the pid.
Once rules are added, user presses enter. The child runs your original program, either using exec or system.
Will this work?
Edit:
Actually you can do above procedure with a shell script. Try following bash script:
#!/bin/bash
echo "Pid is $$"
echo -n "Press Enter.."
read
exec $#
You can run this as /bin/bash ./run.sh <your command>
One way to do it is to enlist gdb to pause the program at the start of its main function (using the command "break main"). This will guarantee that the process is suspended fast enough (although some initialisation routines can run before main, they probably won't do anything relevant). However, for this you will need debugging information for the program you want to start suspended.
I suggest you try this manually first, see how it works, and then work out how to script what you've done.
Alternatively, it may be possible to constrain the process (if indeed that is what you're trying to do!) without using iptables, using SELinux or a ptrace-based tool like sydbox instead.
I suppose you could write a util yourself that forks, and wherein the child of the fork suspends itself just before doing an exec. Otherwise, consider using an LD_PRELOAD lib to do your 'custom' business.
If you care about making that secure, you should probably look at bigger guns (with chroot, perhaps paravirtualization, user mode linux etc. etc);
Last tip: if you don't mind doing some more coding, the ptrace interface should allow you to do what you describe (since it is used to implement debuggers with)
You probably need the PID of a program you're starting, before that program actually starts running. You could do it like this.
Start a plain script
Force the script to wait
You can probably use suspend which is a bash builitin but in the worst case you can make it stop itself with a signal
Use the PID of the bash process in every way you want
Restart the stopped bash process (SIGCONT) and do an exec - another builtin - starting your real process (it will inherit the PID)

Resources