Can an unprivileged docker container be paused from the inside? - linux

Is there a simple way I can completely pause an unprivileged docker container from the inside while retaining the ability to unpause/exec it from the outside?

TL;DR;
On a linux container, the answer is definitely, yes, because those two are equivalent:
From host:
docker pause [container-id]
From the container:
kill -SIGSTOP [process(es)-id]
or, even shorter
kill -SIGSTOP -1
Mind that:
If your process ID, or PID is 1, then you fall under an edge case, because PID 1, the init process, do have a specific meaning and behaviour in Linux.
Some processes might spawn child worker, as the NGINX example below.
And those two are also equivalent:
From host:
docker unpause [container-id]
From the container:
kill -SIGCONT [process(es)-id]
or, even shorter
kill -SIGCONT -1
Also mind that, in some edges cases, this won't work. The edge cases being that your process is meant to catch those two signals, SIGSTOP and SIGCONT, and ignore them.
In those cases, you will have to
either, be a privileged user, because the use of the cgroup freezer is under a folder, that is per default, read only in Docker, and probably this will end you in a dead end, because you will not be able to jump in the container anymore.
or, run your container with the flag --init so the PID 1 will just be a wrapper process initialised by Docker and you won't need to pause it anymore in order to pause the processes running inside your container.
You can use the --init flag to indicate that an init process should be used as the PID 1 in the container. Specifying an init process ensures the usual responsibilities of an init system, such as reaping zombie processes, are performed inside the created container.
The default init process used is the first docker-init executable found in the system path of the Docker daemon process. This docker-init binary, included in the default installation, is backed by tini.
This is definitely possible for Linux containers, and is explained, somehow, in the documentation, where they point out that running docker pause [container-id] just means that Docker will use an equivalent mechanism to sending the SIGSTOP signal to the process run in your container.
The docker pause command suspends all processes in the specified containers. On Linux, this uses the freezer cgroup. Traditionally, when suspending a process the SIGSTOP signal is used, which is observable by the process being suspended. With the freezer cgroup the process is unaware, and unable to capture, that it is being suspended, and subsequently resumed. On Windows, only Hyper-V containers can be paused.
See the freezer cgroup documentation for further details.
Source: https://docs.docker.com/engine/reference/commandline/pause/
Here would be an example on an NGINX Alpine container:
### For now, we are on the host machine
$ docker run -p 8080:80 -d nginx:alpine
f444eaf8464e30c18f7f83bb0d1bd07b48d0d99f9d9e588b2bd77659db520524
### Testing if NGINX answers, successful
$ curl -I -m 1 http://localhost:8080/
HTTP/1.1 200 OK
Server: nginx/1.19.0
Date: Sun, 28 Jun 2020 11:49:33 GMT
Content-Type: text/html
Content-Length: 612
Last-Modified: Tue, 26 May 2020 15:37:18 GMT
Connection: keep-alive
ETag: "5ecd37ae-264"
Accept-Ranges: bytes
### Jumping into the container
$ docker exec -ti f7a2be0e230b9f7937d90954ef03502993857c5081ab20ed9a943a35687fbca4 ash
### This is the container, now, let's see the processes running
/ # ps -o pid,vsz,rss,tty,stat,time,ruser,args
PID VSZ RSS TT STAT TIME RUSER COMMAND
1 6000 4536 ? S 0:00 root nginx: master process nginx -g daemon off;
29 6440 1828 ? S 0:00 nginx nginx: worker process
30 6440 1828 ? S 0:00 nginx nginx: worker process
31 6440 1828 ? S 0:00 nginx nginx: worker process
32 6440 1828 ? S 0:00 nginx nginx: worker process
49 1648 1052 136,0 S 0:00 root ash
55 1576 4 136,0 R 0:00 root ps -o pid,vsz,rss,tty,stat,time,ruser,args
### Now let's send the SIGSTOP signal to the workers of NGINX, as docker pause would do
/ # kill -SIGSTOP 29 30 31 32
### Running ps again just to observer the T (stopped) state of the processes
/ # ps -o pid,vsz,rss,tty,stat,time,ruser,args
PID VSZ RSS TT STAT TIME RUSER COMMAND
1 6000 4536 ? S 0:00 root nginx: master process nginx -g daemon off;
29 6440 1828 ? T 0:00 nginx nginx: worker process
30 6440 1828 ? T 0:00 nginx nginx: worker process
31 6440 1828 ? T 0:00 nginx nginx: worker process
32 6440 1828 ? T 0:00 nginx nginx: worker process
57 1648 1052 136,0 S 0:00 root ash
63 1576 4 136,0 R 0:00 root ps -o pid,vsz,rss,tty,stat,time,ruser,args
/ # exit
### Back on the host to confirm NGINX doesn't answer anymore
$ curl -I -m 1 http://localhost:8080/
curl: (28) Operation timed out after 1000 milliseconds with 0 bytes received
$ docker exec -ti f7a2be0e230b9f7937d90954ef03502993857c5081ab20ed9a943a35687fbca4 ash
### Sending the SIGCONT signal as docker unpause would do
/ # kill -SIGCONT 29 30 31 32
/ # ps -o pid,vsz,rss,tty,stat,time,ruser,args
PID VSZ RSS TT STAT TIME RUSER COMMAND
1 6000 4536 ? S 0:00 root nginx: master process nginx -g daemon off;
29 6440 1828 ? S 0:00 nginx nginx: worker process
30 6440 1828 ? S 0:00 nginx nginx: worker process
31 6440 1828 ? S 0:00 nginx nginx: worker process
32 6440 1828 ? S 0:00 nginx nginx: worker process
57 1648 1052 136,0 S 0:00 root ash
62 1576 4 136,0 R 0:00 root ps -o pid,vsz,rss,tty,stat,time,ruser,args 29 30 31 32
/ # exit
### Back on the host to confirm NGINX is back
$ curl -I http://localhost:8080/
HTTP/1.1 200 OK
Server: nginx/1.19.0
Date: Sun, 28 Jun 2020 11:56:23 GMT
Content-Type: text/html
Content-Length: 612
Last-Modified: Tue, 26 May 2020 15:37:18 GMT
Connection: keep-alive
ETag: "5ecd37ae-264"
Accept-Ranges: bytes
For the cases where the meaningful process is the PID 1 and so, is protected by the Linux kernel, you might want to try the --init flag at the run of your container so Docker will create a wrapper process that will be able to pass the signal to your application.
$ docker run -p 8080:80 -d --init nginx:alpine
e61e9158b2aab95007b97aa50bc77fff6b5c15cf3b16aa20a486891724bec6e9
$ docker exec -ti e61e9158b2aab95007b97aa50bc77fff6b5c15cf3b16aa20a486891724bec6e9 ash
/ # ps -o pid,vsz,rss,tty,stat,time,ruser,args
PID VSZ RSS TT STAT TIME RUSER COMMAND
1 1052 4 ? S 0:00 root /sbin/docker-init -- /docker-entrypoint.sh nginx -g daemon off;
7 6000 4320 ? S 0:00 root nginx: master process nginx -g daemon off;
31 6440 1820 ? S 0:00 nginx nginx: worker process
32 6440 1820 ? S 0:00 nginx nginx: worker process
33 6440 1820 ? S 0:00 nginx nginx: worker process
34 6440 1820 ? S 0:00 nginx nginx: worker process
35 1648 4 136,0 S 0:00 root ash
40 1576 4 136,0 R 0:00 root ps -o pid,vsz,rss,tty,stat,time,ruser,args
See how nginx: master process nginx -g daemon off; that was PID 1 in the previous use case became PID 7, now?
This ables us to kill -SIGSTOP -1 and be sure all meaningful processes are stopped, still we won't be locked out of the container.
While digging on this, I found this blog post that seems like a good read on the topic: https://major.io/2009/06/15/two-great-signals-sigstop-and-sigcont/
Also related it the ps manual page extract about process state code:
Here are the different values that the s, stat and state output
specifiers (header "STAT" or "S") will display to describe the state
of a process:
D uninterruptible sleep (usually IO)
I Idle kernel thread
R running or runnable (on run queue)
S interruptible sleep (waiting for an event to complete)
T stopped by job control signal
t stopped by debugger during the tracing
W paging (not valid since the 2.6.xx kernel)
X dead (should never be seen)
Z defunct ("zombie") process, terminated but not reaped by
its parent
For BSD formats and when the stat keyword is used, additional
characters may be displayed:
< high-priority (not nice to other users)
N low-priority (nice to other users)
L has pages locked into memory (for real-time and custom
IO)
s is a session leader
l is multi-threaded (using CLONE_THREAD, like NPTL
pthreads do)
+ is in the foreground process group
Source https://man7.org/linux/man-pages/man1/ps.1.html#PROCESS_STATE_CODES

docker pause command from inside is not possible for an unprivileged container. It would need access to the docker daemon by mounting the socket.
You would need to build a custom solution. Just the basic idea: You could bindmount a folder from the host. Inside this folder you create a file which acts as a lock. So when you pause inside the container you would create the file. While the file exists you activly wait/sleep. As soon as the host would delete the file at path which was mounted, your code would resume. That is a rather naive approach because you actively wait, but it should do the trick.
You can also look into inotify to overcome activ waiting.
https://lwn.net/Articles/604686/

Related

`ps` of specific container from host

in the host, is there any way to get ps of specific container?
if a container having cgroup foo has processes bar, baz, bam
then like ps --cgroup-id foo should print the result of ps as if in the container(cgroup) as follows:
PID USER TIME COMMAND
1 root 0:00 bar
60 root 0:00 baz
206 root 0:00 bam
it doesn't have to be ps though, I hope it could be made of just one or two commands.
Thanks!
There's a docker top command, e.g.:
$ docker top 9f2
UID PID PPID C STIME TTY TIME CMD
root 20659 20621 0 Oct08 ? 00:00:00 nginx: master process nginx -g daemon off;
systemd+ 20825 20659 0 Oct08 ? 00:00:00 nginx: worker process
systemd+ 20826 20659 0 Oct08 ? 00:00:00 nginx: worker process
systemd+ 20827 20659 0 Oct08 ? 00:00:00 nginx: worker process
systemd+ 20828 20659 0 Oct08 ? 00:00:00 nginx: worker process
systemd+ 20829 20659 0 Oct08 ? 00:00:00 nginx: worker process
systemd+ 20830 20659 0 Oct08 ? 00:00:00 nginx: worker process
systemd+ 20831 20659 0 Oct08 ? 00:00:00 nginx: worker process
systemd+ 20832 20659 0 Oct08 ? 00:00:00 nginx: worker process
Or you can exec into the container if the container ships with ps:
docker exec $container_name ps
And if ps isn't included in the container, you can run a different container in the same pid namespace:
$ docker run --pid container:9f2 busybox ps -ef
PID USER TIME COMMAND
1 root 0:00 nginx: master process nginx -g daemon off;
23 101 0:00 nginx: worker process
24 101 0:00 nginx: worker process
25 101 0:00 nginx: worker process
26 101 0:00 nginx: worker process
27 101 0:00 nginx: worker process
28 101 0:00 nginx: worker process
29 101 0:00 nginx: worker process
30 101 0:00 nginx: worker process
31 root 0:00 ps -ef

Killing subprocess from inside a Docker container kills the entire container

On my Windows machine, I started a Docker container from docker compose. My entrypoint is a Go filewatcher that runs a task of a taskmanager on every filechange. The executed task builds and runs the Go program.
But before I can build and run the program again after filechanges I have to kill the previous running version. But every time I kill the app process, the container is also gone.
The goal is to kill only the svc1 process with PID 74 in this example. I tried pkill -9 svc1 and kill $(pgrep svc1). But every time the parent processes are killed too.
The commandline output from inside the container:
root#bf073c39e6a2:/app/cmd/svc1# ps -aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 2.5 0.0 104812 2940 ? Ssl 13:38 0:00 /go/bin/watcher
root 13 0.0 0.0 294316 7576 ? Sl 13:38 0:00 /go/bin/task de
root 74 0.0 0.0 219284 4908 ? Sl 13:38 0:00 /svc1
root 82 0.2 0.0 18184 3160 pts/0 Ss 13:38 0:00 /bin/bash
root 87 0.0 0.0 36632 2824 pts/0 R+ 13:38 0:00 ps -aux
root#bf073c39e6a2:/app/cmd/svc1# ps -afx
PID TTY STAT TIME COMMAND
82 pts/0 Ss 0:00 /bin/bash
88 pts/0 R+ 0:00 \_ ps -afx
1 ? Ssl 0:01 /go/bin/watcher -cmd /go/bin/task dev -startcmd
13 ? Sl 0:00 /go/bin/task dev
74 ? Sl 0:00 \_ /svc1
root#bf073c39e6a2:/app/cmd/svc1# pkill -9 svc1
root#bf073c39e6a2:/app/cmd/svc1
Switching to the containerlog:
task: Failed to run task "dev": exit status 255
2019/08/16 14:20:21 exit status 1
"dev" is the name of the task in the taskmanger.
The Dockerfile:
FROM golang:stretch
RUN go get -u -v github.com/radovskyb/watcher/... \
&& go get -u -v github.com/go-task/task/cmd/task
WORKDIR /app
COPY ./Taskfile.yml ./Taskfile.yml
ENTRYPOINT ["/go/bin/watcher", "-cmd", "/go/bin/task dev", "-startcmd"]
I expect only the process with the target PID is killed and not the parent process that spawned it it.
You can use process manager like "supervisord" and configure it to re-execute your script or the command even if you killed it's process which will keep your container up and running.

Apache2: "Address already in use" when trying to start it ('httpd.pid' issue?)

Using Apache2 on Linux, I get this error message when trying to start it.
$ sudo /usr/local/apache2/bin/apachectl start
httpd not running, trying to start
(98)Address already in use: make_sock: unable to listen for connections on address 127.0.0.1:80
no listening sockets available, shutting down
Unable to open logs
$ sudo /usr/local/apache2/bin/apachectl stop
httpd (no pid file) not running
Some facts:
This is one of the last lines in my Apache logs:
[Mon Jun 19 18:29:01 2017] [warn] pid file /usr/local/apache2/logs/httpd.pid overwritten -- Unclean shutdown of previous Apache run?
My '/usr/local/apache2/conf/httpd.conf' contains
Listen 127.0.0.1:80
I have "Listen 80" configured at '/etc/apache2/ports.conf'
Disk is not full
I've checked that I do not have two or more "Listen" at '/usr/local/apache2/conf/httpd.conf'
Some outputs:
$ sudo ps -ef | grep apache
root 1432 1 0 17:35 ? 00:00:00 /usr/sbin/apache2 -k start
www-data 1435 1432 0 17:35 ? 00:00:00 /usr/sbin/apache2 -k start
www-data 1436 1432 0 17:35 ? 00:00:00 /usr/sbin/apache2 -k start
myuserr 1775 1685 0 17:37 pts/1 00:00:00 grep --color=auto apache
$ sudo grep -ri listen /etc/apache2
/etc/apache2/apache2.conf:# supposed to determine listening ports for incoming connections which can be
/etc/apache2/apache2.conf:# Include list of ports to listen on
/etc/apache2/ports.conf:Listen 80
/etc/apache2/ports.conf: Listen 443
/etc/apache2/ports.conf: Listen 443
What can I do to restart Apache? Should I repair 'httpd.pid'?
This error means that something already uses 80 port.
If you really don't have 2 line of Listen 80 in apache configurations then execute this command to see what uses 80 port: netstat -antp | grep 80.
I fixed it by killing the three processes
root 1621 1 0 18:46 ? 00:00:00 /usr/sbin/apache2 -k start
www-data 1624 1621 0 18:46 ? 00:00:00 /usr/sbin/apache2 -k start
www-data 1625 1621 0 18:46 ? 00:00:00 /usr/sbin/apache2 -k start
However, each time I want to reboot my server, I must kill thee processes. What is starting them?

Can't stop/restart Apache2 service

Trying to stop Apache2 service, but get PID error:
#service apache2 stop
[FAIL] Stopping web server: apache2 failed!
[....] There are processes named 'apache2' running which do not match your pid file which are left untouched in the name of safety, Plea[warnview the situation by hand. ... (warning).
Trying to kill, those processes:
#kill -9 $(ps aux | grep apache2 | awk '{print $2}')
but they get re-spawned again:
#ps aux | grep apache2
root 19279 0.0 0.0 4080 348 ? Ss 05:10 0:00 runsv apache2
root 19280 0.0 0.0 4316 648 ? S 05:10 0:00 /bin/sh /usr/sbin/apache2ctl -D FOREGROUND
root 19282 0.0 0.0 91344 5424 ? S 05:10 0:00 /usr/sbin/apache2 -D FOREGROUND
www-data 19284 0.0 0.0 380500 2812 ? Sl 05:10 0:00 /usr/sbin/apache2 -D FOREGROUND
www-data 19285 0.0 0.0 380500 2812 ? Sl 05:10 0:00 /usr/sbin/apache2 -D FOREGROUND
And though the processes are running i can't connect to the server on port 80. /var/log/apache2/error.log.1 has no new messages when i do the kill -9.
Before I tried to restart everything worked perfectly.
Running on Debian: Linux adara 3.2.0-4-amd64 #1 SMP Debian 3.2.54-2 x86_64 GNU/Linux
UPD:
also tried apache2ctl:
#/usr/sbin/apache2ctl -k stop
AH00526: Syntax error on line 76 of /etc/apache2/apache2.conf:
PidFile takes one argument, A file for logging the server process ID
Action '-k stop' failed.
The Apache error log may have more information.
but there is no pid file in /var/run/apache2
I'm new to linux, looks like it has to do something with startup scripts, but can't figure out what exactly.
Below is the command to find out the process running on port 80
lsof -i tcp:80
Kill the process with PID.Restart the system once to check if their is any start up script executing and using the Port 80 which is preventing you to start your service.
For start up scripts you can check
/etc/init.d/ or /etc/rc.local or crontab - e
You can try Apache official documentation for stop/restart operations.
link

Vagrant provisioning script runs before VM is fully up?

I have a Vagrant box that I am preparing, along with a shell script as the provisioning file. I have spent a couple of days getting it working, and it now seems stable and complete. The base box is Ubuntu 12.04 (32-bit), with Postgres, Redis and Memcached running on the VM. The provisioning script sets up Nginx configuration, creates a blank database, and does some basic housekeeping.
When I came to package the VM, and attempt to re-run it on a different machine at home, I kept coming across a problem on the first run (vagrant up) whereby none of the services were running - and so my attempt to run dropdb or createdb failed.
Digging in to why this occurred (and I'm an ex-Windows guy, so this took some doing) I found myself in the bowels of run levels and the /etc/rc[0-6,S].d files.
I have relevant S (start) files for the three services I'm interested in:
vagrant#precise32:~$ ls -l /etc/rc2.d
total 4
-rw-r--r-- 1 root root 677 Apr 14 2012 README
lrwxrwxrwx 1 root root 20 Dec 29 10:05 S19postgresql -> ../init.d/postgresql
lrwxrwxrwx 1 root root 19 Dec 29 10:05 S20memcached -> ../init.d/memcached
lrwxrwxrwx 1 root root 15 Dec 29 10:05 S20nginx -> ../init.d/nginx
lrwxrwxrwx 1 root root 22 Dec 29 10:05 S20redis-server -> ../init.d/redis-server
...
and K files for run level 0 (shutdown), and so all seems in order:
vagrant#precise32:~$ ls -l /etc/rc0.d
total 4
lrwxrwxrwx 1 root root 19 Dec 29 10:05 K20memcached -> ../init.d/memcached
lrwxrwxrwx 1 root root 15 Dec 29 10:05 K20nginx -> ../init.d/nginx
lrwxrwxrwx 1 root root 22 Dec 29 10:05 K20redis-server -> ../init.d/redis-server
lrwxrwxrwx 1 root root 20 Dec 29 10:05 K21postgresql -> ../init.d/postgresql
....
This seemed to suggest that the underlying VM runlevel wasn't 2, and so in order to debug this issue, I created a new provisioning script to output a.) the runlevel at the time of provisioning, and b.) whether the expected processes were running (memcache, prostgres, redis):
ps aux | grep memcache
ps aux | grep postgres
ps aux | grep redis
# expected output is 'N 2'
runlevel
I ran vagrant destroy, then vagrant up on this, and the result is as follows:
[default] Running provisioner: Vagrant::Provisioners::Shell...
root 791 0.0 0.2 4624 840 ? S 10:33 0:00 grep memcache
root 793 0.0 0.2 4624 836 ? S 10:33 0:00 grep postgres
root 795 0.0 0.2 4624 840 ? S 10:33 0:00 grep redis
unknown
i.e. the services aren't running at the time the provisioning script is run, and more confusingly, the runlevel command isn't even recognised.
If I then repeatedly re-run the provisioning script on the running VM, using vagrant provision, I get the same results for the first few times I run it, and then eventually (after 2-3mins) I see what I had expected first time round:
[default] Running provisioner: Vagrant::Provisioners::Shell...
memcache 1103 0.2 0.2 46336 1072 ? Sl 10:56 0:00 /usr/bin/memcached -m 64 -p 11211 -u memcache -l 127.0.0.1
root 1267 0.0 0.2 4624 840 ? S 10:56 0:00 grep memcache
postgres 1073 13.0 2.0 50440 7828 ? S 10:56 0:02 /usr/lib/postgresql/9.1/bin/postgres -D /var/lib/postgresql/9.1/main -c config_file=/etc/postgresql/9.1/main/postgresql.conf
postgres 1077 0.3 0.3 50440 1248 ? Ss 10:56 0:00 postgres: writer process
postgres 1078 0.3 0.3 50440 1244 ? Ss 10:56 0:00 postgres: wal writer process
postgres 1079 0.1 0.6 50860 2296 ? Ss 10:56 0:00 postgres: autovacuum launcher process
postgres 1080 0.0 0.3 20640 1284 ? Ss 10:56 0:00 postgres: stats collector process
root 1269 0.0 0.2 4624 836 ? S 10:56 0:00 grep postgres
redis 1123 0.6 0.2 3292 1036 ? Ss 10:56 0:00 /usr/bin/redis-server /etc/redis/redis.conf
root 1271 0.0 0.2 4624 840 ? S 10:56 0:00 grep redis
N 2
It looks like it's just taking a bit of time for everything to come up, which makes sense, hwoever presents a huge problem for me in that the provisioning script will always fail first time.
Is this a known situation, and if so, what is the solution? Ideally, the provisioning script would pause until the runlevel had changed to 2, i.e. the box was ready to accept the shell commands.
[UPDATE: HACK]
I have managed to work around this issue by hacking together the following script:
while [ "`runlevel`" = "unknown" ]; do
echo "runlevel is 'unknown' - waiting for 10s"
sleep 10
done
echo "runlevel is now valid ('`runlevel`'), kicking off provisioning..."
I have saved this as 'pre-provision.sh' and my Vagrantfile now looks like:
# Enable provisioning with a shell script.
config.vm.provision :shell, :path => "pre-provision.sh"
config.vm.provision :shell, :path => "provision.sh", :args => "myapp"
which gives the following ouput:
[default] Running provisioner: Vagrant::Provisioners::Shell...
runlevel is 'unknown' - waiting for 10s
runlevel is 'unknown' - waiting for 10s
runlevel is 'unknown' - waiting for 10s
runlevel is 'unknown' - waiting for 10s
runlevel is 'unknown' - waiting for 10s
runlevel is 'unknown' - waiting for 10s
runlevel is 'unknown' - waiting for 10s
runlevel is 'unknown' - waiting for 10s
runlevel is 'unknown' - waiting for 10s
runlevel is 'unknown' - waiting for 10s
runlevel is 'unknown' - waiting for 10s
runlevel is now valid ('N 2'), kicking off provisioning...
[default] Running provisioner: Vagrant::Provisioners::Shell...
...
and then the original provision.sh runs, and everything is OK.
I have not marked this as the answer (although it is an answer) because I still want to know what I should have done - this cannot be the way it works, surely?
Turns out that the easiest way to do this is to look for the PID file for the relevant process (see this article for an explanation of the pid file - What is a .pid file and what does it contain?)
NGINX_PID=/var/run/nginx.pid
...
## set up nginx configuration using .config from shared directory
if [ ! -f $NGINX_PID ]; then
echo "---> Waiting for for Nginx process to spin up"
while [ ! -f $NGINX_PID ]; do
echo .
sleep 1
done
fi

Resources