Problem
Docker containers started via Jenkins pipeline command
docker.image(imageToStart).inside('--init')
can not be stopped due to zombie processes left by container.
Questions
How is it possible to get zombie processes from a Docker container, when it was started with '--init' option?
Has someone else hit the same issue?
Used environment
Docker 18.03.1-ce
Jenkins 2.60.2
Docker Pipeline plugin 1.12
Details
When a container is started from Jenkins pipeline with a command like:
docker.image('alpine').inside('--init') {
sh ('ps -efa -o pid,ppid,user,comm')
}
There are several processes in this container with parent PID 0:
[Pipeline] withDockerContainer
loco does not seem to be running inside a container
$ docker run -t -d -u 1001:1002 \
--init \
-w /lhome/ci<br>admin/jenkins/workspace/bli-groovy-test \
-v /lhome/ciadmin/jenkins/workspace/bli-groovy-test:/lhome/ciadmin/jenkins/workspace/bli-groovy-test:rw,z \
-v /lhome/ciadmin/jenkins/workspace/bli-groovy-test-tmp:/lhome/ciadmin/jenkins/workspace/bli-groovy-test-tmp:rw,z \
-e ******** \
--entrypoint cat alpine
[Pipeline] {
[Pipeline] sh
[bli-groovy-test] Running shell script
+ ps -efa -o pid,ppid,user,comm
PID PPID USER COMMAND
1 0 1001 init
7 1 1001 cat
8 0 1001 sh
14 8 1001 script.sh
15 14 1001 ps
[Pipeline] }
PID 1 / PPID 0 is the 'init' command used to start the container
PID 8 / PPID 0 is the 'sh' command from the closure to execute 'ps' command
The 'sh' process does not reap its child processes. When the process itself exits its descendants are assigned to a PPID from outside the container and not to PPID 1 from the 'init' process of the container.
The new parent PID is the PID of the 'docker-containerd-shim' process of the container.
With the small example I could not reproduce the zombie processes, but here is the situation from a more complex Jenkins job:
Docker command from Jenkins job
$ docker run -t -d -u 1001:1002 \
--init \
-w /lhome/testadmin/jenkins-coreloops/workspace/test-job/database \
-v /lhome/testadmin/jenkins-coreloops/workspace/test-job/database:/lhome/testadmin/jenkins-coreloops/workspace/test-job/database:rw,z \
-v /lhome/testadmin/jenkins-coreloops/workspace/test-job/database-tmp:/lhome/testadmin/jenkins-coreloops/workspace/test-job/database-tmp:rw,z \
-e ******** \
--entrypoint cat richmond.lhs-systems.com:5000/ait/mpde
[Pipeline] {
[Pipeline] sh
10:03:09 [database] Running shell script
10:03:09 + ./db-upgrade.sh
inside the container shell scripts are started
shell scripts call perl scripts
perl scripts start SQL*Plus (1 instance per database login)
perl scripts send SQL commands to SQL*Plus instances via STDIN
When the closure ends and Jenkins tries to stop the container, the following processes are left:
[testadmin#testhost] ~ # ps -efa | grep -vw grep | grep -w 47077
root 1725 47077 0 10:03 ? 00:00:00 [ps] <defunct>
root 1732 47077 0 10:03 ? 00:00:00 [docker-runc] <defunct>
root 2887 47077 0 10:04 ? 00:00:00 [sqlplus] <defunct>
root 2915 47077 0 10:04 ? 00:00:00 [sqlplus] <defunct>
root 47077 17349 0 10:03 ? 00:00:00 docker-containerd-shim
-namespace moby
-workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/1863503ca54f75168db8ce20c78b821c0e5280f07d59875e8f651db4f0b67d9f
-address /var/run/docker/containerd/docker-containerd.sock
-containerd-binary /usr/bin/docker-containerd
-runtime-root /var/run/docker/runtime-runc
root 47098 47077 0 10:03 pts/0 00:00:00 /dev/init -- cat
root 47506 47077 0 10:03 ? 00:00:00 [sh] <defunct>
[testadmin#testhost] ~ #
and command 'docker stop' is aborted after timeout of 180 seconds.
To cleanup the remaining processes of the container this docker-containerd-shim process has to be killed with SIGKILL.
Note
We observed this issue on our recently installed CentOS server:
- CentOS Linux release 7.5.1804 (Core)
- environment related parts from 'docker info':
Server Version: 18.03.1-ce
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 773c489c9c1b21a6d78b5c538cd395416ec50f88
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: 949e6fa
Security Options:
seccomp
Profile: default
Kernel Version: 3.10.0-862.6.3.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 48
Total Memory: 377.6GiB
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
The behavior on other of our Docker hosts is similar with respect to the multiple processes with parent PID 0, but we didn't observe that containers were hanging on shutdown or that there was a similar number of zombie processes.
For comparisation the similar 'docker info' extract from one of these other hosts:
Server Version: 17.05.0-ce
Storage Driver: devicemapper
Pool Name: dock-thinpool
Pool Blocksize: 524.3kB
Base Device Size: 16.11GB
Backing Filesystem: xfs
Data file:
Metadata file:
Udev Sync Supported: true
Deferred Removal Enabled: true
Deferred Deletion Enabled: true
Deferred Deleted Device Count: 0
Library Version: 1.02.140-RHEL7 (2017-05-03)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9048e5e50717ea4497b757314bad98ea3763c145
runc version: 9c2d8d184e5da67c95d601382adf14862e4f2228
init version: 949e6fa
Security Options:
seccomp
Profile: default
Kernel Version: 3.10.0-693.11.6.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 48
Total Memory: 377.6GiB
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Edit 2018-Aug-01:
As workaround I added the init process as subreaper also to problematic 'docker exec' calls and problematic 'sh' calls in the Jenkins docker().inside() closure.
This eliminated the zombie processes in our environment.
Related
I enabled user namespace isolation as depicted in archwiki and on docker documentation.
However, when I try to use a bind mount:
# this is fish shell
$ docker run --rm -it -v (pwd)/bmount/:/bmount alpine sh
I get:
docker: Error response from daemon: failed to create shim: OCI runtime
create failed: container_linux.go:380: starting container process
caused: process_linux.go:545: container init caused:
rootfs_linux.go:76: mounting "xxxx/bmount" to rootfs at "/bmount"
caused: stat xxxx/bmount: permission denied: unknown.
I managed to get regular docker volume to work without issue. As far as I'm understanding this setup, bind mount should be working given the container has rights on the folder, right?
Here is some more details:
I created a dockremap user and group, both having an UID/GID of 200000
Here is the content of /etc/subuid and /etc/subgid
dockremap:200000:65536
dockremap:200000:65536
And docker is configured to use user namespace isolation, here is the /etc/docker/daemon.json
{ "userns-remap": "dockremap" }
I made sure the bmount folder I'm trying to bind mount is chown to dockremap
drwxr-xr-x 2 dockremap dockremap 4,0K 24 juin 13:08 bmount
docker info gives this:
Client: Context: default Debug Mode: false Plugins: app: Docker
App (Docker Inc., v0.9.1-beta3) buildx: Build with BuildKit (Docker
Inc., v0.5.1-tp-docker)
Server: Containers: 0 Running: 0 Paused: 0 Stopped: 0 Images: 2
Server Version: 20.10.7 Storage Driver: overlay2 Backing Filesystem:
extfs Supports d_type: true Native Overlay Diff: false
userxattr: false Logging Driver: json-file Cgroup Driver: systemd
Cgroup Version: 2 Plugins: Volume: local Network: bridge host
ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf
journald json-file local logentries splunk syslog Swarm: inactive
Runtimes: io.containerd.runtime.v1.linux runc io.containerd.runc.v2
Default Runtime: runc Init Binary: docker-init containerd version:
36cc874494a56a253cd181a1a685b44b58a2e34a.m runc version:
b9ee9c6314599f1b4a7f497e1f1f856fe433d3b7 init version: de40ad0
Security Options: seccomp Profile: default userns cgroupns
Kernel Version: 5.12.11-arch1-1 Operating System: Arch Linux OSType:
linux Architecture: x86_64 CPUs: 2 Total Memory: 3.735GiB Name:
CRIQUET ID:
AT7Q:T3HO:HBFF:H7AM:XABE:KY4B:AQ2H:POBZ:YUE3:XGOT:4ROB:SDUX Docker
Root Dir: /var/lib/docker/200000.200000 Debug Mode: false Registry:
https://index.docker.io/v1/ Labels: Experimental: false Insecure
Registries:
127.0.0.0/8 Live Restore Enabled: false
$ uname -a
Linux CRIQUET 5.12.11-arch1-1 #1 SMP PREEMPT Wed, 16 Jun 2021 15:25:28
+0000 x86_64 GNU/Linux
I am trying to install sentry docker on my linux .After clone it's repository:
git clone https://github.com/getsentry/onpremise
I run this
$ ./install.sh
but i got this error:
alt#mx-alt:/mnt/Software/Linux/sentry/onpremise
$ ./install.sh
Checking minimum requirements...
FAIL: Expected minimum RAM available to Docker to be 2400 MB but found MB
this is my docker info:
$ sudo docker info
Client:
Debug Mode: false
Server:
Containers: 1
Running: 0
Paused: 0
Stopped: 1
Images: 1
Server Version: 19.03.13
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 8fba4e9a7d01810a393d5d25a3621dc101981175
runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd
init version: fec3683
Security Options:
seccomp
Profile: default
Kernel Version: 4.19.0-12-amd64
Operating System: Debian GNU/Linux 10 (buster)
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 15.63GiB
Name: mx-alt
ID: DRNU:OLX2:5VCT:GPNW:I3OV:4OHB:43UU:OVZL:OH5Y:5A2U:7MJA:SBHU
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
increase docker ram on linux?
I write my answer maybe help to someone!
1 - I pull busybox first:
$ docker pull busybox
Using default tag: latest
latest: Pulling from library/busybox
5f5dd3e95e9f: Pull complete
Digest: sha256:9f1c79411e054199210b4d489ae600a061595967adb643cd923f8515ad8123d2
Status: Downloaded newer image for busybox:latest
docker.io/library/busybox:latest
alt#mx-alt:~
alt#mx-alt:~
$ sudo docker run --rm busybox free -m 2
total used free shared buff/cache available
Mem: 16009 2176 11539 208 2293 13350
Swap: 8191 0 8191
alt#mx-alt:~
2- ./install.sh
Done.
I am new to docker and for my learning purpose I followed the official nodejs docker instructons and followed the instructions but it keeps throwing error on the same command.
I tried making images docker on Raspberry Pi to be used as a server. but I have a problem building it to run it
this is Dockerfile me
FROM node:4.3.2
WORKDIR /app
RUN npm install
EXPOSE (80)
CMD ["node", "index.js"]
ERROR
docker build -t hello-world .
Sending build context to Docker daemon 2.212MB
Step 1/5 : FROM node:4.3.2
---> 3538b8c69182
Step 2/5 : WORKDIR /app
---> Using cache
---> 7b8a5c56f23d
Step 3/5 : RUN npm install
---> Running in bbd6026d01d9
standard_init_linux.go:190: exec user process caused "exec format error"
The command '/bin/sh -c npm install' returned a non-zero code: 1
and docker version
Containers: 19
Running: 0
Paused: 0
Stopped: 19
Images: 10
Server Version: 18.06.1-ce
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 468a545b9edcd5932818eb9de8e72413e616e86e
runc version: 69663f0bd4b60df09991c08812a60108003fa340
init version: fec3683
Security Options:
seccomp
Profile: default
Kernel Version: 4.14.62-v7+
Operating System: Raspbian GNU/Linux 9 (stretch)
OSType: linux
Architecture: armv7l
CPUs: 4
Total Memory: 976.7MiB
Name: raspi2
ID: MJNK:BGTA:EFDS:B7VD:QZIL:T65S:IJRJ:ZO74:RG6D:BITS:AZNB:LDSC
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
WARNING: No memory limit support
WARNING: No swap limit support
WARNING: No kernel memory limit support
WARNING: No oom kill disable support
WARNING: No cpu cfs quota support
WARNING: No cpu cfs period support
The RUN instruction has two writing forms.
In your case, you have chosen the shell form which RUN statement will be parsed to /bin/sh -c run-statement(npm install here).
Unfortunately, it replies with non-zero error. We all should know, shell script will replies non-zero error when error occurs.
I am not familiar with npm. But still I may see that error is not related with docker now, but shell script itself.
Maybe it’s the syntax or npm version issue which is mentioned by other comment. It is not about docker platform now.
You could just upgrade npm image as comment said above. May this help~
I'm a new in a Docker, and I've tried to find solution in the google befor ask question - no result.
I decided to learn docker via practical use case - create PostgreSQL container into my VM instance for develop enviroment.
I've been in vacation and didn't check my server several days. Later I tried to connect to my DB, and couldnt - all of my active containers was exited with code 128.
I tried to start again container with DB - docker start django-postgres and got error message - Error response from daemon: OCI runtime create failed: container with id exists: 5c11e724bf52dd1cb6fd10ebda40710385e412981eb269c30071ecc8aac9e805: unknown
Error: failed to start containers: django-postgres
I suspect that somewhere in my system docker keeps some metadata of my container which didn't removed after container was down with code 128, but my knowledge of unix doesn't enough to determine where is it can be. Also, I'm affraid of lost my DB data connected with container.
Some techincal info:
docker version:
Version: 18.03.0-ce
API version: 1.37
Go version: go1.9.4
Git commit: 0520e24
Built: Wed Mar 21 23:10:01 2018
OS/Arch: linux/amd64
Experimental: false
Orchestrator: swarm
docker info
Containers: 9
Running: 2
Paused: 0
Stopped: 7
Images: 5
Server Version: 18.03.0-ce
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: cfd04396dc68220d1cecbe686a6cc3aa5ce3667c
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: 949e6fa
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.4.0-116-generic
Operating System: Ubuntu 16.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 488.3MiB
ID: NDUH:OH24:4M4L:TR5O:TOIH:ARV4:LNRP:6QNE:WEYW:TMXR:7KNK:ZPDD
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
WARNING: No swap limit support
Does anyone can help my understand my issue and how to fix it without lost data?
N.B. The second container that has been exited with code 128 was OpenVPN. I can't restart it also, but error was differ - cgroups: cannot found cgroup mount destination: unknown
I found solution here (github):
Temp fix is
sudo mkdir /sys/fs/cgroup/systemd
sudo mount -t cgroup -o none,name=systemd cgroup /sys/fs/cgroup/systemd
This fix coudn't helped with Postgres container.
It is possible to list all running and stopped containers using docker ps -a. -a or --all Show all containers (default shows just running).
You can find the volumes attached to your old postgres container using docker inspect <container-id> (Maybe pipe to less and search for volumes)
If you want to recover your data, you can attach it to a new postgres container and recover it. (If it is a root volume change target to /)
docker run --name new-postgres \
--mount source=myoldvol,target=/var/lib/postgresql/data -d postgres
And then you can remove the old one by using docker rm <container-id>.
For more information please see,
docker ps,
docker volumes,
docker rm
I want Docker to start with systemd cgroup driver. For some reason it is using only cgroupfs on my CentOS 7 server.
Here is startup config file.
# systemctl cat docker
# /usr/lib/systemd/system/docker.service
[Unit]
Description=Docker Application Container Engine
Documentation=http://docs.docker.com
After=network.target
Wants=docker-storage-setup.service
Requires=docker-cleanup.timer
[Service]
Type=notify
NotifyAccess=all
EnvironmentFile=-/etc/sysconfig/docker
EnvironmentFile=-/etc/sysconfig/docker-storage
EnvironmentFile=-/etc/sysconfig/docker-network
Environment=GOTRACEBACK=crash
Environment=DOCKER_HTTP_HOST_COMPAT=1
Environment=PATH=/usr/libexec/docker:/usr/bin:/usr/sbin
ExecStart=/usr/bin/dockerd-current \
--add-runtime docker-runc=/usr/libexec/docker/docker-runc-current \
--default-runtime=docker-runc \
--exec-opt native.cgroupdriver=systemd \
--userland-proxy-path=/usr/libexec/docker/docker-proxy-current \
$OPTIONS \
$DOCKER_STORAGE_OPTIONS \
$DOCKER_NETWORK_OPTIONS \
$ADD_REGISTRY \
$BLOCK_REGISTRY \
$INSECURE_REGISTRY
ExecReload=/bin/kill -s HUP $MAINPID
LimitNOFILE=1048576
LimitNPROC=1048576
LimitCORE=infinity
TimeoutStartSec=0
Restart=on-abnormal
MountFlags=slave
[Install]
WantedBy=multi-user.target
# /etc/systemd/system/docker.service.d/docker-thinpool.conf
[Service]
ExecStart=
ExecStart=/usr/bin/dockerd --storage-driver=devicemapper --storage-opt=dm.thinpooldev=/dev/mapper/docker-thinpool \
--storage-opt=dm.use_deferred_removal=true --storage-opt=dm.use_deferred_deletion=true
EOF
When I start Docker, it's running like this:
# ps -fed | grep docker
root 8436 1 0 19:13 ? 00:00:00 /usr/bin/dockerd-current --storage-driver=devicemapper --storage-opt=dm.thinpooldev=/dev/mapper/docker-thinpool --storage-opt=dm.use_deferred_removal=true --storage-opt=dm.use_deferred_deletion=true
root 8439 8436 0 19:13 ? 00:00:00 /usr/bin/docker-containerd-current -l unix:///var/run/docker/libcontainerd/docker-containerd.sock --shim docker-containerd-shim --metrics-interval=0 --start-timeout 2m --state-dir /var/run/docker/libcontainerd/containerd --runtime docker-runc
Here is the output of docker info:
# docker info
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 1
Server Version: 1.12.6
Storage Driver: devicemapper
Pool Name: docker-thinpool
Pool Blocksize: 524.3 kB
Base Device Size: 10.74 GB
Backing Filesystem: xfs
Data file:
Metadata file:
Data Space Used: 185.6 MB
Data Space Total: 1.015 GB
Data Space Available: 829.4 MB
Metadata Space Used: 77.82 kB
Metadata Space Total: 8.389 MB
Metadata Space Available: 8.311 MB
Thin Pool Minimum Free Space: 101.2 MB
Udev Sync Supported: true
Deferred Removal Enabled: true
Deferred Deletion Enabled: true
Deferred Deleted Device Count: 0
Library Version: 1.02.135-RHEL7 (2016-11-16)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: null bridge overlay host
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: seccomp
Kernel Version: 3.10.0-514.16.1.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
Number of Docker Hooks: 2
CPUs: 1
Total Memory: 992.7 MiB
Name: master
ID: 6CFR:H7SN:MEU7:PNJH:UMSO:6MNE:43Q5:SF4K:Z25I:BKHP:53U4:63SO
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Insecure Registries:
127.0.0.0/8
Registries: docker.io (secure)
How can I make it run with systemd?
Thanks
SR
A solution that does not involve editing systemd units or drop-ins would be to create (or edit) the /etc/docker/daemon.json configuration file and to include the following:
{
"exec-opts": ["native.cgroupdriver=systemd"]
}
After saving it, restart your docker service.
sudo systemctl restart docker
This solution obviously is only feasible if you would want to apply this system-wide.
Since I have two configuration file I need to add the entry in the second config file also -- /etc/systemd/system/docker.service.d/docker-thinpool.conf:
--exec-opt native.cgroupdriver=systemd \
Just to add, cgroupfs is dockers own control group manager. However, for the majority of Linux distributions ssytemd is the default init system now and systemd has tight integration with Linux control groups and In Kubernetes site, they recommend using systemd (see below) as using cgroupfs along with systemd seems to be non-optimal
So it is better to use systemd then for cgroup managment. kubelet is configured by default to use systemd. So it is easier and better to change Docker to use the systemd Cgroup driver
A history of this overlap is here https://lwn.net/Articles/676831/
In Kubernetes site, they recommend using systemd https://kubernetes.io/docs/setup/production-environment/container-runtimes/
Cgroup drivers When systemd is chosen as the init system for a Linux
distribution, the init process generates and consumes a root control
group (cgroup) and acts as a cgroup manager. Systemd has a tight
integration with cgroups and will allocate cgroups per process. It’s
possible to configure your container runtime and the kubelet to use
cgroupfs. Using cgroupfs alongside systemd means that there will then
be two different cgroup managers.
Control groups are used to constrain resources that are allocated to
processes. A single cgroup manager will simplify the view of what
resources are being allocated and will by default have a more
consistent view of the available and in-use resources. When we have
two managers we end up with two views of those resources. We have seen
cases in the field where nodes that are configured to use cgroupfs for
the kubelet and Docker, and systemd for the rest of the processes
running on the node becomes unstable under resource pressure.
OS: Centos 7.4 As kubernetes 1.23.1 recommend to use cgroup systemd, and docker 20.10.20 use cgroup cgroupfs. So, you have to change docker service file.
step1: Stop docker service
systemctl stop docker
step2: change on files /etc/systemd/system/multi-user.target.wants/docker.service and /usr/lib/systemd/system/docker.service
From :
ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
TO:
ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock --exec-opt native.cgroupdriver=systemd
step3: start docker service and kubelet
systemctl start docker
kubeadm init phase kubelet-start
Make sure you are logged in as root and execute the below two commands :
echo '{"exec-opts": ["native.cgroupdriver=systemd"]}' >> /etc/docker/daemon.json
systemctl restart docker
Try to restart the docker service:
systemctl daemon-reload
systemctl restart docker.service